mastodon.social is one of the many independent Mastodon servers you can use to participate in the fediverse.
The original server operated by the Mastodon gGmbH non-profit

Administered by:

Server stats:

336K
active users

@bobwyman can you help me out? I'm trying to remember the term of art for the opposite of real-time search. Is it just "static search"? I feel like there's a better term for it.

@evan I've found that "real-time search" isn't really as useful a term as people would expect... But, if you insist on using it, then the opposite of real-time is going to be either "periodic," "repeated," or "queued," depending on what you're trying to say.

There are two kinds of search: Retrospective and Prospective.
1/ (see following notes)

Bob Wyman

@evan A retrospective search compares a query to one or more documents that have been previously indexed. Such a query tells you what has been indexed in the past. Because the query looks to the past, it is "retrospective." You can process that query in "real-time," like Google, etc. do, or you can queue the query for processing once or more times in the future, but the query will always be looking to the past. Most "real-time" searches are implemented as repeated retrospective searches.
2/

@evan A prospective search compares each member of a sequence of one or more documents to a previously indexed set of one or more queries. In such a system a query asks: "Tell me whenever!" Because each query looks to the future, the queries are "prospective." Many notification systems rely on prospective search. Similarly, much of the internal operation of SocialWeb systems can be modeled as prospective searches. (i.e. copy post to my feed "whenever" I am mentioned in it.)
3/

@evan The key to understanding the difference is to consider what is indexed. A retrospective system indexes "documents" which are usually just collections of static data. On the other hand, a query can contain ranges, boolean expressions, etc. which can't be indexed using the "inverted trees" and other methods typically used when indexing documents. Thus, true prospective systems, which index queries, require very different indexing structures than do retrospective systems.
4/

@evan Because most people don't understand how to index queries, and because this problem is typically not discussed in schools or in books on search, most systems that should use prospective search are actually implemented by doing repeated *retrospective* searches on incrementally built document indexes. While this is inefficient, it is easily hacked together and, given the power of machines today, most folk don't have enough queries to really notice the limitations of repeated retrospective.

@evan When the semantics of a prospective system are very simple, it is possible to use simple implementations. For instance, if the queries are limited to specifying a single string term (i.e. a user name) then one can build a simple, high-performing prospective system that extracts usernames from incoming posts and compares them to a hash-table of "queries" which are just usernames.

A proper system would allow Booleans, ranges, etc., but that requires a dramatic increase in complexity.

@evan Someday, I'll write this stuff up in a form which is useful for general consumption. It's really not that hard if you recognize that prospective search requires approaches different from retrospective. I've always found it odd that proper prospective search technology isn't more widely used outside large platforms like Google, Yahoo, Facebook, Bloomberg, etc. Many applications could much better serve their users' needs if they did this properly.

@bobwyman Until then, do you mind if I paraphrase for my ActivityPub book, with acknowledgment? "Prospective" and "retrospective" are way better terms than what I was using.

@evan I have no objection to anyone using these terms. Frankly I'd be pleased if more people knew the difference between retrospective and prospective search because almost all thinking about search today is about only half of the whole search problem -- the retrospective half. If more people recognized the distinction, we might get more people working on addressing that, as yet largely unexplored, half of the search problem.

@bobwyman @evan I think you have a typo in this message? You wrote "prospective" twice. Great stuff, I learned some new things!

@tedmielczarek @evan Thanks for catching that. I've edited the note to point out that people often use repeated *retrospective* searches to create the impression of having done a prospective search.

@bobwyman @evan Umm at AWS I did a ton of work on what we called "event filtering" where we compiled a bunch of "rules" together and fired blobs of JSON at it and for each it would report which rules matched and take appropriate action. Sounds like what you're talking about? Ran at millions of events/second.

@timbray @evan Yes, an index of "compiled rules" which is queried with JSON documents, is the kind of thing I'm talking about. All the large platforms have built such systems, but they aren't used much in most smaller applications -- this may be because the query/rules indexing methods haven't been provided as open-source components for developers.

(Note: Back about 20 years ago Werner Vogels wanted to buy PubSub.com to get our prospective search tech for AWS... Long story...)

@timbray @evan Note: These systems can be built to achieve very high event rates. At Google, we had a system that processed tens of millions of search queries in "real-time" against every document which was being inserted into the search index. (i.e. The equivalent of billions of retrospective queries per second.)

See: Google Abandoned Patent Application re: "Predictive searching and associated cache management" patents.google.com/patent/US20

patents.google.comUS20100318538A1 - Predictive searching and associated cache management - Google Patents A computer system including instructions stored on a computer-readable medium, may include a query manager configured to manage a query corpus including at least one predictive query, and a document manager configured to receive a plurality of documents from at least one document source, and configured to manage a document corpus including at least one document obtained from the at least one document source. The computer system also may include a predictive result manager configured to associate the at least one document with the at least one predictive query to obtain a predictive search result, and configured to update a predictive cache using the predictive search result, and may include a search engine configured to access the predictive cache to associate a received query with the predictive search result, and configured to provide the predictive search result as a search result of the received query, the search result including the at least one document.

@timbray @evan You mentioned this on your blog some years ago and I remember taking a look at it then. But, what I'm seeing on GitHub seems to be more developed than what I remember. I'll have some fun taking a fresh look at it. Thanks for the link.

@timbray @evan I've often wanted to specify a server system (e.g. ActivityPub) using prospective search tech (or event rules). The idea is that server processing rules can be broken into two types: (like speech acts...)
1. Constitutive Rules, that determine what constitutes types of event, and
2. Regulative Rules, that specify how to respond to different events.
A server would then be defined as a set of these two rule types. The result would be an executable specification. Makes sense?

@timbray @evan The interesting thing is that if one were to build a system that handled both Constitutive and Regulative Rules, then that one system would be able to implement a wide variety of different protocols and could be easily modified and extended.

@timbray @evan Can either event-rule or Quamina process JSON structures that contain embedded, repeating, multi-field structures? (i.e. "Find Bills-of-Material that contain a "Phone" priced at more than $100?")

How do you flatten such an object?

@bobwyman @evan both of those packages assume JSON so "embedded and repeating" is just an array. How that's flattened is actually one of the tricky bits but the semantics are pretty well described in the README, well at least for Quamina. But yes, it works about as one would expect.

@timbray @evan Another use of my Prospective Search Infrastructure (PSI) at Google was by Google Ads. Every ad is a prospective query. Something like: "Whenever a page includes the word "locksmith" and the user is in Florida, make this ad a candidate for scoring." So, all the ads' queries were indexed and then, for each page view, the combination of the page and the user context was used as the document which was matched against all queries. That required, of course, very high volume matching.