How Search Results are Indexed

How are search results indexed?

Search - how does it work?

We use a custom Elasticsearch cluster to index Posts, Comments, Pages, Users and Groups and query on it. In addition, we use Google APIs to query Drive Files, Calendar events and GMail.

How are the results sorted?

In the Elasticsearch cluster, the search query is parsed into terms and then queried to a relevant index like *Posts*.

Documents (in this index, *Posts*) are given scores by algorithms and sorted based on the scores.

These algorithms take into account:

  • Term match in the document
  • Term frequency in the document
  • Term frequency overall
  • Term importance in the document (Is it once in 300-word document, or once in 10-word document?)
  • Partial matches
  • Document age (older documents are less relevant) - this is used more heavily in the case of posts and comments, less in the case of pages and not used at all in the case of users and groups)

Note: When querying the Google APIs, the results are given by Google as-is.