Monday, December 21, 2015

How Google Might Make Better Synonym Substitutions Using Knowledge Base Categories

Leigh Miller - Yankee Stadium, francis_leigh, Some rights reserved
Leigh Miller – Yankee Stadium, francis_leigh, Some rights reserved

A couple of months ago, I wrote about a Google patent that involved rewriting queries, titled Investigating Google RankBrain and Query Term Substitutions. There’s likely a lot more to how Google’s RankBrain approach works, but I came across a patent that seems to be related to the patent I wrote about in that post, and thought it was worth sharing and starting a discussion about. The patent I wrote about in that post was Using concepts as contexts for query term substitutions. The title for this new patent was very similar to that one (Synonym identification based on categorical contexts), and the more recent patent was granted on December 1st of this year.

The new patent starts off describing a scenario that is a good example of how it works. The inventors tell us:

For example, learning that “restaurants” is a good synonym for “food” in the query [food in San Francisco] is relatively straightforward, because the volume of query traffic including the query term “San Francisco” is very large. For much smaller cities, such as Grey Bull, Wyo., the query stream may have never seen any supporting evidence for this synonym substitution.

That both cities are entities that fit into the same category, that of “Cities” means that they could potentially be good synonyms for each other. That’s what the inventors of this patent tell us specifically, using the San Francisco and Grey Bull example:

For example, if “San Francisco” and “Grey Bull” are both cities, and “restaurants” is a good synonym for “food” in queries about San Francisco, the synonym relationship may apply to queries related to “Grey Bull” as well. Thus, the category “city” may be considered a useful category when identifying synonyms for query expansion in circumstances such as this.

So, we are told that the process involved in this patent is to identify categories from a knowledge base involving a number of entities where other entities within that same category could potentially be synonyms for each other in similar contexts. The process from the patent involves identifying those entities from a query stream, and identifying the category as one that they call a “coherent” category.

The patent tells us that a coherent category is one in which a certain threshold of terms tend to co-occur in a query stream involving those entities. The patent tells us, for instance that a category that might include entities that are cities, villages, and towns might see a lot of co-occurring terms involving hotels and roads. If the number of co-occurring terms appearing in that query stream meet a certain threshold, it would be considered a coherent category, and the entities from the same categories could possibly then be used as synonyms for each other.

The patent in question is:

Synonym identification based on categorical contexts
Invented by: Zachary A. Garrett, Takahiro Nakajima, Tasuku Oonishi
Assignee: Google
US Patent 9,201,945
Granted December 1, 2015
Filed: March 8, 2013

Abstract

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training recognition canonical representations corresponding to named-entity phrases in a second natural language based on translating a set of allowable expressions with canonical representations from a first natural language, which may be generated by expanding a context-free grammar for the allowable expressions for the first natural language.

Take Aways

When I wrote about the query term substitution patent I refer to at the start of this post, I included a number of examples of queries that were re-written based upon some substitutions of query terms that might seem reasonable to a search engine looking at words that tended to show up, or co-occur, in a query stream involving those search terms.

For instance, someone searching for [New York Yankees stadium] was likely searching for results that involved “baseball” since queries that included “New York Yankees” and “stadium” also often included the term “baseball.”

That patent didn’t use the term “co-occur” nor did it explain how a knowledge base might be used to substitute entities that might be in the same categories like this one does, but the idea that a shared context like entity categories can be used to trigger entity substitutions in a query is interesting.

It’s worth spending time with both patents and reading through each of them multiple times and thinking about how they are being used.


Copyright © 2015 SEO by the Sea. This Feed is for personal non-commercial use only. If you are not reading this material in your news aggregator, the site you are looking at may be guilty of copyright infringement. Please contact SEO by the Sea, so we can take appropriate action immediately.
Plugin by Taragana

The post How Google Might Make Better Synonym Substitutions Using Knowledge Base Categories appeared first on SEO by the Sea.

No comments:

Post a Comment