BERT – Search Engine Evolution

In late October 2019, Google launched a new algorithm to optimise search engine results. The algorithm is called BERT, and stands for Bidirectional Encoder Representation from Transformers.

Described as the largest step in the evolution of the search engine industry in the past five years, it is estimated to affect one in ten searches.

In this post, Letrário will tell you more about the differentiating factors of this algorithm, showing how it is apparent in searches and highlighting the advantage it has for the translation industry.

The greatest innovative element

  • As indicated by the “B”, meaning “bidirectional”, BERT analyses the context of each keyword in both directions. In other words, while traditional search algorithms do not contextualise a certain term or only contextualise it unidirectionally (taking into account only the words that precede or follow that term), BERT analyses the entire context of the keywords that were used in the search.

In 2018, Google exemplified this analytic method with an article about natural language processing, in which BERT’s early steps were also announced:

“[…] in the sentence ‘I accessed the bank account,’ a unidirectional contextual model would represent ‘bank’ based on ‘I accessed the’ but not ‘account.’ However, BERT represents ‘bank’ using both its previous and next context – ‘I accessed the […] account’ […].”

For example, the word “bank” would have the same context-free representation in “bank account” and “bank of the river.” As a result, traditional search engines would present mixed results, with some being more accurate than others. Thanks to BERT, Google’s search engine “understands” the meaning of each expression beforehand and presents results that are more adequate, showcasing the neural deep learning nature that is ingrained in its network.

By analysing Wikipedia’s comprehensive corpus as training, BERT was able to understand subtleties of the human language.

The most noticeable result of this innovation will be BERT’s ability to address more complex searches and questions than usual.

Search engines generally resort to keywords inserted by users, ignoring word order, articles and prepositions when presenting results. That is, they presume that the essential vocabulary will provide enough information. In turn, BERT takes all these linguistic elements into account, which help provide more efficient results, i.e. results that better address users’ searches. This innovation is implemented for over 70 languages besides English.

An example of search engine change

When searching touristic information to try and find out what are the requirements to travel from the USA to Portugal, for example, an American citizen would type “documents needed when travelling from the USA to Portugal”. 

Prior to BERT, Google’s first hits would probably point to info about the documents a Portuguese citizen would need if they wished to move to the USA. Google wasn’t able to effectively distinguish the relation introduced by word order and prepositions. 

With BERT, Google is able to tell that the trip’s point of origin is, in fact, the USA, and not Portugal.

Of course, we won’t notice any difference in simple searches, in which single terms are used. This is why it is estimated the effect will only be seen in one out of ten searches. Nevertheless, the significance of this breakthrough should not be downplayed.

The power of search engines for translation

In addition to implementing BERT in its search engine, Google made an open source release for this software, allowing any company to use it and even adapt the new technology to their systems. This is yet another feature fostering the progress of human language analysis technology.

Modern devices have ever more resources to better understand human speech, which allows for a wider range of options and correct lexical and terminological solutions, increasing the quality of the translation and proofreading services we provide.

Modern devices have ever more resources to better understand human speech, which allows for a wider range of options and correct lexical and terminological solutions, increasing the quality of the translation and proofreading services we provide.