How Google BERT Vs. Smith Algorithms Work Together - Semalt Overview

Google recently releases a research paper on their new NLP algorithm SMITH. This paper enlighted many SEO professionals on the changes that would warrant increases or drops in SERP ranking. Nevertheless, our concern here is how does this new SMITH algorithm compare to BERT?

In the paper published by Google, they claimed that SMITH outperforms BERT in understanding long search queries and long documents. What makes SMITH so interesting is that it can understand passages within a document similar to what BERT does with words and sentences. This improved feature of SMITH enables it to understand longer documents with ease. 

But before we go any further, we must inform you that as of right now, SMITH isn't live in google's algorithms. But if our speculations are right, it'll be launched alongside with passage indexing, or it will precede it. If you truly are interested in learning how to rank on SEP, Machine learning would inevitably go side by side to this interest. 

So back to the topic, Is BERT about to be replaced? Won't most documents on the web that are vast, robust, and therefore longer perform better with SMITH?

Let's jump into the further and see what we've concluded. SMITH can do both the job of reading robust and thin documents. Think of it like a Bazooka. It can cause great damage by it can also open doors. 

To Begin With, Why BERT Or SMITH?

The real question here is why will a search engine require Natural Learning Processing to provide search results. The answer is simple. Search engines require NLP in their transition from search engine understanding strings or keywords to things or webpages. 

Where Google doesn't have an idea, what else can be on the page other than the keywords or whether the content getting indexed even makes sense in relation to the search query. Thanks to NLP, Google can understand the context of the characters typed into its search query.
Thanks to NLP, Google can distinguish the intentions of a user when they say "riverbank" and "bank account." It can also understand statements such as "Caroline met up with her friends for a drink, drinks, pint, ale, brew…" as unnatural. 

As experts in SEO, we must say that understanding search query has come a long way. Best believe that it was excessively difficult finding the right articles on the internet in the past. 

Understanding BERT

BERT currently functions as the best NLP model we have for many, if not most, applications, especially when it comes to understanding complex language structures. Many consider the first Bidirectianal character as the biggest leap forward in this algorithm. Rather than having an algorithm that reads from left to right, BERT can also understand the words in relation to their context. This way, it wouldn't give results for the individual words put in the query but index webpages based on the collective meaning of words in the search query.

Here is an example to facilitate your understanding:


If you were to interpret that statement from left to right, upon reaching the word "light," you would classify the truck as something with light. That is because the truck came before the light in the statement.

But if we want to classify things on trucks, we may leave out "light" because we do not come across it before "truck."

It is hard to consider the statement in one direction alone. 

Additionally, BERT also has another secrete benefit of being so remarkable, and it allows for the processing of language effectively with lower resource cost compared to the previous models. That is indeed an important factor to consider when one wants to apply it to the entire web. 

The application of tokens is yet another evolution that has accompanied BERT. There are 30,000 tokens in BERT, and each one of these represents a common word with some couple extra tokens for characters and fragments in case a word exists outside the 30,000. 

Through its ability to process tokens and transformers, BERT understood the content, which also gave it the ability to understand sentences adequately. 

So if we say, "the young lady went to the bank. She later on sat on the river bank and watched the river flow".  

BERT will assign different values to those sentences because they are referring to two different things. 

Understanding SMITH

Then comes SMITH, an algorithm with better resources and numbers to use for processing larger documents. BERT uses about 256 tokens per document, and when it surpasses this threshold, the computing cost gets too high for optimal function. In contrast, SMITH can handle up to 2,248 tokens per document. That's about 8X the number of token BERT uses. 

To understand why computing costs go up in a single NLP model, we must first consider what it takes to understand a sentence and a paragraph. When dealing with a sentence, there is only one general concept to understand. There are fewer words relating to one another hence fewer connections between words and the ideas they hold in memory. 

By making sentences into paragraphs, the connection between these words is multiplied greatly. Processes 8X the text will require many more times in speed and memory optimization capacity using the same model. This is where SMITH makes all the difference by basically batching and doing a lot of offline processing. Interestingly SMITH still depends on BERT to function properly. 

Here is a description of how SMITH takes a document at its core:
  1. It first breaks the document into grouping sizes that are easier to manage.
  2. It then processes each block of sentences individually.
  3. A transformer then learns a contextual representation of each block, after which it turns them into a document representation. 

How Does SMITH Work?  

To train the SMITH model, we learn from BERT in two ways:

To train BERT, a word is taken out of a sentence, and alternative options will be supplied

The BERT, which is better trained, is the one that will be more successful in choosing the right option from the alternatives provided. For example, if BERT is given the sentence:

The happy brown ------ jumped over the picket fence. 
  • Option one - tomatoes.
  • Option two - dog.
The better trained the BERT is, the better its chances of choosing the right option, which is option two. 

This training method is also applied in SMITH as well. 

SMITH Is Trained For Large Documents

The better trained SMITH is, the better its chances at recognizing omitted sentences. It's the same idea with BERT but a different application. This part is particularly interesting because it paints a world with Google generated contents pieced together into walled-in search engine result pages. Of course, users can leave, but they won't because Google can piece together short and long-form content from all of the best sources on its result page. 

If you're in doubt of this happening, you should know that it has already started happening, and even though they haven't yet mastered it, it is a start. 

Is SMITH Better Than BERT?

With all you've read, it's completely natural to assume that SMITH is better, and in many tasks, it truly is better. But consider how you use the internet for a moment; what questions do you input regularly in search queries? 
  • "What is the weather forecast for today?" 
  • "Directions to a restaurant".
Answering such search queries usually requires short content, often with limited and uncomplicated data. SMITH is more involved in understanding longer and more complex documents and long and complex search queries. 

This will include piecing together several documents and topics to create their answers. It determines how content can be broken down, enabling Google to know the right thing to display. It will help Google understand how the pages of content are related to each other, and it provides a scale on which links may be valued amongst other benefits. 

With that being said, we conclude by saying both BERT and SMITH are important, and they both serve their unique purpose. 


While SMITH is the bazooka, we need it to paint a clear picture of how things collectively are. In resources, it costs more because it does a bigger job, but it costs far less than BERT when doing that same job. 

BERT helps SMITH to aid its understanding of short queries and tiny content chunks. This is, however, until Google develops another NLP algorithm that will replace both, and then we will move and catch up with yet another advancement in SEO.

Interested in SEO? Check out our other articles on the Semalt blog.