User data is important in Google’s ranking systems. What we learned from Liz Reid’s appeal statement

I found some interesting things in the latest document on the DOJ-Google trial. Google has appealed the ruling, which says the company must share confidential information with competitors.

Image source: Marie Haynes

Key Takeaways:

  • Google was ordered to share information with competitors to avoid becoming an illegal monopolist. Google doesn’t want to reveal its extensive user data.
  • Google’s data on page quality and timeliness is protected by copyright. They don’t want to give it away.
  • Indexed pages are annotatedincluding signals that identify spam sites.
  • If spammers got hold of these spam signals, it would be difficult to stop spam.
  • User data is important to Google’s Glue system This is where information about each search query is stored, what the user saw and how they interacted with the search results.
  • User data is important for training RankEmbed BERT – one of the deep learning systems behind search.

Okay, let’s get to the interesting stuff!

Google has proprietary signals for page quality and freshness

That’s really no surprise. I found it interesting that freshness signals are at the core of Google’s proprietary secrets.

Image source: Marie Haynes

Here you can learn more about the importance of Google’s proprietary freshness signals:

Image source: Marie Haynes

Crawled pages are marked with “proprietary page comprehension annotations.”

Each page in the Google index is annotated to make the page easier to understand. This includes signals to detect spam and duplicate pages. I’ve already written about how Every page in the index has a spam score.

Photo credit: Marie Haynes

Spam scores could be used to reverse ranking systems

Google does not want to share information about these values ​​with its competitors.

Photo credit: Marie Haynes

If the spam scores are made public, it could lead to more spam and make it harder for Google to combat spam.

Photo credit: Marie Haynes

Google creates the index based on these marked pages

The pages where Google has added page comprehension annotations are organized based on how often Google expects the content to need to be accessed and how current the content needs to be.

Image source: Marie Haynes

Only a fraction of the pages make it into the Google index

Google argues that providing competitors with a list of indexed URLs would allow them to “forgo crawling and analyzing the larger web and instead focus on crawling only the fraction of the pages that Google has included in its index.” Building this index costs Google a lot of time and money. They don’t want to give that away for free.

Photo credit: Marie Haynes

The role of user data in Google’s ranking systems

That’s the most interesting part. I feel like we don’t pay enough attention to how Google uses user data. (Stay tuned with me YouTube channel as I will soon be releasing a very interesting video with my thoughts on the importance of user-side data – probably the most important factor in Google’s ranking systems.)

User data is used to build GLUE and RankEmbed models

Google Glue is a huge table of user activity. It collects the text of searched queries, the user’s language, location, and device type, as well as information about what was displayed on the SERP, what the user clicked or hovered over, how long they stayed on a SERP, and more.

Even more interesting is RankEmbed BERT. RankEmbed BERT is one of the deep learning systems underlying search. In the Pandu Nayak From our statement, we learned that RankEmbed BERT is used to rerank the results returned by traditional ranking systems. RankEmbed BERT is trained on click and query data from actual users.

The AI ​​systems behind search are constantly learning to improve to provide searchers with satisfactory results. Google looks at what they click on and whether or not they return to the SERPs. Google also runs live experiments looking at what searchers click and stay on. These actions help train RankEmbed BERT. Further fine-tuning is done through the ratings of the quality assessors. I will post more about this soon. In conclusion, I would like to highlight that user satisfaction is by far the most important thing we should optimize for!

From Liz Reid’s document that we are analyzing today, we can see that user data is used to train, build and operate RankEmbed models.

Image source: Marie Haynes

Once again, we learn that the user data used to train these models includes query, location, time of search, and how the user interacted with what was shown to them.

Image source: Marie Haynes

This is about the actions that users take within Google search results. What I really want to know is what role does Chrome data play? Does Google check whether users interact with your pages, fill out your forms, create your recipes, and more? I think they do. The Judgment summary of this trial notes that Chrome data is used in the ranking systems, but it doesn’t share many details.

Image source: Marie Haynes

Google says if someone has the Glue and RankEmbed user data, they could use it to train an LLM

This user data is the key to Google’s success.

Image source: Marie Haynes

It’s worth reading the whole thing Statement from Liz Reid.

Additional resources:


This post was originally published on Marie Haynes Consulting.


Featured image: N Universe/Shutterstock


Follow us on Facebook | Twitter | YouTube


WPAP (907)

Leave a Comment

ajax-loader
Good Marketing Tools
Privacy Overview

This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.