• Technology

Yandex leaked code containing 1,922 search ranking factors Ranktracker explains all ranking factors

  • Felix Rose-Collins
  • 7 min read
Yandex leaked code containing 1,922 search ranking factors Ranktracker explains all ranking factors

Intro

You probably heard about Yandex, it’s the 4th biggest search engine by market share worldwide. Yesterday proprietary source code of Yandex was leaked.

The most interesting part for SEO community is: the list of all 1922 ranking factors used in the search algorithm

We have downloaded the code, analysed it and here it is presented in a helpful way.

Yandex leak

The incident should not come as a surprise, since Yandex or its products are often under cyber attack. In 2016, Hackread.com exclusively reported on how a dark web vendor was selling 6.3 million Yandex user account data.

In September 2021, the Russian search engine giant was hit by one of the largest DDoS attacks powered by 200,000 compromised IoT devices.

Meet Ranktracker

The All-in-One Platform for Effective SEO

Behind every successful business is a strong SEO campaign. But with countless optimization tools and techniques out there to choose from, it can be hard to know where to start. Well, fear no more, cause I've got just the thing to help. Presenting the Ranktracker all-in-one platform for effective SEO

We have finally opened registration to Ranktracker absolutely free!

Create a free account

Or Sign in using your credentials

Yandex git sources

Why is this big?

Yandex is one of the largest IT companies in Russia. Within the country it provides a wider range of services than Google. Imagine one company that replaces Google, Uber, Amazon, Netflix and Spotify.

Is this leak real?

I personally never worked at Yandex, but I know several people who worked there at different times or work there still. I verified that at least some of the archives for sure contain modern source code for company services as well as documentation pointing to real intranet URLs.

What’s inside

The leaker has shared a magnet link containing 44.7GB of files linked to Yandex git sources. The files were allegedly stolen from Yandex in July 2022. Apart from containing anti-spam guidelines, the code repositories are believed to have Yandex’s source code.

The leak revealed around 1,922 ranking factors the search engine uses in its search algorithm. The code was leaked as a torrent. Per the analysis posted by Twitter user Alex Buraks, the leaked data includes numerous ranking factors, including text relevancy, PageRank, content age, freshness, etc.

Moreover, several end-user behaviour factors, link-related factors, and host reliability exist. SEOs find some unusual ranking factors, such as the number of unique visitors, average domain ranking across queries, and percent of organic traffic.

It looks like at least source code for all major services of Yandex been leaked:

  • Search Engine and Indexing Bot
  • Maps - Like Google Maps and Street View
  • Alice - AI assistant like Siri / Alexa
  • Taxi - Uber-like taxi service
  • Direct - Ads service like Google Ads / Adwords
  • Mail - Mail service like GMail
  • Disk - File storage service like Google drive
  • Market - Marketplace like Amazon
  • Travel - Like a Booking.com plus Airplane, Train and Bus tickets
  • Yandex360 - Like Google Workspaces for services on your own domain
  • Cloud - Probably not all infrastructure code was leaked.
  • Pay - Payment processing like Stripe, but with limited set of features
  • Metrika - Like Google Analytics
  • And at least the backend part of the majority of other company services is there. Largest archive called “frontend” is yet to be explored.

Shestakov further noted some API keys, which most likely have been used to test deployment.

Details about this leak: can be found here:

https://arseniyshestakov.com/2023/01/26/yandex-services-source-code-leak/

Yandex Denies Hacking Attempt

Yandex claims that it is aware of the leak and has already initiated an investigation to check how source code ‘fragments’ were exposed to the public. It is worth noting that the leak doesn’t include user or employee personal data.

However, considering the significance of Yandex in Russia’s IT infrastructure and leaked data, it could be assumed that the attack was motivated by the country’s invasion of Ukraine. So, pro-Ukraine hackers could be involved.

Meet Ranktracker

The All-in-One Platform for Effective SEO

Behind every successful business is a strong SEO campaign. But with countless optimization tools and techniques out there to choose from, it can be hard to know where to start. Well, fear no more, cause I've got just the thing to help. Presenting the Ranktracker all-in-one platform for effective SEO

We have finally opened registration to Ranktracker absolutely free!

Create a free account

Or Sign in using your credentials

In its official statement, Yandex clarified that the company wasn’t hacked and a former employee could be involved in leaking its source code in the public domain. Russia’s leading IT firm noted that the leaked archive includes code fragments that are part of an internal repository, the data of which is different from what is used in the latest version of the repository.

Yandex was not hacked. Our security service found code fragments from an internal repository in the public domain, but the content differs from the current version of the repository used in Yandex services,” the company’s statement read.

Nevertheless, source code leaks are dangerous for posing serious security issues to organisations since threat actors can observe the company’s intellectual property and system data. Leaking of source code would help attackers create targeted security exploits.

Theoretically, what is the difference between algorithms used in Google and in Yandex?

They are quite similar:

  • there is RankBrain analogue - MatrixNet
  • they are using PageRank (almost the same as in Google);
  • a lot of text algorithms are the same.

Yandex vs Google

  • There are a lot of ex-googlers in Yandex
  • Yanex was built as Google clone;
  • SEO specialists in Russia are using almost same white hat SEO tactics for Yandex and for Google

Of course there are a lot of differences, but the approach and the majority of ranking factors seem to be similar.

In practice: comparing Google vs Yandex search results they are a ~70% match.

According to Statcounter Yandex is close to Yahoo and Bing by market share:

search engine market share worldwide

The file with ranking factors: https://dropbox.com/s/toyehkkfduogbwk/factors_gen.txt?dl=0

Structure for each factor:

  1. name
  2. link to internal wiki (restricted)
  3. AntiSeoUpperBound (haha)
  4. description (it's in Russian, I translated it for you)
  5. etc

1. First factor in the list - PageRank.

First factor in the list - PageRank

Main insights after analysing this list: Age of links is a ranking factor.

Age of links is a ranking factor.

2. Traffic and % of organic traffic are ranking factors.

Buying PPC affects rankings.

Traffic and % of organic traffic are ranking factors

3. Numbers in URLs is bad for rankings

Numbers in URLs is bad for rankings

4. Too many slashes in URLs is bad for ranking

Too many slashes in URLs is bad for ranking

5. Hard pessimization equal PR=0

Hard pessimization equal PR=0

6. Host reliability is a ranking factor

Less 40x/50x errors you have, the better for your organic traffic

Host reliability is a ranking factor

7. There is a separate ranking factor for uplifting Wikipedia

there is a separate ranking factor for uplifting Wikipedia

8. A lot of ranking factors connected with user behaviour - CTR, last-click, time on site, bounce rate

Note: We are almost sure that in Yandex those factors impacting much more than in Google.

A lot of ranking factors connected with user behaviour - CTR, last-click, time on site, bounce rate

9. Document age and last update both are ranking factors

Document age and last update both are ranking factors

10. Average domain position across all queries is a ranking factor

Average domain position across all queries is a ranking factor

11. Crawl depth is a ranking factor

Keep your important pages closer to main page:

  • top pages: 1 click from the main page
  • important pages: <3 clicks

Crawl depth is a ranking factor

12. Additionally: ranking factor for orphan pages

You can find this via our website audit tool

Additionally: ranking factor for orphan pages

Backlinks from main pages are more important than from internal pages

14. Number of search queries of your site/url is a ranking factor

More the better

Number of search queries of your site/url is a ranking factor

15. Traffic from Wikipedia is a ranking factor

Traffic from Wikipedia is a ranking factor

16. If your url would be the last for the search session (user will find what he needs) - it would impact rankings

There are strict factors for this and predictable factors as well.

If your url would be the last for search session (user will find what he needs) - it would impact rankings

17. Bookmarks ranking factor

The more users add to bookmarks a url, the more factor value it has

Bookmarks ranking factor

18. Special ranking factors for short videos (tiktok, shorts, reels)

Special ranking factors for short videos (tiktok, shorts, reels)

19. Maps js-api on page (for example Google Maps) is a ranking factor

In Google (for example in the travel niche) adding maps with useful info/functionality is working as well.

Maps js-api on page (for example Google Maps) is a ranking factor

20. Keywords in URL are ranking factors

As we can see from the description - the optimal would include up to 3 words from the search query.

Keywords in URL are ranking factors

21. Returning users is a ranking factor

Build products with good retention and it would benefit your SEO (there are a lot of ranking factors for measuring it).

Returning users is a ranking factor

22. Percentage of CAPITAL LETTERS in <title> is a ranking factor

Percentage of CAPITAL LETTERS in <title> is a ranking factor

23. Percentage of direct traffic is a ranking factor

Aka. If all your traffic came from Organic Search - it's suspicious + bad for rankings.

Percentage of direct traffic is a ranking factor

24. One more ranking factor for content quality - broken embedded video on the page

  • Embed videos - good for rankings.
  • Broken embed videos - bad.

One more ranking factor for content quality - broken embedded video on the page

25. Verified accounts on social networks ranks differently as other urls

Important for brand searches - ideally searching your brand there should be only your domains + verified social networks in the top 10

Verified accounts on social networks ranks differently as other urls

If it is in one link - it's more beneficial. Especially if the order of words is the same.

If your backlinks anchors contain all words from the keywords - it's good for SEO

![Ratio "good" vs "bad" backlinks is a ranking factor](https://www.ranktracker.com/media/yandex-leaked-code-containing-search-ranking-factors-ranktracker-explains-all-ranking-factors/images/i84.png "Ratio "good" vs "bad" backlinks is a ranking factor")

28. The quality rank of texts on the domain is a ranking factor

Pages with low quality content affect the entire domain.

The quality rank of texts on the domain is a ranking factor

29. Amount of advertisements on a page is a ranking factor

Amount of advertisements on a page is a ranking factor

30. There is randomness as a separate ranking factor

When you don't understand why some of the pages are on top - it could be just random (to test behaviour factors).

There is a random as a separate ranking factor

31. JS from Google Analytics is a ranking factor

Predictably. Good websites using GA / Google analytics more often than bad websites.

JS from Google Analytics is a ranking factor

Backlinks from the top 100 best websites by PageRank impacts on rankings

33. URL has no digits

/100-best-credit-cards

/best-credit-cards

URL has no digits

34. Number of slashes in URL

/finance/articles/2023/investment-advices

/investment-advices

Number of slashes in URL

35. Number of non-letters in URL

/pet-toys&all$currency=dollar#mobile

/pet-toys

Number of non-letters in URL

36. '?' symbol in the URL is a ranking factor

/movies?genre=action

/action-movies

'?' symbol in the URL is a ranking factor

37. Search query = URL, including dots and spaces (??)

Search query is "Franklin D. Roosevelt":

/roosevelt

/Franklin_D._Roosevelt

Search query = URL, including dots and spaces (??)

38. Old date in the URL

/2009/12/01/how-to-tie-a-tie

/how-to-tie-a-tie

Old date in the URL

39. Keywords is in URL, not in the text of the page

/video-games & page is about music

/video-games & page is about video games

Keywords is in URL, not in the text of the page

40. URL coverage with trigrams from the search query

/hotels-new-zealand

/nz

/cheap-hotels-in-new-zealand-best-deals

URL coverage with trigrams from the search query

  • Include 1-3 most important words in the URL;
  • Less slashes/digits/non-letters, if it is not part of your keyword

41. initial weights of Yandex ranking factors

Final weights calculated by AI (matrixnet), but initial values are useful as well.

initial weights of Yandex ranking factors

Conclusion

Well there we have it, this is all we are sharing for now. We’re just getting started. This provides a rough overview for you of what’s in there.

We’re just scratching the surface here with so many more valuable insights ahead.

But we were quite right in many assumptions and interpretations from the outside of how such an extensive search engine would work, at least regarding links.

All in all, the Yandex code leak offers a fascinating insight into the inner workings of a modern search engine.

Meet Ranktracker

The All-in-One Platform for Effective SEO

Behind every successful business is a strong SEO campaign. But with countless optimization tools and techniques out there to choose from, it can be hard to know where to start. Well, fear no more, cause I've got just the thing to help. Presenting the Ranktracker all-in-one platform for effective SEO

We have finally opened registration to Ranktracker absolutely free!

Create a free account

Or Sign in using your credentials

Although not all of the findings can be directly applied to Google, many assumptions made in recent years about the general functioning of large Internet search engines are confirmed.

I assume the SEO industry still has a few interesting months ahead of it with new insights from this leak.

Keep your eye on this page as we will continue to add ranking factors over the coming weeks & months.

Special credits to https://twitter.com/alex_buraks

Felix Rose-Collins

Felix Rose-Collins

Ranktracker's CEO/CMO & Co-founder

Felix Rose-Collins is the Co-founder and CEO/CMO of Ranktracker. With over 15 years of SEO experience, he has single-handedly scaled the Ranktracker site to over 500,000 monthly visits, with 390,000 of these stemming from organic searches each month.

Start using Ranktracker… For free!

Find out what’s holding your website back from ranking.

Create a free account

Or Sign in using your credentials

Different views of Ranktracker app