Intro
You probably heard about Yandex, it’s the 4th biggest search engine by market share worldwide. Yesterday proprietary source code of Yandex was leaked.
The most interesting part for SEO community is: the list of all 1922 ranking factors used in the search algorithm
We have downloaded the code, analysed it and here it is presented in a helpful way.
The incident should not come as a surprise, since Yandex or its products are often under cyber attack. In 2016, Hackread.com exclusively reported on how a dark web vendor was selling 6.3 million Yandex user account data.
In September 2021, the Russian search engine giant was hit by one of the largest DDoS attacks powered by 200,000 compromised IoT devices.
The All-in-One Platform for Effective SEO
Behind every successful business is a strong SEO campaign. But with countless optimization tools and techniques out there to choose from, it can be hard to know where to start. Well, fear no more, cause I've got just the thing to help. Presenting the Ranktracker all-in-one platform for effective SEO
We have finally opened registration to Ranktracker absolutely free!
Create a free accountOr Sign in using your credentials
Why is this big?
Yandex is one of the largest IT companies in Russia. Within the country it provides a wider range of services than Google. Imagine one company that replaces Google, Uber, Amazon, Netflix and Spotify.
Is this leak real?
I personally never worked at Yandex, but I know several people who worked there at different times or work there still. I verified that at least some of the archives for sure contain modern source code for company services as well as documentation pointing to real intranet URLs.
What’s inside
The leaker has shared a magnet link containing 44.7GB of files linked to Yandex git sources. The files were allegedly stolen from Yandex in July 2022. Apart from containing anti-spam guidelines, the code repositories are believed to have Yandex’s source code.
The leak revealed around 1,922 ranking factors the search engine uses in its search algorithm. The code was leaked as a torrent. Per the analysis posted by Twitter user Alex Buraks, the leaked data includes numerous ranking factors, including text relevancy, PageRank, content age, freshness, etc.
You probably heard about Yandex, it’s the 4th biggest search engine by market share worldwide. Yesterday proprietary source code of Yandex was leaked.
— Alex Buraks (@alex_buraks) January 27, 2023
The most interesting part for SEO community is: the list of all 1922 ranking factors used in the search algorithm
[🧵THREAD] pic.twitter.com/6x82AAmbON
Moreover, several end-user behaviour factors, link-related factors, and host reliability exist. SEOs find some unusual ranking factors, such as the number of unique visitors, average domain ranking across queries, and percent of organic traffic.
It looks like at least source code for all major services of Yandex been leaked:
- Search Engine and Indexing Bot
- Maps - Like Google Maps and Street View
- Alice - AI assistant like Siri / Alexa
- Taxi - Uber-like taxi service
- Direct - Ads service like Google Ads / Adwords
- Mail - Mail service like GMail
- Disk - File storage service like Google drive
- Market - Marketplace like Amazon
- Travel - Like a Booking.com plus Airplane, Train and Bus tickets
- Yandex360 - Like Google Workspaces for services on your own domain
- Cloud - Probably not all infrastructure code was leaked.
- Pay - Payment processing like Stripe, but with limited set of features
- Metrika - Like Google Analytics
- And at least the backend part of the majority of other company services is there. Largest archive called “frontend” is yet to be explored.
Shestakov further noted some API keys, which most likely have been used to test deployment.
Details about this leak: can be found here:
https://arseniyshestakov.com/2023/01/26/yandex-services-source-code-leak/
Yandex Denies Hacking Attempt
Yandex claims that it is aware of the leak and has already initiated an investigation to check how source code ‘fragments’ were exposed to the public. It is worth noting that the leak doesn’t include user or employee personal data.
However, considering the significance of Yandex in Russia’s IT infrastructure and leaked data, it could be assumed that the attack was motivated by the country’s invasion of Ukraine. So, pro-Ukraine hackers could be involved.
The All-in-One Platform for Effective SEO
Behind every successful business is a strong SEO campaign. But with countless optimization tools and techniques out there to choose from, it can be hard to know where to start. Well, fear no more, cause I've got just the thing to help. Presenting the Ranktracker all-in-one platform for effective SEO
We have finally opened registration to Ranktracker absolutely free!
Create a free accountOr Sign in using your credentials
In its official statement, Yandex clarified that the company wasn’t hacked and a former employee could be involved in leaking its source code in the public domain. Russia’s leading IT firm noted that the leaked archive includes code fragments that are part of an internal repository, the data of which is different from what is used in the latest version of the repository.
“Yandex was not hacked. Our security service found code fragments from an internal repository in the public domain, but the content differs from the current version of the repository used in Yandex services,” the company’s statement read.
Nevertheless, source code leaks are dangerous for posing serious security issues to organisations since threat actors can observe the company’s intellectual property and system data. Leaking of source code would help attackers create targeted security exploits.
Theoretically, what is the difference between algorithms used in Google and in Yandex?
They are quite similar:
- there is RankBrain analogue - MatrixNet
- they are using PageRank (almost the same as in Google);
- a lot of text algorithms are the same.
- There are a lot of ex-googlers in Yandex
- Yanex was built as Google clone;
- SEO specialists in Russia are using almost same white hat SEO tactics for Yandex and for Google
Of course there are a lot of differences, but the approach and the majority of ranking factors seem to be similar.
In practice: comparing Google vs Yandex search results they are a ~70% match.
According to Statcounter Yandex is close to Yahoo and Bing by market share:
The file with ranking factors: https://dropbox.com/s/toyehkkfduogbwk/factors_gen.txt?dl=0
Structure for each factor:
- name
- link to internal wiki (restricted)
- AntiSeoUpperBound (haha)
- description (it's in Russian, I translated it for you)
- etc
1. First factor in the list - PageRank.
Main insights after analysing this list: Age of links is a ranking factor.
2. Traffic and % of organic traffic are ranking factors.
Buying PPC affects rankings.
3. Numbers in URLs is bad for rankings
4. Too many slashes in URLs is bad for ranking
5. Hard pessimization equal PR=0
6. Host reliability is a ranking factor
Less 40x/50x errors you have, the better for your organic traffic
7. There is a separate ranking factor for uplifting Wikipedia
8. A lot of ranking factors connected with user behaviour - CTR, last-click, time on site, bounce rate
Note: We are almost sure that in Yandex those factors impacting much more than in Google.
9. Document age and last update both are ranking factors
10. Average domain position across all queries is a ranking factor
11. Crawl depth is a ranking factor
Keep your important pages closer to main page:
- top pages: 1 click from the main page
- important pages: <3 clicks
12. Additionally: ranking factor for orphan pages
You can find this via our website audit tool
13. Backlinks from main pages are more important than from internal pages
14. Number of search queries of your site/url is a ranking factor
More the better
15. Traffic from Wikipedia is a ranking factor
16. If your url would be the last for the search session (user will find what he needs) - it would impact rankings
There are strict factors for this and predictable factors as well.
17. Bookmarks ranking factor
The more users add to bookmarks a url, the more factor value it has
18. Special ranking factors for short videos (tiktok, shorts, reels)
19. Maps js-api on page (for example Google Maps) is a ranking factor
In Google (for example in the travel niche) adding maps with useful info/functionality is working as well.
20. Keywords in URL are ranking factors
As we can see from the description - the optimal would include up to 3 words from the search query.
21. Returning users is a ranking factor
Build products with good retention and it would benefit your SEO (there are a lot of ranking factors for measuring it).
22. Percentage of CAPITAL LETTERS in <title> is a ranking factor
23. Percentage of direct traffic is a ranking factor
Aka. If all your traffic came from Organic Search - it's suspicious + bad for rankings.
24. One more ranking factor for content quality - broken embedded video on the page
- Embed videos - good for rankings.
- Broken embed videos - bad.
25. Verified accounts on social networks ranks differently as other urls
Important for brand searches - ideally searching your brand there should be only your domains + verified social networks in the top 10
26. If your backlinks anchors contain all words from the keywords - it's good for SEO
If it is in one link - it's more beneficial. Especially if the order of words is the same.
27. Ratio "good" vs "bad" backlinks is a ranking factor
![Ratio "good" vs "bad" backlinks is a ranking factor](https://www.ranktracker.com/media/yandex-leaked-code-containing-search-ranking-factors-ranktracker-explains-all-ranking-factors/images/i84.png "Ratio "good" vs "bad" backlinks is a ranking factor")
28. The quality rank of texts on the domain is a ranking factor
Pages with low quality content affect the entire domain.
29. Amount of advertisements on a page is a ranking factor
30. There is randomness as a separate ranking factor
When you don't understand why some of the pages are on top - it could be just random (to test behaviour factors).
31. JS from Google Analytics is a ranking factor
Predictably. Good websites using GA / Google analytics more often than bad websites.
32. Backlinks from the top 100 best websites by PageRank impacts on rankings
33. URL has no digits
❌ /100-best-credit-cards
✅ /best-credit-cards
34. Number of slashes in URL
❌ /finance/articles/2023/investment-advices
✅ /investment-advices
35. Number of non-letters in URL
❌ /pet-toys&all$currency=dollar#mobile
✅ /pet-toys
36. '?' symbol in the URL is a ranking factor
❌ /movies?genre=action
✅ /action-movies
37. Search query = URL, including dots and spaces (??)
Search query is "Franklin D. Roosevelt":
❌ /roosevelt
✅ /Franklin_D._Roosevelt
38. Old date in the URL
❌ /2009/12/01/how-to-tie-a-tie
✅ /how-to-tie-a-tie
39. Keywords is in URL, not in the text of the page
❌ /video-games & page is about music
✅ /video-games & page is about video games
40. URL coverage with trigrams from the search query
✅ /hotels-new-zealand
❌ /nz
❌ /cheap-hotels-in-new-zealand-best-deals
- Include 1-3 most important words in the URL;
- Less slashes/digits/non-letters, if it is not part of your keyword
41. initial weights of Yandex ranking factors
Final weights calculated by AI (matrixnet), but initial values are useful as well.
Conclusion
Well there we have it, this is all we are sharing for now. We’re just getting started. This provides a rough overview for you of what’s in there.
We’re just scratching the surface here with so many more valuable insights ahead.
But we were quite right in many assumptions and interpretations from the outside of how such an extensive search engine would work, at least regarding links.
All in all, the Yandex code leak offers a fascinating insight into the inner workings of a modern search engine.
The All-in-One Platform for Effective SEO
Behind every successful business is a strong SEO campaign. But with countless optimization tools and techniques out there to choose from, it can be hard to know where to start. Well, fear no more, cause I've got just the thing to help. Presenting the Ranktracker all-in-one platform for effective SEO
We have finally opened registration to Ranktracker absolutely free!
Create a free accountOr Sign in using your credentials
Although not all of the findings can be directly applied to Google, many assumptions made in recent years about the general functioning of large Internet search engines are confirmed.
I assume the SEO industry still has a few interesting months ahead of it with new insights from this leak.
Keep your eye on this page as we will continue to add ranking factors over the coming weeks & months.
Special credits to https://twitter.com/alex_buraks