Intro
Google’s Gary Illyes recently discussed the reasons behind the frustrating "crawled but not indexed" errors during an interview at the SERP Conf 2024 conference in Bulgaria. His insights shed light on this common issue and offer valuable guidance for resolving it.
Key Points:
1. Content Similarity:
- Illyes confirmed that one reason for this error is content similarity. If a page's content closely mirrors other already indexed content, Google may choose not to index it.
2. General Site Quality:
- The overall quality of a website significantly impacts indexing. A high number of "crawled but not indexed" pages can indicate quality issues with the site.
3. Technical Issues:
- Technical problems, such as serving the same content across multiple URLs, can also lead to this error. Google’s perception of the site might change if such issues are detected.
4. Duplication:
- Duplicate content is another major factor. Google might crawl a page but decide not to index it if a similar version with better signals already exists in its index.
Detailed Explanation:
During the interview, Illyes responded to a question about whether the "crawled but not indexed" error could result from a page being too similar to already indexed content. He confirmed this could be one reason but emphasized that several factors contribute to this issue.
Granularity and Complexity:
Illyes noted the complexity of categorizing these errors due to the way data is handled internally at Google. He explained that while duplicate content is a significant factor, there are many other potential reasons for this error.
Quality Issues:
Illyes highlighted that the general quality of a site can greatly influence indexing. A surge in "crawled but not indexed" pages might hint at a decline in Google's perception of the site’s quality. This could be due to various reasons, including poor content or technical errors.
Technical Problems:
Technical issues, such as a website mistakenly serving the same content for different URLs, can also lead to this problem. Such errors can cause Google to reconsider its indexing decisions.
Site Signals:
Illyes mentioned that if another site with better signals hosts the same content, Google might prefer to index that site instead. This scenario often occurs with syndicated content where the original publisher’s version is not indexed.
Practical Takeaways:
Understanding these causes can help webmasters debug and fix "crawled but not indexed" errors. Key actions include:
-
Review Content Similarity: Ensure your content is unique and not too similar to existing indexed content.
-
Enhance Site Quality: Focus on improving the overall quality of your site to enhance its perception by Google.
-
Resolve Technical Issues: Address any technical issues that might cause duplicate content or other problems.
-
Monitor Site Signals: Be aware of how your site’s signals compare to others, especially if your content is syndicated.
By addressing these areas, you can improve your chances of having your pages indexed by Google.