logo

3 Types of Not Indexed Pages


Adam Gent

Ever wondered why some of your important pages aren't indexed in Google?

Despite submitting your URLs through XML sitemaps and following best practices, many pages still end up in the dreaded "Not Indexed" category in Google Search Console.

There are 3 common types of Not Indexed pages every SEO professional should know about. In this article, we'll identify which category your pages fall into.

So, let's dive in.

3 Types of Not Indexed Pages

The Page Indexing report provides insights into why pages aren't indexed.

Although this can be very useful, it's important we group these three Not Indexed reports into 3 categories. As it can help us clarify exactly what action we need to take to fix these indexing issues.

The 3 categories of Not Indexed pages are:

  1. Technical: These pages don't meet Google's technical requirements.
  2. Duplication: These pages are a duplicate of another page.
  3. Quality: These pages have been actively removed by Google.

1) Technical

The pages in this category don't meet Google's technical requirements.

What indexing states are in this category?

These pages either don't meet Google's basic technical requirements or have directives that explicitly tell Google not to index them:

Why are pages grouped into this category?

Google detected that the page does not meet the minimum technical requirements.

For a page to be eligible to be indexed it must meet the following technical requirements:

  1. Googlebot isn't blocked
  2. Google receives an HTTP 200 (success) status code
  3. The page has indexable content

If we group the technical errors in Google Search Console, they correspond with one of the minimum requirements:

  1. Googlebot isn't blocked
    1. URL blocked by robots.txt
    2. Blocked due to unauthorized request (401)
    3. Blocked due to access forbidden (403)
    4. URL blocked due to other 4xx issue
  2. Google receives an HTTP 200 (success) status code
    1. Server error (5xx)
    2. Redirect error
    3. Not found (404)
    4. Page with redirect (3xx)
  3. The page has indexable content
    • URL marked ‘noindex’
    • Soft 404

2) Duplicate Content

These are pages which contain duplicate or similar content.

What are these types of errors?

These types of errors are to do with Google canonicalization process in the indexing pipeline (I’ve provided descriptions as these are a bit more complicated):

Why are pages grouped into this category?

Pages are grouped into this category because of Google’s canonicalization algorithm.

When Google identifies duplicate pages across your website it:

  1. Groups the pages into a cluster.
  2. Analyses the canonical signals around the pages in the cluster.
  3. Selects a canonical URL from the cluster to appear in the search results.

This process is called canonicalization. However, the process isn't static.

Google continuously evaluates the canonical signals to determine which URL should be the canonical URL for the cluster. It looks at:

  1. 3xx Redirects
  2. Sitemap inclusion
  3. Canonical tag signals
  4. Internal linking patterns
  5. URL structure preferences

If a page was previously the canonical URL but new signals make Google select another URL in the cluster, then your original page gets removed from search results.


3) Quality

The final category is about pages being actively removed by Google.

What are these types of errors?

These types of indexing errors are split into 3 groups based on the signals collects around pages over time:

Why are pages grouped into this category?

Google is actively removing these pages from its search results and index.

Our research has found that nearly 80% of the 'crawled - currently indexed' index coverage state were historically crawled and indexed.

But that's not all.

We did research into how Google manages its index highlights a mechanism that uses page quality to decide if pages are removed from search results.

But that's not all.

We researched and found that index coverage states indicate crawl priority in Google's architecture. And that this crawl priority is based on page quality.

But that's not all.

Our 130-Day Indexing Rule research has identified that Google actively removes pages from its search results. If an indexed page has not been recrawled in 130 days then it has a 99% chance that it will be changed to Not Indexed.

But that's not all.

Our 190-Day Indexing Rule research revealed a critical discovery: Google doesn't merely remove pages from its index, it actively forgets them. When a page goes 190 days without being recrawled, Google purges it from memory entirely, marking it as 'URL is unknown to Google'.

All of this research points to the same thing: The 3 index coverage states are a strong indicator of page quality and crawl priority.

If your website has a high proportion of indexed pages, this signals that you've exceeded Google's page quality benchmark.

However, if your website has a high number of 'crawled - currently not indexed' pages then this indicates a low-quality website. And Google is actively removing pages from its Google search results.

Finally, if you've got a lot of pages in the 'URL is Unknown to Google' report then it's a strong indication these pages have zero crawl priority.

Summary

3 categories of Not Indexed pages: technical, duplicate, and quality.

Technical barriers and duplicate content issues are generally within your control to fix through standard optimisation practices.

Quality issues, however, require deeper analysis and often signal more significant problems with how your content meets user and search engine expectations.

Regularly monitoring your indexation status is crucial to identifying which category your Not Indexed pages fall into and taking appropriate action.


Adam Gent

Adam Gent

SEO Product Manager and Technical SEO. I’m currently an independent consultant who works with organisations to plan, scope and execute SEO projects that drive results.

Linkedin
3 Types of Not Indexed Pages | Indexing Insight