3 Types of Not Indexed Pages

Ever wondered why some of your important pages aren't indexed in Google?

Despite submitting your URLs through XML sitemaps and following best practices, many pages still end up in the dreaded "Not Indexed" category in Google Search Console.

There are 3 common types of Not Indexed pages every SEO professional should know about. In this article, we'll identify which category your pages fall into.

So, let's dive in.

3 Types of Not Indexed Pages

The Page Indexing report provides insights into why pages aren't indexed.

Although this can be very useful, it's important we group these three Not Indexed reports into 3 categories. As it can help us clarify exactly what action we need to take to fix these indexing issues.

The 3 categories of Not Indexed pages are:

Technical: These pages don't meet Google's technical requirements.
Duplication: These pages are a duplicate of another page.
Quality: These pages have been actively removed by Google.

1) Technical

The pages in this category don't meet Google's technical requirements.

What indexing states are in this category?

These pages either don't meet Google's basic technical requirements or have directives that explicitly tell Google not to index them:

Server error (5xx)
Redirect error
URL blocked by robots.txt
URL marked ‘noindex’
Soft 404
Blocked due to unauthorized request (401)
Not found (404)
Blocked due to access forbidden (403)
URL blocked due to other 4xx issue
Page with redirect

Why are pages grouped into this category?

Google detected that the page does not meet the minimum technical requirements.

For a page to be eligible to be indexed it must meet the following technical requirements:

Googlebot isn't blocked
Google receives an HTTP 200 (success) status code
The page has indexable content

If we group the technical errors in Google Search Console, they correspond with one of the minimum requirements:

Googlebot isn't blocked
1. URL blocked by robots.txt
2. Blocked due to unauthorized request (401)
3. Blocked due to access forbidden (403)
4. URL blocked due to other 4xx issue
Google receives an HTTP 200 (success) status code
1. Server error (5xx)
2. Redirect error
3. Not found (404)
4. Page with redirect (3xx)
The page has indexable content
- URL marked ‘noindex’
- Soft 404

2) Duplicate Content

These are pages which contain duplicate or similar content.

What are these types of errors?

These types of errors are to do with Google canonicalization process in the indexing pipeline (I’ve provided descriptions as these are a bit more complicated):

Alternate page with proper canonical tag: The page has indicated that another page is the canonical URL that will appear in search results.
Duplicate without user-selected canonical: Google has detected that this page is a duplicate of another page, that it does not have a user-selected canonical and that they have chosen another page as the canonical URL.
Duplicate, Google chose different canonical than user: Although you have specific another page as the canonical URL, Google has chosen a different page as the canonical URL to appear in search results.

Why are pages grouped into this category?

Pages are grouped into this category because of Google’s canonicalization algorithm.

When Google identifies duplicate pages across your website it:

Groups the pages into a cluster.
Analyses the canonical signals around the pages in the cluster.
Selects a canonical URL from the cluster to appear in the search results.

This process is called canonicalization. However, the process isn't static.

Google continuously evaluates the canonical signals to determine which URL should be the canonical URL for the cluster. It looks at:

3xx Redirects
Sitemap inclusion
Canonical tag signals
Internal linking patterns
URL structure preferences

If a page was previously the canonical URL but new signals make Google select another URL in the cluster, then your original page gets removed from search results.

3) Quality

The final category is about pages being actively removed by Google.

What are these types of errors?

These types of indexing errors are split into 3 groups based on the signals collects around pages over time:

Crawled - currently not indexed: The page has either been discovered, crawled but not indexed OR the historically indexed page has been actively removed from Google’s search results.
Discovered—currently not indexed: A new page has been discovered but not yet crawled, OR Google is actively forgetting the historically indexed page.
URL is unknown to Google: A page has never been seen by Google OR Google has actively forgotten the historically crawled and indexed pages.

Why are pages grouped into this category?

Google is actively removing these pages from its search results and index.

Our research has found that nearly 80% of the 'crawled - currently indexed' index coverage state were historically crawled and indexed.

But that's not all.

We did research into how Google manages its index highlights a mechanism that uses page quality to decide if pages are removed from search results.

But that's not all.

We researched and found that index coverage states indicate crawl priority in Google's architecture. And that this crawl priority is based on page quality.

But that's not all.

Our 130-Day Indexing Rule research has identified that Google actively removes pages from its search results. If an indexed page has not been recrawled in 130 days then it has a 99% chance that it will be changed to Not Indexed.

But that's not all.

Our 190-Day Indexing Rule research revealed a critical discovery: Google doesn't merely remove pages from its index, it actively forgets them. When a page goes 190 days without being recrawled, Google purges it from memory entirely, marking it as 'URL is unknown to Google'.

All of this research points to the same thing: The 3 index coverage states are a strong indicator of page quality and crawl priority.

If your website has a high proportion of indexed pages, this signals that you've exceeded Google's page quality benchmark.

However, if your website has a high number of 'crawled - currently not indexed' pages then this indicates a low-quality website. And Google is actively removing pages from its Google search results.

Finally, if you've got a lot of pages in the 'URL is Unknown to Google' report then it's a strong indication these pages have zero crawl priority.

Summary

3 categories of Not Indexed pages: technical, duplicate, and quality.

Technical barriers and duplicate content issues are generally within your control to fix through standard optimisation practices.

Quality issues, however, require deeper analysis and often signal more significant problems with how your content meets user and search engine expectations.

Regularly monitoring your indexation status is crucial to identifying which category your Not Indexed pages fall into and taking appropriate action.

3 Types of Not Indexed Pages

3 Types of Not Indexed Pages

1) Technical

2) Duplicate Content

What are these types of errors?

Why are pages grouped into this category?

3) Quality

What are these types of errors?

Why are pages grouped into this category?

Summary

Subscribe to our blog.