Why Does a Robots.txt Blocked Page Still Show in Search Results?

Why Does This Happen?

A page disallowed by robots.txt can still appear in search results, for example, if other pages link to it. This can happen when:

The page doesn't have a noindex rule defined, or;
The crawler can't see the noindex rule because the robots.txt file blocks access.

To check if this is happening on Google, you can look at the "Page indexing" section on the URL Inspection page in the Google Search Console. If the page is blocked by robots.txt, it will display an "Indexed, though blocked by robots.txt" warning.

On this page, you can also see where Google found the URL and verify if the page indexing is allowed, despite being disallowed by robots.txt. For example, check the "Page indexing" section for details like:

Referring page: https://www.example.com/some-referrer
...
Crawling Allowed? No: Blocked by robots.txt file
Page fetch: Error: Blocked by robots.txt file
Indexing allowed? Yes

How to Fix the Issue?

To prevent this from happening, you need to make sure of the following:

Page Has noindex Rule

Apply the noindex rule to tell search engines (like Google) not to index the page's content. You can do this in two ways:
1. Using <meta> Tag
  
  If you only have access to the frontend of your page, you can simply add the following <meta> tag in the <head> of your page:
```
<meta name="robots" content="noindex" />
```
  This will apply noindex rule to all crawlers that honor it. To also prevent the crawler from finding other linked pages, you can add the nofollow rule:
```
<meta name="robots" content="noindex,nofollow" />
```
2. Using HTTP Response Header
  
  If you have access to the backend of your page, you can send the following X-Robots-Tag response header to the client:
```
X-Robots-Tag: noindex
```
  This response header will apply the noindex rule to all crawlers that honor it. To also prevent the crawler from finding other linked pages, you can add the nofollow rule:
```
X-Robots-Tag: noindex, nofollow
```
Page Isn't Blocked By robots.txt File

You must ensure that the robots.txt file doesn't block the page. You can do so by removing "Disallow" directives for the page in your robots.txt file.

Following these steps instructs search engines, like Google, not to display the page in search results and helps avoid potential page blocking issues.

Why Does a Robots.txt Blocked Page Still Show in Search Results?

Why Does This Happen?

How to Fix the Issue?

Page Has `noindex` Rule

Using `<meta>` Tag

Using HTTP Response Header

Page Isn't Blocked By `robots.txt` File

Further Reading

Why Does a Robots.txt Blocked Page Still Show in Search Results?

Why Does This Happen?

How to Fix the Issue?

Page Has noindex Rule

Using <meta> Tag

Using HTTP Response Header

Page Isn't Blocked By robots.txt File

Further Reading

Page Has `noindex` Rule

Using `<meta>` Tag

Page Isn't Blocked By `robots.txt` File