Why Does a Robots.txt Blocked Page Still Show in Search Results?

Why Does This Happen?

A page disallowed by robots.txt can still appear in search results, for example, if other pages link to it. This can happen when:

  1. The page doesn't have a noindex rule defined, or;
  2. The crawler can't see the noindex rule because the robots.txt file blocks access.

To check if this is happening on Google, you can look at the "Page indexing" section on the URL Inspection page in the Google Search Console. If the page is blocked by robots.txt, it will display an "Indexed, though blocked by robots.txt" warning.

On this page, you can also see where Google found the URL and verify if the page indexing is allowed, despite being disallowed by robots.txt. For example, check the "Page indexing" section for details like:

Referring page: https://www.example.com/some-referrer
...
Crawling Allowed? No: Blocked by robots.txt file
Page fetch: Error: Blocked by robots.txt file
Indexing allowed? Yes

How to Fix the Issue?

To prevent this from happening, you need to make sure of the following:

  1. Page Has noindex Rule

    Apply the noindex rule to tell search engines (like Google) not to index the page's content. You can do this in two ways:

    1. Using <meta> Tag

      If you only have access to the frontend of your page, you can simply add the following <meta> tag in the <head> of your page:

      <meta name="robots" content="noindex" />
      

      This will apply noindex rule to all crawlers that honor it. To also prevent the crawler from finding other linked pages, you can add the nofollow rule:

      <meta name="robots" content="noindex,nofollow" />
      
    2. Using HTTP Response Header

      If you have access to the backend of your page, you can send the following X-Robots-Tag response header to the client:

      X-Robots-Tag: noindex
      

      This response header will apply the noindex rule to all crawlers that honor it. To also prevent the crawler from finding other linked pages, you can add the nofollow rule:

      X-Robots-Tag: noindex, nofollow
      
  2. Page Isn't Blocked By robots.txt File

    You must ensure that the robots.txt file doesn't block the page. You can do so by removing "Disallow" directives for the page in your robots.txt file.

Following these steps instructs search engines, like Google, not to display the page in search results and helps avoid potential page blocking issues.

Further Reading


This post was published by Daniyal Hamid. Daniyal currently works as the Head of Engineering in Germany and has 20+ years of experience in software engineering, design and marketing. Please show your love and support by sharing this post.