Why Does This Happen?
A page disallowed by robots.txt
can still appear in search results, for example, if other pages link to it. This can happen when:
- The page doesn't have a
noindex
rule defined, or; - The crawler can't see the
noindex
rule because therobots.txt
file blocks access.
To check if this is happening on Google, you can look at the "Page indexing
" section on the URL Inspection page in the Google Search Console. If the page is blocked by robots.txt
, it will display an "Indexed, though blocked by robots.txt
" warning.
On this page, you can also see where Google found the URL and verify if the page indexing is allowed, despite being disallowed by robots.txt
. For example, check the "Page indexing
" section for details like:
Referring page: https://www.example.com/some-referrer ... Crawling Allowed? No: Blocked by robots.txt file Page fetch: Error: Blocked by robots.txt file Indexing allowed? Yes
How to Fix the Issue?
To prevent this from happening, you need to make sure of the following:
-
Page Has
noindex
RuleApply the
noindex
rule to tell search engines (like Google) not to index the page's content. You can do this in two ways:-
Using
<meta>
TagIf you only have access to the frontend of your page, you can simply add the following
<meta>
tag in the<head>
of your page:<meta name="robots" content="noindex" />
This will apply
noindex
rule to all crawlers that honor it. To also prevent the crawler from finding other linked pages, you can add thenofollow
rule:<meta name="robots" content="noindex,nofollow" />
-
Using HTTP Response Header
If you have access to the backend of your page, you can send the following
X-Robots-Tag
response header to the client:X-Robots-Tag: noindex
This response header will apply the
noindex
rule to all crawlers that honor it. To also prevent the crawler from finding other linked pages, you can add thenofollow
rule:X-Robots-Tag: noindex, nofollow
-
-
Page Isn't Blocked By
robots.txt
FileYou must ensure that the
robots.txt
file doesn't block the page. You can do so by removing "Disallow
" directives for the page in yourrobots.txt
file.
Following these steps instructs search engines, like Google, not to display the page in search results and helps avoid potential page blocking issues.
Further Reading
This post was published by Daniyal Hamid. Daniyal currently works as the Head of Engineering in Germany and has 20+ years of experience in software engineering, design and marketing. Please show your love and support by sharing this post.