Webmasters Asked on December 29, 2021
So I am trying to learn SEO and I am honestly confused and have following 8 questions.
Do I tell a bot not to visit a certain link through X-Robots-Tag
or through robot meta tag
or robots.txt
?
Is it ok to include all 3 (robots.txt, robot meta tag, and X-Robots-Tag header) or I should always only provide 1?
Do I get penalized if I show same info in X-Robots-Tag
and in robots
meta tag and robots.txt
?
Let’s say for /test1
my robots.txt
says Disallow
but my robots meta tag
says index, follow
and my X-Robots-Tag
says index, nofollow, noarchive
. Do I get penalized if those values are different?
Let’s say for /test1
my robots.txt
says Disallow
but my robots
meta tag says index, follow
and my X-Robots-Tag
says index,nofollow,noarchive
. Which rule will be followed by the bot? What is the importance here?
Let’s say my robots.txt
has a rule saying Disallow: /
and Allow: /link_one/link_two
and my X-Robots-Tag
and robot meta tag
for every link except /link_one/link_two
says nofollow,noindex,noarchive
. From what I understand bot will never get to /link_one/link_two
since I prevented it from crawling at root level. Now if I provide a sitemap.xml
in the robots.txt
that has /link_one/link_two
there, will it actually end up being crawled?
Will bot crawl into the directory provided by sitemap.(xml/txt)
even though it is not accessible through home page or any pages following the home page?
And overall I would appreciate some clarification on what is the difference between robots.txt
, X-Robots-Tag
and robot meta tag
and sitemap.(xml/txt)
. To me they seem like they do the exact same thing.
I already saw that there are some questions that answer a small subset of what I asked. But I want the whole big explanation.
While X-Robots-Tag
and meta robots
are equivalent, robots.txt
is different. The former is about indexing, while the latter is about crawling/visiting.
Tell bots not to visit a URL by using robots.txt
.
Use only one of the three for each URL. Using both X-Robots-Tag
and meta robots
on a URL is redundant because they are equivalent, and using both robots.txt
and either of the others for a URL can cause issues because robots.txt
blocks crawling, and crawling is required for a bot to even see either of the other ones since they are document-level directives.
You don't get penalized, but it doesn't do anything more than just having a page in robots.txt
. As I said in 2, robots.txt
will block a page from crawling, but a bot would need to crawl the page to see either of the other 2, so it can't see them.
robots.txt
prevents the bot from crawling and finding the other two.
robots.txt
prevents the bot from crawling and finding the other two.
At least for Google, "the most specific rule based on the length of the [path] entry trumps the less specific (shorter) rule." This means that your robots.txt
file will allow crawling of /link_one/link_two
, even though you disallow /
.
See The Sitemap Paradox. Short answer is that if your site can't properly be crawled without the sitemap, it may run into SEO issues anyways. In other words, that could cause a problem.
X-Robots-Tag
and meta robots
are exactly equivalent, just one is a way to do it at the HTTP level, and the other is a way to do the same thing at the HTML level. They prevent indexing, they don't prevent crawling. Use them for pages you don't want in search results. In contrast, robots.txt
prevents crawling, but not indexing. Use robots.txt
for pages you don't want bots to waste time crawling, but that wouldn't be catastrophic if they showed up in search results (Google is known to index URLs without crawling them if they are considered important enough). If you use both robots.txt
and one of the others, robots.txt
will prevent the bot from visiting the page and even seeing the others, rendering them useless.
Answered by Maximillian Laumeister on December 29, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP