If you’re in SEO these days, as much as you’re suspicious of Google, you also have to make sure to keep up with what they tell the webmaster community so that you can give the best recommendations to your clients. For me, one of the main places I go to follow this information is Barry Schwartz’s site, Search Engine Roundtable. During one of my reading sessions last week, I saw Barry’s article about John Mueller’s Google Webmaster Central Hangout, and I bookmarked it to watch later because the video was almost an hour long. Finally, yesterday, I got around to watching the video, and I noticed a couple of seeming contradictions between what John said and what his colleague Gary Illyes said a month earlier here.
Google Robots.txt Contradiction #1
At around the 28:25 mark, John says specifically that the rule “Allow: .js” which was espoused by Gary would only affect URLs that begin with .js (for example, /.jsSampleURL). To unblock .js files, you’d need to include the wildcard operator (*) such that the rule should be “Allow: *.js”. Interestingly, Gary does mention using the wildcard later in his Stack Overflow comment, but for the “catch-all” rule, he omits it for some reason.
Which format is correct in this case?
Testing This Using Search Console
Luckily for us mere mortals who don’t have regular access to Google’s engineers, Google’s Search Console has a Robots.txt Testing Tool where you can make mock adjustments to your current robots.txt file and test that against specific URLs to see how your changes would affect the site in question.
For this test, I used the example of http://www.example.com/.jsSampleURL and http://www.example.com/sample.js.
Test #1 Rule
In this test, Gary seems to believe that http://www.example.com/sample.js will be blocked, while John thinks that you need the wildcard (*) before .js to have this effect. For his part, John said that this rule should block http://www.example.com/.jsSampleURL
Test #1 Winner – NEITHER!
The testing tool showed that *neither* URL was blocked. Thus, both John AND Gary are partially wrong here. Very interesting.
But what happens when you add the wildcard like John said?
Test #2 Rule
Test #2 Winner – John Mueller!
Both URLs are now blocked! The wildcard makes all the difference, just like John said.
Contradiction #2 – Not Really a Contradiction
Testing Google’s Guidelines Again
Test #1 Rules
Test #2 Rules
Testing Outcome – Google’s Guidelines Confirmed!
As John and Gary said, the testing tool showed that the resource would not be crawlable in Test #1 due to the first rule. However, when adding the subfolder to the second rule, we get the following result:
There are a few things I learned from this experience:
- Make sure to include wildcards when using the method Gary Illyes described.
- Make sure to include separate “Allow” rules for each subfolder that contains the files you are trying to unblock.
Last, and most importantly, robots.txt is a complicated business. Make sure that you have an experienced SEO or web developer looking at these issues for you if you can’t get Google to crawl your site.