Did you know that if you don’t have an actual robots.txt
file, WordPress will create a virtual one for you? For example, I have not created a robots file for trepmal.com, yet you can see one at https://trepmal.com/robots.txt.
So either you have created your own file, or you’re relying on the virtual one. Unless you’ve explicitly disabled WordPress virtual robots.txt file, you’ll have something at yoursite.com/robots.txt
However, if you (1) use nginx, and if you (2) followed certain popular guidelines* for configuring your site, and if you (2) are relying on the virtual file, you might discover that you get a 404 if you try to view your robots file.
The troubling part of the nginx configuration looks like this
location = /robots.txt {
access_log off;
log_not_found off;
}
What’s inside isn’t the problem. It’s pretty basic, whether found (access_log
) or not (log_not_found
) don’t log it ( off;
).
The problem is with this: location = /robots.txt
That equals sign means that when the request matches, perform only these rules and nothing else. When you have a robots file, then no problem – it just gets served up, plain and simple. But if you don’t, there’s nothing in that block that lets WordPress handle the request, so you get a 404.
Here’s some irony for you: The block that prevents logging a 404 will actually cause it.
There are 2 easy options for fixing this.
1. Pass the request on to WordPress
(2014/7/18: Updated because If is Evil. Thanks Mike Little!)
location = /robots.txt {
try_files $uri $uri/ /index.php?$args;
access_log off;
log_not_found off;
}
2. Remove it altogether
This will let the regular rules take care of the request, but your access.log might get bloated with hits from bots.
And don’t forget to reload nginx when you’re done making any changes to the conf file.
When faced with the 404 on a single-install site, it is perfectly fine to just create the robots.txt file, but if on a multisite, that solution is less acceptable and really WordPress should be handing the request.
I use a slightly better version than an if test (see http://wiki.nginx.org/IfIsEvil)
Thanks! I’ve updated the post accordingly.
Thanks for solutions
Tested here
Pardon the most basic question of all: What location and file do I place this fix?BTW, it’s exactly what I’m needing and thank you for you great post!
This will depend on the configuration of your server. On CentOS, there’s likely to be a site conf file in
/etc/nginx/conf.d/
, on Ubuntu, it might be in/etc/ngnix/sites-available/
.Great post! This problem was driving me nuts. I had the
location = /robots.txt {
access_log off;
log_not_found off;
}
working fine on a single site wordpress installation somehow. On my second installation – a Multisite – it always led to the 404. Thanks to the hint to include the
try_files $uri $uri/ /index.php?$args;
it now finally works! I have no idea why it worked in the single site installation though.
Thank you !
many thanks for this article!
In the past I used to use:
location = /robots.txt {
allow all;
log_not_found off;
access_log off;
}
Everything is in trouble when it comes to https://xetaitot.com/robots.txt
It returns a 404 error.
This happened and I did not know it until I accidentally entered webmaster on Google!