Nginx, robots.txt, and Copy-Pasta

Did you know that if you don’t have an actual robots.txt file, WordPress will create a virtual one for you? For example, I have not created a robots file for trepmal.com, yet you can see one at https://trepmal.com/robots.txt.

So either you have created your own file, or you’re relying on the virtual one. Unless you’ve explicitly disabled WordPress virtual robots.txt file, you’ll have something at yoursite.com/robots.txt

However, if you (1) use nginx, and if you (2) followed certain popular guidelines* for configuring your site, and if you (2) are relying on the virtual file, you might discover that you get a 404 if you try to view your robots file.

The troubling part of the nginx configuration looks like this

location = /robots.txt {
    access_log off;
    log_not_found off;
}

What’s inside isn’t the problem. It’s pretty basic, whether found (access_log) or not (log_not_found) don’t log it ( off;).

The problem is with this: location = /robots.txt

That equals sign means that when the request matches, perform only these rules and nothing else. When you have a robots file, then no problem – it just gets served up, plain and simple. But if you don’t, there’s nothing in that block that lets WordPress handle the request, so you get a 404.

Here’s some irony for you: The block that prevents logging a 404 will actually cause it.

There are 2 easy options for fixing this.

1. Pass the request on to WordPress
(2014/7/18: Updated because If is Evil. Thanks Mike Little!)

location = /robots.txt {
    try_files $uri $uri/ /index.php?$args;
    access_log off;
    log_not_found off;
}

2. Remove it altogether

This will let the regular rules take care of the request, but your access.log might get bloated with hits from bots.

And don’t forget to reload nginx when you’re done making any changes to the conf file.

When faced with the 404 on a single-install site, it is perfectly fine to just create the robots.txt file, but if on a multisite, that solution is less acceptable and really WordPress should be handing the request.

* Such as this or this or this or this or this

7 thoughts on “Nginx, robots.txt, and Copy-Pasta”

  1. Pardon the most basic question of all: What location and file do I place this fix?BTW, it’s exactly what I’m needing and thank you for you great post!

    1. This will depend on the configuration of your server. On CentOS, there’s likely to be a site conf file in /etc/nginx/conf.d/, on Ubuntu, it might be in /etc/ngnix/sites-available/.

  2. Great post! This problem was driving me nuts. I had the

    location = /robots.txt {
    access_log off;
    log_not_found off;
    }

    working fine on a single site wordpress installation somehow. On my second installation – a Multisite – it always led to the 404. Thanks to the hint to include the

    try_files $uri $uri/ /index.php?$args;

    it now finally works! I have no idea why it worked in the single site installation though.

Leave a Reply

Your email address will not be published. Required fields are marked *

%d bloggers like this: