robots.txt management
Created: Last updated:
With the plainfiles module in place adding the robots.txt was almost a natural. There are a few benefits running this text file through the framework and not having it as a plain file on the server.
It is just there
No need to worry about a robots.txt. If I set up a new environment for a client the file is there automatically. No need to worry about it, upload or copy/paste anything. It is simply just there.
Sitemap.xml coverage
Because the framework also creates a sitemap.xml file (since almost from the getgo) it is automatically included in the robots.txt file.
Again no need to worry about anything like submitting it to any search engines. Once the search engines pick up a website and the robots.txt they also get the sitemap.xml link information.
Unless there is a search engines that does not understand the sitemap directive but that is not a problem we can solve here.
Logging
I can run all robots.txt requests by the logging class. Actually the plainfiles module runs it without even doing something special. Sweet!
With the logging I cannot only see when the spiders pick up the file but more importantly who picks it up. Because I can filter the logs with well-known addresses I can see any suspicious requests.
IP Blocking
Since the plainfiles module also verifies the IP address, i.e. if the request is coming from some malicious address, the events are not only logged appropriately but they are also passed to the same process as all malicious requests.
Until now all such requests have been handled with the .htaccess file which basically did nothing besides passing the request to the local robots.txt. Now without the local robots.txt these requests like almost every other is passed to the index.php and the framework.
Management
Last but not least the management is easier because I store the information in a ini configuration file which can be easily manipulated with Zend_Config. With a little admin form the information can be set to the configuration.