Should I Be Using A Robots.txt File

by admin on 2009/03/13

Should I Be Using A Robots.txt File

I’m writing this article mainly for two reasons, one I got asked a good question about the robots.txt file and the second reason is to provide some up-to-date content on the subject, as I searched google and found articles from sites that are a few years old.

The two main topics for this article are “what is a robots.txt file” and “should I be using a robots.txt file”. If you have read this article and haven’t found the answer to your question, please leave a comment with your question and I’ll update this article, that rule also applies to all my other posts.

First of all we’ll cover “what is a robots.txt file” as this will give you a good understanding of what the file is and does, you should then be able to come up with the answer to “should you use a robots file”.

A robot file is simply a file which tells the robots that crawl your site what to do, if you don’t have a robot file the robots will just crawl some pages or all pages. I was recently asked why does the robots.txt file show up as a 404 on the stats page? Is having this as a 404 hurting my site?

Theres hundreds of robots crawling the internet every minute and when they visit a site the first thing they look for is a robots.txt file in your root folder, if they find one they will follow the instructions in the file. If they can’t find a robots.txt file then it can display in your awstats that they were taken to a 404 page, the standard procedure for a robot is to look for domain.com/robots.txt and if you don’t have a robot file the bot will be taken to a 404 page. If you don’t want a 404 error in your stats page you can simply create a blank robot file, this wont hurt your site in anyway but will prevent the 404’s showing up in awstats.

Each bot is defined by a name, google’s bot is called Googlebot, yahoo’s is known as Yahoo Slurp and so on, there are hundreds of different bots out there and as your website starts to get popular you’ll notice different types of bots coming to your site that you didn’t know existed. The robot file allows you to have some control over the bots that visit your site, you can specify which pages to ignore and not index. Allot of sites have robot files as some sites use duplicate content which is for there members to view and not for the search engines to pick up and display, so they would use a robots.txt file to tell the search engines not to index the content on certain pages. Having duplicate content can hurt your website so if you use it I strongly suggest having a robot file as duplicate content will only hurt your websites potential and can damage Pagerank, serps, ect. It’s known that robot files can’t always stop the bots from accessing pages or directories so you might want to add something into your httaccess file as well.

I would personally use a robot file on any type of website weather it be big or small, if the search engines like google look for this file then its worth having even if it is blank.

Remember if google see’s 404 errors or broken links then it will assume your website is still under construction and will not give it any serious weight so I would also suggest going through your awstats and prevent as many of the 404 errors from happening again.

Other Resources:

Limited Time Directory Submissions
How to Increase Pagerank
Top Article Sites By Pagerank
Create Your Own Traffic Generator
Should I Be Using A Robots.txt File

Leave a Comment

Previous post: Add A Blog To Your Site

Next post: Abit About External Links And Linking To Google