12th June 2007

Robots Exclusion Standard

A robots.txt file, commonly mis-represented as a robot.txt file, is a file encoded in the ANSI text format. This basically means it is a simple text file which should be created in Notepad. It controls how search engine crawlers (robots) look at your website and can be used to specify how certain areas of your site is indexed or to give instruction to specific search engines.

The file should be placed in the root directory of your website of where your index.html or home page resides. Even though you may not require the spider to exclude any area of your site from its search you should still have it as all the top-ranked search engines now look for it.

Some reasons you may need to exclude spiders from your site include:
1. There are some private directories or information that you do not want to be crawled.
2. You’re still fixing parts of the site and some areas may contain error pages.
3. You have optimized certain pages for specific search engines and want to exclude other search engine spiders from indexing it.
4. You want to prevent some search engine robots or email harvesting bots (Bad Bots) from crawling your pages altogether.

Syntax For File Creation
The basic instructions are placed in two lines of text.

User-agent: Spider Name
Diallow: File/Directory Name

Get More Details at: GoArticles.com
Spread the word: bookmark it/readit

This entry was posted on Tuesday, June 12th, 2007 at 3:38 am and is filed under SEO/Search Engine News. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

Leave a Reply

Spread the Word
delicious
digg
technorati
reddit
magnolia
stumbleupon
yahoo
google
  • Subscribe

  • Add to Google
  • Add to My Yahoo!
  • Subscribe with Bloglines
  • Subscribe in NewsGator Online
  • Add to Technorati Favorites!
  • Feedburner Reader
  • Get free E-Book on blogging

  • Online Marketing
  • RSS


eXTReMe Tracker