IT Masala

A Tech Curry with a Pinch of Indian Spice

31st January 2007

How to Control Search Engines Access and Indexing of your website

posted in Bloggers, Google, Webmaster |

Can publishers specify that some parts of the site should be private and non-searchable?Therobots_google.jpg good news is that those who publish on the web have a lot of control over which pages should appear in search results.

The key is a simple file called robots.txt

A simple example
Here is a simple example of a robots.txt file.

User-Agent: Googlebot Disallow: /logs

The User-Agent line specifies that the next section is a set of instructions just for the Googlebot. All the major search engines read and obey the instructions you put in robots.txt, and you can specify different rules for different search engines if you want to. The Disallow line tells Googlebot not to access files in the logs sub-directory of your site. The contents of the pages you put into the logs directory will not show up in Google search results.

Preventing access to a file
If you have a news article on your site that is only accessible by registered users, you'll want it excluded from Google's results. To do this, simply add a META tag into the html file, so it starts something like:

<html> <head> <meta name="googlebot" content="noindex"> …

This stops Google from indexing this file. META tags are particularly useful if you have permission to edit the individual files but not the site-wide robots.txt.

Learn more
You can find out more about robots.txt at http://www.robotstxt.org and at Google's Webmaster help center, which contains lots of helpful information, including:

 

Here is an useful list of the bots used by the major search engines: http://www.robotstxt.org/wc /active/html/index.html

via [Google blog ]

Leave a Reply

*
To prove you're a person (not a spam script), type the security text shown in the picture. Click here to regenerate some new text.
Click to hear an audio file of the anti-spam word


Call India for only 6.9ยข - 90 Free Minutes
Spread the Word
delicious
digg
technorati
reddit
magnolia
stumbleupon
yahoo
google