The importance of robots.txt
Although the robots.txt file is a very important file if you want to have a
good ranking on search engines, many Web sites don't offer this file.

If your Web site doesn't have a robots.txt file yet, read on to learn how to
create one. If you already have a robots.txt file, read our tips to make sure
that it doesn't contain errors.



What is robots.txt?

When a search engine crawler comes to your site, it will look for a special
file on your site. That file is called robots.txt and it tells the search engine
spider, which Web pages of your site should be indexed and which Web
pages should be ignored.

The robots.txt file is a simple text file (no HTML), that must be placed in
your root directory, for example:

http://www.yourwebsite.com/robots.txt



How do I create a robots.txt file?

As mentioned above, the robots.txt file is a simple text file. Open a simple
text editor to create it. The content of a robots.txt file consists of so-called
"records".

A record contains the information for a special search engine. Each
record consists of two fields: the user agent line and one or more Disallow
lines. Here's an example:

User-agent: googlebot
Disallow: /cgi-bin/

This robots.txt file would allow the "googlebot", which is the search engine
spider of Google, to retrieve every page from your site except for files
from the "cgi-bin" directory. All files in the "cgi-bin" directory will be
ignored by googlebot.

The Disallow command works like a wildcard. If you enter

User-agent: googlebot
Disallow: /support

both "/support-desk/index.html" and "/support/index.html" as well as all
other files in the "support" directory would not be indexed by search
engines.

If you leave the Disallow line blank, you're telling the search engine that
all files may be indexed. In any case, you must enter a Disallow line for
every User-agent record.

If you want to give all search engine spiders the same rights, use the
following robots.txt content:

User-agent: *
Disallow: /cgi-bin/



Where can I find user agent names?

You can find user agent names in your log files by checking for requests
to robots.txt. Most often, all search engine spiders should be given the
same rights. in that case, use "User-agent: *" as mentioned above.



Things you should avoid

If you don't format your robots.txt file properly, some or all files of your
Web site might not get indexed by search engines. To avoid this, do the
following:

Don't use comments in the robots.txt file

Although comments are allowed in a robots.txt file, they might confuse
some search engine spiders.

"Disallow: support # Don't index the support directory" might be
misinterepreted as "Disallow: support#Don't index the support directory".



Don't use white space at the beginning of a line. For example, don't write

                
User-agent: *
     Disallow: /support

but

User-agent: *
Disallow: /support



Don't change the order of the commands. If your robots.txt file should
work, don't mix it up. Don't write

Disallow: /support
User-agent: *

but

User-agent: *
Disallow: /support



Don't use more than one directory in a Disallow line. Do not use the
following

User-agent: *
Disallow: /support /cgi-bin/ /images/

Search engine spiders cannot understand that format. The correct syntax
for this is

User-agent: *
Disallow: /support
Disallow: /cgi-bin/
Disallow: /images/



Be sure to use the right case. The file names on your server are case
sensitve. If the name of your directory is "Support", don't write "support" in
the robots.txt file.



Don't list all files. If you want a search engine spider to ignore all files in a
special directory, you don't have to list all files. For example:

User-agent: *
Disallow: /support/orders.html
Disallow: /support/technical.html
Disallow: /support/helpdesk.html
Disallow: /support/index.html

You can replace this with

User-agent: *
Disallow: /support



There is no "Allow" command

Don't use an "Allow" command in your robots.txt file. Only mention files
and directories that you don't want to be indexed. All other files will be
indexed automatically if they are linked on your site.


Tips and tricks:

1. How to allow all search engine spiders to index all files

Use the following content for your robots.txt file if you want to allow all
search engine spiders to index all files of your Web site:

User-agent: *
Disallow:

2. How to disallow all spiders to index any file

If you don't want search engines to index any file of your Web site, use
the following:

User-agent: *
Disallow: /

3. Where to find more complex examples.

If you want to see more complex examples, of robots.txt files, view the
robots.txt files of big Web sites:

http://www.cnn.com/robots.txt
http://www.nytimes.com/robots.txt
http://www.spiegel.com/robots.txt
http://www.ebay.com/robots.txt

Your Web site should have a proper robots.txt file if you want to have
good rankings on search engines. Only if search engines know what to do
with your pages, they can give you a good ranking.
1.
4.
3.
2.
6.
5.
7.
Uncle Stink's Trading Post
SEO Tools
Uncle Stink's Trading Post
Robots.txt file
The importance of robots.txt