• Understanting robots.txt in 5 minutes

    Posted on March 1st, 2009 Webmaster No comments

    A lot of people have a lot of doubts about the function of the robots.txt file and the right configuration of this file.
    I will explain in a few words what the real fuction and how to configure the robots.txt, just the basics.

    The robots.txt file always need to be in the root / of the domain, example www.gwebtools.com/robots.txt, its a standard for search engines.

    Good search engines like google, yahoo, live, ask and many others will respect the options that you configure in your robots.txt, bad search engines like exploits crawler will not respect.

    The robots.txt is a simple text file to help the search engines index only relevant content about your website.

    Configuring the robots.txt is very easy, the options are:

    User-agent: In this option you can put the crawler name like Googlebot or * for all crawlers, that means the configurations will be applied only to those crawlers.
    Disallow: With this option you specify wich folders, pages you don´t want the crawler access and index.
    Allow: With this option you specify wich folders, pages the crawler can access and index (by default it index all).
    Sitemap: You can specify the url of your sitemap.

    –  robots.txt example 01 begin –
    User-agent: *
    Disallow: /
    Allow: /list-of-pages.php
    Allow: /contact.php
    – end of robots.txt example 01 –

    Explanation: In this example the rules apply to all crawlers, and just two pages can be indexed list-of-pages.php and contact.php.

    –  robots.txt example 02 begin –
    User-agent: Googlebot
    Allow: /
    Disallow: /downloads
    Allow: /downloads/signup.php
    – end of robots.txt example 02 –

    Explanation: In this example the rules apply just to googlebot all urls can be indexed, except the folder /downloads, but the page /downloads/signup.php can be indexed.

    Easy. Doubts send comments.