Not everyone has access to their web server, but they still want to have control over how crawlers behave on their web site. If you’re one of those, you can still control the crawlers that come to your site. Instead of using the robots.txt file, you use a robots meta tag to make your preferences known to the crawlers.
The robots meta tag is a small piece of HTML code that is inserted into the tag of your web site and it works generally in the same manner that the robots.txt file does. You include your instructions for crawlers inside the tags. The following example shows you how your robots meta tag might look:
This bit of HTML tells crawlers not to index the content on the site and not to follow the links on the site. Of course, that might not be exactly what you had in mind. You can also use several other robots meta tags for combinations of following, not following, indexing, and not indexing:
The major difference between robots.txt and robots meta tags is that with the meta tags you cannot specify which crawlers you’re targeting. It’s an all or nothing tag, so you either command all of the crawlers to behave in a certain way, or you command none of them. It’s not as precise as robots.txt, but if you don’t have access to your web server, it’s a good alternative.
Unfortunately, not all search engines recognize the robots.txt file or the robots meta tags. So in some cases, you have no control at all over what the crawler examines on
your site. However, more search engines seem to be allowing these commands to help classify the Web more efficiently.
Search engine crawlers can help your site get indexed so that it appears in search results. But they can also cause problems with your site if they don’t follow the guidelines outlined in the Robot Exclusion Standard or if your site is not stable enough to support the way the crawler examines it.
Knowing how to control the way that search engines crawl your site can help to assure that your site is always at its shiny best (or at least appears to the search crawler to be). It won’t necessarily give you complete control of all the crawlers on the Web, but it will help with some of them.