Strategie mit Wurzeln. Wachstum mit Wirkung.

Crawler

A crawler (bot/spider) automatically searches the web, follows links, and includes pages in the search engine index – for example, the Googlebot. For SEO, it's important that the bot can reach content; areas can be restricted using robots.txt or nofollow.

In order to capture the amount of content on the World Wide Web, a so-called crawler—also known as a bot or spider—is required to search the internet. The search engine Google also has such a web crawler—namely the Googlebot. The search engine bot automatically searches the internet for websites and includes them in the index so that the pages can receive their ranking in the SERPs. Updates and changes on a page are also recorded. However, crawlers are not used exclusively by search engines but also, among other things, for collecting data and information like email addresses. This means that they are not necessarily limited only to the World Wide Web.

How does a crawler work? How does it read a page?

A web crawler, the first of which was sent on a "journey" in 1993 (it was called World Wide Web Wanderer), follows the principle of surfing the internet: it navigates from hyperlink to hyperlink through various pages and theoretically traverses the entire internet via such links. However, since many websites can only be accessed by entering login credentials, bots cannot capture the majority of the internet. During a crawl (a process), the crawler follows certain specifications determined by the programmer. The crawler automatically performs these tasks and repeats them continuously. This means that the internet is constantly being scoured by search programs. If a change occurs on a page, it is also detected by the crawler.

The program's specifications also specify in which categories or for which terms a website is classified into the index by the crawler. After the crawl, the website's content is listed in the index accordingly and can thus be retrieved for specific search terms.

Googlebot and Co.: Crawlers in Search Engine Optimization

For search engine optimization, the crawler is of important significance because it ensures that websites and content are made discoverable in search engines like Google. And since search engines are still the most important source of traffic for web offerings, this has high priority for all pages on the internet. Google's bot is likely the most well-known representative here.

On one hand, you must make the website accessible to the web crawler so it can index the information. This means, among other things, that you should not exclude the bot with a note in the robots.txt file. On the other hand, the probability that the crawler will index the website more frequently (and that means more completely) is higher, the more extensive the backlink structure is. This is because the bot navigates to new URLs during the crawl via hyperlinks. If web pages or sub-sections of a page are not or poorly linked, it may be that these contents cannot be reached by the bot.

Set nofollow links to a page: When the crawler should not be activated

Even if the goal is to have a website found as well as possible in search results, certain areas of a website can be excluded from indexing by a bot. This becomes necessary, for example, when dealing with unimportant subpages, such as a login page for internal use of the web service. In such cases, a link can be set to "nofollow" to indicate to a search engine crawler that this link should not be followed.