Search Engine Spiders, Bots & Crawlers

What are Spiders, Bots & Crawlers?

Search Engine Resources:

Spiders, bots and web crawlers are automated software programs that travel the Web, locating and indexing websites for search engines. They are often called spiders and crawlers because they crawl all over the Web.

The search engine spider has one basic job and that is to crawl website content, capture information and take it back to the associated search engine. Once a bot is sent to your website it starts reading the text in the body of each web page. It also reads the HTML (source code) and discovers links to other web pages.

Search engine spiders don't rank web pages. They simply go out and gets copies of them, which they forward to a search engine so they can be included in the search engine's database. The search engines then use powerful algorithms to analyze the information gathered by the bots and rank web pages based on their analysis.

How often do Search Engine Spiders visit websites?

Once a website is in a search engine database, the bots will keep visiting it regularly. Each time they crawl a website, they check for changes made to it. If there are some, the crawlers make a note and remember to come back a little sooner next time. The best way to keep them coming back often is to focus on fresh content. Remember to add new pages or other useful information to your website on a consistent basis.

Think Like a Search Engine Spider

Before making major changes to your website, take a minute to consider it looks to a search engine spider. Search engine spiders can't see colors, so they can't appreciate the colorful spider image below on the left. They actually can't even see the black and white one with the word "Google" above it. Unfortunately, they can't even see the image on the right. They only know what's in an image when the web page designer adds an Image ALT tag to the image. (NOTE: The image ALT tag can't be seen on the web page, only in the source code of the page.)

How Search Engine Spiders, Bots and Crawlers see Web Pages

Search engines spiders don't care about fancy web design.
A search engine spider only sees text and HTML code.
Search engine bots cannot see text in an image.
If your site includes mostly images with little text, you will not do well in search queries.
Do you have a slow loading web page?
How fast a page loads in a browser is a major factor in determining how much of it gets crawled.
The content at the top of a web page is most important.
The search engine spider reads the content in the order that it is inserted into the page from top to bottom. All search engines give the most ranking weight to the information at the top of the page.
Robots don't use search forms.
Search engine spiders don't perform searches to find content. Do not make a search box your only means of navigation on your website. Search spiders will get stuck.
Is your website crawlable?
"Crawlable" means the links to and within your website can be followed by a search engine crawler. It reads the text on a web page and records any hyperlinks it finds. It then follows these URLs, crawls those pages, and collects the data. If a search engine spider cannot follow a link, then the destination page will not be included in the search engine's database.
Limit the number of links on a web page.
Search engines will only crawl so many links on a given page. Pages with hundreds of links are at risk of not getting all of those links crawled and indexed. It is best to only link pages of primary importance from the home page. Do not link every page in your website to every other page.
Content to Code Ratio.
The content to code ratio refers to how much content your page has relative to how much source code (non-visible information). A good content to code ratio is anywhere from 25 to 70 percent. When a website is code-heavy, having a poor content to code ratio, it is often referred to as having CODE BLOAT. Instead, provide quality content with clean code to make it easier for search engine bots to crawl your site.