The Evolving Spiderbot and Valuable Content
Nine tenths of people who go online use search engines, with more than half of those people using a search engine once or more per day1. Google has even become a part of everyday speech, and yet, few of the people using it have any idea of how it magically finds the web pages that have the information that they are looking for. In fact, search engines have evolved symbiotically alongside the internet from simple systems that required people to enter their URL addresses manually to our current search engines that can even return relevant results for misspelled search terms. The evolution of these online indexing services has had one common thread throughout their development- relevance.
To understand how search engines evolve, you have to know a little bit about how web pages work. When a web page is published, the content has a theme or a topic which is defined by a set of keywords. Keywords are simply the words used in the text that tell the reader what the content is about, and they almost always include one or two of the words in the title. They are the subject of the post and web pages, and allow you to set the keywords that define your page in the HTML code so that they can be easily indexed. They provide a tag that places them in a library of pages that has been cataloged. Web directories are catalogs of pages that work like this, but search engines have evolved past this to be able to return more valuable content. This is because they actively search the web to index it using programs, using what are called crawlers, or spiderbots.
Crawlers work by following links to pages and then, after reading the keyword tags, search through the content to determine the relevance of the content to the keywords. These programs are often called spiderbots because they travel on the links between pages indexing them according to their inbuilt algorithm, and transmitting the results back to home base. Initially, this was done by scanning any text for repetition of the keywords but this tended to encourage pages of spam. With enough keywords to trick the spiderbots, the search engines have steadily moved towards using other indicators of relevance to weed low quality pages out of the Search Engine Results Pages (SERP). The other major indicator of the quality of the content on a page is the availability of links that point to it from other sites on the web, and which indicate that another user has found it valuable enough to link to his own page. Refinements such as judging the anchor text for the link and determining whether links are placed with text in context or not have all come to influence a page’s place in the SERPs.
Google makes hundreds of changes to its crawler algorithms every year, and the most significant changes that have been announced this year (so far) is that they are heading towards semantic search capabilities. This evolution is a big step towards greater relevance in the search results as the crawlers will now take into account more of the context of the content of a page when it ranks it in the SERPs. It will now be able to evaluate the use of synonyms for the keywords as well as misspelled variations to return more results that fit the user’s intentions. This depth of inspection will also further sort the spam pages out of the top ranks and, in theory at least, should promote the publication of more valuable content as a means of building good SEO.
1. Search Engine Use 2012, Pew Research Center