Nov 29, 2012

Twitter and SEO: The Saga Continues

Quite recently, Twitter did something few in the industry expected: it changed its robots.txt file, and it had a dramatic impact on search results1. The robots.txt file is responsible for directing search engine bots to those pages that need to be indexed and conversely, preventing them from indexing other pages. Twitter has generally had quite a few SEO problems, namely the numerous links leading to one account, which leads to a dispersed PageRank for profile pages. But why is the change in the robots.txt file such a huge thing?

In 2009, the golden age of real-time searching, Google and Twitter entered into a deal whereby Twitter would export real-time data of its newsfeeds so that users could search tweets. This in turn was a result of the awful search function currently implemented by the website; users could not search for tweets which were more than seven days old. At the same time, Twitter entered into similar agreements with the other search engines, namely Yandex and Bing. However, later in 2011 Google was the only one that could not renew its agreement2, thus giving Yandex3 and Bing4 a notable advantage. One could argue that the negotiations falling through effectively brought about the end of Google Real Time Search.

The change in the robots.txt file is focused on allowing “/search,” or its tweets and hashtags. Unfortunately, the “/search/users”—the search function for user profiles—and “/search/*/grid”—the search function for photos and videos—are still unavailable for crawling. Although quite a large amount of tweets are left unaccounted for, Google will nevertheless be able to produce surprisingly interesting results, such as a page related to the Olympics5.

However, it is questionable whether allowing search engines to crawl and index its search pages is a positive move. According to Google’s webmaster guidelines, websites should not index pages with dynamically created content that adds no value for users, such as results or catalogue pages6. And indeed, if one does a Twitter search on Google7, one will stumble onto millions (124 million7 to be precise) of searches in Twitter.

It could be argued that having access to the search history of Twitter is hardly something to be positive about; in fact, it could pollute the SERPs with irrelevant results. That is so because Twitter results pages change their composition in less than an hour; 17% of top 1,000 queries change between hours8, and their relevance is short-lived. Twitter simply churns out too much.

Thankfully, there is progress by the social media giant in exporting user tweets and searching reliably within tweet libraries9. The company has started building a tool to incorporate these functions. However, it warns that with 400 million tweets per day, it will be a herculean if not impossible task to provide results encompassing longer time periods. A practical application of this concept can be seen in Topsy10, one of the last real-time search engines. It can seek out and deliver tweets from previous dates with questionable accuracy. Why not try it out to see for yourself how many of your past tweets it detects?

On the topic of Twitter searching, it is worthwhile to mention that the social media company has recently upgraded its search functions to include an auto-complete function along with new filters. The auto-complete function suggests currently topical queries so that you know just what is trending on the web at that particular time. The additional filter is called, “People you follow,” which will help you to hone in on tweets you might have seen recently or expect to see. Additionally, Twitter is able to correct your spelling mistakes.