Technical Problems You Should Not Put Off Fixing
Websites are created with the best of intentions, and usually attention is paid even to the tiniest of details. Yet with time, they accumulate information, are unscrupulously updated and sometimes pass through a couple of developers. After a number of years, the unnoticed errors start banding together to wreak havoc on the whole of creation. It makes sense, then, to detect and deal with them as soon as possible.
Sometimes websites that use .net have duplicate copies with lower case and uppercase URLs. Although search engines have algorithms for picking up the right version, webmasters should not leave it entirely up to them. The URL rewrite module on the Internet Information System (IIS)1 has an option to enforce the lower case version, permanently solving the problem.
Another server related issue is misconfiguration. This can be evident in multiple symptoms from disastrous to seemingly harmless, and it is the latter that tend to go unnoticed. Things like sudden drops in traffic to particular pages, loss in standing of others for no apparent reason, unpredictable site layout and poorly executed commands can be signaling code failure and a need for debugging.
Some pages don’t need to be crawled or indexed, but denying Googlebot access to them presents another opportunity for problems. When writing up the robots.txt, consider copying catchall commands for Googlebot specifically, as certain combinations could produce unexpected results2. Another viable way of checking for inconsistencies is through Google’s Webmaster Tools, which will simulate a crawling. Keep in mind that hidden characters could unintentionally find their way there, so a check with the command line is always necessary.
Some platforms, including .net, sometimes create duplicates of website pages. These have almost identical URLs as their original counterparts with the exception of the ending: “.html,” “.aspx,” and others. These practically divide the original PageRank and decrease the authority of your website, so removing them is a must. Although guessing will produce a few results, for a comprehensive analysis, carry out a third-party crawl of the website and identify all the homepages. Once you’ve identified them, you should either utilize a 301 redirect for the duplicate or place a rel=”canonical” tag. Furthermore, you could alter all internal links so that they point to the right page. The last option reduces the link equity lost due to a 301 redirect.
Unfortunately, many commerce websites, which coincidentally are database driven, have dynamic links that point to the same content. Furthermore, the sheer number of links rises exponentially with the amount of products on offer without presenting anything different. For example, when browsing by categories, these are added to the URL in a random fashion, such as by color, size, or price. These deplete your Googlebot index budget, which is the number of pages Google will index on your website depending on its PageRank. It makes sense, then, to optimize the budget by selecting which pages need indexing. However, altering the robots.txt file will work only for non-indexed pages, while older ones need to be tagged with rel=”canonical” until a permanent solution is found. You cannot un-index pages from Google’s databases.
Another unsightly problem is having soft 404 errors, which cannot be detected. The user sees the normal 404 error message, but the search engine receives the status code 200, meaning a perfectly healthy page. The easiest way to deal with this is to fix the code so it can return the proper 404 error message (why do you have a non-working page in the first place?). Additionally, Google’s Webmaster Tools give accurate information on broken links and error codes. Status codes can further be checked with third party tools like Web Sniffer3.