Blog

FREE Estimate

Getting Technical with rel=”canonical”

Webmasters have been able to specify a preferred URL for their content using the rel=”canonical” tag since 2009, and many have rightfully done so in order to solve duplicate content issues and punishments. However, since June 17, 2011, Google has been able to support a rel=”canonical” HTTP header1, which gives more diversity and greater functionality for specifying URLs for non-textual and non-HTML content such as PDF, RAR, and XLS files, and even across different file types, which previously were not covered by the simple link tag.

The rel=”canonical” HTTP header is extremely useful when a website offers its content in a variety of formats; for example, having a catalogue in both HTML and PDF form. Usually, these share the same name so it pays to inform the search engine that you are aware one is a copy. Furthermore, this could be used as a competitive advantage because a lot of SEO work focuses on text or HTML content. There is obviously a reason for that. Despite the HTTP headers’ better functionality, they are much harder to incorporate in dynamic pages. This implementation could even necessitate a number of changes to the site structure, such as a member access system, additional programs or servers, and a greater chance of error (admittedly only when the process is not done properly). If your website has enough content to benefit from this, you should consider canonical HTTP headers.

Broadly speaking, rel=”canonical” HTTP headers can be implemented in three ways: 1) by using PHP for text and HTML-based pages, 2) by using .htaccess for pages with no text or HTML content, and 3) by making the HTTP header dynamic, according to the difficulty of each method. Concerning the first method, the PHP support is needed because the interface must recognize the header() function. Google itself offers a lot of advice and tips on the subject2. Broadly speaking, the code should resemble: header( ‘Link: <your webpage>; rel=”canonical”’). The function will add the HTTP header to the other headers before any HTML output is sent. For experienced professionals, this syntax should be familiar, as it resembles the traditional link tag.

Sites with only a few pieces can make use of the .htaccess command in order to add the header. The code is similar to the previous type, except that the names and types of the files must be specified (hence why it is unpractical for bigger content systems). The pro forma code is as follows:

<Files “File.doc”>

 Header add Link “<http://www.siteurl.com/desiredpage.html>; rel=”canonical””

</Files>

However, two tricks can make file selection easier: “?” can represent an unknown symbol while “*” will substitute a string of unknown characters for multiple selecting. This function will add an HTTP header, which will redirect to an HTML page.

The final technique involves writing a PHP file to be referenced whenever somebody accesses your content. The easiest way is by rewriting the ending of the URL:

RewriteRule ^(.+).pdf /pdf.php?file=$1

The example pdf.php file will be utilized whenever a PDF that is stored on the website database is accessed. It is wise to add a logical operator to check whether the file exists and return the appropriate message. Thus the following:

$path=$_Server[‘Document_storage] . $_Get[‘requested_file’] . ‘.pff

If( file_esists($path)) {

//Depending on personal conditions, invoke the header() function

Header( ‘Content-Type: application/pdf’);

Header( ‘Content-Length: ‘.filesize($path));

Readfile($path);

}

Else {

Header( ‘HTTP 404 Not Found’);

Include ($_Server[ ‘Document_storage’] . ‘404.php’ );

}

Naturally, you should check whether everything has been implemented properly so that unpleasant surprises do not suddenly spring up. Mozilla Firefox has a trustworthy tool to help check HTTP headers3. It is advised that you implement HTTP headers in phases to analyze whether debugging is needed anywhere.

References:

  1. http://googlewebmastercentral.blogspot.co.uk/2011/06/supporting-relcanonical-http-headers.html
  2. http://support.google.com/webmasters/bin/answer.py?hl=en&answer=139394
  3. https://addons.mozilla.org/en-us/firefox/addon/live-http-headers/?src=ss

Leave a Reply

Your email address will not be published. Required fields are marked *


Important: 
This site makes use of cookies which may contain tracking information about visitors. By continuing to browse this site you agree to our use of cookies.