Welcome to Search Kingdom


Castles, keeps, moats... No, sadly we haven't got any of those, but we do have all the first hand knowledge you need to help your website to rank well in search engine results. No hype, no false promises, just clear advice, training or direct assistance to get your website found.

Canonical URLs and Duplicate Content

As we know there is no such thing as a duplicate content penalty. However, duplicate content is a real issue for both search engines and those who wish to do well in this area.

The big three (Google, Yahoo and Microsoft) rarely get together on general standards, but when they do it is worth taking note. All three have a problem cutting through the mess that is the world-wide web and trying to archive it into something that is usable for us all. A big issue in this respect is the amount of duplication the web throws up. This sits in two main areas…

  1. The plagiarism that inherently exists on the web e.g. “ooh, that is a good article, I will use that on my site”, etc.
  2. Websites that duplicate information either intentionally or unintentionally.

The later category is the one that the big three (mainly Google) have been trying for a long while to mitigate. Fortunately, the “pet insurance london”, “pet insurance cardiff”, etc. which were all basically the same page, but with different Meta and H1 tags problem has been dealt with pretty much (some are still out there though!). Also, data sorting on page (e.g. sort by price, etc.) which generates lots of very similar pages all with different URLs (i.e. usually with the “?sort=” parameter is slowly being tackled. Yahoo, also have a parameter based URL removal tool in their site explorer suite.

The recent announcement helps with issues 1 (not at all as you can’t use this tag for external domain pages) and 2 (yes) to try to help search engines and webmasters make sure that the real (and main) version of a web page is treated as the canonical (main and sort of only) one.

The addition is a tag for the head of your page which is…

link rel=”canonical” href=”The URL you want to be the main one for this web page”

As a rule it is not a bad idea to include this in all of your pages that you manually create (which includes ones that you create in WordPress, etc. – don’t worry there are some plugins available already). This means that if these pages get tagged with a differnet URLs in one way or another, at least the search engine know what you meant to be the main one. Also, if you create intentional duplicate content (landing pages, etc.) then you can use this tag to help you to not confuse the search engines.

The plagiarism one is still something that the search engines will have to deal with themselves. Also, if you are creating multiple pages (with different URLs) from a database source (feed, CMS, etc.) then this will only help you if you can incorporate a dynamic element into the ‘rel’ tag that picks the canonical URL for you. This will work as a good alternative to the ‘noindex’/'nofollow’ way which (was) my favoured method.

It is well to remember that this is not a panacea. You should still look to rationalise your URLs and ensure that they are not duplicated and indexed. Remember, you are still passing PageRank (well, the links you don’t ‘nofollow’ anyway) to any page you link to on your site (even the ‘noindex’ pages), so (still) make sure you are not giving away credit to meaningless pages.

However, overall the is really good and especially so since it creates a ’301 redirect’ environment (so the good stuff gets passed back to the original page too) and therefor better than ‘noindex’ in many ways. Remember this will only work on your domain and links between these pages (i.e. you can’t use the tag to external domains from the domain you are working on).

If used well this is an excellent addition to your SEM efforts.

« « Previous post: Google Eye Tracking and Heatmap Studies
Next post: rel=”nofollow” » »

One Response to “Canonical URLs and Duplicate Content”

  1. Rob Andrews says:

    Does this also take care of the URL hijacking problem, more info on this blog post here. This fix is said to work only on the site it is on and does not work on external URLs. So if the URL starts ‘www.someproxy.blah…’ this should be viewed as an external URL even if the end result is the spoofed page? Will the canonical tag work? I would guess not. Comments and opinions are very welcome.

Leave a Reply