You can use the Google Cache of a page to figure out if Google is handling your parameters, tag pages, etc., without creating problems. Click on the "Cached Snapshot" in the Google toolbar from URL and see if the path displayed in the cache matches the original URL. If it does not match, the cache is displaying the canonical URL and you know that Google has figured out your duplicate content and has mapped the pages together. This technique is particularly useful for any site that uses tracking parameters, controls display through or has canonical concerns.
I discovered this trick while investigating Semtech.com the other day, a semiconductor company that makes ICs for circuit protection and ESD protection . Semtech has complex needs because engineers want to locate products with specific parameters. Semtech solves this requirement with a very powerful CMS, but one with some SEO concerns.
The user navigation portion of the site has a hierarchical navigation system that put products in categories with subcategories while the parametric search calls products part number in the /products/ root directory. External links point to both versions, depending on the path the user took to find it. This canonicalization issue has largely been address by blocking the parametric search path via robots.txt, but the products continue to exist with two different URLs: the canonically deterministic URL and in the /products/ directory. For example, one of their power management product has a canonical URL of http://www.semtech.com/products/power_management/switching_regulators/sc2440/ but also exists at http://www.semtech.com/products/sc2440/.
If you look at the Cache for the non-canonical URL , http://www.semtech.com/products/sc2440/, you can see that Google shows the canonical URL in the Cache.
Leave a Reply