The duplicate-content problem
A single web page is almost never reachable through a single URL. The same article might live at http://site.com/post, https://site.com/post, https://www.site.com/post, https://site.com/post/, and https://site.com/post?utm_source=newsletter. Every one of those strings is, to a search engine, a distinct address. When a crawler finds the same content at five addresses, it has to guess which one you actually want ranked — and it splits the links, relevance, and authority that should have accumulated on one page across all five.
This is duplicate content, and it rarely triggers a penalty. The real cost is quieter: diluted ranking signals, wasted crawl budget, and the wrong URL showing up in results. The rel="canonical" tag is the standard fix. You can build a correct one in seconds with the Canonical Tag Generator, but it’s worth understanding what the tag does before you scatter it across a site.
What rel=canonical actually does
A canonical tag is a single line in the <head> of an HTML document:
<link rel="canonical" href="https://site.com/post">
It tells search engines: “Several URLs may show this content, but the master copy lives here.” Crawlers then treat the canonical URL as the one to index and consolidate the ranking signals from the duplicates onto it. It behaves a little like a 301 redirect for ranking purposes — equity flows to the target — except the duplicate URLs stay live and accessible to users. That’s the key difference: a redirect moves people, a canonical only moves ranking signals.
Importantly, a canonical is a hint, not a directive. Google usually honours it, but if your other signals contradict the tag — internal links, sitemaps, and redirects all pointing elsewhere — it may pick a different canonical. Consistency across signals is what makes the tag stick.
Self-referencing canonicals
The most common and most useful pattern is the self-referencing canonical: every page points its canonical at its own clean URL. At first this sounds pointless — why would a page declare itself the master of itself? The answer is pre-emption. The moment a marketing campaign appends ?utm_source=twitter, or a session ID slips into the URL, or someone links to the http version, you suddenly have a duplicate. A self-referencing canonical means the clean URL is already declared as the master, so those variants resolve correctly without any extra work.
Set this once, sitewide, as part of your template, and most duplicate-content issues never appear. The Canonical Tag Generator outputs the exact normalised URL you should self-reference — with HTTPS forced, the domain lowercased, and tracking parameters stripped.
When to use a canonical (and when not to)
Reach for a canonical when the duplicates should stay live but only one should rank:
- Tracking-parameter URLs — campaign links with UTM tags pointing back to the clean page.
- Faceted or sorted listings —
/shoes?sort=pricecanonicalising to/shoes. - Printer-friendly or AMP variants — alternate renderings of the same article.
- Syndicated content — a guest post canonicalising to the original on your own domain.
- Cross-domain duplicates — the same product description on several regional stores.
Use a different tool when the goal is different. If a page should be removed from search entirely, use noindex. If a URL has permanently moved, use a 301 redirect. Canonical is specifically for “keep both, rank one.”
The mistakes that hurt
Canonical tags are easy to get subtly wrong, and the failures are silent — you won’t see an error, you’ll just see the wrong page (or no page) ranking.
- Canonicalising everything to the homepage. A frequent automated-plugin bug. It tells Google your 500 articles are all duplicates of your home page, and they vanish from results.
- Relative URLs.
href="/post"is technically valid but fragile; a parsing quirk can resolve it against the wrong base. Always use an absolute URL with the scheme and full domain. - Canonical to a redirected or 404 URL. The target must be a live, indexable page that returns 200. Pointing at a redirect creates a chain crawlers may ignore.
- Multiple canonical tags on one page. Two conflicting tags cause Google to ignore both.
- Mismatched signals. A canonical saying one thing while your sitemap, internal links, and hreflang say another. Make them agree.
- Combining canonical with noindex. Contradictory instructions — one says “rank this other URL,” the other says “don’t rank this at all.” Pick one.
The trailing slash deserves special mention. /page and /page/ are different URLs to a server, and if both resolve you’ve created a duplicate. Decide on one convention for your whole site and make the canonical enforce it — that’s exactly what the trailing-slash option in the Canonical Tag Generator is for.
Canonical via HTTP header
HTML pages get the <link> tag, but non-HTML resources — PDFs, images, downloadable files — have no <head> to put it in. For those you send the canonical as an HTTP response header instead:
Link: <https://site.com/whitepaper.pdf>; rel="canonical"
This is how you stop a PDF that’s reachable from several paths from competing with itself. The generator produces this header alongside the HTML tag so you can hand it straight to your server config.
A quick workflow
- Decide your sitewide URL conventions: HTTPS only,
wwwor bare domain, trailing slash or not. - Add a self-referencing canonical to your page template using those conventions.
- For known duplicates (campaign URLs, faceted pages), canonical them to the clean version.
- Verify in Google Search Control’s URL Inspection tool that Google’s chosen canonical matches yours.
- Keep your sitemap, internal links, and redirects consistent with the canonical.
Once your conventions are set, generating the tags themselves is mechanical — paste a URL into the Canonical Tag Generator, copy the output, and move on. Canonicalisation also pairs naturally with how you structure outbound links; if you manage marketing URLs, the guide to cloaking and tidying affiliate links covers the parameter-handling side of the same problem.
The takeaway
Canonical tags solve one specific problem well: telling search engines which URL is the original when content is reachable through many. Use absolute URLs, prefer self-referencing canonicals, keep your signals consistent, and never point everything at the homepage. Get those right and you stop leaking ranking signals to duplicates you didn’t even know you had.