Duplicate content is exactly what you would think it is: content duplicated across a website or elsewhere on the Internet. Even if you are careful about creating new and unique content, your website might be suffering from duplicate content and not even know it. This article will demonstrate how to identify and remedy duplicate content so that you can easily avoid the potential negative impacts of content duplication on search rankings.
What is duplicate content?
Google defines duplicate content as "substantive blocks of content within or across domains that either completely match other content or are appreciably similar." Duplicating content can be done maliciously, but most often is done without deceit in mind.
Why does duplicate content matter?
To understand why duplicate content is bad for your website, it will help to first understand the situation from Google and other search engines' point-of-view. The goal of these search engines is to provide the most relevant content for their users, duplicate content reduces the relevancy of content, and relevancy is key when it comes to search results.
For example, if 10 urls have the same information repeated, search engines like Google will make a decision on what it thinks is the originator of the content. It can then push the other “copies” into the supplemental index that no one sees.
Alternatively, search engines can lower a website's position in the results page or unindex the content altogether. These penalties will make it harder for people searching to find your site's content, which can ultimately cost you business. It is important to note that you really have to abuse duplicate content (we are talking hundreds of pages here) for the serious penalties to go into effect.
How do I identify duplicate content?
Duplicate content can take several forms. It can be the actual content on the page, or it can be things like duplicate page titles and duplicate meta descriptions. Here are some of the best free tools out to identify duplicate content on your site.
You might be surprised at how many people just copy and paste blocks of text across their site. Siteliner checks a website against itself to make sure you have not duplicated your own content. After typing a website URL into the site, Siteliner displays a results page of all of the pages it scanned. Clicking on any of the page URLs will direct you to that page which is color coded based off of the content that has been shared between pages.
Copyscape is owned by the same company as Siteliner. Instead of looking for duplicate content within a single site, Copyscape checks the rest of the internet for plagiarism. After you type in the site you want to check, Copyscape will show you a results page of the pages(if any) that have copied your content.
Screaming Frog is a great tool that crawls a website similar to how search engines do, records key onsite SEO elements, and displays them in a table exportable to Excel. You can filter the table by many categories including URL, status code (404, 301, 302, etc.), H1 tags, page titles, and meta descriptions.
For example, if you were to filter the results by page titles, Screaming Frog would organize the rows of data so that you can see the page titles for each of the pages crawled along with the URLs that have duplicate title tags or if any URLs are missing title tags completely.
How do I fix duplicate content?
All content should have a meaningful purpose for the user. To make sure you create content right the first time, avoid using content from other parts of a site just to fill space. Realize that using the same page titles and meta descriptions for different pages does not help differentiate the pages you have worked so hard to make accessible to users. Here are a few additional tips for solving duplicate content on your website.
Configure the Canonical Tag
One cause of duplicate content arises when all associated URLs fail to point to the same site (think www.example1.com and example1.com being counted as two different sites just because the latter does not start with "www"). Use the “rel=canonical” tag to tell the search engine, “Even though you see example1.com as a different page than www.example1.com, its really just a copy, so send all of its ranking power to www.example1.com.”
Use 301 Redirects
A second way to resolve duplicate content issues is to create 301 redirects. A 301 redirect tells search engines that page X was permanently moved to page Y, so when someone clicks on a link to page X, they will arrive at page Y. A 301 redirect also transfers all of page X’s ranking power to page Y.
Why would you do this? Think about a site about soccer with a page dedicated to soccer balls and another page dedicated to Adidas soccer balls. If the soccer ball page only features content about Adidas soccer balls, there is really no point in having two pages. In this case, best practice is to 301 redirect whichever page has the least amount of traffic to the page with the most amount of traffic to transfer the redirected page’s ranking power and authority to the other page in question, ultimately increasing organic search rankings.
Use No-Index, Follow on Links
A third way to resolve duplicate content issues is to use the “noindex, follow” meta tag. This tag allows search engine robots crawling a site to follow the links on that page, but it will not include that page in search engine indexes. This technique works well with pages that have pagination issues like www.example1.com/blog/page2 (essentially a directory of your blog posts) ranking higher than an individual blog post.
The fourth option is to simply rewrite your content. While this might be the most time consuming option, it could also reap the most reward. If you rewrite your content to not only get rid of the duplicate content but also make it more relevant to your user, a boost in organic search rankings could follow.
A good tool to help when rewriting is nTopic, which analyzes the content on a page and provides long-tail keywords that might prove helpful to increase the topical relevancy of your page and ultimately help the page build more authority on the subject. nTopic is also great for helping improve evergreen content.
For additional reading on duplicate content, check out The Illustrated Guide to Duplicate Content in Search Engines on Moz Blog.
Matt is an observer. His keen attention to detail and knack for noticing the little differences set him apart. Coupled with his ability to think laterally, this skill creates an opportunity for him to think of innovative and unique ideas designed to help clients meet their goals. At Knowmad, Matt focuses on SEO and PPC services.