Between a quarter and about a third of the content on the World Wide Web repeats itself. According to Google's head of search spam, Matt Cutts, around 25-30% of web content is duplicate. Your website is also likely to have duplicate content, even if it follows web content writing rules. In this post, we will touch upon the reasons and risks of duplication, as well as review useful modules that fix duplicate content in Drupal.
What is duplicate content?
Duplicate content is defined as identical or similar one found at different web addresses. These URLs can be within the same domain or across different ones.
Common reasons for duplicate content
Reasons for duplicate content in Drupal are pretty much the same as in other CMSs. They vary from unintentional to malicious, and from purely technical to human-created. Among the most common ones are:
- scraped content (copied without permission)
- syndicated content (shared by agreement)
- HTTP and HTTPS versions of pages
- WWW and non-WWW versions of pages
- printer-friendly versions of pages
- different user session IDs generating different URLs
- almost identical product descriptions in e-commerce stores (for example, Drupal Commerce)
- identical pieces of site-wide content (for example, in the footer)
What are the risks of duplicate content?
Duplicate content may result in losing your Google rankings. Google can’t show all duplicate results but only one from the “cluster” of duplicates.
In addition, Google rewards uniqueness as added value. Sites with no unique content will find it harder to get good rankings.
SEO experts always take care to keep their websites from being penalized algorithmically by Google for low-quality content, including copied content. That’s why they try to keep their sites unique.
Google admits there may be strict penalties in cases when the behavior of a website with many duplicates is found to be manipulative.
Modules that deal with duplicate content in Drupal
Getting rid of duplicates completely is neither possible nor necessary. However, there are contributed modules to deal with duplicate content in Drupal that will make your website much cleaner from copies.
The Taxonomy Unique module
By default, Drupal allows for creating taxonomy terms with the same names within the same vocabulary. As part of measures to avoid duplicate content in Drupal 7 and 8, the Taxonomy Unique module can prohibit that.
Whenever someone tries to create a term that already exists, Drupal shows an error message saying the term name already exists. The feature can be individually enabled for each vocabulary. The error message is also customizable.
The Unique Content Title module
Thanks to the Unique Content Title module, you can require unique titles for each node of a content type. You will need to check the “Unique title” option in the “Submission form settings” of a content type. The module is only available for Drupal 7.
The Suggest Similar Titles module
Here is another module to avoid duplication on node titles. This Drupal 7 module is at the pre-release stage for Drupal 8.
During node creation, the Suggest Similar Titles module compares the new node titles to existing ones, and informs you if they match. The module settings allow to you:
- decide on which content types to use the feature
- add keywords to ignore
- specify the allowed percentage of text matching
- ask to check node permissions
- limit the number of titles to show in suggestions
The Copy Prevention module
The Copy Prevention module helps you protect your texts and images from being copied on the Internet. Among the ways to do it are:
- disabling text selection
- disabling copy to clipboard
- disabling right-click context menu on all content or on images
- placing transparent image above your images
- hiding your images from indexing
The Intelligent Content Tools module
Here is an interesting module to fix duplicate content in Drupal, although it is in the development version.
The Intelligent Content Tools module consists of three submodules. It offers auto tagging, text summarization, and duplicate content identification for your website.
The module tells you when some duplicates have been found. Its work is based on Natural Language Processing.
The Redirect module
Since 301 redirects and canonical URLs are among the ways to fix duplicates, the module is helpful in this sphere. It lets you:
- manually create redirects
- automatically redirect to canonical URLs (taking care of trailing slashes, language prefixes, and so on)
- automatically create redirects in case of URL alias change
The Redirect module for Drupal 8 comes with two submodules: Redirect 404 and Redirect Domain. They help you, respectively:
- track and fix 404 errors
- create cross-domain redirects
A bunch of other modules are recommended to use in cooperation with the Redirect:
- Multi-path autocomplete as help with entering paths
- Pathologic for transforming relative links in content to absolute URLs
- Match Redirect for redirecting according to path patterns with wildcards
- Pathauto for automatic generation of path redirects, which we will now look at more closely
The PathAuto module
The purpose of the PathAuto module is generating human-readable and SEO-friendly URLs according to chosen patterns. By default, URL aliases in Drupal look like this: /node/123. The Pathauto module can automatically change them to something like /category/my-node-title.
The module will play an important part in website clean-up from duplicates. The Pathauto module quietly and reliably redirects URLs to the new paths based on the pattern, with no confusion for search engines or broken links for users.
Let’s fix duplicate content on your Drupal website
Each of the above modules will add a helpful touch to fixing duplicate content in Drupal. If you need help using them, or doing a comprehensive website check and clean-up with a variety of tools, feel free to contact our Drupal team. Stay unique — and both search engines and users appreciate that!