When youâ€™re putting together a web site, the content for that site often presents one of the greatest challenges, especially if itâ€™s a site that includes hundreds of pages. Many people opt to purchase bits of content, or even scrape content from other web sites to help populate their own. These shortcuts can cause real issues with search engines.
Say your web site is about some form of marketing. Itâ€™s very easy to surf around the Web and find hundreds (or even thousands) of web sites from which you can pull free, permission-granted content to include on your web site. The problem is that every other person or company creating a web site could be doing the same thing. And the result? A single article on a topic appears on hundreds of web sites â€” and users arenâ€™t finding anything new if they search for the topic and every site has the same article.
To help combat this type of content generation, some search engines now include as part of their search algorithm a method to measure how fresh site content is. If the crawler examines your site and finds that much of your content is also on hundreds of other web sites, you run the risk of either ranking low or being delisted from the search engineâ€™s indexing database.
Some search engines now look for four types of duplicate content:
Highly distributed articles.
These are the free articles that seem to appear on every single web site about a given topic. This content has usually been provided by a marketing-savvy entrepreneur as a way to gain attention for his or her project or passion. But no matter how valuable the information, if it appears on hundreds of sites, it will be deemed duplicate and that will reduce your chances of being listed high in the search result rankings.
Product descriptions for e-commerce stores.
The product descriptions included on nearly all web pages are not included in search engine results. Product descriptions can be very small and depending on how many products youâ€™re offering, there could be thousands of them. Crawlers are designed to skip over most product descriptions. Otherwise, a crawler might never be able to work completely through your site.
Duplicate web pages.
It does no good whatever for a user to click through a search result only to find that your web pages have been shared with everyone else. These duplicate pages gum up the works and reduce the level at which your pages end up in the search results.
Content that has been scraped from numerous other sites.
Content scraping is the practice of pulling content from other web sites and repackaging it so that it looks like your own content. Although scraped content may look different from the original, it is still duplicate content, and many search engines will leave you completely out of the search index and the search results.