Wordpress Duplicate Content

One of the problems with Wordpress is that it creates a lot of duplicate content, and there have been a lot of conversations on whether search engines, especially Google penalize you for duplicate content. It seems that it is not a penalty but a filter.

If you create a standard Wordpress site you will have same post display under the specific post, Archives, Categories, Feeds and Trackbacks. And the Archives and categories can create several duplicates by themselves depending on the site settings.

When a crawler visits the site it must decide which of these pages is most relevant, and the majority of time it does not pick the one you would. I have tested this on several sites and found that the Category pages usually get listed ahead or in place of the actual post.

There are 2 ways to cure this. One is with a robots.txt file and the other is with an IF statement in the code. The downside to using only a robots.txt file solution is that it is a blanket being thrown over a specific problem and sometimes you can trap a good page by accident.

A robots file should be used to block out some of the core files in Wordpress, as follows.

User-agent: *
Disallow: /category/
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-admin/
Disallow: /wp-
Disallow: /about/trackback/
Disallow: /wp-register.php
Disallow: /wp-login.php
Disallow: /trackback/
Disallow: /feed/

Now to stop all the other duplicate content place the following statement in the header file right before the first occurrence of Meta…

<?php if(is_home() | is_single() | is_page()){
echo ‘<meta name=”robots” content=”index,follow”>’;
} else {
echo ‘<meta name=”robots” content=”noindex,follow”>’;
}?>

This will make all pages that are not the Home page, or a post page or a static page tell the Robots NOT index, but follow all links.

With these changes the site should get a very clean and accurate index listing.

ShareThis

10 Responses

  1. Forest Parks Says:

    Thanks very very much, i’ll try and implement this tomorrow.

  2. Melina Says:

    very interesting. i’m adding in RSS Reader

  3. BlogRambler Says:

    The character should be a “single quote”, but it is being converted to an “apostrophe”

    Hope that gets it right.

  4. shambhavi Says:

    Hi there,

    Could you identify the characters you are referring to by name here:

    For some reason the character ‘ is being changed to `.

    Replace ` with ‘ and it should work fine.

    Is this a comma being replaced by a single quote? Hard to see!

  5. BlogRambler Says:

    For some reason the character ‘ is being changed to `.

    Replace ` with ‘ and it should work fine. I have seen this problem before when copying and pasting things from web pages.

  6. Rourke McNamara Says:

    Alpesh:

    I got the same error as you with the code snippet above. The following code does the same thing and did work for me:

    I’m using that on my site now.

  7. Alpesh Nakar Says:

    Parse error: syntax error, unexpected T_STRING, expecting ‘,’ or ‘;’

    Any idea why this is happening?
    Cheers!
    Alpesh

  8. Another way to prevent Google from spidering duplicate content: excerpts » wordpressgarage.com Says:

    […] in a previous post, Google penalizes sites for duplicate content. One way of solving this is modifying the robots.txt file. Daily Blog Tips describes another way to prevent Google from finding duplicate content on your […]

  9. Set up robots.txt to prevent Google from finding duplicate content » wordpressgarage.com Says:

    […] Read the solution at WordPress Duplicate Content on Blog Rambler>>  These icons link to social bookmarking sites where readers can share and discover new web pages. […]

  10. Using robots.txt to Stop Google From Finding Duplicate Content | David Paul Robinson Says:

    […] Rambler has a good post explaining how to setup your robots.txt file so that all the duplicate links that Wordpress generates (to your Archives, etc) don’t […]

Leave a Comment

Please note: Comment moderation is enabled and may delay your comment. There is no need to resubmit your comment.