Large websites tend to have many useless pages associated with them. They may be caused by any of the following

poorly structured or poorly formatted user generated content
content duplication due to content management issues
canonical related issues
dangling nodes which act as PageRank sinks
navigational pages which are heavily duplicated and soak up link authority and do not provide a clean site structure

I recently have had a couple SEOs show me various navigational techniques which made thousands of thousands of somewhat similar mid level navigational pages.

Some pages make sense to be indexed and provide a great user experience if searchers land on them. Others provide a poor user experience.

Search engines do not like indexing search results from other engines, so if your navigational scheme has an element which acts similar to an internal search engine you probably do not want all those search pages getting indexed if they are heavily duplicates of one another.

I was talking to Stuntdubl the other day, and he stated one of the main things he likes to look at to get a general indication of the health of a site is to look at the ratio of quality pages indexed to total pages indexed from your site.

If lots of your indexed pages are heavily duplicated and/or of low value that may cause search engines to crawl or index your site less deeply and not index all your individual product level pages.

Published: August 11, 2006 by Aaron Wall in

Bite ¿Byte? Sized Content

Google Vertical Search Canibalizing Google's Organic SERPs

Comments

Cary

August 11, 2006 - 6:42pm

Hehe... that's one of the

Hehe... that's one of the tricks I've discovered this year. On my more important sites I block the SEs from indexing anything that isn't THE EXACT PAGE that I want people to find.

While some might worry about not having as many pages as possible indexed, my traffic has only gone up since implementing this.

It takes some brain work, but by using robots.txt, and also Meta Tags, you can pretty much get Google to only index the good stuff.

In fact my main site has a perfect ratio of indexed pages - zero filler, 100% tasty :)

blake

August 11, 2006 - 6:44pm

The main functional portion of my site is dynamic so that I can allow the content archives to be updated by only changing my XML file. This is user generated content based on the month that the user wants to view. Because of my concern for the search engines indexing dynamic content, I have created mirror pages made up of static html of the months that you can get to by the sitemap. Obviously I can only do this when a month is over.

Is this considered duplicate content in a bad way? Does the search engine index my XML file as well and look at this content as 3 times in repetition?

James

August 11, 2006 - 7:56pm

Cary - that sounds interesting - I use html sitemaps to ensure most of my site is indexed, however I find many visitors arrive on those sitemap pages rather than the pages with content they are looking for, and then leave.

Is there any way of ensuring the pages I'm pointing to are indexed by google without having these sitemap pages in the index?

Nancy

August 11, 2006 - 8:57pm

How would you recommend handling spidering for something like a blog, where the content on the site index changes frequently, and search engine listings often include content that is no longer on that page? People end up clicking through to the site index instead of the permalink page and being unable to find the content they are looking for.

But not letting engines index your site index would be a really bad idea, wouldn't it?

blake

August 11, 2006 - 9:33pm

I noticed that one of my main competitors that has a very large market share and a high pagerank has included the Meta "no index, no follow" on their main landing page. Why is that a good idea and how is that successful?

Jarrod Hunt

August 11, 2006 - 9:34pm

Rand,

If your reading this post.

This might be an interesting item to include on your page strength tool.

Ronny

August 12, 2006 - 10:11am

James: You can always use meta tags (I assume you're not talking of Google Sitemaps) setting index to "noindex, follow"

This way Google will follow all links of that page but it won't index the page itself.

Nancy: Good question. I'd like to get more info on this topic, too. If you have found out anything, please let me know!

Aaron Wall

August 12, 2006 - 1:13pm

Hi Blake
Not sure if you need to make a mirror html file if you already have a php one or something like that. Many sites have feeds and XML files, so that probably shouldn't be too big of a problem since your link structure will most likely reinforce your home page or main blog page as being more authoritative than your feed, and many feeds are not full text feeds in nature (using only snippets).

One note of caution with feeds is that you want to make sure you host your feed on your own site. If you host it on an external more authoritative domain that can cause subscribe transfer problems if you ever leave them AND I have seen some search engines rank those feeds on more authoritative domains (like Feedburner) above the original domain the content came from.

Hi James
You don't need sitemaps if your internal site structure has clean logical hierarchical links and clean URLs. Sitemaps are nice for giving you another means to get indexed, a way to focus / alter / change internal anchor text, and a way to redistribute your PageRank / link authority, but if you have a small site that is manually crafted or a rather advanced content management system you should be able to integrate some of those ideas directly into your site without using a sitemap.

Hi Nancy
I think if your site gets enough authority (via legitimate links) if the search engine is good then eventually they should get better at indexing it and sending people to the correct page.

I definently would want to get my main page indexed, but you also want to ensure that your sub-pages have clean URLs and are only getting indexed at one location.

Hi Blake
I can't really state why anyone would use a technique and be certain, plus their use of a technique may help or hinder them.

Larry

August 13, 2006 - 9:39am

Aaron/Others: Can you elaborate on bullet point # 4 above, namely "dangling nodes that act as PageRank sinks." I'm wrestling with a Drupal site at the moment that creates all kinds of nodes. Wondering if you've unravelled a part of the mystery?

Aaron Wall

August 13, 2006 - 10:30am

A dangling node is a page which has inbound links, but does not filter the link authority back to other pages within the site.

Greg

August 14, 2006 - 3:35pm

Cary, That sounds interesting to me but it seems like a lot of trouble. Were you having issues with the se sendign users tot he wrong pages who then balked and moved on? I always sort of figured that I'd make content as best as I could and hope that the search engine sent the right people to the right places. Otherwise it seems like I'd only be getting traffic for one term or concept vs. the many that a whole site typically contains. That may just be fancy talk though to mask my laziness. Either way, kudos.

Aaron, I appreciate the def for dangling nodes. I'd never really thought about it before.

Indexed Page Quality Ratio

Comments

Add new comment

Have a question?

I'd like to learn more about:

About SEOBook