If Links Didn't Matter...

David Berkowitz recently wrote an article asking what if links lost their value? Over the past year real editorial links have only increased in value, as Google has been more aggressively requiring some minimum PageRank threshold to even index a page.

Many types of links have lost value as Google has got better at filtering link quality, but will editorial links ever lose their value? To answer that you have to realize that the reason links have value is that they are typically a proxy for trust based on social relationships or human judgement.

But links are openly gamed today and there are an increasing number of affordable marketing techniques that allow virtually any site to garner hundreds or thousands of quality links.

One day Google might come up with better ways to determine what to trust, but if they do, it is going to be based on who humans trust more, and who amongst those trusted sources does the best job of providing editorial value and noise filtering on their site. And this internal site filtering will become even more important as many hub sites leverage their brand and allow communities to contribute content to their sites.

There is one part of David's article that I think is off though, and that is the part on the keyword density:

Keyword density, the imperfect science of including just enough of the most important keywords on any given page without spamming the search engines, becomes more important than ever.

I don't think keyword density will be the answer to anything. I think a more appropriate phrase might be linguistic and attention based profiling.

Attention Profiling:

If links (and link acquisition rate) are a sign of quality, then likely so are RSS subscribers and RSS readers, as well as brand related search queries, custom search engine entries, instant message mentions, email mentions, and repeat visitors. Those are a few examples of attention based profiling.

Linguistic Profiling:

If you are the person that people are talking about then you are also going to help shape your topic's language. You may make up many of the new words used in your industry and your name may even be a core keyword in your industry.

You are not going to match your language better than the competition by caring about keyword density. The way you beat them is to have more market attention and work your business and name into the industry language.

View All Your Google Supplemental Index Results

[Update: use this supplemental ratio calculator. Google is selfish and greedy with their data, and broke ALL of the below listed methods because they wanted to make it hard for you to figur out what pages of your site they don't care for. ]

A person by the nickname DigitalAngle left the following tip in a recent comment

If you want to view ONLY your supplemental results you can use this command site:www.yoursite.com *** -sljktf

Why Are Supplemental Results Important?

Pages that are in the supplemental index are placed there because they are trusted less. Since they are crawled less frequently and have less resources diverted toward them, it makes sense that Google does not typically rank these pages as high as pages in the regular search index.

Just how cache date can be used to view the relative health of a page or site, the percent of the site stuck in supplemental results and the types of pages stuck in supplemental results can tell you a lot about information architecture related issues and link equity related issues.

Calculate Your Supplemental Index Ratio:

To get your percentage of supplemental results you would divide your number of supplemental results by your total results count

site:www.yoursite.com *** -sljktf
site:www.yoursite.com

What Does My Supplemental Ratio Mean?

The size of the supplemental index and the pages included in it change as the web grows and Google changes their crawling priorities. It is a moving target, but one that still gives you a clue to the current relative health of your site.

If none of your pages are supplemental then likely you have good information architecture, and can put up many more profitable pages for your given link equity. If some of your pages are supplemental that might be fine as long as those are pages that duplicate other content and/or are generally of lower importance. If many of your key pages are supplemental you may need to look at improving your internal site architecture and/or marketing your site to improve your link equity.

Comparing the size of your site and your supplemental ratio to similar sites in your industry may give you a good grasp on the upside potential of fixing common information architecture related issues on your site, what sites are wasting significant potential, and how much more competitive your marketplace may get if competitors fix their sites.

Google Using Search Engine Scrapers to Improve Search Engine Relevancy

If something ranks and it shouldn't, why not come up with a natural and easy way to demote it? What if Google could come up with a way to allow scrapers to actually improve the quality of the search results? I think they can, and here is how. Non-authoritative content tends to get very few natural links. This means that if it ranks well for competitive queries where bots scrape the search results it will get many links with the exact same anchor text. Real resources that rank well will tend to get some number of self reinforcing unique links with DIFFERENT MIXED anchor text.

If the page was ranking for the query because it was closely aligned with a keyword phrase that was in the page title, internal link structure, and is heavily represented on the page itself that could cause the page to come closer and closer to the threshold of looking spammy as it picks up more and more scraper links, especially if it is not picking up any natural linkage.

How to Protect Yourself:

  • If you tend to get featured on many scraper sites make sure you change your page titles occasionally on your most important and highest paying pages.

  • Write naturally, for humans, and not exclusively for search bots. If you are creating backfill content that leverages a domain's authority score, try to write articles like a newspaper. If you are not sure what that means look at some newspapers. Rather than paying people to write articles optimized for a topic, pay someone else to do it who does not know much about SEO. Tell them to ensure they don't use the same templates for the page titles, meta descriptions, and page headings.
  • Use variation in your headings, page titles, and meta description tags.
  • Filters are applied at different levels depending on domain authority and page level PageRank scores. By gaining more domain authority it should help your site bypass some filters, but that may also cause your site to be looked at with more scrutiny by other types of filters.
  • Make elements of your site modular so you can quickly react to changes. For example, many of my sites use server side includes for the navigation, which allows me to make the navigation more or less aggressive depending on the current search algorithms. Get away with what you can, and if they clamp down on you ease off the position.
  • Get some editorial deep links with mixed anchor text to your most profitable or most important interior pages, especially if they rank well and do not get many natural editorial votes on their own.
  • Be actively involved in participating in your community. If the topical language changes without you then it is hard to stay relevant. If you have some input in how the market is changing that helps keep your mindshare and helps ensure you match your topical language as it shifts.

New Directory, URL, & Keyword Phrase Based Google Filters & Penalties

WebmasterWorld has been running a series of threads about various penalties and filters aligned with specific URLs, keyword phrases, and in some cases maybe even entire directories.

Some Threads:

There is a lot of noise in those threads, but you can put some pieces together from them. One of the best comments is from Joe Sinkwitz:

1. Phrase-based penalties & URL-based penalties; I'm seeing both.
2. On phrase-based penalties, I can look at the allinanchor: for the that KW phrase, find several *.blogspot.com sites, run a copyscape on the site with the phrase-based penalty, and will see these same *.blogspot.com sites listed...scraping my and some of my competitors' content.
3. On URL-based penalties allinanchor: is useless because it seems to practically dump the entire site down to the dregs of the SERPs. Copyscape will still show a large amount of *.blogspot.com scraping though.

Joe has a similar post on his blog, and I covered a similar situation on September 1st of last year in Rotating Page Titles for Anchor Text Variation.

You see a lot more of the auto-gen spam in competitive verticals, and having a few sites that compete for those types of queries helps you see the new penalties, filters, and re-ranked results as they are rolled in.

Google Patents:

Google filed a patent application for Agent Rank, which is aimed at allowing them to associate portions of page content, site content, and cross-site content with individuals of varying degrees of trust. I doubt they have used this much yet, but the fact that they are even considering such a thing should indicate that many other types of penalties, filters, and re-ranking algorithms are already at play.

Some Google patents related to phrases, as pointed out by thegypsy here:

Bill Slawski has a great overview post touching on these patent applications.

Phrase Based Penalties:

Many types of automated and other low quality content creation cause the low quality pages to barely be semantically related to the local language, while other types of spam generation cause low quality pages to be too heavily aligned to the local language. Real content tends to fall within a range of semantic coverage.

Cheap or automated content typically tends to look unnatural, especially when you move beyond comparing words to looking at related phrases.

If a document is too far off in either direction (not enough OR too many related phrases) it could be deemed as not relevant enough to rank, or a potential spam page. Once a document is flagged for one term it could also be flagged for other related terms. If enough pages from a site are flagged a section of the site or a whole site can be flagged for manual review.

URL and Directory Based Penalties:

Would it make sense to prevent a spam page on a good domain for ranking for anything? Would it make sense for some penalties to be directory wide? Absolutely. Many types of cross site scripting errors and authority domain abuses (think rented advertisement folder or other ways to gain access to a trusted site) occur at a directory or subdomain level, and have a common URL footprint. And cheaply produced content also tends to have section wide footprints where only a few words are changed in the page titles across an entire section of a site.

I recently saw an exploit on the W3C. Many other types of automated templated spam leave directory wide footprints, and as Google places more weight on authoritative domains they need to get better at filtering out abuse of that authority. Google would love to be able to penalize things in a specific subdomain or folder without having to nuke that entire domain, so in some cases they probably do, and these filters or penalties probably effect both new domains and more established authoritative domains.

How do You Know When You are Hit?

If you had a page which typically ranked well for a competitive keyword phrase, and you saw that page drop like a rock you might have a problem. Other indications of problems are if you have inferior pages that are ranking where your more authoritative page ranked in the past. For example, lets say you have a single mother home loan page ranking for a query where your home loan page ranked, but no longer does.

Textual Community:

Just like link profiles create communities, so does the type and variety of text on a page.

Search results tend to sample from a variety of interests. With any search query there are assumed common ideas that may be answered by a Google OneBox, related phrase suggestions, or answered based on the mixture of the types of sites shown in the organic search results. For example:

  • how do I _____

  • where do I buy a ____
  • what is like a _____
  • what is the history of ______
  • consumer warnings about ____
  • ______ reviews
  • ______ news
  • can I build a ___
  • etc etc etc

TheWhippinpost had a brilliant comment in a WMW thread:

  • The proximity, ie... the "distance", between each of those technical words, are most likely to be far closer together on the merchants page too (think product specification lists etc...).

  • Tutorial pages will have a higher incidence of "how" and "why" types of words and phrases.
  • Reviews will have more qualitative and experiential types of words ('... I found this to be robust and durable and was pleasantly surprised...').
  • Sales pages similarly have their own (obvious) characteristics.
  • Mass-generated spammy pages that rely on scraping and mashing-up content to avoid dupe filters whilst seeding in the all-important link-text (with "buy" words) etc... should, in theory, stand-out amongst the above, since the spam will likely draw from a mixture of all the above, in the wrong proportions.

Don't forget that Google Base recently changed to require certain fields so they can help further standardize that commercial language the same way they standardized search ads to have 95 characters. Google is also scanning millions of books to learn more about how we use language in different fields.

Historical Search Spam Patterns and Link Reciprocation

Some people are wildly speculating that Google and other engines may create historical databases of SEOs and site relationships to identify spam. I have no doubt that some sites that go way too far stay penalized for a long time, and that some penalties may flag related sites for review, but I think search engines have enough data and most people leave enough footprints that search engines do not have to dig too deep into history to connect the dots. And there is little upside in them connecting the dots.

If they did connect the dots manually that would take a long time to do it broadly, and if they did it automatically they would run into problems with false relationships. Some sites I once owned were sold to people who do not use them to spam. If ownership relationships took sites out by proxy I could just create spam sites using a competitors details in the Whois data , or heavily link to their sites from the spam sites.

Where people run into problems with spamming is scalability. If you scale out owning many similar domains you are probably going to leave some sort of footprint: cross linking, affiliate ID codes, AdSense account numbers, analytics tracking scripts, a weird page code, similar site size, similar inlink or outlink ratios, similar page size, or maybe some other footprint that you forgot to think of.

Many of those things can be spoofed too, (what is to prevent me from using your AdSense ID on spam?), so in many cases there has to be a hybrid of automated filtering and flagging and manual review.

And even if you are pretty good at keeping your sites unique on your end, if you outsource anything they are going to have a limited network size, likely a routine procedure with footprints, and if their prices are low they are probably going to be forced to create many obvious footprints to stay profitable. And if you use reciprocal or triangular links associated with those large distributed link farms that puts you in those communities far more than some potential historical relationship of some sort. By linking to it you confirm the relationship.

Search engines do not want to ban false positive, so many spammy link related penalties just suppress rankings until the signs of spam go away. Remove the outbound reciprocal link page that associates you with a bad community, get a few quality links, and watch the rankings shoot up. The thing is, once a site gets to be fairly aggressively spammy it rarely becomes less spammy. If it was created without passion it likely dies then turns into a PPC domainer page with footprints. Hiding low value pages deep in the index until the problem goes away is a fairly safe idea for search engineers, because after a domain has been burned it rarely shifts toward quality unless someone else buys it.

Hidden Content Costs & Understanding Your Profit Potential Per Page

Ever since Google has got more selective with what they will index, the model for profitable SEO changed from chucking up pages and hoping some of them are profitable, to where it makes sense to put more strategy into what you are willing to publish.

The Supplemental Index Hates Parasitic SEO:

Each site will only get so many pages indexed given a certain link authority. And each of those pages will rank based on the domain's authority score, and the authority of the individual page, but each page needs a minimum authority score to get indexed and stay out of the supplemental results - this is how Google is trying to fight off parasitic SEO.

Given that many people are leveraging trusted domains, it makes sense that if you have one that you leverage it in a way that makes sense. CNN will rank for a lot of queries, but it does not make sense for Google to return nothing but CNN. It is good for the health of Google to have some variety in their search results. This is why smaller sites can still compete with the bigger ones, Google needs to use the smaller sites to have variety and to have leverage over the larger sites...to keep the larger sites honest if they are too aggressive in leveraging their authority, or have holes that others are exploiting.

Extending a Profitable Website:

If you have a 100 page niche website you may be able to expand it out to 500 pages without seeing too much of a drop in revenue on those first 100 pages, but eventually you will see some drop off where the cost of additional content (via link authority that it pulls from other pages on your site) nearly matches the revenue potential of the new pages. And then at some point, especially if you are not doing good keyword research, have bad information architecture, create pages that compete with other pages on your site, are not actively participating in your market (gaining links and mindshare), or if you are expanding from a higher margin keyword set to a lower margin one, you may see revenues drop as you add more pages.

The solution to fix this problem is build editorial linkage data and stop adding pages unless they have a net positive profit potential.

What are the costs of content?

  • the time and money that went into creating it

  • link equity (and the potential to be indexed) that the page takes from other pages
  • the mindshare and effort that could have been used doing something potentially more productive
  • the time it takes to maintain the content
  • if it is bad or off topic content, anything that causes people to unsubscribe, hurts conversion rates, or lowers your perceived value is a cost

How can a Page Create Profit?

  • anything that leads people toward telling others about you (links or other word of mouth marketing) is a form of profit

  • anything that makes more people pay attention to you or boosts the credibility of your site is a form of profit
  • anything that thickens your margins, increases conversion rates, or increases lifetime value of a customer creates profit
  • anything that reduces the amount of bad customers you have to deal with is a form of profit

Mixing Up Quality for Profit Potential:

I am still a firm believer in creating content of various quality levels and cost levels, using the authoritative content to get the lower quality content indexed, and using the lower quality content earnings to finance the higher quality ideas, but rather than thinking of each page as another chance to profit it helps to weigh the risks and rewards when mapping out a site and site structure.

Increasing Profit:

Rather than covering many fields broadly consider going deeper into the most profitable areas by

  • creating more pages in the expensive niches

  • making articles about the most profitable topics semantically correct with lots of variation and rich unique content
  • highly representing the most valuable content in your navigational scheme and internal link structure
  • creating self reinforcing authority pages in the most profitable verticals
  • requesting visitors add content to the most valuable sections or give you feedback on what content ideas they would like to see covered in your most valuable sections
  • If your site has more authority than you know what to do with consider adding a user generated content area to your site

Take Out the Trash:

If Google is only indexing a portion of your site make sure you make it easy for them to index your most important content. If you have an under-performing section on your site consider:

  • deweighting it's integration in the site's navigational scheme and link structure

  • placing more internal and external link weight on the higher performing sections
  • if it does not have much profit potential and nobody is linking at it you may want to temporarily block Googlebot from indexing that section using robots.txt, or remove the weak content until you have more link authority and/or a better way to monetize it

User Generated Tags Are Useless Noise

A current dumb, but popular, trend is to get user to tag pages.

How valuable is a Technorati tag page to a Google user? Probably just about worthless, IMHO. The only reason they exist is that it gives bloggers crumbs of exposure in exchange for their link equity, and it gives Technorati a way to build authority and get an automated scraper to pass as real content. Other large sites have started following this tag example, and allow users to use non-descriptive labels like 2000, hip, and cool to tag their content. As if this tag noise was not bad enough for people trying to look past the clutter and actually find something, some of these sites use the tags to create additional content pages.

What are these page? They are a perfect example of low information quality pages. Some dumb content management systems and blog plug-ins take the noise one step further by cross referencing the tags, having a virtually infinite set of tags that will keep generating more cross referenced tag pages until search engines get sick of wasting their bandwidth and your link equity indexing the low value garbage.

A set of loosely defined tag pages is no better than a low quality search result page. Search engines have long ago decided they generally didn't want to index the search results from other search engines. When too many of their own results are these noisy tag pages eventually they are going to turn against tags...maybe not via any official statements, but some sites will just not rank as well.

Search engines react to the noise in the marketplace then the marketplace creates new types of noise to pollute the SERPs. Then the search engines react to the noise in the marketplace. Then the marketplace creates new types of noise to pollute the SERPs. Tags are noise and they will have their day.

Why would you want to let users outside of your business interests control your information architecture and internal link structure when Google is getting picky about what they are willing to index? Why waste your link equity and bandwidth?

Wasting Link Authority on Ineffective Internal Link Structure

If you put one form of internal navigation in parallel with another you are essentially telling search engines that both paths and both subset pages are of the same significance. Many websites likely lose 20% or more of their potential traffic due to sloppy information architecture that does not consider search engines.

Many people believe that having more pages is always better, but ever since Google got more aggressive with duplicate content filters and started using minimum PageRank thresholds to set index inclusion priorities that couldn't be further from the truth. Shoemoney increased his Google search traffic 1400% this past month by PREVENTING some of his pages from being indexed. Some types of filtering are good for humans while being wasteful for search engines. For example, some people may like to sort through products by price levels or look at different sizes and colors, but pages that are almost duplicate with the exception of price point, size, model number, or item color may create near duplicate content that search engines do not want to index.

If you are wasting link equity getting low value noisy pages indexed then your high value pages will not rank as well as they could because you wasted link equity getting low value pages indexed. In some cases getting many noisy navigational pages indexed could put your site on a reduced crawling status (shallow crawl or less frequent crawl) that may preclude some of your higher value long tail brand specific pages from getting indexed.

More commonly searches that have some sort of filter associated with them will be associated specific brands rather than how we sort through those brands via price-points. Plus the ads for those terms tend to be more expensive as well.

The reasons brands exist is that they are points of differentiation that allow us to charge non commodity prices for commodities. That associated profit margin and marketing driven demand is why there is typically so much more money in branded terms than other non-brand related filters.

When designing your site's internal link structure make sure that you are not placing noisy low value pages and paths in parallel or above higher value paths and pages.

Search Personalization Will Not Kill SEO

Many people are syndicating the story that search personalization will kill SEO. Nothing could be further from the truth. Each time search engines add variables to their ranking algorithms they create opportunity. Plus as the field gets muddier those who understand how communities interact with search will have more relative influence over the marketplace. Quality SEO is not based on using a rank checker for arbitrary terms and cranking out meaningless ranking reports. It is based on measuring traffic streams and conversions. If the search engines are sending you more leads or better converting leads then your SEO is working.

Customers worth having don't care about ranking reports. They care about conversions. Other than to create a straw man scenario for self promotional stories, why have rank checkers suddenly become important?

Collateral Damage in Killing GoogleBombs

Pages