Increasing SEO Complexity Lowers Result Diversity

Changing the Cost-benefit Analysis

In the last post I mentioned how the US government tried to change the cost benefit analysis for some sleazy executives at pharmaceutical corporations which continue to operate as criminal enterprises that simply view repeated fines as a calculable cost of doing business.

If you think about what Google's Panda update did, it largely changed the cost-benefit analysis of many online publishing business models. Some will be frozen with fear, others will desperately throw money at folks who may or may not have solutions, while others who gained will buy additional marketshare for pennies on the Dollar.

"We actually came up with a classifier to say, okay, IRS or Wikipedia or New York Times is over on this side, and the low-quality sites are over on this side." - Matt Cutts

Now that Google is picking winners and losers the gap between winners & losers rapidly grows as the winners reinvest.

And that word invest is key to understanding the ecosystem.

Beware of Scrapers

To those who are not yet successful with search, the idea of spending a lot of money building on a strategy becomes a bit more risky when you see companies like Demand Media that have spent $100's of millions growing an empire only to see 40% of the market value evaporate in a couple weeks due to a single Google update. There are literally thousands of webmasters furiously filing DMCA reports to Google after Panda, because Google decided that the content quality was fine if it was on a scraper site, but the exact same content lacked quality when on the original source site.

And even some sites that were not hit by Panda (even some which have thousands of inbound links) are still getting outranked by mirroring scrapers. Geordie spent hours sharing tips on how to boost lifetime customer value. For his efforts, Google decided to rank a couple scrapers as the original source & filter out PPCBlog as duplicate content, in spite of one of the scrapers even linking to the source site.

Outstanding work Google! Killer algo :D

Even if the thinking is misguided or an out of context headline, Reuters articles like Is SEO DOA as a core marketing strategy? do nothing to build confidence to make large investments in the search channel. Which only further aids people trying to do it on the cheap. Which gets harder to do as SEO grows more complex. Which only further aids the market for lemons effect.

Market Domination

At the opposite end of the spectrum, there are currently some search results which look like this

All of the colored boxes are the same company. You need a quite large monitor to get any level of result diversity above the fold. The company that was on the right side of the classifier can keep investing to build a nearly impenetrable moat, while others who fell back will have a hard time justifying the investment. Who wants to scale up on costs while revenues are down & the odds of success are lower? Few will. But the company with the top 3 (or top 6) results is collecting the data, refining their pitch, and re-investing into locking down the market.

Much like the Gini coefficient shows increasing wealth consolidation in the United States, search results where winners and losers are chose by search engines creates a divide where doing x will be very profitable for company A, while doing the exact same thing will be a sure money loser for company B.

Thin Arbitrary Lines in the Sand

The lines between optimization & spam blur as some trusted sites are able to rank a doorway page or a recycled tweet. Once site owners know they are trusted, you can count on them green lighting endless content production.

Scraping the Scrape of the Scrape

Many mainstream media websites have topics subdomains where they use services like DayLife or Truveo to auto-generate a near endless number of "content pages." To appreciate how circular it all is consider the following

  • a reporter makes a minimally informing Tweet
  • Huffington Post scrapes that 3rd party Tweet and ranks it as a page
  • I write a blog post about how outrageous that Huffington Post "page" was
  • SFGate.com has an auto-generated "Huffington Post" topics page (topics.sfgate.com/topics/The_Huffington_Post) which highlighted my blog post
  • some of the newspaper scraper pages rank in the search results for keywords
  • sites like Mahalo scrape the scrape of the scrape
  • etc.

At some point in some such loops I am pretty certain the loops start feeding back into themselves & create a near-infinite cycle :D

An Endless Sea of "Trustworthy" Content

The OPA mentioned a billion dollar shift in revenues which favors large newspapers. But those "pure" old-school media sites now use services like DayLife or Truveo to auto-generate content pages. And it is fine when they do it.

...but...

The newspapers call others scammy agents of piracy and copyright violators for doing far less at lower scale, all while wanting to still be ranked highly (even while putting their own original content behind a paywall), and then go out and do the exact same scraping that they complain about others doing. It is the tragedy of the commons played out on an infinite web where the cost of an additional page is under a cent & everyone is farming for attention.

And the piece of pie everyone is farming for is shrinking as:

Brands Becoming the Media

Rather than subsidizing the media with ads, brands are becoming the media:

Aware that consumers spend someplace between eight and 10 hours researching cars before they contact a dealer, auto markers and dealers are vectoring ever-greater portions of their marketing budgets into intercepting consumers online.

As but one example, Ford is so keen about capturing online tire-kickers that its website gives side-by-side comparisons between its Fiesta and competing brands. While you are on the Ford site, you can price the car of your dreams, investigate financing options, estimate your payment, view local dealer inventories and request a quote from a dealer.

Search Ads Replacing the Organic Search Results

AdWords is eating up more of the value chain by pushing big brands

  • comparison ads = same brands that were in AdWords appearing again
  • bigger adwords ads with more extensions = less diversity above the fold
  • additional adwords ad formats (like product ads) = less diversity (most of the advertisers who first tried it were big box stores, and since it is priced on a CPA profit share basis the biggest brands that typically have more pricing power with manufacturers win)

Other search services like Ask.com and Yahoo! Search are even more aggressive with nepotistic self promotion.

Small Businesses Walking a Tightrope (or, the Plank)

Not only are big brands being propped up with larger ad units (and algorithmically promoted in the organic search results) but the unstable nature of Google's results further favors big business at the expense of small businesses via the following:

  • more verticals & more ad formats = show the same sources multiple times over
  • less stability = more opportunities for spammers (they typically have high margins & lots of test projects in the work...when one site drops another one is ready to pop into the game...really easy for scrapers to do...just grab content & wait for the original source to be penalized, or scrape from a source which is already penalized)
  • less stability = small businesses have to fire employees hard to make payroll
  • less stability = lowers multiples on site sales, making it easier for folks like WebMD, Quinstreet, BankRate, and Monster.com to buy out secondary & tertiary competing sites

If you are a small business primarily driven by organic search you either need to have big brand, big ego, big balls, or a lack of common sense to stay in the market in the years to come, as the market keeps getting consolidated. ;)

Google Wants to Act Like a Start Up

I just saw this Google snippet while trying to fine one of our old posts and it was *so* awful that I had to share it.

This is an area where Bing was out in front of Google & used a more refined strategy for years now before Google started playing catch up last fall.

Google ignored our page title, ignored our on-page header, and then use the 'comments' count as the lead in the clickable link. Then they follow it with the site's homepage page title. The problem here is if the eye is scanning the results for a discriminating factor to re-locate a vital piece of information, there is no discrimination factor, nothing memorable stands out. Luckily we are not using breadcrumbs & that post at least had a somewhat memorable page URL, otherwise I would not have been able to find it.

For what it is worth, the search I was doing didn't have the words comments in it & Google just flat out missed on this one. Given that some huge % of the web's pages has the word "comments" on it (according to the number of search results returned for "comments" it is about 1/6th as popular online as the word "the") one might think that they could have programmed their page title modification feature to never select 'comments' as the lead.

Google has also been using link anchor text sometimes with this new feature, so it may be a brutal way to Google-bomb someone. It is sure be fun when the political bloggers give it a play. ;)

But just like the relevancy algorithms these days, it seems like this is one more feature where Google ships & then leaves it up to the SEOs to tell them what they did wrong. ;)

Google Throws the Book at Competitors

You can learn a lot about how search has improved over the years by reading Matt Cutts. Recently he highlighted how search was irrelevant in the past due to a lack of diversity:

Seven of the top 10 results all came from one domain, and the urls look a little… well, let’s say fishy. In 1999 and early 2000, search engines would often return 50 results from the same domain in the search results. One nice change that Google introduced in February 2000 was “host crowding,” which only showed two results from each hostname. ... Suddenly, Google’s search results were much cleaner and more diverse! It was a really nice win–we even got email fan letters.

Thanks to those kinds of improvements, in 2011 we never have to look at search results like this.*

* And by never, I mean, unless the results are linking to fraternal Google pages, in which case, game on!

Why should Google result crowding not apply to Google.com? Sure they can say those books are from different authors, but many websites are ran by organizations with multiple authors. Some websites are even built through the partnerships of multiple different business organizations. Who knows, maybe some searchers are uncomfortable with every other listing being an out of context book highlight.

In the past I have been called cynical for highlighting stuff like the following image

I saw it as part of a trend toward home cooking promotions. And I still view it that way. The above books promotion is simply further proof of concept.

Outside of...

  • Youtube
  • other Google owned and operated sites
  • a branded website ranking for its own brand

Can you show me *any* occurrence of a result where a site is listed 5 times in the search result? Bonus points if you can find it where the 5 times are not grouped into 1 bunch via result crowding.

Other than a home cooking override, how is it possible that this problem fixed years ago suddenly re-appears?

As a thought experiment, ask yourself if that Google ranking accident would happen if the content archive being served up was promoting media hosted on Microsoft servers.

A friend of mine summed it up nicely with:

well, it's not everyday you see that kind of power and the fact that other sites aren't afforded the same opportunity makes me think that they are being anti-competitive. Google literally wrote the book (ok scraped it) on anti-competitive practices.

Google Panda Coming to a Market Near You

If you live outside the United States and were unscathed by the Panda Update, a world of hurt may await soon. Or you may be in for a pleasant surprise. It is hard to say where the chips may lay for you without looking.

Some people just had their businesses destroyed, whereas the Online Publisher Association sees a $1 billion windfall to the winning publishers.

Due to Google having multiple algorithms running right now, you can get a peak at the types of sites that were hit, and if your site is in English you can see if it would have got hit by comparing your Google.com rankings in the United States versus in foreign markets by using the Google AdWords ad preview tool.

In most foreign markets Google is not likely to be as aggressive with this type of algorithm as they are in the United States (because foreign ad markets are less liquid and there is less of a critical mass of content in some foreign markets), but I would be willing to bet that Google will be pretty aggressive with it in the UK when it rolls out.

The keywords where you will see the most significant ranking changes will be those where there is a lot of competition, as keywords with less competition generally do not have as many sites to replace them when they are whacked (since there were less people competing for the keyword). Another way to get a glimpse of the aggregate data is to look at your Google Analytics search traffic from the US and see how it has changed relative to seasonal norms. Here is a look out below example, highlighting how Google traffic dropped. ;)

What is worse, is that on most sites impacted revenue declined faster than traffic because search traffic monetizes so well & the US ad market is so much deeper than most foreign markets. Thus a site that had 50% profit margins might have just went to break even or losing money after this update. :D

When Google updates the US content farmer algorithm again (likely soon, since it has already been over a month since the update happened) it will likely roll out around other large global markets, because Google does not like running (and maintaining) 2 sets of ranking algorithms for an extended period of time, as it is more cost intensive and it helps people reverse engineer the algorithm.

Some sites that get hit may be able to quickly bounce back *if* they own a well-read tech blog and have an appropriate in with Google engineers, however most will not unless they drastically change their strategy. Almost nobody has recovered and it has been over a month since the algorithm went live. So your best bet is to plan ahead. When the tide goes out you don't want to be swimming naked. :)

Google Shows True Colors With BeatThatQuote Spam

Guidelines are pushed as though they are commandments from a religious tome, but they are indeed a set of arbitrary devices used to hold down those who don't have an in with Google.

When Google nuked BeatThatQuote I guessed that the slap on the wrist would last a month & give BTQ time to clean up their mess.

As it turns out, I was wrong on both accounts.

Beat That Quote is already ranking again. They rank better than ever & only after only 2 weeks!

And the spam clean up? Google did NOTHING of the sort.

Every single example (of Google spamming Google) that was highlighted is still live.

Now Google can claim they handled the spam on their end / discounted it behind the scenes, but such claims fall short when compared to the standards Google holds other companies to.

  • Most sites that get manually whacked for link-based penalties are penalized for much longer than 2 weeks.
  • Remember the brand damage Google did to companies like JC Penny & Overstock.com by talking to the press about those penalties? In spite of THOUSANDS of media outlets writing about Google's BTQ acquisition, The Register was the most mainstream publication discussing Google's penalization of BeatThatQuote, and there were no quotes from Google in it.
  • When asking for forgiveness for such moral violations, you are supposed to grovel before Google admitting all past sins & admit to their omniscient ability to know everything. This can lead one to over-react and actually make things ever worse than the penalty was!
  • In an attempt to clean up their spam penalties (or at least to show they were making an effort) JC Penny did a bulk email to sites linking to them, stating that the links were unauthorized and to remove them. So JC Penny not only had to spend effort dropping any ill gotten link equity, but also lost tons of organic links in the process.

Time to coin a new SEO phrase: token penalty.

token penalty: an arbitrary short-term editorial action by Google to deflect against public relations blowback that could ultimately lead to review of anti-competitive monopolistic behaviors from a search engine with monopoly marketshare which doesn't bother to follow its own guidelines.

Your faith in your favorite politician should be challenged after you see him out on the town snorting coke and renting hookers. The same is true for Googler's preaching their guidelines as though it is law while Google is out buying links (and the sites that buy them).

You won't read about this in the mainstream press because they are scared of Google's monopolistic business practices. Luckily there are blogs. And Cyndi Lauper. ;)

Update: after reading this blog post, Google engineers once again penalized BeatThatQuote!

Google's Cat & Mouse SEO Game

This infographic highlights how Google's cat and mouse approach to SEO has evolved over the past decade.

One of the best ways to understand where Google is headed is to look at where they have been and how they have changed.

Click on it for ginormous version.

Google's Collateral Damage Infographic.

If you would like us to make more of them then please spread this one. We listen to the market & invest in what it values ;)

Feel free to leave comments below if you have any suggestions or feedback on it :)

How Google Destroyed the Value of Google Site Search

Do You Really Want That Indexed?

On-demand indexing was a great value added feature for Google site search, but now it carries more risks than ever. Why? Google decides how many documents make their primary index. And if too many of your documents are arbitrarily considered "low quality" then you get hit with a sitewide penalty. You did nothing but decide to trust Google & use Google products. In response Google goes out of its way to destroy your business. Awesome!

Keep in mind that Google was directly responsible for the creation of AdSense farms. And rather than addressing them directly, Google had to roll everything through an arbitrary algorithmic approach.

< meta name="googlebot" content="noindex" />

Part of the prescribed solution to the Panda Update is to noindex content that Google deems to be of low quality. But if you are telling GoogleBot to noindex some of your content, then if you are also using them for site search, you destroy the usability of their site search feature by making your content effectively invisible to your customers. For Google Site Search customers this algorithmic change is even more value destructive than the arbitrary price jack Google Site Search recently did.

We currently use Google Site Search on our site here, but given Google's arbitrary switcheroo styled stuff, I would be the first person to dump it if they hit our site with their stupid "low quality" stuff that somehow missed eHow & sites which wrap repurposed tweets in a page. :D

Cloaking vs rel=noindex, rel=canonical, etc. etc. etc.

Google tells us that cloaking is bad & that we should build our sites for users instead of search engines, but now Google's algorithms are so complex that you literally have to break some of Google's products to be able to work with other Google products. How stupid! But a healthy reminder for those considering deeply integrating Google into your on-site customer experience. Who knows when their model will arbitrarily change again? But we do know that when it does they won't warn partners in advance. ;)

I could be wrong in the above, but if I am, it is not easy to find any helpful Google documentation. There is no site-search bot on their list of crawlers, questions about if they share the same user agent have gone unanswered, and even a blog post like this probably won't get a response.

That is a reflection of only one more layer of hypocrisy, in which Google states that if you don't provide great customer service then your business is awful, while going to the dentist is more fun than trying to get any customer service from Google. :D

I was talking to a friend about this stuff and I think he summed it up perfectly: "The layers of complexity make everyone a spammer since they ultimately conflict, giving them the ability to boot anyone at will."

Is the Huffington Post Google's Favorite Content Farm?

I was looking for information about the nuclear reactor issue in Japan and am glad it did not turn out as bad as it first looked!

But in that process of searching for information I kept stumbling into garbage hollow websites. I was cautious not to click on the malware results, but of the mainstream sites covering the issue, one of the most flagrant efforts was from the Huffington Post.

AOL recently announced that they were firing 15% to 20% of their staff. No need for original stories or even staff writers when you can literally grab a third party tweet, wrap it in your site design, and rank it in Google. Inline with that spirit, I took a screenshot. Rather than calling it the Huffington Post I decided a more fitting title would be plundering host. :D

plundering host.

We were told that the content farm update was to get rid of low quality web pages & yet that information-less page was ranking at the top of their search results, when it was nothing but a 3rd party tweet wrapped in brand and ads.

How does Huffington Post get away with that?

You can imagine in a hyperspace a bunch of points, some points are red, some points are green, and in others there’s some mixture. Your job is to find a plane which says that most things on this side of the place are red, and most of the things on that side of the plane are the opposite of red. - Google's Amit Singhal

If you make it past Google's arbitrary line in the sand there is no limit to how much spamming and jamming you can do.

we actually came up with a classifier to say, okay, IRS or Wikipedia or New York Times is over on this side, and the low-quality sites are over on this side. - Matt Cutts

(G)arbitrage never really goes away, it just becomes more corporate.

The problem with Google arbitrarily picking winners and losers is the winners will mass produce doorway pages. With much of the competition (including many of the original content creators) removed from the search results, this sort of activity is simply printing money.

As bad as that sounds, it is actually even worse than that. Today Google Alerts showed our brand being mentioned on a group-piracy website built around a subscription model of selling 3rd party content without permission! As annoying as that feels, of course there are going to be some dirtbags on the way that you have to deal with from time to time. But now that the content farm update has went through, some of the original content producers are no longer ranking for their own titles, whereas piracy sites that stole their content are now the canonical top ranked sources!

Google never used to put piracy sites on the first page of results for my books, this is a new feature on their part, and I think it goes a long way to show that their problem is cultural rather than technical. Google seems to have reached the conclusion that since many of their users are looking for pirated eBooks, quality search results means providing them with the best directory of copyright infringements available. And since Google streamlined their DMCA process with online forms, I couldn’t discover a method of telling them to remove a result like this from their search results, though I tried anyway.
... I feel like the guy who was walking across the street when Google dropped a 1000 pound bomb to take out a cockroach - Morris Rosenthal

Way to go Google! +1 +1

Too clever by half.

Google's Matt Cutts Talks Down Keyword Domain Names

I have long documented Google's preference toward brands, while Google has always stated that they don't really think of brand.

While not thinking of brands, someone on the Google UI team later added navigational aids to the search results promoting popular brands - highlighting the list of brands with the label "brands" before the list of links.

Take a look at what Matt Cutts shares in the following video, where he tries to compare brand domain names vs keyword domain names. He highlights brand over and over again, and then when he talks about exact match domains getting a bonus or benefit, he highlights that Google may well dial that down soon.

Now if you are still on the fence, let me just give you a bit of color. that we have looked at the rankings and the weights that we give to keyword domains, & some people have complained that we are giving a little too much weight for keywords in domains. So we have been thinking about at adjusting that mix a bit and sort of turning the knob down within the algorithm, so that given 2 different domains it wouldn't necessarily help you as much to have a domain name with a bunch of keywords in it. - Matt Cutts

For years the Google algorithm moved in one direction, and that was placing increased emphasis on brand and domain authority. That created the content farm problem, but with the content farm update they figured out how to dial down a lot of junk hollow authority sites. They were able to replace "on-topic-ness" with "good-ness," according to the search quality engineer who goes by the nickname moultano. As part of that content farm update, they dialed up brands to the point where now doorway pages are ranking well (so long as they are hosted on brand websites).

Google keeps creating more signals from social media and how people interact with the search results. A lot of those types of signals are going to end up favoring established brands which have large labor forces & offline marketing + distribution channels. Google owns about 97% of the mobile search market, so more and more of that signal will eventually end up bleeding into the online world.

In addition to learning from the firehose of mobile search data, Google is also talking about selling hotel ads on a price per booking. Google can get a taste of any transaction simply by offering free traffic in exchange for giving them the data needed to make a marketplace & then requiring access to the best deals & discounts:

It is believed that Google requires participating hotels to provide Google Maps with the lowest publicly available rates, for stays of one to seven nights, double occupancy, with arrival days up to 90 days ahead.

In a world where Google has business volume data, clientele demographics, pricing data, and customer satisfaction data for most offline businesses they don't really need to place too much weight on links or domain names. Businesses can be seen as being great simply by being great.*

(*and encouraging people to stuff the ballot box for them with discounts :D)

Classical SEO signals (on-page optimization, link anchor text, domain names, etc.) have value up until a point, but if Google is going to keep mixing in more and more signals from other data sources then the value of any single signal drops. I haven't bought any great domain names in a while, and with Google's continued brand push and Google coming over the top with more ad units (in markets like credit cards and mortgage) I am seeing more and more reason to think harder about brand. It seems that is where Google is headed. The link graph is rotted out by nepotism & paid links. Domain names are seen as a tool for speculation & a short cut. It is not surprising Google is looking for more signals.

How have you adjusted your strategies of late? What happens to the value of domain names if EMD bonus goes away & Google keeps adding other data sources?

A Thought Experiment on Google Whitelisting Websites

Google has long maintained that "the algorithm" is what controls rankings, except for sites which are manually demoted for spamming, getting hacked, delivering spyware, and so on.

At the SMX conference it was revealed that Google uses white listing:

Google and Bing admitted publicly to having ‘exception lists’ for sites that were hit by algorithms that should not have been hit. Matt Cutts explained that there is no global whitelist but for some algorithms that have a negative impact on a site in Google’s search results, Google may make an exception for individual sites.

The idea that "sites rank where they deserve, with the exception of spammers" has long been pushed to help indemnify Google from potential anti-competitive behavior. Google's marketing has further leveraged the phrase "unique democratic nature of the web" to highlight how PageRank originally worked.

But why don't we conduct a thought experiment for the purpose of thinking through the differences between how Google behaves and how Google doesn't want to be perceived as behaving.

Let's cover the negative view first. The negative view is that either Google has a competing product or a Google engineer dislikes you and goes out of his way to torch your stuff simply because you are you and he dislikes you & is holding onto a grudge. Given Google's current monopoly-level marketshare in most countries, such would be seen as unacceptable if Google was just picking winners and losers based on their business interests.

The positive view is that "the algorithm handles almost everything, except some edge cases of spam." Let's break down that positive view a bit.

  • Off the start, consider that Google engineers write the algorithms with set goals and objectives in mind.
    • Google only launched universal search after Google bought Youtube. Coincidence? Not likely. If Google had rolled out universal search before buying Youtube then they likely would have increased the price of Youtube by 30% to 50%.
    • Likewise, Google trains some of their algorithms with human raters. Google seeds certain questions & desired goals in the minds of raters & then uses their input to help craft an algorithm that matches their goals. (This is like me telling you I can't say the number 3, but I can ask you to add 1 and 2 then repeat whatever you say :D)
  • At some point Google rolls out a brand-filter (or other arbitrary algorithm) which allows certain favored sites to rank based on criteria that other sites simply can not match. It allows some sites to rank with junk doorway pages while demoting other websites.
  • To try to compete with that, some sites are forced to either live in obscurity & consistently shed marketshare in their market, or be aggressive and operate outside the guidelines (at least in spirit, if not in a technical basis).
  • If the site operates outside the guidelines there is potential that they can go unpenalized, get a short-term slap on the wrist, or get a long-term hand issued penalty that can literally last for up to 3 years!
  • Now here is where it gets interesting...
    • Google can roll out an automated algorithm that is overly punitive and has a significant number of false positives.
    • Then Google can follow up by allowing nepotistic businesses & those that fit certain criteria to quickly rank again via whitelisting.
    • Sites which might be doing the same things as the whitelisted sites might be crushed for doing the exact same thing & upon review get a cold shoulder.

You can see that even though it is claimed "TheAlgorithm" handles almost everything, they can easily interject their personal biases to decide who ranks and who does not. "TheAlgorithm" is first and foremost a legal shield. Beyond that it is a marketing tool. Relevancy is likely third in line in terms of importance (how else could one explain the content farm issue getting so out of hand for so many years before Google did something about it).

Pages