Google SEO Correlation Analysis

I have never been a huge fan of correlation analysis. The reason being is that how things behave in aggregate may not have anything to do with how they would behave in your market for your keywords on your website.

Harmful High Quality Links?

A fairly new website was ranked amazingly quickly on for a highly competitive keyword. It wasn't on the first page, but ranked about #20 for a keyword that is probably one of the 100 most profitable keywords online (presuming you could get to a #1 ranking above a billion Dollar corporation). The site did a promotion that was particularly well received by bloggers and a few bigger websites in the UK press and at first rankings improved everywhere. Then one day while looking at its rankings using rank checker I saw the site simply fell off the map. It was nowhere. I then jumped into web analytics and saw search traffic was up. What happened was Google took the site as being from the UK, so its rankings went to page 1 in the UK while the site disappeared from the global results. In aggregate we know that more links are better & links from high trusted domains are always worth getting. And yet in the above situation the site was set back by great links. Of course we can set the geographic market inside Google Webmaster Tools to the United States, but how long will it take Google to respond? How many other local signals will be fixed to pull the site out of the UK?

Over time those links will be a net positive for the site, but it still needs to develop more US signals. And beyond those sort of weird things (like links actually hurting your site) the algorithms can look for other signals to push into geotargeting. Things like Twitter mentions, where things are searched for, how language is used on your website, and perhaps even your site's audience composition may influence localization. What is worse about some of these other signals is that they may mirror media coverage. If you get coverage in The Guardian a lot of people from the UK will see it, and so you might get a lot of Tweets mentioning your website that are from the UK as well. In such a way, many of the signals can be self-reinforcing even when incorrect.

Measuring The Wrong Thing

Another area where correlation analysis falls short is when one page ranks based on the criteria earned by another. Such signal bleeding means that if you are looking at things in aggregate you are often analyzing data which is irrelevant.

Sampling Bias

Correlation analysis also has an issue of sampling bias. People tend to stick with defaults until they learn enough to change. Unfortunately most CMS tools are set up in sub-optimal ways. If you look at the top ranked results some of the sub-optimal set ups will be over-represented in the "what works" category simply because most websites are somewhat broken. The web is a fuzz test.

Of course the opposite of the above is also true: some of the best strategies remain hidden in plain sight simply due to sheer numbers of people doing x poorly.

Analyzing Data Pairs Rather Than Individual Signals

Another way signals have blurred is how Google uses page titles in the search results. That generally used to be just the page title. But more recently they started mixing in

  • using an on-page heading rather than the page title (when they feel the on-page heading is more relevant)
  • adding link anchor text into the title (in some cases)
  • adding the homepage page's title at the end of sub-pages (when sub-page page titles are short)

As Google adds more signals & changes how they account signals it makes analyzing what they are doing much harder. You not only need to understand how the signals are used, but how they interact in pairs or groups. When Google uses the H1 heading on a page to display in the search results are they still putting a lot of weight on the page title? Does the weighting on the H1 change depending on if Google is displaying it or not?

Analysis is Still Valuable, but...

I am not saying that analysis is a waste of time, but rather that when you do it lots of do's and don'ts become far less concrete. The fact is that there are always edge cases that disprove any rule of thumb. Rather than looking for general rules one needs to balance things like:

  • risk vs reward
  • yield vs effort
  • focus vs diversity
  • investment vs opportunity cost

First Mover Advantage

Along the same lines, any given snapshot of search is nowhere near as interesting as understanding historical trends and big shifts. If you are one of the first people to notice something there is far more profit potential than being late to the party. Every easily discernible signal Google creates eventually gets priced close to (or sometimes above) true market value. Whereas if you are one of the first people to highlight a change you will often be called ignorant for doing so. :D

Consensus is the opposite of opportunity.

When you do correlation analysis you are finding out when the market has conformed to what Google trusts & desires. Exact match domains were not well ranked across a wide array of keywords until after Google started putting more weight on them & people realized it. But if there is significant weight on them today & their prices are sky high then knowing that they carry some weight might not be a real profit potential in your market. It might even be a distraction or a dead end. Imagine being the person who bets (literally) a million Dollars that Google will place weight on only to find out that Google changes their algorithmic approach & weighting, or makes a special exception just for your site (as they can & have done). That day would require some tequila.

As a marketing approach becomes more mainstream then not only do the cost rise, but so does the risk of change. As people complain about domain names (or any other signal or technique) it makes Google more likely to act to curb the trend and/or lower it's weighting & value. To see an extreme version of such, consider that the past year has seen lots of complaints about content farms. A beautiful quote:

Searching Google is now like asking a question in a crowded flea market of hungry, desperate, sleazy salesmen who all claim to have the answer to every question you ask.

And so Google promises action. Don't make Google look stupid!

History Holds the Key for Success

The only way to profitably predict the future is to accurately understand history.

  • "Our ignorance of history makes us libel to our own times. People have always been like this." - Gustave Flaubert
  • "History repeats itself, first as tragedy, second as farce." - Karl Marx
  • "We are the prisoners of history. Or are we?" - Robert Penn Warren, Segregation
Published: January 27, 2011 by Aaron Wall in seo tips


January 27, 2011 - 3:28pm

I think your warnings about the limitations of correlational work are absolutely true, but I also firmly believe we have to pursue all avenues of inquiry. Medicine is a great example. I used to work in medical research a bit, and to be honest, it's a mess. Doctors are sometimes bad researchers, there's a lot of bias in the industry (Big Pharma being the glaring example), and there are many situations where you're just forced to use large-scale correlational data. You can't make 1,000 pregnant mothers smoke and see what happens.

Worse yet, medicine does exactly what scientists are NOT supposed to do. It takes generalizations about a population and then tries to apply that to a sample of one. That's why any doctor worth his/her salt also takes into account patient history and tries to see the big picture.

In the end, though, we still need the data, best practices and even the correlational work for medicine to move forward. I think the same is true in SEO. SEO is a young industry and the "science" of SEO is in its infancy at best. We're going to screw it up, and no single piece of analysis should ever be taken at face value. Still, it's important to take those steps.

Just a disclaimer: Obviously, I'm connected with SEOmoz and they do a lot of correlational work. I'm also a frequent SEO Book reader, though, a member of your sister site (PPC Blog) and generally admire your work. I think it's important that these debates not become factional - we've all got to keep each other honest.

January 27, 2011 - 4:18pm

I totally agree to there being baseline value. People still screw up even having page titles, let alone relevant tons of mileage to advance on that front. :D

Correlation data can be used to help get a person into the game, but generally won't be what offers their competitive edge that gets them out in front of competitors.

My point mostly was from the profit perspective that most of it is available to those who go against what is commonly known & invest in new ideas, new platforms, new techniques, etc. (Sorta "contrarian investor meets web marketer")

One sorta SEO example on this was that a few years ago it was common knowledge that you need to own the .com domain & some other extensions were viewed as second rate & not worthwhile. In some cases I have seen a .org go for a couple thousand Dollars where the equivalent .com name would cost 7 figures. I think was $2,300 & that was after someone paid over a million Dollars for the .com. So a lot of the best domains in the .org and .net TLDs eventually got bid up to market rate (and in many cases well beyond their intrinsic value IMHO). But some of the people who got the secondary TLDs potentially did good on the profit front (even if the correlation data would say that you should go with the .com).

And at the same time some of the new techniques fall flat on their face. Sticking with domain names, my mom set up a .us name for a medical condition my grandmother had & that domain sorta didn't go anywhere because I don't think .us got any bonus. Also the project wasn't heavily invested in because I think my grandma recovered & I sorta put up a holder site theme for a while & given my grandmother's recovery + the lack of momentum we didn't decide to invest further.

The same thing is true with media formats as well. Were infographics a source of great profit for some? Absolutely. But now that they have become so commonplace they lose impact unless they are far more remarkable in terms of quality.

In such a way, we are forced to continually reinvent ourselves. To keep growing a website & stay competitive + profitable it often requires doing what others are not doing & being the first to break new ground.

I agree with you on medicine being hosed. I think a large part of it is that we lead physically unhealthy lives in the pursuit of material wealth built on Ponzi finance. It is like health is an afterthought for most folks, coming after paying debt interest. To make up for that group of choices we create drugs that treat symptoms (rather than problems) so that people will stay prescribed to them, increasing lifetime customer values. To make up for the toxic nature of daily life we intake other toxins to make us feel more "normal" even if we are not (or even if being "normal" is actually a bad thing). ;)

Taking the absurd nature of health full-circle, I just read that Health Canada warned that some of their herbal remedies are DANGEROUS because they contain ... wait for it ... some of the prescription drugs they allegedly replace. :D

The problem started to crop up about five years ago. Products billed as natural remedies to treat conditions like impotency and high cholesterol were being found to contain a surprise, and hidden, ingredient – actual prescription drugs. Now Health Canada is reporting a “jump” in the number of such products.

January 27, 2011 - 4:35pm

I think there's definitely a difference between trying to build best practices and tools for the masses and trying to excel as a consultant or agency or find the next opportunity. Almost by definition, if you want to stand out from the crowd, you can't just do what everyone else keeps telling you to do.

January 27, 2011 - 5:06pm

Thanks for the great comments here (and the ones you leave inside the member's area on PPCBlog) :)

January 27, 2011 - 5:40pm

From my perspective, doing a correlation test INTENTIONALLY is not always very productive. There are so many variables that it is almost impossible to isolate anything without allocating a sufficient amount of time and resources. Why not use that time actually doing SEO and working with clients, while industry leaders such as SEOmoz do the heavy lifting?

I understand Aaron’s point that gaining the knowledge before the crowd can be a huge opportunity. However, I feel that by actually doing the work and being productive can you can come to conclusions that are rather anecdotal but still very valuable.

Really, how many people actually KNOW SEO and are willing do the work?

On a side note, I have always tried to stay away from calling SEO a science. Wouldn’t that make Google the logical/philosophical equivalent of God? I am not ready to make that leap quite yet…

January 27, 2011 - 6:06pm

It is not often, but we still do take on select client work. And I would comfortably state that we know SEO pretty well. :D

And, to clarify, I am not saying that correlation analysis is never useful, but rather that it is understanding where and when and how your situation is unique that gives you a lasting competitive advantage.

January 27, 2011 - 6:22pm

I did not mean anyone specifically. I just meant in general, the opportunity cost of doing a correlation analysis for the sole purpose of doing the analysis does not typically out weigh doing actual work. (whatever that work may be)

The most valuable information I have discovered on my own is almost always by accident.

January 27, 2011 - 6:48pm

The most valuable information I have discovered on my own is almost always by accident.

If they are being honest, then I think that is true for most professional SEOs, especially if they run their own websites. Client sites are not great for the OTJ training strategy (or quick and dirty which leads to errors) but anyone running their own sites has certainly screwed up a lot if they run a variety of sites. The key is to have a range of experiences and data-points to extrapolate ideas from. And then figure out how to apply those ideas to other markets & scenarios. Some of it comes on accident, like accidentally blocking your website from being indexed or messing up a link's anchor text or doing something wonky with the onsite CMS, etc. but how you use it is more important than how you learn it!

January 29, 2011 - 6:50pm

I was at the SEOMoz meetup in San Diego, and this presenter of a "study" was talking about how well PPC coverts 4 times higher than organic traffic.

[as a clarification... it was not an seomoz employee, but a local firm that seomoz has a relationship with.]

This felt like a very strange statement to me. I didn't see why an ad would convert 400% better than a the same organic keyword. Maybe 10% or even 50% better...but 400% seemed silly high.

So I spoke with the speaker afterwards, and I asked him how they did the comparison. And it turned out that did this:

conversion rate for organic traffic = # of conversions/ # of people coming from google in total

So they compared relatively carefully selected PPC (with nice landing pages), with random long tail keywords, mid-tail, and head traffic (though they did remove branded queries from the SEO traffic).

OF COURSE THAT WILL CONVERT LOADS BETTER. But the speaker didn't mention any of these details.

I think a much more useful study would have been to compare organic keyword traffic, with same landing page compared as PPC keywords and the same landing page. Or at least baskets of organic keywords vs. baskets or keywords. Or, even PPC traffic to a page, vs. organic traffic to page.

But even then you have to control for if your ad is showing above the fold for a keyword and you rank organically for the keyword. Because people in the seobook forum seem to say that "double listing" helps with conversion rates.

That pissed me off so bad that they had a room full of newb internet marketers, who just believe that PPC converts 400% better...the academic in me is so pissed off. But the business man in me knows that these studies will help render competition in most verticals flaccid, and many companies will continue to be fed relatively garbage information by guru internet marketers SEO social medial ROI analytics guaranteed traffic 100 links a day :D

Can you tell I miss the forums Aaron? Take your time thou!

January 29, 2011 - 7:22pm

See, to me that is part of the hard part with having hard numbers. Unless you have huge data sets (say like the search engines or Efficient Frontier) and/or you are super careful with controlling the study, it can be easy to hunt for data that supports a desired conclusion & trim data that does not.

To be fair, I think SEOmoz is *far* better at trying to build real data sets & reach the natural conclusions than the people who are doing it the other way around. If you asked me to bet if I thought the above referenced study was from an employee or an outside consultant I would put money on the latter.

January 29, 2011 - 11:20pm

Sorry, I didn't make that clear. It was NOT an SEOmoz employee that presented that bit.

It was a local San Diego firm that sells some sort of PPC software I think.

January 30, 2011 - 7:43am

I wasn't trying to go after anyone in particular & the last thing I want to do is harm someone else's reputation when they didn't deserve it. If I feel people deserve to be flamed I will highlight it, but I hate to have our site falsely attribute something like that.

January 30, 2011 - 4:34pm

If you like, you can edit my previous comment to say "it was not an seomoz employee, but a local firm that seomoz has a realtionship with"

February 8, 2011 - 10:40am

Hi Aaron, nice post - finally something worth reading. I´d like to ask you if some evidence exists for this claim:
"using an on-page heading rather than the page title (when they feel the on-page heading is more relevant)"

Did you see this happen? An example would be wonderful.

Thank you in forward,

February 8, 2011 - 3:38pm

The link in the post which uses "more recently" as the anchor text does show one such example. There are MILLIONS of other such examples though. Currently (though algorithms could change in the future of course) Google is going so far as using H3 headers in some cases. Search Google for "Geordie Carswell" (our partner on PPC Blog) and you will see the page ranking using his name as the link in the search results, and his name is not in the page title, but is in an H3 heading on the page.

February 9, 2011 - 1:30pm

Thank you very much, I suppose that I was reading too fast. What does Google make "feel" like they should use a heading instead of the Title? The visual apprearance? Ithink that they have a visual scanner which tries to much the HTML code? I suppose it was primary ment to find cloaking issues and is now used for content verification. Could this really be? I mean teoreticaly it´s possible, but is it useful to them? That may mean that the element which is visually crawled should be "right places" in the html code, no "manipulation" for seo texts possible. What do you think?

February 9, 2011 - 3:27pm

They want their search results to look as relevant as possible. Try searching Google for some of our post titles (or closely related phrases) and then try searching Google for some of our page titles (or closely related phrases) ... they are trying to make the results appear as relevant and seamless as possible.

Google was years ahead of competition by storing on-page content in RAM so they could generate the most relevant snippet description possible (matching the user search query as closely as possible). But when it came to adjusting using on-page information over the page title I think both Yahoo! and Microsoft's Bing beat Google to the punch on that. But now Google is just as sophisticated on that front & perhaps is even more aggressive than Bing at using on page information for the clickable link in the search results.

February 14, 2011 - 11:53am

I know what they want, but I still don't have an logical explaination how they do that. They aren't so good at discovering cloaking issues, look just at, the title in the SERPs and the Title in the landing page. It's cloaked. Please share your thought about the "visual scanner", their wishes are familiar to me, but as I see they are far away from those wishes still.

March 21, 2011 - 8:48am

The Google content farm update added another layer of abstraction over the results, which makes the aggregate analysis that much harder to do (or at least that much harder to do well and with both purpose & valid results).

Add new comment

(If you're a human, don't change the following field)
Your first name.
(If you're a human, don't change the following field)
Your first name.
(If you're a human, don't change the following field)
Your first name.