Google SERP-rot, Paid Links, & Spam Classification

I talked to a search engineer a few months back and he mentioned that he thought one of my sites and one of the promotions associated with it were both spammy. This month I came across a random blog comment where a person talked about how great that search company was for showing them that same site! The only problem was that since that site was new and we still need more links we had to pay Google for those clicks.

Meanwhile a network of older established poorly designed English third language sites dominate Google's organic search results, and keep getting self-reinforcing links that make it virtually impossible to compete with them without buying links. But our AdWords ads and viral marketing we did lead to some exposure where editors from other companies got to evaluate our site.

  • A number of mainstream media companies (newspapers and radio shows) mentioned us on their site.
  • A leading search company featured a link to our site aggressively in their portal (sorry I can't say more than that or a partner would kill me for doing so).
  • Mahalo listed our site with a cool rating and listed many deep links from our site on their overview page.
  • The Yahoo! Directory listed our site for free.

Had we not paid Google $1,000's, the organic links we got never would have existed, and our site might never rank. Amongst most other search related companies they generally love our site. But because I am associated with the site and I am an aggressive marketer the site is seen in a different light by search engineers at Google, in spite of providing a better user experience than the outdated garbage Google currently ranks (as indicated by searchers and editorial judgement from human reviewers at other search companies).

I am not complaining here, as we are on page 2 and getting close to page 1, but most content producers are not as aggressive at marketing as we have been, and some of the best content might take many years to rank - if ever. The bigger issues at hand are

  • Most English speaking webmasters with trusted sites use Google, thus if something is not in Google it is hard for it to get the quality links needed to rank unless the webmaster buys AdWords or spends a lot on public relations
  • many employees of other search companies are likely using Google search
  • any warp in Google's view of the web (like SERP staleness & bias toward huge media companies) creates opportunity for another search company to be born, and to some extent validate arbitrage plays by companies like Mahalo.

By relying on old websites to clog up the search results Google virtually guarantees that you need to buy links to rank a new site. The only question is who is getting paid!

Google to Police 'The Truth'

Recently a fake story was highlighted in the mainstream media, and the SEO behind it also mentioned it on their site. The SEO space as a whole began debating the legitimacy of such tactics, and Matt Cutts even commented on the issue:

My quick take is that Google’s webmaster guidelines allow for cases such as this:

“Google may respond negatively to other misleading practices not listed here (e.g. tricking users by registering misspellings of well-known websites). It’s not safe to assume that just because a specific deceptive technique isn’t included on this page, Google approves of it.”

There’s not much more deceptive or misleading than a fake story without any disclosure that the story is hoax.

The irony of this statement, as Nick Wilsdon pointed out, was that not only did Fox News syndicate the fake story, but they got in trouble in the past for attributing fake quotes to John Kerry. A person coming up with a clever story to get a few inbound links is nowhere near as sleazy as lying to try to sway the public vote for presidency...but it is much easier for Matt to police the small and weak webmasters while turning a blind eye to similar (but worse) offenses from larger players.

Morals of the story:

  • If you talk about exceptionally effective SEO strategies expect them to lose their effectiveness (search engineers are active in public discourse because it is easier to control people through fear than it is to write a better relevancy algorithm).
  • If your technique works so well that it is featured on many SEO blogs and/or draws a specific public comment from Matt Cutts you have went too far (sheep must be slaughtered to control the herd).
  • If you are going to lie do it in a way that builds a fan base. If you have such a large fan base that most of your traffic comes from channels other than Google it is virtually impossible for Google to block you (unless you use hate speech that extends beyond the lies and spin that are typical on networks like Fox News).

If you want to understand how the mainstream media works I highly recommend investing 5 hours and $50 into the following 3 DVDs. As more time passes Google's ad fueled business model will lead to them essentially replicating the flaws and biases of the mainstream media.

  • Manufacturing Consent - Noam Chomsky talks about how the media operates to shape public opinion and policy.
  • Outfoxed - how Fox News spins the news to fuel their desired political agendas.
  • The Fog of War - in this DVD Robert S. McNamara talks about how he used spin and media control to try to minimize blowback from the Vietnam War.

A New Kind of Duplicate Content - GoogleBot Random Form Crawl

Michael VanDeMar highlights how a website lost an important page to duplication across a new not so important page, which was added to the Google index by Google filling out forms.

If you have limited PageRank and a Google accessible form or search box you may want to block them from indexing output URLs via a robots noindex meta tag or your robots.txt file.

Why is Google Buying Links From SEMPO?

Google, which has arbitrarily forced its will to use nofollow on the web (and declared link buyers and sellers who do not use the tag as spammers) is buying a PageRank 7 link from SEMPO.org.

You would think that if Google wants to set new proprietary standards they would follow them as well. And what better spot to start following them than with a trade organization promoting search engine marketing?

How Much is a #1 Google Ranking Worth?

I just wrote a ~15 page article aimed at helping SEOs estimate how much a top rank in Google is worth.

I would appreciate any feedback you have on making it better. If you like it please hook me up with a Del.icio.us or Stumble. Any and all mentions are appreciated. :)

Will Your Website Pass a Google Review?

Welcome to GoogleNet!

Hitwise recently mentioned that Google controls over 1/3 of UK web traffic.
Upstream uk internet traffic from google properties to other websites in the UK 2007 2008  chart.png
With that much usage data, if you were Google, would you use usage data in your relevancy algorithms?

An Army of Google Search Editors

They could easily use algorithms to detect

  • sites that they send a lot of traffic to relative to its total traffic (comparing ratios between toolbar data and search traffic)
  • sites which have seen a rapid spike in traffic from Google
  • sites which people quickly bounce away from (and do not later return to)
  • sites which get a lot of traffic from Google but get few navigational queries

and flag anything out of the ordinary for human review. Marissa Mayer stated they have 10,000 reviewers.

Does Your Site Look Good to Google's Relevancy Algorithm?

As the web keeps getting richer and deeper, and Google increasingly uses human review for demoting spam, all the aesthetic things matter:

  • domain name
  • site design
  • content formatting
  • branding and public relations

As search evolves so too will spam. Some spam sites will LOOK and FEEL better than most non-spam sites. And so the remote quality raters will be given more data to look at - perhaps eventually even a sample of backlinks or other related data.

False positives will occur - sites and careers built around Google without proper support stilts will crumble. Unless your site is of social significance (you are a big corporation, a non-profit organization, a government institution, an educational institution, a top blogger, an official Google partner, or Youtube/Google house content) then part of the optimization process revolves around not only creating sites that pass a hand review, but also trying to create sites that do not get flagged for review - especially if you are a thin affiliate site.

How do you not get flagged for review?

  • Build enough quality signals and direct traffic that your site looks like a real part of the web.
  • Build something people keep coming back to.
  • Do not make drastic changes to your site unless you are comfortable with it going under review.

How do you pass a review?

Short term I think the aesthetic things matter a lot. Longer term it is best if your site satisfies a few criteria

  • exclusive content that people value and keep coming back to (Google loses if they remove the best content from their index)
  • a brand that people care about and search for (Google looks dumb if they do not rank your site)
  • a meaningful and reliable traffic stream outside of Google (many quality signals may stem from this exposure, which will help keep your overall profile more organic)
  • you could cause public relations harm to Google and diminish their brand value in the eyes of thousands of people (removing your site has real opportunity cost)

Usage Data for Algorithmic Site Promotion

Creating Fake User Accounts is Harder Than it Sounds

If usage data was ever used to promote sites, they could look at regional data and help promote sites based on what is popular locally. Searchers reveal their location by IP address and the queries they search for.

The Trusted Few

Google could use a subset of their users when using usage data to affect relevancy (perhaps users with 6 months account history, credit card on file via Google Checkout, and a normal email profile).

Why Usage Data is Tricky

Much of the signal from usage data is likely mirrored by PageRank, so the lift might not be that great until they really refine the technology.

Some tricky parts with promoting sites based on usage data are:

  • usage data is quite noisy, and
  • it may not favor informational sites over commercial intent the way that PageRank does. That informational bias to the organic search results is a large part of why AdWords is so profitable.

Microsoft recently presented a paper on finding authority pages based on browsing habits.

Google is Quietly Consuming the Internet

TechRepublic asks "Will the Google revolution engulf IT departments?" Each time I write a newsletter, about 80% of the items are about Google. They keep innovating faster than other companies their size. Here are some examples of things they have done over the last ~ 2 months.

  • Changes organic search results based on prior search query.
  • Added a search box for site search inside the search results, giving Google a second taste at displaying ads even on navigational queries for a specific website.
  • Started crawling site search forms on trusted sites, which (along with sitelinks, universal search, Youtube, and branded video ads) distributes more traffic to large trusted sites and business partners, with less traffic going to smaller websites (search keeps getting more editorial).
  • Offered App Engine, which provides free hosting to developers (in exchange for being stuck on their network and letting them spy on your usage data and growth).
  • Created a marketplace for people building on the Google network.
  • Begun policing widgets not on their network, a topic that deserves its own post.

Not only are dumb companies buying into the everything Google strategy, but even some semi-intelligent ones are. After logging into Dreamhost recently I was shocked to see them integrating Google apps and email on all customer domains. What happens if/when Google buys GoDaddy? How does Dreamhost compete when Google gives away hosting as a loss leader?

There is big risk to Google consuming the web. The issue is not only information diversity and innovation, but what happens when your Google account gets hacked? I regret my reliance on Gmail, but am unsure how to fix it.

Nationwide Google Wireless ISP Plan, Try #2

After they bid low and lost the C block of wireless spectrum Google has started talking to the media about using unlicensed whitespace. From the WSJ:

Google said that the white space, located between channels 2 and 51 on TV that aren't hooked up to satellite or cable, offer a "once-in-a-lifetime opportunity to provide ubiquitous wireless broadband access to all Americans." In addition, opening up the spectrum would "enable much-needed competition to the incumbent broadband service providers," Mr. Whitt wrote. Google has done its own white-space testing and submitted its results to the FCC in December. Philips also submitted a testing device to the agency last year, which returned satisfactory results.

Cheaper (or free) nationwide connectivity = more web users. More web users = more searches.

The other (big) piece of this, is that if Google works this deal, they will likely end up with a lot more usage data - and a strong starting point to triangulate other usage data against. With links becoming a commodity, how hard would it be for Google to find a better signal? In 5 years will they still rely on links and have 10,000 people rating content? What if they could somehow get everyone to start rating content (through usage data), and place more trust on natural looking Google user accounts with years of a natural usage profile. If they slowly mixed it into the relevancy algorithms over time who would even know they did it?

If Google does set up a free ISP think how much usage data they would have.

  • Google ISP (usage data, geo-specific relevancy)
  • Google Android (more geo-data)
  • Google accounts (which users can we really trust, what do they buy, etc.)
  • Google toolbar
  • Google search
  • social applications (Gmail, Google Talk, Orkut, Google Gadgets)
  • Google AdWords
  • Google Checkout (track sales volume, return requests, etc.)
  • Google AdSense
  • DoubleClick (thanks for the reminder Dan)
  • Google Analytics
  • Google Feedburner feed distribution
  • Google reader
  • iGoogle homepage (along with Google Gadgets)
  • Google YouTube (embeds, views, subscribers, etc.)

In that type of market, effective SEO morphs into marketing. Until that day comes keep link spamming building!

Does Google Spy on its Customers?

Sometimes people think I am a cynic when I mention things like "avoid Google Analytics," but you never really understand how Google perceives the web until they chose to try to wipe you out. Jay Weintraub recently posted about how he was permanently banned from AdWords because one of his employees accessed his company account AND their personal account from the same IP address.

A person who has access to the company's AdWords accounts has their own AdWords account. They are a good employee and don't work on their personal project at the office, but as a good employee they do work on your business while at home. By accessing both AdWords accounts on the same machine, Google decides both accounts are the same person despite their being different. Worst case, the employee breaks the rules with their personal account. The employer finds their campaigns stopped and can't get them back online.

There was a point in time when some people who practiced PPC claimed that it was safer than SEO, but in the face of

it is certainly a bit harder to claim that PPC is a safe and effective long-term marketing strategy.

Worse yet though, if Google is willing to ban people paying them millions of dollars, what happens to those who publish AdSense ads and are dependant on Google for revenue as well? What happens to those who are dominating the organic rankings without paying their Google tax?

If Google connects up all that data to use against their advertisers, surely they are using the same data to hand out punishment to other parties as well. Just by using AdSense you make your business more reliant on Google (and eventually more likely to be penalized by Google). Just by using Google Analytics you are leveling the competition field for everyone except yourself. And the problem there is that you can't get away with many of the things that your competitors do.

How many emails like this could I send out before my site would get banned?

My threshold and the threshold for Sallie Mae are two different numbers. I wonder if I offer PageRank 6 (and above) bloggers a free membership to my site if they linked to me (like Demand Media does) if I would be deemed a spammer?

As Google's stranglehold on the web grows (Google just closed the DoubleClick deal - giving them access to a lot more affiliate data) the solution to remove yourself from risks associated with Google's influence is to create a business that is not reliant on Google...a brand and a destination. But to do that you really need to ignore Google's advice.

And if you are an end consumer and searcher, you are hosed already. Ads already track you and know who you are, and Google has patents to target ads to leverage and exploit your mental weaknesses:

Examples of information that could be useful, particularly in massive multiplayer online RPG’s, may be the specific dialogue entered by the users while chatting or interacting with other players/characters within the game. For example, the dialogue could indicate that the player is aggressive, profane, polite, literate, illiterate, influenced by current culture or subculture, etc. Also decisions made by the players may provide more information such as whether the player is a risk taker, risk averse, aggressive, passive, intelligent, follower, leader, etc. This information may be used and analyzed in order to help select and deliver more relevant ads to users.

Hat tip to Andy for the link to Jay's post.

Spying on Google: What is Spam? What is Relevant? Read This to Find Out

You can read a lot about what search engineers want by looking at how the search results change. You can learn a bit more by listening to how they try to guide / influence / manipulate the market while engaging in discourse. And you can learn a lot more by reading their guidelines for how they expect people to rate search quality.

The reasons that the internal communication documents are so powerful are

  • they do not discuss search from "in an ideal world" approach, but cover the current marketplace from a pragmatic standpoint solving real issues
  • the documents may display algorithmic holes that require manual intervention
  • the documents may show clues as to the hints search engineers give raters to quickly infer quality and relevancy
  • the documents show issues or relevancy infractions that merit a lower relevancy rating
  • the documents show how ratings change based on the quality and availability of information on the topic
  • how something that is considered spam in some instances is considered fine if it is associated with a large well known brand
  • how things that are relevant in some verticals are irrelevant in others if Google runs a competing offering
  • the current documents are the result of years of back and forth communication between quality raters and search engineers

For organic search junkies the Google Gods have tossed us another gift. An SEO Black Hat member discovered an April 2007 Google Evaluation Guidelines document, referenced here.

In April 2007 Yahoo! Music did offer lyrics, but the official Google query evaluation guidelines from that time-frame stated

Exceptions (Scraped Content that is not Spam) Lyrics, poems, ringtones (that the user programs rather than downloads), quotes, and proverbs have no central authority. When you see pages with this content, you cannot judge it to have been copied, and the pages should not be assigned a Spam label. Unfortunately, some content is written specifically for Spam pages and you will not find it on another source.

Although you may be convinced that the intent is to deceive, if the content makes sense and appears original, you will not be able to label such pages Spam.

In a sense, if a spammer or copyright violator is the only person providing the information online for free it is not considered spam, even if it would have been deemed spam by the traditional guidelines. The same is likely true if Google is trying to work on business negotiations to own that content directly (how could they state there are no central authority sites for music lyrics when sites like Yahoo! Music offer them?).

Because Google has not partnered up with the record labels to create a Google database of lyrics somehow those copyright violations are deemed acceptible even if they would have been judged as spam under Google's typical guidelines. And, of course, after Google creates a relationship to get those lyrics hosted on Google.com, many of those lyrics sites will indeed be deemed as spammers.

In other words, spam is only spam if it does not help Google achieve its business objectives. Who cares about the laws. Good to know.

You can compare the current query evaluation and rater document to the 2003 versions I referenced here and here. And the 2007 document has been leaked online.

Pages