Why Mahalo (and Other Content Scrapers) Render Google's Spam Team Flaccid

I was talking to a friend yesterday who was at a conference where Demand Media's CEO spoke, and he stated that nobody asked the big question: "what if google decides they don't like you anymore?"

Then I got thinking about how Google torched Squidoo after Jason Calacanis went on his public campaign to rebrand it as spam. But today under the same level of scrutiny, how is Mahalo (which scrapes millions of 3rd party content listings *without any editorial filter*) not spam? Squidoo at least donates $10,000 a month to charity. Mahalo just "borrows" your content without permission and keeps all the cash.

In the past Google hated content scrapers pretty bad. How bad? Well a guy named Teeceo used to make scraper sites, and here is how Matt Cutts described his work:

In the chat room, I said hello to teeceo, but I know the stuff that he was doing and it’s shoot-on-sight. I think anyone who is blackhat knows (or should know) that I’m happy to talk to anyone, but that we’ll still take action on the spam we find.

Imagine taking that approach to hunting search spam all day long, and then ignoring the *fact* that Mahalo is scraping millions of third party listings and using them as content with no editorial filtering.

Then I started thinking about why the Google spam team could ignore something as outrageous as Mahalo, especially when it was built by a guy who was a false anti-spam evangelist. Is it because Jason is a good guy? No. Is it because there is some actual editorial vetting of the content? no. Is it because Google is getting a cut of the AdSense revenues? Google doesn't need the short term cash flow (look at all the affiliate AdWords advertisers they just torched), so that is too cynical of a view.

Yes Google wants display inventory (their biggest opportunity for 2010 according to the quarterly call), and these "content" websites have already given themselves over to Google as inventory. But it must be something deeper than that. So I started thinking about it from a longterm strategic level...

Google won't penalize sites like Mahalo (even though they blatantly violate Google's guidelines) because Google *wants* to use the works of companies like Mahalo, Demand Media, and Aol to lower the value of other content and bankrupt a lot of the traditional media companies.

Why would Google want to do that?

There is excessive duplication in the marketplace. The faster that duplication is driven out of the marketplace the more desperate companies will be to cut deals with Google. And while there is a down market Google can drive companies out of the market and just claim that it was the economy that did it (much like how Mahalo used the down economy as an excuse to fire most of their editorial staff and replace them with content scraping robots).

Once a lot of media companies are bankrupted, the market is far more efficient, and there are fewer mouths to feed, that means Google can squeeze greater profits margins out of the media ecosystem by getting a fatter cut of the ad revenue.

Currently this shift is risk free because almost nobody understands how the marketplace works. Sure Paul Kedrosky and Mike Arrington blogged about the search results getting spammier, but until you frequently read the above listed sequence on sites outside of the SEO industry there is no damage to the Google brand in them turning the internet into a cesspool.

Once it starts harming the Google brand then I suspect them to act quickly and decisively. And sites like Mahalo will see a sharp drop in traffic. Jason better milk it while he can. The clock is ticking.

Published: January 29, 2010 by Aaron Wall in publishing & media

Comments

gentlesavage
January 29, 2010 - 4:37pm

I hope the clock is ticking. Most of the time I can't stand searching on Google anymore. The first page of Google on most searches now return all those crap websites --eHow, about.com, ezinarticles, mahalo, etc. It's getting harder everyday to find a real website with quality info in Google.

WaveShoppe
January 29, 2010 - 6:01pm

Aloha Aaron, I just signed up so I could comment on Mahalo’s scraping. Recently I put in a lot of time and research to create a Blog post that addressed one of our industries (apparel) biggest misconceptions and posted it on our company Blog. I thought it was quite informative and felt it would be quite helpful to the reader, the problem is that no one will immediately get the opportunity to read it because as soon as I posted it, Mahalo scraped it. We all know what happens when a big site victimizes the little guys content.

I don’t know this Jason guy, but its pretty disappointing to put in all that effort just to see that someone stole your thunder, as well as apparently making money off my time and research.

Mahalo means “thank you” how ironic.

servercraft
January 29, 2010 - 6:32pm

Is it technically impossible to prevent them from scraping your site?

January 29, 2010 - 6:48pm

Yes, because they do not scrape directly. They scrape through Google...so to block them from scraping your content you have to block Google from indexing it.

So to turn off their scraping you need to not get any search traffic from Google. FAIL!

WaveShoppe
January 29, 2010 - 7:36pm

But in support of Google and all the other search engines, I don’t ever recall a customer mentioning that they couldn’t find one of our products or store via a search engine. Though it would be nice to see the reach of Local extended to customers in the outlying counties. E.g. Riverside and San Bernardino. But as a small business owner I have to be appreciative of the fact that another business even sends prospective customers to the showroom.

The recession is beating the daylights out of small businesses so we need all the help we can get.

January 29, 2010 - 7:46pm

How is a search engine paying another company to steal your content beneficial to your business?

Google would still make billions of Dollars of profit even if they didn't pay companies to steal your content.

WaveShoppe
January 29, 2010 - 8:12pm

Aaron, for the record, I am completely against rewarding someone for bad behavior. But honestly what can someone like myself do about it? I mean at my core I am an apparel maker with a simple website and a physical store. While I am flattered that someone liked my work enough to scrape it, I also admit that it’s a bit of a kick in the teeth to know that they make money off it and without giving me proper credit, so I honestly don’t know how to best address this.

I am having a hard time ripping on the search engines, mainly because I don’t feel they have an obligation to do anything for my business, yet they seemingly do manage to give back. Whereas a content burglar like Mahalo is on the other side of the game, profit without paying any dividends or perks to the true content owners. For me utilizing a search engine is like having a drivers license, it’s a privilege per se and not a right

Martypants
January 29, 2010 - 8:14pm

You wrote: I am flattered that someone liked my work enough to scrape it
This is not the case. They scraped your work because you finished it.

January 29, 2010 - 8:23pm

There was no emotion as an input. The scrapping was automatic. A machine did it...not a person.

And as to this "I am having a hard time ripping on the search engines, mainly because I don’t feel they have an obligation to do anything for my business" well the reason they collect snippets is to have content to sell ads against. When they are doing it on their own site that is a somewhat fair tradeoff, but when 3rd party sites use that snippet as page content to steal your work and rank against you then it is not legitimate.

WaveShoppe
January 29, 2010 - 8:39pm

"when 3rd party sites use that snippet as page content to steal your work and rank against you then it is not legitimate"

I completely agree. By they way Aaron, thanks for letting me post and vent a little. I have been hanging around since the Threadwatch days but I usually don’t have much to say. This particular scraping incident just got under my skin bit more than others.

eonian
January 30, 2010 - 7:26pm

Thank you to inform us about that.

Until Google react, I was wondering if there is a way to block Mahalo robots by a disallow/or a php script linked to a constantly updated database by the community... for example?

You said that they scraped the content through google and it is impossible to block them, but don't they need at some point to go on the website to steal the articles?

If it's possible, I'm sure some some good dev. will be able to create a plugin for wordpress and other CMS to make it easy to use.

Again, if it is possible, it could be a great start to show google that the blogger community has the ability to react when they don't.

Thanks for your input.

January 30, 2010 - 7:26pm

Since Mahalo is scraping data from Google you can't block Mahalo without blocking Google.

eonian
January 30, 2010 - 7:28pm

You said that they scraped the content through google and it is impossible to block them, but don't they need at some point to go on the website to steal the articles?

January 30, 2010 - 7:29pm

No...they scrape your abstract and page titles from Google and then turn that into "content" on their page.

eonian
January 30, 2010 - 7:49pm

I guess we can ban the IP of their website but it will penalize the web surfer not him...

Do you think that a major campaign related by all the SEO website like yours explaining how to identify the IP of mahalo website in order to redirect to a special page would b e an effective answer?

It could be a simple page explaining to the visitor what the problem is about (protecting quality and original content) and explaining them how to ban mahalo website from google search result page?

Just wondering, but perhaps a firefox plugin can be put in place that that will silently censored those kind of scrapping website. The redirect page will then invite the visitor to install it.

January 30, 2010 - 7:40pm

Doubtful. Either Google cares about content theft or it does not. All we can do is make the issue known publicly.

eonian
January 30, 2010 - 9:14pm

I don't think we should look at google expecting them to solve all the issue, no?

Firefox and chrome (I think) is able to tell you when you reach a website they qualified as potentially armful.
Plugin like SEOquake are able to intercept google result page and modify it.
Why it wouldnt be possible to put in place a plugin that will intercept the google result page and just remove Mahalo website if they are as armful to original content's authors? Just to send a clear message to him and google by the same way (if you don't do it, we are doing it ourself)?

January 30, 2010 - 9:01pm
  • Google is scraping the content
  • Google is allowing the syndication of the scraped content
  • Google is wrapping the scraped content in their ads
  • Google is ranking the scraped pages in their search results

It is 100% Google's fault.

In response to Google's actions even their own CEO calls the web a cesspool. That is pretty bad!

eonian
January 30, 2010 - 9:12pm

I understand but I was more asking about a the different possibility to react to it. Thanks for taking the time to answer anyway.

tom888
January 30, 2010 - 9:17pm

"dog eat dog" or "a scraper [google] scraped by a scraper [malaho]".

It would be best if it was possible to block google to display any abstract under search results. But they won't introduce a new meta tag because they are a giant parasite.

MyContent
February 1, 2010 - 4:47pm

[site] is another mass-scraper in action that currently is ranking quite well in Google. They are basically a lyrics site, which, as everyone knows, are a dime a dozen on the web.

To boost their rankings and size, they have been using automated methods to MASS-scrape review sites, and it's working. They are ranking very well for many relevant review searches.

Because they are so big, they are getting away with it. Meanwhile, smaller sites and blogs are at best being outranked by this site. At worst, they are being completely booted by Google's duplicate content filter/penalty. Google is benefiting from this theft, too. They are using Google Ads on the site.

Clearly, the lesson here is that the key to ranking in Google is NOT to create unique, original content. Rather, it is to create an enormous site using content culled from everywhere else and slap some Google ads on it.

Isn't there supposed to be some sort of review of sites participating in Google's ad network? How does a site like this (or Mahalo) get a good quality score for regurgitating information that already is available?

February 1, 2010 - 8:55pm

Years ago Google even had the web's #1 warez site in their ad network :D

AJ_Kohn
February 5, 2010 - 11:56pm

Lets face it, many of the duplicate content tools Google has in place aren't very sophisticated. You could argue that's on purpose, but I think it's more to do with the fact that some at Google felt that scraper intensive sites wouldn't be valuable and the rest of the algorithmic signals would take care of these bad actors.

That hasn't worked.

The problem is two fold. First, people actually eat these aggregator pages up, whether it's a short-attention span or just that they don't know better. Sites like Mahalo or Demand Media create McDonald's content - and it is fulfilling many of these queries. The new short clicks versus long clicks measurement probably shows that McDonald's content is delivering (some) value to users.

But the only reason they're really getting any type of algorithmic exposure is 'trust and authority' which is being derived from links. Mahalo and Demand Media (and others) have invested in link building efforts. The revenue share model - coupled with instructions on how to link build - provide a huge incentive to build links.

Could Google simply decide to put the smack down on these sites - the ones we've all identified. Sure. But then every other slippery slope practice will be pointed at. Why haven't you smacked these guys? Or them over there?

Suddenly, Google's making it personal - and I doubt anyone wants that.

No, Google will want to fix this algorithmically. That means a better duplicate content mechanism (which I don't think will happen) or a change in how they calculate trust and authority (which I think may happen).

How much they change the trust and authority signal will be an indication of the direction of Google search quality. Are they willing to accept an overwhelming amount of McDonald's content through mass-marketing exposure (e.g. - embedded link networks), or will they even the playing field and let Gourmet content compete side by side McDonald's - may the best content win.

February 6, 2010 - 12:51am

Suddenly, Google's making it personal - and I doubt anyone wants that.

Considering how Google profiles top SEOs and hunts them while ignoring the bad actors in plain daylight, I am not sure how I could agree with that sentence.

With Google (or at least since Matt Cutts started the spam team) it has always been personal. Check out this video if you think otherwise.

When Rand outed my affiliate program for using 301 redirects (after reading the tip in our member's area) Matt promptly burned our affiliate program to the ground. Since then, Google engineers have publicly stated that 301 redirecting affiliate links is legitimate (and they do pass link juice for hundreds to thousands of corporations...just not my site), and there was even an SEOmoz blog post about how they plan on taking their affiliate program in house to get the 301 benefits.

With the spam team it has always been personal.

eHow was a domain bought for its link equity, then they poured low end content into it, and now they are incentivizing people to build more links into the trash. Mahalo was hyped as the anti-SEO, is now a glorified scraper site, and sells corporate SEO services.

Add new comment

(If you're a human, don't change the following field)
Your first name.
(If you're a human, don't change the following field)
Your first name.
(If you're a human, don't change the following field)
Your first name.