Are Content Mills the Future of Online Publishing? What Comes Next?

Aaron's discussed content mills in his interview with Tedster yesterday.

What is a content mill?

A content mill is a site that publishes cheap content. The content is either user-contributed, paid, or a mix of the two. The term content mill is obviously pejorative, the implication being that the content is only published to pump content into search engines, and is typically of low value in terms of quality.

The problem is that some sites that publish cheap content may well provide value, but it depends who is reading it. For example, a forum might be considered a content mill, as it contains cheap, user-generated content of little value to a disinterested visitor, or a forum might be a valuable, regularly updated resource provided by a community of enthusiasts!

Depends who you ask.

As Aaron says, content mills are all the rage in 2010. Let's take a closer look.

Why Are SEOs Interested In Content Mills?

This idea is nothing new. It's actually white-hat SEO strategy, and has been used for years.

  • Research keywords
  • Write content about those keywords
  • Publish content and attempt to rank that content in search engine results
  • Repeat

If you can publish a page at a lower cost than your advertising return, then you simply repeat the process over and over, and you're golden. Think Adsense, affiliate, and similar means to monetize pages. Take a look at Demand Media.

The Problem With Content Mills

One of the problems with content mills is that in an attempt to drive the production cost of content below the predicted return, some site owners are producing garbage content, usually by facilitating free contributions from users.

At the low end, Q&A sites proliferate wherein people ask questions and a community of people with opinions, informed or otherwise, provide their two cents worth. Unfortunately, many of the answers are worth somewhat less than two cents, resulting in pages of little or no value to an end reader. I'm sure you've seen such pages, as such pages often rank well in search engines if they are published on a domain with sufficient authority.

Some sites, like Mahalo, not only automate their page creation, but the use that automated page to generate automate related question pages as well. The rabbit hole has no bottom!

At the other end of the spectrum, we have sites that publish higher-cost, well researched content sourced from paid writers. A traditional publishing model, in other words. Generally speaking, such pages are of higher value to end user, but the problem is that the search engines can't appear to tell the difference between these pages and the junk opinion pages. If the content mill has sufficient authority, then the junk gets promoted.

And there are many examples in between, of course.

As Tedster mentioned, "the problem here is that every provider of freelance content is NOT providing junk - though some are. As far as I know, there is no current semantic processing that can sort out the two. It's tough to see how this could be quickly and effectively reined in, at least not by algorithm. I assume that this kind of empty filler content is not very useful for visitors — it certainly isn't for me. So I also assume it must be on Google's radar.".

The Future Of Content Mills

I think Tedster is right - such sites will surely appear on Google's radar, because junk, low value content doesn't help their end users.

It must be a difficult problem to solve, else Google would have done so by now, but I think it's reasonable to assume Google will try to relegate the lowest of the low-value content sites at some point. If you are following a content mill strategy, or considering starting one, it's reasonable to prepare for such an eventuality.

The future, I suspect, is not to be a content mill, in the pejorative sense of the word. Aim for quality.

Arbitrary definitions of quality are difficult enough, as we've discussed above. Objective measurement is impossible, because what is relevant to one person may be irrelevant to the next. The field of IQ (information quality) may provide us some clues regarding Google's approach. IQ is a form of research in systems information management that deals specifically with information quality.

Here are some of the metrics they use:

  • Authority- Authority refers to the expertise or recognized official status of a source. Consider the reputation of the author and publisher. When working with legal or government information, consider whether the source is the official provider of the information.
  • Scope of coverage - Scope of coverage refers to the extent to which a source explores a topic. Consider time periods, geography or jurisdiction and coverage of related or narrower topics.
  • Composition and Organization- Composition and Organization has to do with the ability of the information source to present it’s particular message in a coherent, logically sequential manner.
  • Objectivity - Objectivity is the bias or opinion expressed when a writer interprets or analyze facts. Consider the use of persuasive language, the source’s presentation of other viewpoints, it’s reason for providing the information and advertising.
  • Validity - Validity of some information has to do with the degree of obvious truthfulness which the information carries
  • Uniqueness - As much as ‘uniqueness’ of a given piece of information is intuitive in meaning, it also significantly implies not only the originating point of the information but also the manner in which it is presented and thus the perception which it conjures. The essence of any piece of information we process consists to a large extent of those two elements.
  • Timeliness - Timeliness refers to information that is current at the time of publication. Consider publication, creation and revision dates.
  • Reproducibility

Any of this sound familiar? It should, as the search landscape is rife with this terminology. This is not to say Google look at all these aspects, but they have used similar concepts, starting with PageRank.

As conventional SEO wisdom goes, Google may have tried to solve the relevancy problem partly by focusing on authority, on the premise that a trusted authority must publish trusted content, so the pages of a domain with a high degree of authority receive a boost over those with lower authority levels. But this situation may not last, as some trusted sources, in terms of having authority, do, at times, publish auto-gen garbage content. Google may well start looking at composition metrics, if they aren't doing so already.

This is speculation, of course.

I think a good rule of thumb, for the time being, should be "will this page pass human inspection?". If it looks like junk to a human reviewer in terms of organization, and reads like junk in terms of composition, it probably is junk, and Google will likely feed such information back into their algorithms. Check out Google's Quality Rater Document from 2007 which should give you a feel for Google's editorial policy.

Published: April 7, 2010 by A Reader in publishing & media

Comments

hugoguzman
April 7, 2010 - 8:38pm

I would argue that it's a matter of if, not when. In fact, I think we're already there.

There will still be room for high quality and hyper niche/local publishing, but mills will make up the bulk (as they do in other verticals like farming, shopping, etc.).

Martypants
April 7, 2010 - 9:06pm

I think the bullets you list as "quality" indicators are very well done, and very likely to be in the mix.
It may be speculation, but it is pretty darned insightful.

PeterD
April 8, 2010 - 1:30am

They're from Wikipedia. I'll add the link.

http://en.wikipedia.org/wiki/Information_quality

halfacat
April 8, 2010 - 3:11am

Great article, definitely getting my brain in a mess here. It will be interesting to see how the Engine handles Content Spam that is good enough for the majority of the population. How do they prevent it from filtering someone with bad spelling or grammar. The quality of these Mills in terms of replicating intelligence is only going to get better and considering the reading capability of your average american should result in some interesting problems for Google and the like.

Do we also throw Wolfram in here?
Should we be happy that they make more sense than most any conversation on Twitter?

April 8, 2010 - 9:03am

Well even search engines are getting in on the thin content aggregation & publishing model (and trying to rank such pages in other search engines)... Hitwise shows the most recent monthly growth of Ask.com's search traffic at 21% ... in a month!

http://community.seobook.com/attachments/seo-blog/361d1270717160-ask-com...

CureDream
April 8, 2010 - 4:59pm

is that Google's not good at question answering. "Content mills" are trying to work their way into the gaps.

I see "web blight" from the viewpoint of users. If Mahalo or somebody else can autogenerate pages that satisfy user needs well, I think that's fine. The trouble I've got with Mad Lib pages like

"Where can I find a gay jewish brazillian proctologist who plays the sitar, accepts medicaid, and likes dogs in Lincoln, NB?"

is that usually the page doesn't answer the questions.

Myself, I'm working on a "content mill" that profitably solves a problem with conventional I.R. systems fail. Behind it all is a commitment to quality which is better than the status quo. Now yes, I'm using S.E.O. tactics to convince Google to serve my results, but that's just because there's no better channel to serve user needs out there.

April 8, 2010 - 6:42pm

I loved your content example there CureDream. It is equally ridiculous AND representative :D

Internet Market...
April 10, 2010 - 10:49pm

Having appropriate and timely content is so important to achieving a high page rank and gaining website traffic. It is interesting that although the content mills can be helpful for SEO, Google can also disregard these searches for being spam. Thanks for the insightful information, I'm excited to read more!

rp_joe
April 12, 2010 - 10:48pm

The problem with creating content is that you never know if it will be read. As Patrick McKenzie said recently, you can make changes based on what you think is good but AB testing may show otherwise.

If you place cheap content on a site and watch the stats, you can then rewrite the pages that are getting viewers. Its a broad shotgun approach. I believe Google even likes this because I have seen on my sites that they move the sites up the SERP because its fresh content.

April 13, 2010 - 4:21pm

I am not against cheaper content existing. Quite to the contrary in fact :D

But the point you made about incremental improvements...that isn't really part of the Demand Media model :D

Add new comment

(If you're a human, don't change the following field)
Your first name.
(If you're a human, don't change the following field)
Your first name.
(If you're a human, don't change the following field)
Your first name.