From the above data (and the aggressive promotion of YouTube content after the roll out of universal search) it is fair to state that house content is favored by the Google algorithm.
Another Knol Test
Maybe we are being a bit biased and/or are rushing to judgement? Maybe a more scientific effort would compare how Knol content ranks to other content when it is essentially duplicate content? I did not want to mention that I was testing that when I created my SEO Basics Knol, but the content was essentially a duplicate of my Work.com Guide to Learning SEO (that was also syndicated to Business.com). Even Google shows this directly on the Knol page
Google Knows its Duplicate Content
Is Google the Most Authoritative Publisher?
Given that Google knows that Business.com is a many year old high authority directory and that the Business.com page with my content on it is a PageRank 5, which does Google prefer to rank? Searching for a string of text on the page I found that the Knol page ranks in the search results.
If I override some of Google's duplicate content filters (by adding &filter=0 to the search string) then I see that 2 copies of the Knol page outrank the Business.com page that was filtered out earlier.
Some may call this the Query Deserves Freshness algorithm, but one might equally decide to call it the copyright work deserves to be stolen algorithm. Google knows the content is duplicate (as proven by the notification they put on their page), and yet they prefer to rank their own house content over the originally published source.
Hijacking Your Rankings via Knol - Google Knoljacking
Where this becomes a big issue is if a person...
posts your content to Knol
and buys/rents/begs/steals/spams/borrows a couple decent inbound links
they can get you filtered out of the search results - even if your site is an authority site. Bad news for anyone publishing copyright work online.
Google Knol Undermines the Creative Commons Spirit
Some new publishers decide to license their work via Creative Commons (hoping to be paid back based on the links economy), but Google wants no part in that! All outbound links on Knol are nofollow, so even if a person wants to give you credit for your work Google makes it impossible to do so.
Google Voids YOUR Copyright
Why do I get enraged by this sort of activity? I remember when one of my sites was voted against, and Google paid someone to steal it and wrap it in AdSense. The person who stole my content outranked me for my own content because a Google engineer thought that was reasonable and fair.
www.seobook.com very famous book from Aaron Wall its really good but paying $79 its really sucks so yesterday, I think why not to share this book to my friends etc openly in text by decompling Acrobat files
Can a casual mention get it removed? Nope. Can flagging it as spam and highlighting that it is stolen copyright content get it removed? Nope. I need to file a DMCA request to get it removed. (Or maybe they will remove it out of embarrassment after I hit publish on this post...we shall see!)
When we consume media one of the biases we often overlook is our own. When NPR created their Budget Hero commentors quicky stated things like "it's too liberal" and "they used right wing think tank as a *credible* source." Such statements reveal as much or more about the reader as they do about the media.
When you know a field better than most people producing media in your space it is easy to denounce everyone who knows less than you as being full of crap. Dr. E. Garcia, a brilliant Information Retreival scientist, makes a habbit out of roasting me because I have a more practical and less academic experience in the space.
While he feels my work is not up to his standards, the work he denounces helps people gain top rankings in Google and is getting free inbound links. Even better, I syndicated some free Creative Commons licensed content on latent semantic indexing called Patterns in Unstructured Data. Dr. Garcia thinks I know nothing about the topic, but when the original source went offline I started gaining citations as the source for that work too! Am I a leading expert on academic information retreival? No. I read some of Gerard Saltan's work, but my experience are more well aligned with finding the criteria necissary to rank in Google.
Web Designer Wall recently published an SEO guide for designers. In it they stated "Most people aim for a keyword density of 2%." I am not sure where they got that stat from, but generally the document was fairly well done and I am glad they cited me as a resource. I could be envious of the exposure their article got and try to rip it to shreds, but where is the benefit? Dr. E. Garcia flaming me generally does nothing but flow PageRank my way. So be it...you know you are doing something right when people hate you. ;)
But faced with massive data, this approach to science — hypothesize, model, test — is becoming obsolete. ...
There is now a better way. Petabytes allow us to say: "Correlation is enough." We can stop looking for models. We can analyze the data without hypotheses about what it might show. We can throw the numbers into the biggest computing clusters the world has ever seen and let statistical algorithms find patterns where science cannot.
Mahalo offers virtually nothing original or of value, but it is worth more than most websites because Jason was good at making people angry. There is greater value in evoking emotions than being the person who's chain is jerked by people writing with the express intent of making you angry.
Most popular free online content contains factual errors, but it is still popular due to an affinity readers have for the author, and/or the ease of understanding what they are writing.
The more you know the easier it is for you to denounce someone who knows less than you in your field, though doing so will rarely build brand loyalty, and often attracts the wrong kinds of customers. Call this phenomena the Threadwatch effect...good for attention, but bad for monetization. This is especially true since people new to the market are willing to spend money to build their businesses, but more established market players are more ad blind and more cynical to most commercial offers.
If you are selling stuff online you are not your own target audience. Every field has far more novices than experts, and experts rarely buy because they feel they already know everything and have got burned so many times in the past.
Most online content is recycled. Local substitution is a fact of life, and probably has been for thousands of years, only now it is faster and cheaper. Unless you add pretty pictures, write for novices, and aggressively market your best content at launch someone is going to recycle it (with errors) and get credit for your work. Competing publishers can polish up posts you wrote *years* ago and be called a visionary for doing so! If you are not making your work accessible to novices then you lose.
The more mindshare you have in your space the easier it is to get weak references from people outside your space who occasionally graze upon your topic. When people who know little about your topic look at your field they care more about format than accuracy because they typically do not realize when they are reading factual errors.
From a business perspective, one of my bigger errors with this site is that I tend to write more for the cynical person who loves SEO than for people newer to the field who are more likely to buy. That is not to say that we do not have people sign up every day, but that we are only targeting the fraction of the customers that we could.
The hard part about changing is that I typically write about what interests me the most, using my own interests as a filter. Dumbing things down would be walking / swimming in uncharted territories, and I don't think I would enjoy it all that much.
Tim O'Reilly thinks the web is much bigger than search, and actually likes the Yahoo! deal. I think it is easy to overlook how Google is quietly winning marketshare in many non-search markets - and how they can easily build such positions using their brand recognition & distributed ad system. A throw away quote from Tim's post is the title of this post:
At O'Reilly, we always say "Create more value than you capture." All successful companies do this. Once they start capturing more value than they create, their market position erodes, and someone displaces them. It may take a while but it happens eventually.
Two of the easiest ways to ensure short term growth are to
undermonetize to ensure you have a better user experience worth talking about
create some content that is easy to monetize aggressively, and leave most content clean and pure while only monetizing the most profitable content
use ad units that do not look or feel like ads, and/or ads that add value to the experience
Amazon just created MP3 clip widgets that pay 10% payout on MP3 downloads. You can create a list of your favorites as content, while displaying ads. Those who run large communities may be able to make a decent income selling culture.
The downfall of most automated content solutions is the perception that because it is automated it is spammy. But that perception may have changed recently, when the NYT published an article about Philip M. Parker. Mr. Parker created a sophisticated set of algorithms which has allowed him to automatically generate over 200,000 books.
He points out that once he has trained the computer to take data about past sales and make complex calculations to project future sales, each new book costs him about 12 cents in electricity. Since these books are print-on-demand or delivered electronically, he is ahead after the first sale, he said.
This video explains a bit more of the process
And when it comes down to content quality, a person who reviewed one of the books on Amazon.com, stated
“The book is more of a template for ‘generic health researching’ than anything specific to rosacea. The information is of such a generic level that a sourcebook on the next medical topic is just a search and replace away.” ... Mr. Parker was willing to concede much of what Mr. Pascoe argued. “If you are good at the Internet, this book is useless,” he said, adding that Mr. Pascoe simply should not have bought it.
So this is a case of self-proclaimed substandard production, and because he is first to market it is fine. But the profit margins are probably bigger than Google's. The commercial web is just over a decade old and this sort of technology already exists. Where will automated content generation be in 5 years? In 10 years?
"We're heading down a path where it no longer suits our business needs to work with ad networks," said Eric Johnson, executive vp, multimedia sales, ESPN Customer Marketing and Sales. Sources say that ESPN would like to rally support from other publishers behind this move and ultimately tamp down ad networks' growth. Turner's digital ad sales wing is rumored to be considering a similar move, though officials said no decisions are imminent.
The two logical options from there are
set a floor price on house content and show fewer ads to offer a better user experience
look at currently hot stories, key markets in the weeks and months ahead, and market positions where you are close to leading but do not yet dominate and advertise your own products and services
add interactive features to your own site which increase brand loyalty and reduce content creation costs...which end up making the ad networks a more viable offering for back-fill content
If the ad networks are too cheap buy out inventory on competing sites to further distance yourself from them as the market leader.
All of those strategies allow you to buy market-share in your vertical on the cheap. The more of your market you own the better you will be able to sell ads for. If ESPN was 60% of the sports market Nike would be required to buy ads with them, largely based on ESPN's terms. Part of being remarkable is about creating featured content, but an equally important piece is making sure you are branded as the leading source. There is no better place to market your content and ideas than your own site.
Everyone who is popular gains detractors along the way. And detractors tend to flock together and vote for other people who share their opinions. That trend virtually guarantees any valuable brand will have dirt ranking somewhere in the search results. The more valuable the brand gets the more people who will gun to unearth the dirt.
With so much competition for attention, many publishers believe they need to offer bold predictions quickly in order to be remarkable. And when those predictions go wrong people are creating documentaries about how wrong you are. Jim Cramer recently mentioned that Bear Stearns was fine right and talked about how unsophisticated the naysayers were (and how they never did their homework)
Days after Jim said Bear Stearns was fine, they were bought out for pennies on the dollar. Not only does Comedy Central offer their take, but other mini-documentaries and flames have appeared
If you are a publisher and your business model requires you to find new customers every day then you need to keep competing for attention. In many markets that will put you in a Jim Cramer-like position where you end up making some bad calls that cost you a lot of money in the long run.
Final Notes on Spam
When trying to decide if a page is Spam, it is helpful to ask yourself this question: if I remove the scraped (copied) content, the ads, and the links to other pages, is there anything of value left? if the answer is no, the page is probably Spam.
Lets take a look at a typical Mahalo page
That page has a #1 ranking in Google with 0 unique content and 0 value to the searcher (according to Google's above guidelines).
How can Jason Calacanis create a site that poor while slagging off everyone else as a spammer? *None* of my sites fit Google's internal webspam guidelines anwhere near as closely as Jason's site does here. Will Google engineers make the right call on this spam site? Only time will tell. And the results will be quite telling, especially when inline affiliate ads further pollute this page. The Jason Calacanis spam legacycontinues.