Getting Granular With User Generated Content

Apr 24th

The stock market had a flash crash today after someone hacked the AP account & made a fake announcement about bombs going off at the White House. Recently Twitter's search functionality has grown so inundated with spam that I don't even look at the brand related searches much anymore. While you can block individual users, it doesn't block them from showing up in search results, so there are various affiliate bots that spam just about any semi-branded search.

Of course, for as spammy as the service is now, it was worse during the explosive growth period, when Twitter had fewer than 10 employees fighting spam:

Twitter says its "spammy" tweet rate of 1.5% in 2010 was down from 11% in 2009.

If you want to show growth by any means necessary, engagement by a spam bot is still engagement & still lifts the valuation of the company.

Many of the social sites make no effort to police spam & only combat it after users flag it. Consider Eric Schmidt's interview with Julian Assange, where Eric Schmidt stated:

  • "We [YouTube] can't review every submission, so basically the crowd marks it if it is a problem post publication."
  • "You have a different model, right. You require human editors." on Wikileaks vs YouTube

We would post editorial content more often, but we are sort of debating opening up a social platform so that we can focus on the user without having to bear any editorial costs until after the fact. Profit margins are apparently better that way.

As Google drives smaller sites out of the index & ranks junk content based on no factor other than it being on a trusted site, they create the incentive for spammers to ride on the social platforms.

All aboard. And try not to step on any toes!

When I do some product related searches (eg: brand name & shoe model) almost the whole result set for the first 5 or 10 pages is garbage.

  • Blogspot.com subdomains
  • Appspot.com subdomains
  • YouTube accounts
  • Google+ accounts
  • sites.google.com
  • Wordpress.com subdomains
  • Facebook Notes & pages
  • Tweets
  • Slideshare
  • LinkedIn
  • blog.yahoo.com
  • subdomains off of various other free hosts

It comes without surprise that Eric Schmidt fundamentally believes that "disinformation becomes so easy to generate because of, because complexity overwhelms knowledge, that it is in the people's interest, if you will over the next decade, to build disinformation generating systems, this is true for corporations, for marketing, for governments and so on."

Of course he made no mention in Google's role in the above problem. When they are not issuing threats & penalties to smaller independent webmasters, they are just a passive omniscient observer.

With all these business models, there is a core model of building up a solid stream of usage data & then tricking users or looking the other way when things get out of hand. Consider Google's Lane Shackleton's tips on YouTube:

  • "Search is a way for a user to explicitly call out the content that they want. If a friend told me about an Audi ad, then I might go seek that out through search. It’s a strong signal of intent, and it’s a strong signal that someone found out about that content in some way."
  • "you blur the lines between advertising and content. That’s really what we’ve been advocating our advertisers to do."
  • "you’re making thoughtful content for a purpose. So if you want something to get shared a lot, you may skew towards doing something like a prank"

Harlem Shake & Idiocracy: the innovative way forward to improve humanity.

Life is a prank.

This "spam is fine, so long as it is user generated" stuff has gotten so out of hand that Google is now implementing granular page-level penalties. When those granular penalties hit major sites Google suggests that those sites may receive clear advice on what to fix, just by contacting Google:

Hubert said that if people file a reconsideration request, they should “get a clear answer” about what’s wrong. There’s a bit of a Catch-22 there. How can you file a reconsideration request showing you’ve removed the bad stuff, if the only way you can get a clear answer about the bad stuff to remove is to file a reconsideration request?

The answer is that technically, you can request reconsideration without removing anything. The form doesn’t actually require you to remove bad stuff. That’s just the general advice you’ll often hear Google say, when it comes to making such a request. That’s also good advice if you do know what’s wrong.

But if you’re confused and need more advice, you can file the form asking for specifics about what needs to be removed. Then have patience

In the past I referenced that there is no difference between a formal white list & overly-aggressive penalties coupled with loose exemptions for select parties.

The moral of the story is that if you are going to spam, you should make it look like a user of your site did it, that way you

  • are above judgement
  • receive only a limited granular penalty
  • get explicit & direct feedback on what to fix

What Types of Sites Actually Remove Links?

Apr 9th

Since the disavow tool has come out SEOs are sending thousands of "remove my link" requests daily. Some of them come off as polite, some lie & claim that the person linking is at huge risk of their own rankings tank, some lie with faux legal risks, some come with "extortionisty" threats that if they don't do it the sender will report the site to Google or try to get the web host to take down the site, and some come with payment/bribery offers.

If you want results from Google's jackassery game you either pay heavily with your time, pay with cash, or risk your reputation by threatening or lying broadly to others.

At the same time, Google has suggested that anyone who would want payment to remove links is operating below board. But if you receive these inbound emails (often from anonymous Gmail accounts) you not only have to account for the time it would take to find the links & edit your HTML, but you also have to determine if the person sending the link removal request represents the actual site, or if it is someone trying to screw over one of their competitors. Then, if you confirm that the request is legitimate, you either need to further expand your page's content to make up for the loss of that resource or find a suitable replacement for the link that was removed. All this takes time. And if that time is from an employee that means money.

There have been hints that if a website is disavowed some number of times that data can be used to further go out & manually penalize more websites, or create link classifications for spam.

... oh no ...

Social engineering is the most profitable form of engineering going on in the 'Plex.

The last rub is this: if you do value your own life at nothing in a misguided effort to help third parties (who may have spammed up your site for links & then often follow it up with lying to you to achieve their own selfish goals), how does that reflect on your priorities and the (lack of) quality in your website?

If you contacted the large branded websites that Google is biasing their algorithms toward promoting, do you think those websites would actually waste their time & resources removing links to third party websites? For free?

Color me skeptical.

As a thought experiment, look through your backlinks for a few spam links that you know are hosted by Google (eg: Google Groups, YouTube, Blogspot, etc.) and try to get Google's webmaster to help remove those links for you & let us know how well that works out for you.

Some of the larger monopoly & oligopolies don't offer particularly useful customer service to their paying customers. For example, track how long it takes you to get a person on the other end of the phone with a telecom giant, a cable company, or a mega bank. Better yet, look at how long it took AdWords to openly offer phone support & the non-support they offer AdSense publishers (remember the bit about Larry Page believing that "the whole idea of customer support was ridiculous?")

For the non-customer Google may simply recommend that the best strategy is to "start over."

When Google aggregates Webmaster Tools link data from penalized websites they can easily make 2 lists:

  • sites frequently disavowed
  • sites with links frequently removed

If both lists are equally bad, then you are best off ignoring the removal requests & spending your time & resources improving your site.

If I had to guess, I would imagine that being on the list of "these are the spam links I was able to remove" is worse than being on the list of "these are the links I am unsure about & want to disavow just in case."

What say you?

Why is Great SEO so Expensive?

Mar 21st

Sharing is caring!

Please share :)

Embed code is here.

The below image has a somewhat small font size on it. You can see the full sized version by clicking here.

Why SEO is Expensive.

  • Over 100 training modules, covering topics like: keyword research, link building, site architecture, website monetization, pay per click ads, tracking results, and more.
  • An exclusive interactive community forum
  • Members only videos and tools
  • Additional bonuses - like data spreadsheets, and money saving tips
We love our customers, but more importantly

Our customers love us!

Don't Buy Link Rich Advertorials (Unless You're Google)

Feb 23rd

I understand Google's desire to have a clean editorial signal & not wanting people to manipulate the web graph.

But Google once again isn't following the best practices they dish out for others.

Both of the following are not one-off articles, but are part of a "series" of advertorials for various Google products with direct followed links to AdWords, Google Analytics, Chromebook, & Hangouts.

Check the date on this next one: February 19th, the same day Interflora was penalized by Google. This is something that is an ongoing practice for Google, while they penalize others for doing the same thing.

Is using payment to influence search results unethical unless the check has Google on it?

None of those links in the content use nofollow, in spite of many of them having Google Analytics tracking URLs on them.

And I literally spent less than 10 minutes finding the above examples & writing this article. Surely Google insiders know more about Google's internal marketing campaigns than I do. Which leads one to ask the obvious (but uncomfortable) question: why doesn't Google police themselves when they are policing others? If their algorithmic ideals are true, shouldn't they apply to Google as well?

Clearly Google takes paid links that pass pagerank seriously, as acknowledged by their repeated use of them.

We're Going Google...

Feb 21st

In the search ecosystem Google controls the relevancy algorithms (& the biases baked into those) as well as the display of advertisements and the presentation of content. They also control (or restrict) the flow of marketable data.

For example, a publisher might not get keyword referral data on organic search, but Google passes that data on via advertisements & passes a large amount of data on through their ad network to other ad networks. Consider this:

a DoubleClick tag on the site sent data to two other companies that collect it for various purposes -- Rubicon and Casale Media, representing a "hop." In a subsequent hop, Casale transferred the IMDB data to BlueKai, Optimax and Brandscreen, while Rubicon pushed it to TargusInfo, RocketFuel, Platform 161, Efficient Frontier and the AMP Platform. AMP then sent the data on to AppNexus and back to DoubleClick.

For about a decade being relevant & focused created efficiencies that more than offset any "size = quality" biases that the Google engineers created. However across many verticals that window is closing & it is never a good idea to wait until it is fully closed to adjust. ;)

This shift from relevancy to "size = quality" can be seen in the stock performance of mid-market companies like BankRate & Quinstreet.

Those companies were laser focused on the markets that have significant consumer intent & traffic value, but Google has eroded the affiliate base & ad networks of many of the direct marketing plays for a couple years straight now.

If Google's algorithmic biases are strong enough to literally move the market on companies worth hundreds of millions to billions of Dollars, one is naive to swim against the tide. The market is becoming more bifurcated.

This is why it is so hard to find a great SEO to recommend for small businesses. If that SEO really knows what they are doing & understands the market dynamics, then they probably won't serve the small business end of the market very long, or if they do, they will do so in a way where their continued flow of payments is not tied to performance. It is hard to have a sustainable business operating in a closed ecosystem if you are swimming in the opposite direction of that ecosystem.

In terms of our membership site here, a good slice of our customer base is the expert end of the market.


It is a tiny sliver of the market, but it is a segment that is somewhat well aligned with independent affiliate types & the sort of direct marketing relevancy-minded folks that Google has spent a couple years trying to marginalize as they cater to branded advertisers. We could try to shift our site to make it more mass market, but I prefer to run a site where we both learn & teach, and fear that moving to lower the barrier to entry and push more mass market will destroy what makes the membership site unique & valuable in the first place.

In early Google research they warned about relevancy shifting toward the interest of advertisers.

Currently, the predominant business model for commercial search engines is advertising. The goals of the advertising business model do not always correspond to providing quality search to users. For example, in our prototype search engine one of the top results for cellular phone is "The Effect of Cellular Phone Use Upon Driver Attention", a study which explains in great detail the distractions and risk associated with conversing on a cell phone while driving. This search result came up first because of its high importance as judged by the PageRank algorithm, an approximation of citation importance on the web [Page, 98]. It is clear that a search engine which was taking money for showing cellular phone ads would have difficulty justifying the page that our system returned to its paying advertisers. For this type of reason and historical experience with other media [Bagdikian 83], we expect that advertising funded search engines will be inherently biased towards the advertisers and away from the needs of the consumers.

Perform that same cellular phone search today & that original cited page is nowhere to be found. Today that same search includes Wal-Mart, T-mobile, Samsung, Amazon.com, Best Buy & other well known brands. Search for the more common phrase cell phones & you get the same brands plus local results and shopping results. Awareness is replacing precision.

I think Gabe Newell described it best:

Closed platforms increase the chunk size of competition & increase the cost of market entry, so people who have good ideas, it is a lot more expensive for their productivity to be monetized. They also don't like standardization ... it looks like rent seeking behaviors on top of friction

As Google makes search more complex & mixes in more signals, it is becoming harder to win at the game if your operation is singularly focused on SEO & it is becoming easier to win if your business already has a strong footprint in many other channels which bleeds into your search profile. The following chart is conceptual, but it aims to get the issue across.

If one company is spending significant capital & effort trying to combat the Panda algorithm & another company automatically sees a ranking boost from Panda, then the company with the boost is typically going to see greater ROI from any further investments in SEO.

Having spilled all the above digital ink, back in 2007 we decided to shift away from an ebook model to run a membership site. On and off over the years we have done a bit of consulting outside of running this site, but haven't put significant emphasis on it over the past couple years as we were pushing hard to keep up with the algorithms & keep this site growing. With all the above shifts in place we recently decided to offer SEO consulting again.

Some FAQs on that front...

  • If we work with you, who will be working on our project? The same people who write on the blog & run the community: Peter Da Vanzo, Eric Covino & Aaron Wall.
  • How many clients will you work with? Just a handful at any given time. We prefer to have a deep integration with a few clients rather than a bulk model.
  • Who are ideal clients? Those who know the value of search traffic & already have some general awareness & momentum in the marketplace. Examples of companies we have worked with in the past include: large ecommerce companies, tier 1 web portals, strong start ups & hedge funds invested in the web. Many of these clients already had an in-house SEO team & some were just actively beginning to leverage search.
  • I have a tiny company with a small budget. Could I still work with you? In some cases there might be a fit, but if you feel our consulting is beyond your budget you can of course still join our membership website. Consulting is for those who want a deeper engagement than we can provide through our current membership site model.
  • Can you name some past clients? For the most part, no. Our consulting projects typically come with nondisclosure agreements.
  • Can you fill out an RFP? Most likely not. If you are still shopping around for an SEO, we are probably not going to be a great fit. But if you have known of us for years & know you want to work with us, do get in touch.

Growing the Search Pie

Feb 19th

Growing search marketshare is hard work. At a recent investor conference Marissa Mayer stated that: "The key pieces are around the underpinnings of the alliance themselves. The point is, we collectively want to grow share, rather than trading share with each other."

Part of the reason Yahoo! & Bing struggle to gain marketshare is Google's default search placement payments to Mozilla and Apple. If the associated browsers have nearly 1/3 the market & Chrome is another 1/3 of the market then it requires Yahoo! or Bing to be vastly better than Google to break the Google habit + default search placement purchases.

Danny reported some interesting comments from Nikesh Arora:

  • half of those billions of queries it handles comes from Google partners, rather than searches at Google directly.
  • Arora also said that he expects about 50% of advertising to move online in the next three to five years.
  • he just said ad team looks at ways to make ads not look like ads. I think he meant that positively, like content you want.

A friend sent me a screenshot where he was surprised how similar the results looked between Bing & Google.

If Bing looks too different it feels out of place, if it looks to similar it doesn't feel memorable. And if Google is optimized for revenue generation then Bing is going to have a fairly similar look & feel to their results if they want to earn enough to bid on partnerships.

Another factor helping Google maintain their dominance in search marketshare is the shift of search query mix to mobile, where Google has a 95.8% marketshare.

Mobile search has a significantly higher CTR than desktop search, due in large part to there being less screen real estate. By the end of this year tablets will likely account for 20% of Google's search ad clicks & drive $5 billion in ad revenues. Add in mobile phones with tablets & mobile search will drive 1/3 of paid search clicks by the end of this year.

With mobile becoming such a huge share of search clicks Google is forcing advertisers into buying all platforms with their ad purchase via their enhanced AdWords campaigns. Google builds off that sort of dominance & Yahoo! is only making about $125 million a year in total revenue from Yahoo!'s mobile traffic.

In spite of losing share on browser defaults & mobile, Yahoo! managed to grow their search ad clicks 11% year over year. How was Yahoo! able to do that? In part by quietly dialing up on search arbitrage. They have long had a "trending now" box on their homepage, but over the past year they have dialed up on ads in their news, finance & sports sections that are linked to search queries. Some of these ad units are in the sidebar & some are inline with the articles.

Yahoo! also buys ads on some smaller ad networks & sends those through to a search result with almost no organic results.

Yahoo! has had a long history of search arbitrage, but they typically did it through a partner network which lowered click value. That was part of what lowered their click prices & made them sign the deal with Microsoft (you couldn't even opt out of Yahoo!'s partner syndication until after they signed the deal with Microsoft).

I recently saw the above ad for Bing which highlighted how they want to work with brands, but Bing still has a number of issues they are dealing with on the monetization front: tighter broad matching, smaller ad ecosystem, regional issues with ad targeting, and no serious effort to develop a contextual ad program open to the long tail of publishers. In spite of those issues, the Yahoo! / Bing ad network was finally starting to build a critical mass & Yahoo! responded by signing a deal to carry Google's contextual AdSense ads.

As Google continues to layer contextual search layers into mobile devices, launch their own physical stores, layer their social network into the search ecosystem, expand their venture investments, inserts themselves at an ISP level, shape the news, control a greater share of ad budget with programmatic bidding, control measurements of success, redefine words, scrape-n-displace publishers with the knowledge graph, de-fund competitors, & hyper-target ads at users, their leverage & market dominance will only grow.

Google is great at growing the search pie.

Yahoo!, not so much. ;)

How Rich Will Listings Get?

Feb 19th

As Google has went from ad platform for illicit content (both ways) to host of illicit content & reseller of legit content, they have cracked down on competitors & are now trying to police the ability of other sites to accept payment:

The web search giant, which is embroiled in a long-running row over the way it deals with pirated material, is considering the radical measure so that it can get rid of the root cause instead of having to change its own search results.

Executives want to stop websites more or less dedicated to offering links to pirated films, music and books from making money out of the illegal material. The plans, still in discussion, would also block funding to websites that do not respond to legal challenges, for example because they are offshore.

While Google is partnering with big media (that has long had a multi-polar approach to copyright) Google continues to gain in a game of inches.

Last month Google announced a new format for their image search results, where they pull the image inline without sending the visitor onto the publisher website. At the same time they referenced some "phantom visitor" complaint from publishers to justify keeping the visitor on Google & highlighted how there were now more links to the image source. If publishers were concerned about the "phantom visitor problem" we wouldn't see so many crappy slideshow pageviews.

Google's leaked remote rater guidelines do mention something about rating an image lower under certain situations like where the author might want attributed for their work that they are routinely disintermediated from.

On Twitter a former Google named Miguel Silvar wrote: "If you do SEO and decide to block Image Search just because it's bringing less traffic, you can stop calling yourself an SEO expert."

Many "experts" would claim that any exposure is good, even if you don't get credit for it. Many clients of said "experts" will end up bankrupt! Experts who suggest it is reasonable for content creators to be stripped of payment, traffic & attribution are at best conflicted.

One of the fears of microformats was that as you add incremental cost to structure your data, the search engines may leverage your extra effort to further displace you. That fear turned out to be valid, as in the background Google was offering vertical review sites the "let us scrape you, or block Googlebot" ultimatum.

Google Shopping has shifted to paid inclusion & Google has made further acquisitions in the space, yet people still recommend that ecommerce sites get ahead by marking up their pages with microformats.

As Google continues to win the game of inches of displacing the original sources, they don't even need you to mark up your content for them to extract their knowledge graph. Bill Slawski shared a video of Google's Andrew Hogue describing their mass data extraction effort: "It's never going to be 100% accurate. We're not even going to claim that it is 100% accurate. We are going to be lucky if we get 70% accuracy ... we are going to provide users with tools to correct the data."

If you as a publisher chose to auto-generate content at a 70% accuracy, pumped it up to first page rankings & then said "if people care they will fix it" Google would rightfully call you a spammer. If they do the same, it is knowledge baby.

Eric Schmidt recently indicated that Google was willing to sacrifice relevancy to collect identity information. Their over-promotion of Google+ has become more refined over time, but it hasn't went way.

Google pays for default placement in Safari & Firefox. Former Google executives head AOL & Yahoo!. Google can thus push for new cultural norms that make Microsoft look like an oddball or outsider if they don't play the same game.

Google isn't the only company playing the scrape-n-displace game.

"The innovation in search is really going to be on the user interface level" - Marissa Mayer



It's worth keeping an eye on Yahoo! (the above types of scraped rich listings, lead generation forms in the organic search results, contextual ad partnership with Google) to see where Google will head next.

Identity vs Irrelevance

Feb 19th

“Within search results, information tied to verified online profiles will be ranked higher than content without such verification, which will result in most users naturally clicking on the top (verified) results. The true cost of remaining anonymous, then, might be irrelevance.” - Eric Schmidt

Authoritarian Regimes & Data Mining

One wonders how Mr. Schmidt can balance the above statement along with warning about authoritarian governments.

And the risks from such data mining operations are not just in "those countries over there." The ad networks that hire lobbyists to change foreign privacy laws do so such that they can better track people the globe over and deliver higher paying ads. (No problem so long as they don't catch you on a day you are down and push ads for a mind numbing psychotropic drug with suicidal or homicidal side effects.)

And defense contractors are fast following with mining these social networks. (No problem so long as your name doesn't match someone else's that is on some terrorist list or such.)

Large & Anonymous

What's crazy is when we get to the other end of the spectrum. Want to know if your hamburger has pink slime in it? Best of luck with that.

Then you get the mainstream media sites that get a free pass (size = trust) and it doesn't matter if their content is created through...

  • a syndicated partnership of with eHow-styled content (Demand Media)
  • a syndicated partnership of scraped/compiled date (FindTheBest)
  • auto-generated content from a bot (Narrative Science)
  • scrape + outsourcing + plagiarism + fake bylines (Journatic)
  • top 10 ways to regurgitate top 10 lists from 10 different angles (BuzzFeed)
  • hatchet job that was written before manufacturing the "conforming" experience (example)
  • factually incorrect hate bait irrelevant article with no author name, wrapped in ads for get rich quick scams (example)

... no matter how it is created, it is fine, so long as you have political influence. Not only will it rank, but it will be given a ranking boost based on being part of a large site, even if it is carpet bombed with irrelevant ads.

Coin Operated Ideals

But then the companies that claim this transparency is vital for society pull a George Costanza & "Do The Opposite" with their own approach.

Whenever they manipulate markets to their own benefit they claim the need for secrecy to stop spammers or protect privacy. But then they collect the same data & pass it along without consent to those who pay for the data.

When Google was caught vandalizing OpenStreetMaps or lying to businesses listed in Mocality, those were the acts of anonymous contractors. When Google got caught in a sting operation pushing ads for illegal steroids from Mexico they would claim that behavior didn't reflect their current policies and that we need to move on.

Then of course there are the half dozen (or more) times that Google has violated their own search quality guidelines. So often that is due yet again to "outsourcing" or a partner of some sort. And they do that in spite of the ability to arbitrarily hardcode themselves in the result set.

If we don't exam the faux ideals push to shift cultural norms we will end up with a crappier world to live in. Some Googlers (or Google fanbois) who read this will claim I am a broken record stuck in the past on this stuff. But those same people will be surprised x years down the road when something bizarre surfaces from an old deranged contact or prior life.

Anyone who has done anything meaningful has also done some things that are idiotic.

Is that sort of stuff always forever relevant or does it make sense at some point to move on?

When that person is Eric Schmidt, the people he pontificate to are blackballed for following his ideals.

After all, his ideals don't actually apply to him.

No Effort Longtail SEO Revenues, from FindTheBest

In our infographic about the sausage factory that is online journalism, we had a throw away line about how companies were partnering with FindTheBest to auto-generate subdomains full of recycled content. Apparently, a person named Brandon who claims to work for FindTheBest didn't think our information was accurate:

Hi Aaron,
My name is Brandon. I have been with FindTheBest since 2010 (right after our launch), and I am really bummed you posted this Infographic without reaching out to our team. We don't scrape data. We have a 40 person+ product team that works very closely with manufacturers, companies, and professionals to create useful information in a free and fair playing field. We some times use whole government databases, but it takes hundreds-of-thousands of hours to produce this content. We have a product manager that owns up to all the content in their vertical and takes the creation and maintenance very seriously. If you have any questions for them about how a piece of content was created, you should go to our team page and shoot them a email. Users can edit almost any listing, and we spend a ton of time approving or rejecting those edits. We do work with large publishers (something I am really proud of), but we certainly do not publish the same exact content. We allow the publishers to customize and edit the data presentation (look, style, feel) but since the majority of the content we produce is the factual data, it probably does look a little similar. Should we change the data? Should we not share our awesome content with as many users as possible? Not sure I can trust the rest of your "facts", but great graphics!

I thought it was only fair that we aired his view on the main blog.

...but then that got me into doing a bit of research about FindTheBest...

In the past when searching for an issue related to our TV I saw a SERP that looked like this

Those mashed sites were subdomains on trusted sites like VentureBeat & TechCrunch.

Graphically the comparison pages appear appealing, but how strong is the editorial?

How does Find The Best describe their offering?

In a VentureBeat post (a FindTheBest content syndication partner) FTB's CEO Kevin O’Connor was quoted as saying: “‘Human’ is dirty — it’s not scalable.”

Hmm. Is that a counter view to the above claimed 40 person editorial research team? Let's dig in.

Looking at the top listed categories on the homepage of Find The best I counted 497 different verticals. So at 40 people on the editorial team that would mean that each person managed a dozen different verticals (if one doesn't count all the outreach and partnership buildings as part of editorial & one ignores the parallel sites for death records, grave locations, find the coupons, find the company & find the listing).

Google shows that they have indexed 35,000,000 pages from FindTheBest.com, so this would mean each employee has "curated" about 800,000 pages (which is at least 200,000 pages a year over the past 4 years). Assuming they work 200 days a year that means they ensure curation of at least 1,000 "high quality" pages per day (and this is just the stuff in Google's index on the main site...not including the stuff that is yet to be indexed, stuff indexed on 3rd party websites, or stuff indexed on FindTheCompanies.com, FindTheCoupons.com, FindTheListing, FindTheBest.es, FindTheBest.or.kr, or the death records or grave location sites).

Maybe I am still wrong to consider it a bulk scrape job. After all, it is not unreasonable to expect that a single person can edit 5,000 pages of high quality content daily.

Errr....then again...how many pages can you edit in a day?

Where they lost me though was with the "facts" angle. Speaking of not trusting the rest of "facts" ... how crappy is the business information for SEO Book on FindTheBest that mentions that our site launched in 2011, we have $58,000 in sales, and we are a book wholesaler.

I realize I am afforded the opportunity to work for free to fix the errors of the scrape job, but if a page is full of automated incorrect trash then maybe it shouldn't exist in the first place.

I am not saying that all pages on these sites are trash (some may be genuinely helpful), but I know if I automated content to the extent FTB does & then mass email other sites for syndication partnerships on the duplicate content (often full of incorrect information) that Google would have burned it to the ground already. They likely benefit from their CEO having sold DoubleClick to Google in the past & are exempt from the guidelines & editorial discrimination that the independent webmaster must deal with.

One of the ways you can tell if a company really cares about their product is by seeing if they dogfood it themselves.

Out of curiousity, I looked up FindTheBest on their FindTheCompany site.

They double-list themselves and neither profile is filled out.

That is like having 2 sentence of text on your "about us" page surrounded by 3 AdSense blocks. :D

I think they should worry about fixing the grotesque errors before worrying about "sharing with as many people as possible" but maybe I am just old fashioned.

Certainly they took a different approach ... one that I am sure that would get me burned if I tried it. An example sampling of some partner sites...

  • analytics-software.businessknowhow.com "BusinessKnowHow ended the relationship with find the best as soon as we realized how spammy they were." - Janet Attard
  • accountants.entrepreneur.com
  • acronyms.sciencedaily.com
  • alternative-fuel.cleantechnica.com
  • antivirus.betanews.com
  • apps.edudemic.com
  • atvs.agriculture.com
  • autopedia.com/TireSchool/
  • autos.nydailynews.com
  • backup-software.venturebeat.com
  • bags.golfdigest.com
  • beer.womenshealthmag.com
  • best-run-states.247wallst.com
  • bestcolleges.collegenews.com
  • bikes.cxmagazine.com
  • bikes.triathlete.com
  • birds.findthelisting.com
  • birth-control.shape.com
  • brands.goodguide.com
  • breast-pumps.parenting.com
  • broker-dealers.minyanville.com
  • businessschools.college-scholarships.com
  • camcorders.techcrunch.com
  • cars.pricequotes.com
  • cats.petharbor.com
  • catskiing.tetongravity.com
  • chemical-elements.sciencedaily.com
  • comets-astroids.sciencedaily.com
  • companies.findthecompany.com
  • companies.goodguide.com
  • compare-video-editing-software.burnworld.com
  • compare.consumerbell.com
  • compare.guns.com
  • compare.roadcyclinguk.com
  • comparemotorbikes.motorbike-search-engine.co.uk
  • congressional-lookup.nationaljournal.com
  • courses.golfdigest.com
  • crm.venturebeat.com
  • cyclocross-bikes.cyclingdirt.org
  • dealers.gundigest.com
  • death-record.com
  • debt.humanevents.com
  • design-software.underworldmagazines.com
  • destination-finder.fishtrack.com
  • diet-programs.shape.com
  • digital-cameras.techcrunch.com
  • dinosaurs.sciencedaily.com
  • dirt-bikes.cycleworld.com
  • dogbreeds.petmd.com
  • dogs.petharbor.com
  • donors.csmonitor.com
  • e-readers.techcrunch.com
  • earmarks.humanevents.com
  • earthquakes.sciencedaily.com
  • ehr-software.technewsworld.com
  • fallacies.sciencedaily.com
  • fec-candidates.theblaze.com
  • fec-committees.theblaze.com
  • federal-debt.nationaljournal.com
  • fha-condos.realtor.org
  • fha.nuwireinvestor.com
  • financial-advisors.minyanville.com
  • findthebest.com
  • findthebest.motorcycleshows.com
  • findthecoupons.com
  • findthedata.com
  • firms.privateequity.com
  • franchises.fastfood.com
  • ftb.cebotics.com
  • game-consoles.tecca.com
  • game-consoles.venturebeat.com
  • gin.drinkhacker.com
  • golf-courses.bunkershot.com
  • gps-navigation.techcrunch.com
  • gps-navigation.venturebeat.com
  • green-cars.cleantechnica.com
  • guns.dailycaller.com
  • ham-radio.radiotower.com
  • hdtv.techcrunch.com
  • hdtv.venturebeat.com
  • headphones.techcrunch.com
  • headphones.venturebeat.com
  • high-chairs.parenting.com
  • highest-mountains.sciencedaily.com
  • hiv-stats.realclearworld.com
  • horsebreeds.petmd.com
  • hospital-ratings.lifescript.com
  • hr-jobs.findthelistings.com
  • inventors.sciencedaily.com
  • investment-advisors.minyanville.com
  • investment-banks.minyanville.com
  • iv-housing.dailynexus.com
  • laptops.mobiletechreview.com
  • laptops.techcrunch.com
  • laptops.venturebeat.com
  • lawschool.lawschoolexpert.com
  • locategrave.org
  • mammography-screening-centers.lifescript.com
  • mba-programs.dealbreaker.com
  • medigap-policies.findthedata.org
  • military-branches.nationaljournal.com
  • motorcycles.cycleworld.com
  • mountain-bikes.outsideonline.com
  • nannies.com
  • nobel-prize-winners.sciencedaily.com
  • nursing-homes.caregiverlist.com
  • nursing-homes.silvercensus.com
  • onlinecolleges.collegenews.com
  • phones.androidauthority.com
  • pickups.agriculture.com
  • planets.realclearscience.com
  • planets.sciencedaily.com
  • plants.backyardgardener.com
  • presidential-candidates.theblaze.com
  • presidents.nationaljournal.com
  • privateschools.parentinginformed.com
  • processors.betanews.com
  • project-management-software.venturebeat.com
  • projectors.techcrunch.com
  • pushcarts.golfdigest.com
  • recovery-and-reinvestment-act.theblaze.com
  • religions.theblaze.com
  • reviews.creditcardadvice.com
  • saving-accounts.bankingadvice.com
  • sb-marinas.noozhawk.com
  • sb-nonprofits.noozhawk.com
  • scheduling-software.venturebeat.com
  • scholarships.savingforcollege.com
  • schools.nycprivateschoolsblog.com
  • scooters.cycleworld.com
  • smartphones.techcrunch.com
  • smartphones.venturebeat.com
  • solarpanels.motherearthnews.com
  • sports-drinks.flotrack.org
  • stables.thehorse.com
  • state-economic-facts.nationaljournal.com
  • steppers.shape.com
  • strollers.parenting.com
  • supplements.womenshealthmag.com
  • tablets.androidauthority.com
  • tablets.techcrunch.com
  • tablets.venturebeat.com
  • tabletsandstuff.com/tablet-comparison-chart
  • tallest-buildings.sciencedaily.com
  • technology.searchenginewatch.com
  • telescopes.universetoday.com
  • tequila.proof66.com
  • texas-golf-courses.texasoutside.com
  • tires.agriculture.com
  • tractors.agriculture.com
  • tsunamies.sciencedaily.com
  • us-hurricanes.sciencedaily.com
  • video-cameras.venturebeat.com
  • volcanic-eruptions.com
  • waterheaters.motherearthnews.com
  • wetsuits.swellinfo.com
  • whiskey.cocktailenthusiast.com
  • whiskey.drinkoftheweek.com
  • white-house-visitors.theblaze.com
  • wineries.womenshealthmag.com



we have seen search results where a search engine didn't robots.txt something out, or somebody takes a cookie cutter affiliate feed, they just warm it up and slap it out, there is no value add, there is no original content there and they say search results or some comparison shopping sites don't put a lot of work into making it a useful site. They don't add value. - Matt Cutts

That syndication partnership network also explains part of how FTB is able to get so many pages indexed by Google, as each of those syndication sources is linking back at FTB on (what I believe to be) every single page of the subdomains, and many of these subdomains are linked to from sitewide sidebar or footer links on the PR7 & PR8 tech blogs.

And so the PageRank shall flow ;)

Hundreds of thousands of hours (eg 200,000+) for 40 people is 5,000 hours per person. Considering that there are an average of 2,000 hours per work year, this would imply each employee spent 2.5 full years of work on this single aspect of the job. And that is if one ignores the (hundreds of?) millions of content pages on other sites.

How does TechCrunch describe the FTB partnership?

Here’s one reason to be excited: In its own small way, it combats the recent flood of crappy infographics. Most TechCrunch writers hate the infographics that show up in our inboxes— not because infographics have to be terrible, but because they’re often created by firms that are biased, have little expertise in the subject of the infographic, or both, so they pull random data from random sources to make their point.

Get that folks? TechCrunch hosting automated subdomains of syndicated content means less bad infographics. And more cat lives saved. Or something like that.

How does FTB describe this opportunity for publishers?

The gadget comparisons we built for TechCrunch are sticky and interactive resources comprised of thousands of SEO optimized pages. They help over 1 million visitors per month make informed decisions by providing accurate, clear and useful data.

SEO optimized pages? Hmm.

Your comparisons will include thousands of long-tail keywords and question/answer pages to ensure traffic is driven by a number of different search queries. Our proprietary Data Content Platform uses a mesh linking structure that maximizes the amount of pages indexed by search engines. Each month—mainly through organic search—our comparisons add millions of unique visitors to our partner’s websites.

Thousands of long-tail keyord & QnA pages? Mesh linking structure? Hmm.

If we expand the "view more" section at the footer of the page, what do we find?

Holy Batman.

Sorry that font is so small, the text needed reduced multiple sizes in order to fit on my extra large monitor, and then reduced again to fit the width of our blog.

Each listing in a comparison has a number of associated questions created around the data we collect.

For example, we collect data on the battery life of the Apple iPad.

An algorithm creates the question “How long does the Apple iPad tablet battery last?” and answers it

So now we have bots asking themselves questions that they answer themselves & then stuffing that in the index as content?

Yeah, sounds like human-driven editorial.

After all, it's not like there are placeholder tokens on the auto-generated stuff

{parent_field}

Ooops.

Looks like I was wrong on that.

And automated "popular searches" pages? Nice!

As outrageous as the above is, they include undisclosed affiliate links in the content, and provided badge-based "awards" for things like the best casual dating sites, to help build links into their site.

That in turn led to them getting a bunch of porn backlinks.

If you submit an article to an article directory and someone else picks it up & posts it to a sketchy site you are a link spammer responsible for the actions of a third party.

But if you rate the best casual dating sites and get spammy porn links you are wonderful.

Content farming never really goes away. It only becomes more corporate.

Introduction Thread #6

Feb 1st
posted in

Welcome to our sixth welcome thread (prior ones here, here, here, here & here).

If you are new to the site, please say hi and introduce yourself. :)

Pages






    Email Address
    Pick a Username
    Yes, please send me "7 Days to SEO Success" mini-course (a $57 value) for free.

    Learn More

    We value your privacy. We will not rent or sell your email address.