New Directory, URL, & Keyword Phrase Based Google Filters & Penalties

WebmasterWorld has been running a series of threads about various penalties and filters aligned with specific URLs, keyword phrases, and in some cases maybe even entire directories.

Some Threads:

There is a lot of noise in those threads, but you can put some pieces together from them. One of the best comments is from Joe Sinkwitz:

1. Phrase-based penalties & URL-based penalties; I'm seeing both.
2. On phrase-based penalties, I can look at the allinanchor: for the that KW phrase, find several *.blogspot.com sites, run a copyscape on the site with the phrase-based penalty, and will see these same *.blogspot.com sites listed...scraping my and some of my competitors' content.
3. On URL-based penalties allinanchor: is useless because it seems to practically dump the entire site down to the dregs of the SERPs. Copyscape will still show a large amount of *.blogspot.com scraping though.

Joe has a similar post on his blog, and I covered a similar situation on September 1st of last year in Rotating Page Titles for Anchor Text Variation.

You see a lot more of the auto-gen spam in competitive verticals, and having a few sites that compete for those types of queries helps you see the new penalties, filters, and re-ranked results as they are rolled in.

Google Patents:

Google filed a patent application for Agent Rank, which is aimed at allowing them to associate portions of page content, site content, and cross-site content with individuals of varying degrees of trust. I doubt they have used this much yet, but the fact that they are even considering such a thing should indicate that many other types of penalties, filters, and re-ranking algorithms are already at play.

Some Google patents related to phrases, as pointed out by thegypsy here:

Bill Slawski has a great overview post touching on these patent applications.

Phrase Based Penalties:

Many types of automated and other low quality content creation cause the low quality pages to barely be semantically related to the local language, while other types of spam generation cause low quality pages to be too heavily aligned to the local language. Real content tends to fall within a range of semantic coverage.

Cheap or automated content typically tends to look unnatural, especially when you move beyond comparing words to looking at related phrases.

If a document is too far off in either direction (not enough OR too many related phrases) it could be deemed as not relevant enough to rank, or a potential spam page. Once a document is flagged for one term it could also be flagged for other related terms. If enough pages from a site are flagged a section of the site or a whole site can be flagged for manual review.

URL and Directory Based Penalties:

Would it make sense to prevent a spam page on a good domain for ranking for anything? Would it make sense for some penalties to be directory wide? Absolutely. Many types of cross site scripting errors and authority domain abuses (think rented advertisement folder or other ways to gain access to a trusted site) occur at a directory or subdomain level, and have a common URL footprint. And cheaply produced content also tends to have section wide footprints where only a few words are changed in the page titles across an entire section of a site.

I recently saw an exploit on the W3C. Many other types of automated templated spam leave directory wide footprints, and as Google places more weight on authoritative domains they need to get better at filtering out abuse of that authority. Google would love to be able to penalize things in a specific subdomain or folder without having to nuke that entire domain, so in some cases they probably do, and these filters or penalties probably effect both new domains and more established authoritative domains.

How do You Know When You are Hit?

If you had a page which typically ranked well for a competitive keyword phrase, and you saw that page drop like a rock you might have a problem. Other indications of problems are if you have inferior pages that are ranking where your more authoritative page ranked in the past. For example, lets say you have a single mother home loan page ranking for a query where your home loan page ranked, but no longer does.

Textual Community:

Just like link profiles create communities, so does the type and variety of text on a page.

Search results tend to sample from a variety of interests. With any search query there are assumed common ideas that may be answered by a Google OneBox, related phrase suggestions, or answered based on the mixture of the types of sites shown in the organic search results. For example:

  • how do I _____

  • where do I buy a ____
  • what is like a _____
  • what is the history of ______
  • consumer warnings about ____
  • ______ reviews
  • ______ news
  • can I build a ___
  • etc etc etc

TheWhippinpost had a brilliant comment in a WMW thread:

  • The proximity, ie... the "distance", between each of those technical words, are most likely to be far closer together on the merchants page too (think product specification lists etc...).

  • Tutorial pages will have a higher incidence of "how" and "why" types of words and phrases.
  • Reviews will have more qualitative and experiential types of words ('... I found this to be robust and durable and was pleasantly surprised...').
  • Sales pages similarly have their own (obvious) characteristics.
  • Mass-generated spammy pages that rely on scraping and mashing-up content to avoid dupe filters whilst seeding in the all-important link-text (with "buy" words) etc... should, in theory, stand-out amongst the above, since the spam will likely draw from a mixture of all the above, in the wrong proportions.

Don't forget that Google Base recently changed to require certain fields so they can help further standardize that commercial language the same way they standardized search ads to have 95 characters. Google is also scanning millions of books to learn more about how we use language in different fields.

Sending Bad Customers to Competitors

One of my friends thought that a good keyword to rank for was cheap widgets. Now on the receiving end of those customers, my friend regrets ranking #1 for cheap widgets. Has anyone ever mentioned poisoning competing business models by sending them floods of low quality leads? If someone helped you rank for junk, and you figured it out, how would you counter? Alter the topic of the page? Remove the page from your site if it was of low value? Change the purpose of the page to harvest and distribute link equity? Point a few links at authoritative websites (like newspapers)? Edit the Wikipedia to put a few extra words in an article? Create parasitic pages on authoritative sites that outrank your site? Recommend a competitor's services to all your bad customers? .htaccess redirect to a page full of ads or a competitor based on referral string? Buy the associated ads for a competitor? Get a competitor links and help them outrank you?

The web is a fairly anonymous place in many ways, and as long as a technique is (remotely close to) legal people will do it. Not saying that I advocate it, but it is good to think about what you would do if any important variables in your business changed (like lead quality, competition in the marketplace, changing technology, etc.)

Targeted Marketing vs Spam Marketing

Almost any marketing method can deliver good or bad messages, be tied to good or bad causes, or be of value or negative value. I think whether marketing is targeted and effective is much more important than the delivery method. SEO gets a bum rap for a variety of reasons, but one thing about good SEO is that it is targeted. Most marketing is not.

SEOs Are Scum:

A person who sold text links for scuba blackjack is considered credible when calling most SEOs scum? By who? And why?

Banks:

I pay my credit card bill and get ads for stamps, soccer, and health insurance. And the envelope contains coupons which, if redeemed, enroll me in worthless programs that cost 10x the value of the coupon. Banks the size of Chase have to do stuff like that to be profitable?

Ad Networks:

Now ad networks are writing things on people's foreheads to get buzz and attention. If the only way you can get people to talk about you is to create controversy or do stupid things that associate you with BumFights is there any satisfaction in that model? And then on the back of that you have your PR firm emailing an owner of a competing network, alerting them to the latest inside scoops and strategy? And then send that same person email spam pitching the SEO value of your wares without my name in it and the email titled "strategic partnership". Where is the relevancy?

Directories:

In spite of already writing the most popular Work.com guide I get emails inviting me to see what Work.com is all about. Why?

Search Engines:

Google is now pushing selling off topic branded advertising and continue to sell ads on sites they banned for spamming. Google sells AdWords ads for software that they specifically say not to use in their webmaster guidelines. Why?

Yahoo! is so desperate that they are reduced to marketing via phone spam. They call that innovation?

The Truth:

But everyone is fighting to say they have the best ad targeting, while the goal of many quality updates is to drive up ad costs, even if that precludes quality or relevant ads. But in some cases targeting is what will make the ad network more efficient. Let me run through an example...

Imagine that you use Google Checkout and one of your customers bought your product and uses Gmail. Now imagine I am a competitor who bids on your brand. Do you think Google may show my ad in your customer's email? Why wouldn't they?

But most people can't serve ads with the precision Google can. And at some point, even if you are targeted, you still have to do some amount of push marketing to get seen. Look how much push marketing and public relations work Google still does even after they are worth over $100 billion. You don't get to be a market maker without first being a market manipulator.

Be a Relevant & Profitable Marketer:

I think whether marketing is targeted and effective is much more important than the delivery method. If you are lacking on scale or budget you can always make up for it using creativity and targeting. Here are a few targeting methods I find exceptionally effective:

  • Frequently sharing my thoughts.

  • Asking for feedback.
  • Answering emails.
  • Participating in forums.
  • Bidding on new buzzwords before others.
  • Linking to a site I want to be seen on. (Bonus points if I write a bunch specifically about them).
  • Legitimate blog comments.
  • Interviews.
  • Reviewing other well known products in the vertical.
  • Going to conferences.
  • Syndicating articles to well read sites.
  • Buying site targeted AdSense ads.
  • I have tried buying ReviewMe ads on sites that decided they did not want to accept money to review my stuff, but decided to review it anyway. When they reviewed it I left a comment on their blog. Another well known blogger then linked to me based on that comment.
  • Personalized emails.

Marketing doesn't have to be expensive if it is targeted, especially if what you are marketing is of real value and you are good at conveying the value.

Talk Talk Talk

Based in part on Calacanis's recent tirades, Scott Karp recently published a great post about SEO from an outsider's perspective. In his post he runs through how and why some people are biases against SEO. I think a couple big reasons that few people talk about are mis-direction and outsourcing faults onto others.

In other news, what is going on with Goog on Google Finance? People are talking about Cramer. Some are talking about how intelligent he is while others are saying his packaging is bad and he is an idiot. Both are probably increasing his brand value though.

I have to agree. I find Cramer a bit of an idiot. I mean, apparently he has done well in stocks, I am not disputing that, but for an investment adviser he is someone I find....well, comical. I've watched his show, and to me he's like the circus; something to see and laugh at when it comes to town, but not something to take too seriously. His biggest fault, and this is ironically the draw of his show, is how he preys on and encourages the emotions of his followers. Now, he may say that its best not to invest with emotion, but watching him run around on tv with his sleeves rolled up, yelling like some motivational speaker selling a new brand of energy drink, sure sends a different message. In fact, the high quality of his marketing skill, and the poor quality of his advice, kind of reminds me of the Motley Fool.....

In every market people who evoke emotional responses win. Even if they are wrong, you will see them refererenced often just because they are good at marketing and preying on human emotions.

Many popular people create far more controversy than value, but links and trust follow conversation. And so do ad dollars. If people are talking about you, you win, even if you are wrong.

Popular and correct are two different things, and the only way to know who you should trust is to test and then aim.

With the advancement of modern technology, people do not even vote on the content, just the headlines, which somewhat feeds into my belief that search personalization + using links as a proxy for value are going to create a polarized biased web full of recycled garbage. Everything is recycled.

Is it unfair to throw any of that blame toward search engines, or is it just default human nature to outsource our own faults and want to split things up to identify with things that are false but look good at a glance? Are our egos so broken that we have to be part of some minority or fighting for one to feel we have purpose? Must we have outspoken leaders to follow? Do the leaders believe their own words, or is it just self-serving marketing?

As more forms of vertical search come about, subscribing and publishing get easier, and more people vote without reading, you can bet that packaging will become more important than information quality...at least until people get sick of it.

I saw two popular pieces about saving money that explicitly gave money saving tips opposite of each other, both published by a friend, who recently talked up the value of his content. Some days sites like Motley Fool will tell you why a stock is a must buy and then have another article dissing the stock the same day. I think they even have a column based on biased polarized advice called Dueling Fools.

Big claims are remarkable, and worthy of a link. In a sea of rushed judgements and meaningless votes sounding convincing is more important than being correct. The perception of value and actually being of value are two different things. For the next couple years it will be far cheaper and more profitable to cater biased marketing to the ignorant rather than to create meaning with a bit of touch and originality. Or am I wrong?

Yahoo! Pipes Are Cool

Yahoo! Pipes is a visual RSS slicing, dicing, and meshing tool. Basically you can take any feeds you like, add them together, and apply a bit of filtering. It is fairly intuitive and a lot a of fun for a wannabe programmer like me. And then when you create something, someone else can clone your pipes and add more stuff to it.

Here are some cool ways to use Yahoo! Pipes:

  • track the latest news in your industry (filtering by sources, keywords, or both)

  • track domain and marketplace offers like Shoemoney does here
  • track inbound links or mentions by syncing up blog search tools
  • see how fast ideas are spreading by tracking all the major social sites and blog search engines at the same time
  • watch eBay price trends (and just about any other trend which you can subscribe to)
  • see which Yahoo! Pipes are spreading, think of how you can improve those ideas or apply them to other markets
  • if you are dirty or aggressive about monetization ;) create a Pipe that is a core to many other pipes that pulls data from major websites like eBay while using your affiliate links in your Pipes

Tim O'Reilly has a great post about how the concept of Pipes could be highly valuable.

Historical Search Spam Patterns and Link Reciprocation

Some people are wildly speculating that Google and other engines may create historical databases of SEOs and site relationships to identify spam. I have no doubt that some sites that go way too far stay penalized for a long time, and that some penalties may flag related sites for review, but I think search engines have enough data and most people leave enough footprints that search engines do not have to dig too deep into history to connect the dots. And there is little upside in them connecting the dots.

If they did connect the dots manually that would take a long time to do it broadly, and if they did it automatically they would run into problems with false relationships. Some sites I once owned were sold to people who do not use them to spam. If ownership relationships took sites out by proxy I could just create spam sites using a competitors details in the Whois data , or heavily link to their sites from the spam sites.

Where people run into problems with spamming is scalability. If you scale out owning many similar domains you are probably going to leave some sort of footprint: cross linking, affiliate ID codes, AdSense account numbers, analytics tracking scripts, a weird page code, similar site size, similar inlink or outlink ratios, similar page size, or maybe some other footprint that you forgot to think of.

Many of those things can be spoofed too, (what is to prevent me from using your AdSense ID on spam?), so in many cases there has to be a hybrid of automated filtering and flagging and manual review.

And even if you are pretty good at keeping your sites unique on your end, if you outsource anything they are going to have a limited network size, likely a routine procedure with footprints, and if their prices are low they are probably going to be forced to create many obvious footprints to stay profitable. And if you use reciprocal or triangular links associated with those large distributed link farms that puts you in those communities far more than some potential historical relationship of some sort. By linking to it you confirm the relationship.

Search engines do not want to ban false positive, so many spammy link related penalties just suppress rankings until the signs of spam go away. Remove the outbound reciprocal link page that associates you with a bad community, get a few quality links, and watch the rankings shoot up. The thing is, once a site gets to be fairly aggressively spammy it rarely becomes less spammy. If it was created without passion it likely dies then turns into a PPC domainer page with footprints. Hiding low value pages deep in the index until the problem goes away is a fairly safe idea for search engineers, because after a domain has been burned it rarely shifts toward quality unless someone else buys it.

Adjusting Your Marketing

Some marketing fails because it does not use market feedback to help improve the ROI on the next generation of marketing. For example, if I make a couple sites and then take what I learned from making those and apply that to making more sites I will probably be more efficient than if I try to make many sites in parallel without collecting feedback. Many of the best marketers do absolutely stupid stuff that destroys the value of their work, other than what they have learned from testing the boundaries. But after you test them you learn and then you can incorporate that into your next round of marketing. It doesn't matter if you screw up as long as you keep learning from it, and adjusting to the market.

Before making a large commitment see if there are ways you can test the market and gain quicker and cheaper feedback. Build some content and links and see if it ranks. If it ranks build more content and links.

It is smart to emotionally invest into some of your most important projects, but it is a bad call to be so invested into the idea that if that idea doesn't work you keep pushing it against the will of the market until you go bankrupt, especially since there are so many market opportunities out there if you are willing to use market feedback to tweak your ideas to make them more profitable.

Ready. Fire. Aim.
Ready. Fire. Aim.
etc etc etc

Everyone is a Hypocrite and a Spammer

One hates to give Jason Calacanis any additional exposure, but how can a person be so against SEO while selling text links for scuba blackjack online? Is he ahead of the market on global warming?

Grow up.. the only thing you're ever going to prove by trying to game my SeRP is that you're low-class idiots.

True, or maybe we are looking for scuba blackjack customers, and knew that you publish high quality original content and ads for that market.

[Video] What is a Self Reinforcing Authority (and a Self Reinforcing Market Position)?


Video Summary:

Some documents and websites build self reinforcing authority that make them hard to beat for their targeted search terms. This video explains how that works and gives examples of some self reinforcing market authorities, as well as tips on how to make these types of sites and pages.

Resources Mentioned in the Video:

Examples of Self Reinforcing Authorities From This Video:

  • us debt clock

  • xe currency converter
  • search engine history
  • search engine ranking factors
  • black hat seo
  • seo code of ethics
  • seo today / search engine watch

Things I Should Have Mentioned That I Forgot:

  • Your title is important because most people will reference your document by its title.

  • Statistics, standards, and scientific sounding things are easy to become self reinforcing powerhouses, especially if they feed into the ego of the target audience.
  • If you get large media coverage of your idea leverage it to get more coverage. Show it off to seem exceptionally legitimate and trustworthy.
  • US News and world report ranks colleges, and is a great example of a self reinforcing authority.
  • Common ways to undermine authority that may prevent a site or article from becoming authoritative.
  • If someone has an authoritative idea in another market, but nobody has applied it to your market that may present an eay oppurtunity.

Signs of a Low Quality Website

Webmasterworld recently had a good thread about signs of low quality websites. The less a person knows about your topic the more likely they are to rely on general signs of quality (or lack of) when consider if they should link at your site or not.

Common Quality Questions:

Is the design clean? Is the content well organized? Do they have major misspellings on their homepage? Who is behind the site? Is it easy to contact them? Are they referenced by any other credible sources? How unique and useful is the content? How aggressively are ads blended into the content? etc. etc. etc.

Why Proxies for Quality Are Important:

Recently someone spread a God hates fags song website. Friends were instant messaging me about whether it was real or not. Some journalists guessed it wrong. People are getting better at creating fakes. The easier we make it for people to trust us in a snap judgement the more people will trust us (and link to our sites).

These proxies for trust are important, especially when you are new to an established industry, are in a new industry with a small community of support, are in a rapidly growing industry that the media is having a feeding frenzy over, or are the seedy arm of a larger industry.

Example of the Importance of Outside Perception:

If an industry is new, the early leaders of that industry might be determined by mainstream media perception (or other perception outside of that industry). Using blogs as an example, if the media did not constantly pump up the Weblogs Inc. story that company still might be unprofitable today. That media exposure lead to more media exposure, gave the sites the link juice to help them rank, and gave them brand exposure that brought in advertisements.

Relating This to the SEO Industry:

With SEO it is easier to be seen as a SEO expert if you are first seen as an expert on search. It is easier to be trusted as an expert on any topic if your site does not flag common signals of crap.

I just got a link from the WSJ to my keyword research tool, but if I would have scored lower on the proxies for value maybe they never would have linked. And when you get that type of link you can leverage it as an additional signal of trust that makes it easier for others to link at you.

With BlackHatSEO.com, I mentioned as seen in Clickz and Search Engine Watch, but what I didn't mention was that both mentions were brief and in the same syndicated article. When the London Times interviewed me about that site I quickly put up another as seen in at the top of the home page, which will make it easier to get more exposure. You want your press coverage to lead to more press coverage, because those are some of the most trusted links and links that money alone usually can't buy.

But I am Already Doing Well:

Many people who buy consultations are already doing far better than I would expect them to do giving some of the obvious flaws I see with their site structure and marketing methods. Some state that they are already doing well. The point of these sorts of signs of crap is not that you need to fill all the holes to do well, or that you can't do well if you do not fill them, but imagine how much better a site can do after it fixes obvious errors if it was already doing well when it had many errors that undermined its credibility and linkability.

Pages