Fake Blogs: Is it a Big Deal, or are Bloggers Naive & Easy to Game?

Blog Anchor Text:
While the author of asbestos.stinkmachine.com is not interested in the subject, he is acquiring some good linkage data as the blog community states how fascinated they are with him making money from AdSense.

In Google's investor day a slide showed Weblogs Inc makes over $600 a day from AdSense. It seems there are a ton of AdSense sites, but if you lack topical interest is it a sustainable business model? Other than the guy getting the free links from the oohs and ahhs most people are going to need to spend some $ to build linkage data.

There probably is still some easy money on the table, but as time passes surely that market will get much much more saturated and competitive.

On a side note, if bloggers are so smart and well connected, how are they so behind the loop on AdSense?

Fake Weblogs:
So evil they may as well be terists...

What is funny is that

  • blogs will link to other blogs just because they are profitable and being made for AdSense.

  • and criticise other blogs for being fake

Beyond intent what really matters?

My weblog is fake. If the fake blog wiki would ever go back online I would add my blog.

Naive & Manipulated:
perhaps worse than being fake?

Yahoo! Firefox Toolbar, Google Investor Day, & Cold Calling

Yahoo! FireFox Toolbar:
released, reviewed by Danny Sullivan.

Wonder when Google will do the same, they are likely missing out on some amazingly valuable data by not having an official one.

Newsburst:
Cnet creates an online aggregator that will compete with Bloglines & Rojo

Brand Pyramid:
Good post about building products and services at various price points. Also, Rob Frankle on how CEOs screw up branding.

Google Investor Day:
was yesterday. slides and audio if your into that sort of stuff. some of the sides:

  • 60: their revenue was split nearly 50 / 50 between Google.com and their partner network.

  • 72: the top 20 markets have a 17.6% web penetration.
  • 73: shows 66% of revenue was domestic.

This guy thinks they will be worth a trillion dollars in 20 years.

In 20 years nobody will remember his prediction, but Battelle gave him a link yesterday and I gave him one today. Random arbitrary predictions are a good way to gain free links ;)

Atomz:
site search provider gets snarfed up by WebSideStory. More people should use the word snarf.

Google's Search Results are Crap?
Danny says people should not be so quick to discout the opinions of SEOs.

Hiring:
Ammon Johns is hiring.

Cold Calling:
is evil. Nick W has some tips on how to cold call. My personal goal when people cold call me is to ensure I drastically increase the likelihood they will have a bad day, and to hopefully lead to eventual attrition at their work place.

The Hypocrisy of Google

Recently Google updated their index and relevancy algorithm with Update Allegra.

The update was believed to be related to latent semantic indexing.

Beings that my own rankings just dipped, it would be easy for me to take things overly personal and perhaps be a bit biased about the situation. Then again some of my other sites are now ranking way better than they were, and I also pointed out this problem before it ever had any significant effect on me.

In doing this update the search results are in many areas less than stellar. Understandable that shuffles will occur as they must to consistantly improve relevancy, but on more than one occasion Google has seemed to have lost focus on their official mission statement.

Google's mission is to organize the world's information and make it universally accessible and useful.

They tell you to design content for the user. Link to quality resources. Act if search engines are not even there. Generally this is good advice for many webmasters.

What they do not tell you is that they do not follow their own guidance.

Sure trying to rank for a term like "SEO" or other generic terms may be a bit unrealistic for many and only a few sites can rank well for such a term. I am not particularly saying that I believe I deserve to rank #1 in Google for "SEO" because it is a generic term and they owe me nothing.

On another front there are brand names that people work long and hard to build. Sure the search results are just informational pages about a topic, and maybe Google doesn't give a shit about my brand, and that is fine too.

Where the real problem exists though is that since I have worked so hard to build that brand it gets a ton of traffic and people expect to see my site there.

When people search for stuff like "seobook" and 10 out of 10 of the front page results reference me but I am not listed that provides a poor user experience for Google's users.

To try to prevent their results from being manipulated they have often thrown the baby out with the bathwater. But maybe in the hopes of achieving their longterm goals Google realizes they have to take short term hits.

What if Google is wrong in their desires though? What if their desire to fight off commercial manipulation is so great that they fail to accept commerce as part of the web, and too often show informational results when people want to shop? Would that eventually cause people to stop using Google? Would accepting markets for more of what they are without trying to bias them away from marketing and toward aged sites or information dense pages potentially create a more efficient market?

New Google Maps, Yahoo! & RackSpace, About.com for Sale

Google Maps:
New version out. Danny Sullivan has more on the offering.

RackSpace:
May not play well with Yahoo!

About.com for Sale:
Bidding ends today. Price between $350 to $500 million. Bidders: Google, Yahoo!, Ask Jeeves, AOL & New York Times.

TrustRank Algorithm

A buddy of mine pointed me to a white paper by Zoltan Gyongyi, Hector Garcia-Molina, & Jan Pederson about a concept called TrustRank(PDF).

Human editors help search engines combat search engine spam, but reviewing all content is impractical. TrustRank places a core vote of trust on a seed set of reviewed sites to help search engines identify pages that would be considered useful from pages that would be considered spam. This trust is attenuated to other sites through links from the seed sites.
TrustRank can be use to

  • automatically boost pages that have a high probablility of being good, as well as demote the rankings of pages that have a high probability of being bad.

  • help search engines identify what pages should be good canidates for quality review

Some common ideas that TrustRank is based upon:

  • Good pages rarely link to bad ones. Bad pages often link to good ones in an attempt to improve hub scores.

  • The care with which people add links to a page is often inversely proportional to the number of links on the page.
  • Trust score is attenuated as it passes from site to site.

To select seed sites they looked for sites which link to many other sites. DMOZ clones and other similar sites created many non useful seed sites.

Sites which were not listed in any of the major directories were removed from the seed set, of the remaining sites only sites which were backed by government, educational, or corporate bodies were accepted as seed sites.

When deciding what sites to review it is mostly important to identify high PR spam sites since they will be more likely to show in the results and because it would be too expensive to closely monitor the tail.

TrustRank can be bolted onto PageRank to significantly improve search relevancy.

Own any domain exploit, no defense exists

Just a friendly exploit reminder as phishing will surely take off in a hurry with this one.

Vulnerable browsers include (but are not limited to):

Most mozilla-based browsers (Firefox 1.0, Camino .8.5, Mozilla 1.6, etc)
Safari 1.2.5
Opera 7.54
Omniweb 5

SEO Marketplace Question ;)

So a friend of mine is building a tool which will likely be publically available for usage and it may run a thousand or few thousand queries a day. This tool may query some of the major search engines and may need to use some open HTTP proxies.

Does anyone know how he can gain access to reliable open HTTP proxies and if / what costs would be available?

Feel free to emails me seobook aT gmail DoT com if you do not want to post anything in the comments.

If and when he completes the tool I will mention its launch on this site :)

Writing Tips, How to be a Consultant, IR Books, Ask Buys Bloglines

Writing:
Everything You Need to Know About Writing Successfully: in Ten Minutes

How to Be a Consultant:
Create The Warm Fuzzy Feelingâ„¢. Reading it certainly takes much longer than 10 minutes, but it is well worth it if you are considering becoming a consultant.

The list is great, but on the web / marketing front I would also add create affiliate and content sites to help build a stable income stream when down periods occur.

Even when you have few clients you help shore up your technical understand by creating things. If you create great sites then they will make money and you will be able to better filter what work you are willing to take on. If you create lousy sites then they will make for great research and will help you identify symptoms of a lousy site when prospective customers contact you.

As stated in that article, it can't be overly stressed

  • how important it is to be easily available; &

  • how amazingly well syndicated articles act as sophisticated salesmen

found on SearchEngineBlog

Information Retrieval Books:
A while ago I read A Theory of Indexing
by Gerard Salton. I also have heard good things about Information Retreival by C. J. "Keith" van Rijsbergen, and Modern Information Retrieval by Ricardo Baeza-Yates & Berthier Ribeiro-Neto. What information retrieval technology books have you read and liked? Wonder if guys like GoogleGuy have a favorite IR book :)

Ask Jeeves Buys Bloglines?
it is what people are saying...

Trademark Laws:
Deregulating Relevancy in Internet Trademark Law

Would You Name Your PPC?
RipUsOff.com...just randomly came across it and after seeing so many articles about click fraud it would appear as though that name could be took the wrong way.

My Favorite Muppet:
Flying Gonzo, though the Cookie Monster is also cool.

Google Semantically Related Words & Latent Semantic Indexing Technology

Many people have been noticing a wide shuffle in search relevancy scores recently. Some of those well in the know attribute this to latent semantic indexing. Even if they are not using LSI, Google has likely been using other word relationship technologies for a while, but recently increased its weighting. How Does Latent Semantic Indexing Work?
Latent semantic indexing allows a search engine to determine what a page is about outside of specifically matching search query text.

A page about Apple computers will likely naturally have terms such as iMac or iPod on it.

Latent semantic indexing adds an important step to the document indexing process. In addition to recording which keywords a document contains, the method examines the document collection as a whole, to see which other documents contain some of those same words. LSI considers documents that have many words in common to be semantically close, and ones with few words in common to be semantically distant. This simple method correlates surprisingly well with how a human being, looking at content, might classify a document collection. Although the LSI algorithm doesn't understand anything about what the words mean, the patterns it notices can make it seem astonishingly intelligent. source

By placing additional weight on related words in content, or words in similar positions in other related documents, LSI has a net effect of lowering the value of pages which only match the specific term and do not back it up with related terms.

LSI vs Semantically Related Words:
After being roasted by a few IR students and scientists I realized that many SEOs (like me) blended the concepts of semantically related words with latent semantic indexing, and due to constraints of the web it is highly unlikely that large scale search engines are using LSI on their main search indexes.

Nonetheless, it is overtly obvious to anyone who studies search relevancy algorithms by watching the results and ranking pages that the following are true for Google:

  • search engines such as Google do try to figure out phrase relationships when processing queries, improving the rankings of pages with related phrases even if those pages are not focused on the target term

  • pages that are too focused on one phrase tend to rank worse than one would expect (sometimes even being filtered out for what some SEOs call being over-optimized)
  • pages that are focused on a wider net of related keywords tend to have more stable rankings for the core keyword and rank for a wider net of keywords

Given the above, here are tips to help increase your page relevancy scores and make your rankings far more stable...

Mix Your Anchor Text!
Latent semantic indexing (or similar technologies) can also be used to look at the link profile of your website. If all your links are heavy in a few particular phrases and light on other similar phrases then your site may not rank as well.

Example Related Terms:
Many of my links to this site say "SEO Book" but I also used various other anchor text combinations to make the linkage data appear less manipulative.

Instead of using SEO in all the links some of them may use phrases like
search engine optimization
search engine marketing
search engine placement
search engine positioning
search engine promotion
search engine ranking
etc.

Instead of using book in all the links some other good common words might be
ebook
manual
guide
tips
report
tutorial
etc.

How do I Know What Words are Related?
There are a variety of options to know what words are related to one another.

  • Search Google for search results with related terms using a ~. For example, Google Search: ~seo will return pages with terms matching or related to seo and will highlight some of the related words in the search results.

  • Use a lexical database
  • Look at variations of keywords suggested by various keyword suggestion tools.
  • write a page and use the Google AdSense sandbox to see what type of ads they would try to deliver to that page.
  • Read the page copy and analyze the backlinks of high ranking pages.

Google Sandbox and Semantic Relationships:
The concept of "Google Sandbox" has become synonymous with "the damn thing won't rank" or whatever. The Sandbox idea is based upon sites with inadequate perceived trust taking longer to rank well.

Understanding the semantic relationships of words is just another piece of the relevancy algorithms, though many sites will significantly shift in rankings due to it. The Google sandbox theory typically has more to do with people getting the wrong kinds of links or not getting enough links than it does with semantic relationships. Some sites and pages are hurt though by being too focused on a particular keyword or phrase.

Where do I learn more about Latent Semantic Indexing?
A while ago I read Patterns in Unstructured Data and found it was wrote in a rather plain english easy to understand manner.

Brian Turner also listed a good number of research papers in this thread.

Forum Coverage:

Selected Forum Quotes:
BakedJake

I'm not about to go post my research and examples on a public forum. But, I'll warn you now - if you're not varying your anchor text, and you're not writing pages synonymous with your term that don't contain the term you're targetting, you're going to be in a world of hurt within the next 90 days.

We've been tracking this update for the last 6 months. I was surprised to see it happen now - I honestly didn't expect it until next month or March, but it's here.

BakedJake

I have a page about "baby clothes". I link to my site 100 times with the anchor text "baby clothes"

I now pull out the words "baby clothes" and all the links pointing to my site with the words "baby clothes"

Do I still have footing to rank for that term "baby clothes" after you've run some sort of semantic analysis on it?

That's my simplistic explanation. I think they're doing something very similar, but taking links into account like that and maybe even devaluing some links on the "main" term...

valeyard

Well, if it hasn't changed by Monday I'm going out to buy a black hat.

If irrelevant junk is what Google wants then irrelevant junk is what it's gonna get. :-(

dataguy

Man I'm glad I diversified my sites. I think I will work on diverifying some more...

andy_boyd

Google Inc. is all about money. And IMHO ... so are Yahoo Inc. and Microsft Corp.. As webmasters we are the people who build sites and depend on these money hungry companies, who at the heels of the hunt, put their interests miles ahead of ours.

Chico_Loco

My main concern with this new update is that if you search for my brand name (and there are quite a few that do based on referrals), then right now my site does not even rank. Our brand name is perhaps the best in my industry, and Google are, in my opinion, diluting my brand name and causing my company money. The first result for my brand name is a spammy page which is a "scraper site" which is actually SERP's page from somewhere - so that's basically useless.

The Hidden or Not so Hidden Messages:

  • If you are entirely dependant on any single network and a single site for the bulk of your income then you are taking a big risk. Most webmasters would be best off to have at least a couple of income streams to shield themselves from algorithm changes.

  • If you are new to SEO you are best off optimizing your site for MSN and Yahoo! off the start and then hoping to later rank well in Google.
  • Make sure you mix your anchor text to minimize your risk profile. Even if you are generally just using your site name as your anchor text eventually that too can hurt you.
  • Search algorithms and SEO will continue to get more complicated. But that makes for many fun posts ;)

Update: a few additional tools recommended in our comments and the comments at ThreadWatch

Google AdWords / AdSense Shakeup, Free Link Renting Guide, Ask Jeeves Blog

AdSense and AdWords shakeup:

found on ThreadWatch

SearchGuild birthday awards:
fun stuff

I was nominated but was beat out by Orion. a real shame that I do not know more about fractal spam and semantic co-occurance...

Free Link Renting Guide:
Patrick Gavin offers free link renting tips (PDF link)

Complacency:
Tim Converse (from Yahoo!) calls out Marissa Mayer (from Google). I am sure there are lots of fun dialogs between the various engines employees.

Ask Jeeves:
creates their obligitory blog.

Pages