Link Harvester Updated

I think I have updated Link Harvester twice since I last posted new source code. It now allows you to grab link data via Yahoo! or MSN.

On top of allowing you to search for links at a specific page or links to anywhere in a domain it also has a third function called deep links which allows you to get a sample of deep link data without grabbing links pointing at the home page. The theory is that many good sites get deep links. Looking through the deep links may give you a better view of how they were acquired or if they are all garbage scraper links, etc.

By looking through the deep links you can

  • check the quality of links pointing at inner pages.

  • know what URLs you really need to redirect if you are changing your content management system.
  • know what URLs are important to redirect if you buy a site and want to modify the content or gut out pieces that were causing duplicate content or other problems

Another useful feature of looking at the deep link profile is that if you look at the links pointing at sites that were not actively marketed via SEO techniques it can help you see what natural link profiles look like.

MSN tends to give some weird numbers with their backlink count sometimes and typically shows fewer backlinks than Yahoo! so by default when Link Harvester gives link counts like

Showing 421 unique domains from the first 250 results of 1129 total results

it means that between Yahoo! and MSN there were 421 unique results returned in the query. The of first 250 means that the link search depth was set to 250 per engine. The 1129 results is the number of links in the Yahoo! database (although they don't return 100% of what they know of they return most of them). If Yahoo is turned off the third number should be from MSNs database.

The Perfect Page Title for SEO and Users

Brett Tabke recently had a supporters thread about the page title element as part of his 101 Signals of Quality series.

In the post he:

  • talks about how some people overdo it or underdo it to make the title less appealing to potential site visitors and search algorithms alike.

  • gave examples not only of what pulled best from the search results, but also what words could be added to forum thread topic titles to keep conversations going.
  • talked about keyword value vs phrase length

One thing he did not talk about a lot is the effect titles have on viral link baiting. I think NickW is probably the best at titling link bait of anyone I have ever seen. The title not only acts as an ad to be clicked on but also as an ad to be part of a story worth spreading. If you can be early with stories or put interesting twists on them those skills can make it really easy to build link popularity.

All in all Brett's post kicked ass. With how much information he put in that post I am going to be interested in seeing how he creates 101 tip posts like that one.

Search is About Communication

Making Untrustworthy Data Trustworthy:

In social networks there tends to be an echo chamber effect. Stories grow broader, wider, and more important as people share them. Tagging and blog citation are inevitably going to help push some stories where they don't belong. Spam will also push other stories.

RSS, the Wikipedia, Government content, press releases, and artful content remixing means automated content generation is easy. Some people are going so far as to try to automate ad generation, while everyone and their dog wants to leverage a publishing network.

What is considered worthwhile data will change over time. When search engines rely to heavily on any one data source it gets abused, and so they have to look for other data sources.

Search Engines Use Human Reviewers:

When John Battelle wrote The Search he stated:

Yahoo is far more willing to have overt editorial and commercial agendas, and to let humans intervene in search results so as to create media that supports those agendas…. Google sees the problem as one that can be solved mainly through technology–clever algorithms and sheer computational horsepower will prevail. Humans enter the search picture only when algorithms fail–and then only grudgingly.

Matt Cutts reviewed the book, stating:

A couple years ago I might have agreed with that, but now I think Google is more open to approaches that are scalable and robust if they make our results more relevant. Maybe I’ll talk about that in a future post.

Matt also states that humans review sites for spam:

If there’s an algorithmic reason why your site isn’t doing well, you can definitely still come back if you change the underlying cause. If a site has been manually reviewed and has been penalized, those penalties do time out eventually, but the time-out period can be very long. It doesn’t hurt your site to do a reinclusion request if you’re not sure what’s wrong or if you’ve checked carefully and can’t find anything wrong.

and recently it has become well known that they outsource bits of the random query evaluation and spam recognition process.

Other search engines have long used human editors. When Ask originally came out it tried to pair questions with editorial answers.

Yahoo! has been using editors for a long time. Sometimes in your server logs you may get referers like http://corp.yahoo.com/project/health-blogs/keepers. Some of the engines Yahoo! bought out were also well known to use editors.

Editors don't scale as well as technology though, so eventually search engines will place more reliance upon how we consume and share data.

Ultimately Search is About Communication:

Many of the major search and internet related companies are looking toward communication to help solve their problems. They make bank off the network effect by being the network or being able to leverage network knowledge better than the other companies.

  • eBay
    • has user feedback ratings
    • product reviews reviews.ebay.com
    • bought Shopping.com
    • bought PayPal
    • bought Skype
  • Yahoo!
    • partnered with DSL providers
    • bought Konfabulator
    • bought Flickr
    • My Yahoo! lets users save or block sites & subscribe to RSS feeds
    • offers social search, allowing users to share their tagged sites
    • bought Del.icio.us
    • has Yahoo! 360 blog network
    • has an instant messenger
    • has Yahoo! groups
    • offers email
    • has a bunch of APIs
    • has a ton of content they can use for improved behavioral targeting
    • pushes their toolbar hard
  • Google
    • may be looking to build a Wifi network
    • has toolbars on millions of desktops and partners with software and hardware companies for further distribution
    • bought Blogger & Picasa
    • alters search results based on search history
    • allows users to block pages or sites
    • has Orkut
    • has an instant messenger with voice
    • has Google groups
    • Google Base
    • offers email
    • AdWords / AdSense / Urchin allows Google to track even more visitors than the Google Toolbar alone allows
    • Google wallet payment system to come
    • has a bunch of APIs allowing others to search
    • search history allows tagging
  • MSN
    • operating system
    • browser with integrated search coming soon
    • may have been looking to buy a part of AOL
    • offers email
    • has an instant messenger
    • Start.com RSS aggregation
    • starting own paid search and contextual ad program based on user demographics
    • has a bunch of APIs
  • AOL
    • AIM
    • AOL Hot 100 searches
    • leverage their equity to partner with Google for further distribution
  • Ask
    • My Ask
    • Bloglines
  • Amazon
    • collects user feedback
    • offers a recommending engine
    • allows people to create& share lists of related products
    • lists friend network
    • finds statistically improbably phrases from a huge corpus of text
    • allows users to tag A9 search results & save stuff with their search history

Even if search engines do not directly use any of the information from the social sharing and tagging networks, the fact that people are sharing and recommending certain sites will carry over into the other communication mechanisms that the search engines do track.

Things Hurting Boring Static Sites Without Personality:

What happens when Google has millions of books in their digital library, and has enough coverage and publisher participation to prominently place the books in the search results. Will obscure static websites even get found amongst the billions of pages of additional content?

What happens when somebody comment spams (or does some other type of spam) for you to try to destroy your site rankings? If people do not know and trust you it is going to be a lot harder to get back into the search indexes. Some will go so far as to create hate sites or blog spam key people.

What happens when automated content reads well enough to pass the Turing test? Will people become more skeptical about what they purchase? Will they be more cautious with what they are willing to link at? Will search engines have to rely more on how ideas are spreading to determine what content they can trust?

Marginalizing Effects on Static Content Production:

As the web userbase expands, more people publish (even my mom is a blogger), and ad networks become more efficient people will be able to make a living off off smaller and smaller niche topics.

As duplicate content filters improve, search engines have more user feedback, and more quality content is created static boring merchant sites will be forced out of the search results. Those who get others talk about them giving away information will better be able to sell products and information.

Good content without other people caring about it simply means to search engines its not good content.

Image showing marginalizing effects on the profitability of publishing boring static sites.

Moving from Trusting Age to Trusting Newsworthiness:

Most static sites like boring general directories or other sites that are not so amazing that people are willing to cite them will lose market share and profitability as search engines learn how to trust new feedback mechanisms more.

Currently you can buy old sites with great authority scores and leverage that authority right to the top of Google's search results. Eventually it will not be that easy.

The trust mechanisms that the search engines use are easy to defeat and matter less if your site has direct type in traffic, subscribers, and people frequently talk about you.

Cite this Post or Subscribe to this Site:

Some people believe that every post needs to get new links or new subscribers. I think that posting explicitly with that intent may create a bit of an artificial channel, but it is a good base guideline for the types of posts that work well.

The key is that if you have enough interesting posts that people like enough to reference then you can mix in a few other posts that are not as great but are maybe more profit oriented. The key is to typically post stuff that adds value to the feed for many subscribers, or post things that interest you.

Many times just by having a post that is original you can end up starting a great conversation. I recently started posting Q and As on my blog. I thought I was maybe adding noise to my channel, but my sales have doubled , a bunch of sites linked to my Q and As, and I have got nothing but positive feedback on it. So don't be afraid to test stuff.

You wouldn't believe how many people posted about Andy Hagans post about making the SEO B list. Why was that post citation worthy? It was original and people love to read about themselves.

At the end of the day it is all about how many legitimate reasons you can create for a person to subscribe to your site or recommend it to a friend.

Man vs Machine:

For most webmasters inevitably the search algorithms will evolve to become advanced to the point where it's easier and cheaper to manipulate human emotion than to directly manipulate the search algorithms. Using a dynamic publishing format which reminds people to come back and read again makes it easier to build the relationships necessary to succeed. To quote a friend:

This is what I think, SEO is all about emotions, all about human interaction.

People, search engineers even, try and force it into a numbers box. Numbers, math and formulas are for people not smart enough to think in concepts.

Disclaimer:

All articles are wrote to express an opinion or prove a point (or to give the writer an excuse to try to make money - although this saying that SEO is becoming more about traditional public relations probably does not help me sell more SEO Books).

In some less competitive industries dynamic sites may not be necessary, but if you want to do well on the web long term most people would do well to have at least one dynamic site where they can converse and show their human nature.

Earlier articles in this series:

Trending and Tracking the Blogosphere and Newsosphere

Feedback Loops:

Most searches occur at the main search sites and portals (Google, Yahoo!, MSN, AOL, etc.), but some people also search for temporal information, looking to find what is hot right now, or seeing how ideas spread. Not everyone can afford WebFountain, but we can all track what people are searching for or how stories are spreading using:

Feed Readers :
Subscribe to your favorite channels (or topical RSS feeds from news sites)

Blog Search:
search for recent news posted on blogs

Blog Buzz Index:
search for stories rapidly propagating through blogs

General Buzz & Search Volume:

Product Feedback:

News Search:

Test Ad Accounts & Test Media:

  • Google AdWords
  • Yahoo! Search Marketing
  • write press releases and submit them cheaply to see how much buzz & news search volume their is around a topic, using sites like PR Web or PR Leap
  • post on a topic
    • see if it spreads
    • check referrer data
    • Sometimes stories emerge out of the comments. The Save Jeeves meme that spread originated around the time the person who created that story commented on my post about Jeeves getting axed.
    • Don't forget to have friends tag your story on Del.ico.us and submit it to Digg.

Tagging:
Some are busy tagging what information they think is useful.

  • Delicious - personal bookmark manager.
  • Wink - tag search
  • Flickr - image tagging hottest tags
  • Tag Cloud - shows graphic version of hot tags
  • Furl
  • Technorati Tags
  • Digg Top Stories
  • Reddit
  • Ning
  • Squidoo
  • My Yahoo!
  • Google Search History (you can't see what others are tagging, but I bet it eventually will influence the search results - Google is already allowing people to share feeds they read)
  • more tagging sites come out daily...lots of others exist, like Edgio, StumbleUpon, Shadows, Kaboodle, etc etc etc
  • also look at the stuff listed in Google Base...there may or may not be much competition there, and Google Base is going to be huge.

Track Individual Stories and Conversations & Trends of a Blog:

Bloggers typically cite the original source OR the person who does the most complete follow up.

Blog Trends:
See if a blog is gaining or losing marketshare and compare blogs to one another

Overall Most Popular Blogs and Stories:

Did I miss anything? Am sure I did. Please comment below.

Here are earlier stories from this series:

Syndication and How News Spreads

A while ago I started publishing bits of an article that I intended to finish quickly, but life slowed me down. Here were the first parts

Why Bloggers Hate SEOs
Why SEOs Should Love Bloggers
Dynamic Sites and Social Feedback
Controlling Data and Helping Consumers Make it Smarter
Small vs Big and Voice in Brand

I am going to see if I can finish up the article today. Here is the next piece:

How News Spreads:

News has to start from somewhere. It doesn't really matter if it comes from blogs or traditional media. A few things that are important with both publishing formats are

  • both have incentives to get the scoop or report on stories early
  • both have audiences who can further spread your message
  • both are fairly viral
  • both have lots of legit link popularity
  • getting viral marketing via blogs or news coverage is something that most people will not be able to replicate

Eventually if the story spreads the feedback network becomes the next round of news. If one or two well known reporters write your story other journalists and bloggers may feel like they are missing out if they do not cover it.

The story about me getting sued was picked up by another blogger, then BusinessWeek, then the WSJ. About a few hundred blog citations followed that. Sometimes news that goes a bit national comes back local, and even then you get a bonus links. A Pittsburgh paper mentioned I was sued. That story was syndicated on a Detroit paper, and even got a mention in the blog of the local paper.

Newspapers love to syndicate content from each other to lower costs. Sometimes they even syndicate things that don't make sense because they need fill to surround their ads. I have even seen an Arizona column featuring local Rhode Island bloggers.

Google Adlinks: Selling User Trust on Bait and Switch Targeting

I have not tested Google Adlinks on many of my sites much, but there are other sites that talk about me in threads.

I was sorta curious how Google picked "Aaron Wall SEO Book" as a link topic and wanted to see what ads they display for it.

I have so many relevancy points for those types of searches:

  • I rank #1 for either phrase or them both together

  • My conversion rate for those searches is amazing and I have Google Analytics enabled so they know how amazingly high the conversion rates are.
  • If you search for either of those phrases (or the phrases together) I am at worst #2 on AdWords (am typically #1).
  • I have AdSense content ads enabled with a wide variety of those types of terms in it and am nowhere near my daily budget on that ad campaign

and yet when I clicked that Aaron Wall SEO Book adlink I did not appear in the ad search results. I also clicked the SEO Inc. adlink and their ad was #9 for their own trademark name.

I realize to Google it is all just math, numbers, money-in-the-bank, yada, yada, but if it is wrong for competitors to use trademark terms in ad copy how wrong (and perhaps even a bit unethical really - since search engines want to push the bullshit ethics angle) is it for Google to create adlink searches using trademarked terms to drive them and potentially not list the trademark sites or any editorial search results?

Is that legitimate comparative advertising? What would Sony think if Google delivered Playstation adlinks that delivered ads for nothing but XBox games. What happens if Yahoo! sells a link named Google that leads to transexual porn ads? Where is the line drawn in the sand?

Is everyone that develops a legitimate brand forced into paying Google through the nose for Adlinks on their products or brands so they don't have Google flush their brand equity down the toilet?

I bet if you search around there are probably some interesting adlinks that are complete bait and switch on trademark terms. Is there any cases associated with the liability of doing that? Should there be?

I am a big believer in aggressive advertising, but is it deceptive for Google to use a trademark term in a link to drive a query to a bunch of ads that may have nothing to do with that topic?

How is this Google adlinks technique any better than typosquatting?

Google Movies OneBox

Not sure if this has been mentioned anywhere yet, but Google created a movies Onebox

Google Movies Onebox.

Philipp also noticed them testing drop down home page navigation.

Johnny Cash Biography AdSense Targeting on Amazon.com

Recently there was a biopic by the name of Walk the Line featuring Johnny Cash's life. The movie is so popular that the soundtrack and 2 of Johnny Cash's CDs are on the Amazon top 25 CDs list.

Johnny Cash's autobiography is a top 500 selling book on Amazon and yet the page had the following ads:

Off targeted AdSense ads.

I thought AdSense was better at targeting than that? Especially for such currently commercially popular and in the news topics? Hmm.

That certainly shows Google still has a long way to go to improve their targeting and profits from contextual ads.

As a special bonus, if you like music you really ought to watch this video.

PPC Spam Eating Soul of Web Content

"I sold my soul for a quarter a click"
- a closet millionaire

Brett Tabke recently posted in his blog a definition of web spam:

So much graphical and textual noise that you can't determine whether you are clicking on a paid advertisement or an actual old-fashioned honest link. When ads are so thick, that you must study the page carefully to determine where the content is at.

That is probably a good secret to highly profitable affiliate marketing or contextual marketing of any type: put the ads where people are thinking they are going to find content. That is what Google teaches people to do. It makes more money. Who can fault us for doing it?

Eventually web users may adjust, but there is some serious CPC to be made until they do.

Brett also mentions that building better authority allows you to get away with being even spammier:

There is a point where ads become so pervasive, that they over power the content and hurt the credibility of a site. If you have a authoritarian site, then that point is much higher than most would believe. I know of one site that has over 25 ads on the page right now and is still considered a top site in it's field.

Which is a great reason why it is worth buying older highly trusted sites, or being lily white off the start. Get the trust. Then get the money.

A while ago I posted that there was a noticable trend where it seemed like there was a shift away from content optimization to content creation. It seems many sites are founded upon the principal that the only purpose of content is to get ads indexed.

It is amazing how much control search engines have over the viability of many publishing business models. As long as I still have at least one or two high quality channels I don't think I will feel guilty creating a good number of low quality spamesq ones. If Google wants to fund content pollution does it make sense to by a hybrid car? ;)

How do Flash Sites Rank Well?

SEO Question: As a long time SEO myself, there is one thing that has me mystified. If you do a search in Google under "chocolate", Godiva comes up #1, Hershey comes up #2. Yet, if you look at their home pages, they have almost no text there. In fact, Godiva has no real text at all. Yes, they have PR6, but still, how is it that these "big boys" come up on top with a home page devoid of any SEO or real text? Is it all links?

SEO Answer: For competitive queries Google's relevancy algorithms are probably about 99% linkage data. Those brands are so strong that their linkage data means they do not need page copy to rank for general relevant terms. Should Starbucks rank for coffee? Few sites are more relevant.

Google does not aim to show the most optimized content. They want to list the most relevant content.

By having limited page copy they may end up missing out on ranking for longer related queries since it is a bit hard for search engines to make documents relevant for long multi word phrases that don't occur in the anchor text or page copy, but for general queries they can still do great.

This is not 100% bright here, but about a year ago I moved the host for one of my sites prior to fully uploading the site at the new location. The files were rather slow to upload and Google cached the home page while the site was not there and the site still ranked #6 for search engine marketing.

Sometimes you will hear some SEOs whine about the updates and others claim that their techniques are more effective because the clients see more stable results. In hyper competitive markets many times the result stability of a particular site has as much to do with client selection as the skill level of the SEO. The result stability in competitive markets has a lot to do with how strong the brand and traditional marketing a company has. Ultimately the search engines aim to emulate end users. Those brands that have significant mindshare in the real world should rank well in the search results as well unless the relevancy algorithms are crap.

A few tips for using flash (if you must use it):

  • Create descriptive useful page titles and meta descriptions.

  • Embed the flash into HTML pages and use regular text links on the page if possible.
  • If it does not screw up the design too bad add HTML text to the page.
  • Create textual representations of what is in the flash using noembed tags.
  • Instead of including everything in one flash file it may make sense to break the content into different flash files so you can create different HTML pages around the different ideas contained in it.
  • Macromedia has a search engine SDK, although I think most sites are still best off using texual representations of the flash files on the HTML content of pages
  • Mike Knott also recommended this JavaScript plugin for flash detection. It is XHTML compliant, and, so long as you use it properly, it is better than the Noembed tag.

Pages