I can plumb around Google blocking it, but there are a limited number of types of webmaster tools that interface with search engines that can be provided to the general public without either being cloned by the search engine or having the search engine serve you some type of retribution for creating them.
Editorial judgements are rarely equitable, and nobody wants to have sitelinks, but have them appear at the top of the 5th page of the search results for their own brand.
New Media is a Key to Growth (ish)
I have never created a Facebook application and have no intent in doing so, because if I am successful they would likely steal my idea and find a way to ban or silence me and/or halt and clone my project. Which is sorta what Kevin Rose did to a Digg member who created an unofficial Digg group on Facebook.
The Transition From Open to Close
Sure that Google maps API is open today, and so are many other data sources, but after they buy enough marketshare look for that to change. The big networks are only open in markets they are losing. What did they do to their SOAP search API after they had enough market leverage? They killed it.
Relying on APIs or scraping data from someone else's platform only has value if you can aggregate it from many sources, do it in a way that is hard to block, add substantial value, have alternative data sources, and you are creating something that you know the data sources you are relying on will not clone for a strategic reason.
Wanted: Writer, Editor, & Marketer...Pay: $0
All these networks pretend that they care about you, but they are vultures. Their data is their data. Their ideas are their ideas....and so are your ideas, unfortunately. If you find yourself becoming someone else's user generated content, or your business can be described as a feature on someone else's product, you are wasting your time.
Joost created an SEO link analysis extension for Firefox that shows link anchor text and PageRank on Yahoo! Site Explorer, Google Webmaster Central, and Microsoft's webmaster portal. I also updated SEO for Firefox to fix a Yahoo! Search error, but to get it to update you have to uninstall and reinstall it because I did not update the versioning data and my programmer is a bit backed up at the moment.
The Website Health Check tool aims to provide a simple and intuitive interface to seeing if your site has any major SEO issues. The site queries Google to grab pages you have indexed in Google, and looks for issues amongst the first 1,000 results.
If your site is exceptionally large, you can use the date based filters to view a sample of recently indexed pages in Google to see if there are any duplication issues amongst those pages.
Questions Answered by the Website Health Check Tool
Is Google indexing your site? Are they quickly indexing your new pages?
Do you have duplicate content pages getting indexed in Google?
Do you have canonical URL issues?
Are any of your pages in Google missing page titles?
Does your server send correct error messages?
This tool is in beta. Please leave feedback below.
I sent the programmer this URL and he would love to get your feedback on what you think of it. We are looking to have version two out before the end of the month.
Features We Are Looking to Add
Allow you to search for not just a site, but a site and a keyword, like [seobook.com seo]
Add indexed page counts from all major global search engines (Google, Yahoo, Microsoft, Ask)
Allow webmasters to grab results from any of the above 4 engines, or mix and match
Make each data point we collect link to the source
What other features would you like to see?
Video About How to Use the Website Health Check Tool
Michael Jenson from Solo SEO recently emailed me about a cool new free SEO tool he created called Index Rank. After seeing my post about Google date based filters, Michael created the Index Rank tool, which allows you to see the growth of a site's profile in Google based on the number of pages indexed over different periods of time. The tool also allows you to compare multiple sites against each other.
Why is this data useful?
Since Google removed the supplemental results label, the next best thing we have to test site trust for lower end longtail pages is how quickly new pages are getting indexed.
If you see a rapid increase in indexing you know that is caused by an increase in domain trust due to better inlinks, an increase in content creation that leveraged unused authority the site was sitting on, solving a crawling issue, improving internal site architecture, or some technical issue that might be associated with creating duplicate content pages.
If everything you create is getting indexed you may consider creating content at a faster rate, perhaps using sub-brands off subdomains.
If you keep pumping out content but are not seeing your indexing stats go up, that is a cue to build links.
The people from SEO Digger recently put together some research on search spam. Some of the terminology they use (like using the word illicit) is inaccurate, but the trends they discovered align well with what one would expect.
In high money niches, spam sites tended to dominate longer search queries while having less exposure in search results for shorter queries. View the below graph with adult, pills, dating, cars, gifts, and casinos. It shows the normalized density of spam sites ranking in Google by 1, 2, and 3 word queries.
Why is Casino an Anomaly?
I believe the reasons casinos appear so tight nit are
US advertising laws and gaming laws prohibit some of the common spam related revenue streams
leading online gaming sites have heavily embraced both offline advertising and SEO
people who gamble tend to be quite passionate about gambling
That passion means gamers are more active to participate in community sites in that niche, which further consolidates traffic streams due to network effects and creates a lot of free on topic content for some of the major community driven sites.
Effective Search Spamming Business Models
Given this research, if you were to create a business model revolving around spamming, it makes sense to focus on the long tail of search. Get enough PageRank to get your pages indexed, but do not worry about accumulating enough PageRank to try to rank for core keywords in the spammy niches. Plus, staying away from the core keywords makes your sites less likely to get booted from a manual review and/or a competitor snitching on you.
Spam & Ranking Low Trust New Sites
The exact same trend that is seen between real sites vs spam sites is paralleled when considering new websites vs older websites.
Older websites that are heavily linked at and heavily trusted dominate the core category related keywords.
Longer search queries have less matches in the search database, and are thus more reliant on the on the page aspects of SEO.
Older sites can not possibly adequately cover all the related longtail search phrases, so newer sites with less authority rank for many of the more accessible long tail keywords.
If you create a new site you can set your goals on ranking for core category keywords, but realize that longtail traffic will come first. If Google lets entire categories get dominated by spam pages then there has to be an associated opportunity to rank real pages.
I just updated SEO for Firefox to include Compete.com website rank and Compete.com monthly uniques. If you leave Compete.com in on demand mode it tends to work quite well. I am also going to ping the guys at Compete.com to ensure the automatic mode gets to be pretty reliable too. Compete.com data is far better than Alexa because it has less of a webmaster bias.
Justin Laing recently emailed me to let me know about his SEO sitefinder tool, which uses the ODP and the Internet Archive to find DMOZ listed websites that have not been updated in a while.
Domain Tools also allows you to find expiring domains that will be up at auction soon. You can view their top picks or use the right rail filters on that page to search for DMOZ and Yahoo! Directory listed domains.
Free tools such as DropScout allow you to find expiring high PageRank domains.
You can also look at TDNam for expiring domains, and either use software to filter through those OR sort the results by bids and prices. Some of the domains with many bidders are pure play domainers, but others are old trustworthy sites in need of a good loving owner.