Majestic SEO - Interview of Alex Chudnovsky

Majestic SEO.

I have been fond of the depth of Majestic SEO's data and the speed with which you can download millions of backlinks for a website. While not as hyped as similar offerings, Majestic SEO is a cool SEO service worth trying out, and their credit based system allows you to try it out pretty cheaply (unless you are trying to get all the backlinks for a site as big as Wikipedia)...as the credits depend on the number of inbound linking domains.

They give you data on your own domain for free, and share a nice amount of data about third party sites for free. For instance, anyone can look up the most well linked to pages on SEO Book free of charge

What made you decide to create Majestic SEO?

We arrived to it naturally - our main aim with the Majestic-12 Distributed Search Engine project is to create a viable competitor to Google. We use volunteers around the world to help us crawl and index the web data. This project was started in late 2004 and about 2 years later it became clear that we need to be as relevant as Google and in order to do that we have to master power of backlinks and anchor text. As time went on many hundreds of terabytes of data were crawled it also became clear that we need to earn money as well in order to sustain our project. It took well over a year to actually achieve the level that we felt confident with to release it publicly in early 2008.

What were the hardest parts about getting it up and running?

The most difficult part was to avoid the temptation to simplify the problem and focus on a small subset of data that is much smaller than that indexed by Google. It was felt that it would be a mistake as you can't really be sure that you have the same view of the web unless you are close to Google's scale.

Once it was decided to follow the hard path a lot of technical scalability problems had to be solved as well, and then deal with the financial aspect of storing insane amount of data using sane amount of hardware.

You use a distributed crawl, much like Grub did. What were some of the key points to get people to want to contribute to the project? How many servers are you running?

The people that joined our project did so because they felt that Google is quickly becoming a monopoly (this was back in 2004) and a viable alternative was necessary. We have over 100 regulars in our project that run distributed crawler and analyser on well over 150 distributed clients: all this allows us to crawl at sustained rate of around 500 Mbits.

Since we recently moved closer to commercial world with Majestic-SEO it was decided that our project participants will benefit from our success by virtue of share ownership - essentially project members are partners. It needs to be stressed here that our members did not join the project for financial reasons.

How often do you crawl? How often do you crawl pages that have not been updated recently?

We crawl every day around 200 mln urls. At the moment our main focus is to grow our database in order to catch up with Google (see analysis here), however we have dedicated some of our capacity to recrawls, in fact in February we should have new version of automatic recrawls of important pages (high ACRanked) released and this will allow to see competitor backlink building activity pretty quickly. Our beta daily updates feature shows new backlinks found in previous day for registered or purchased domains, this gives a chance to see new backlinks before we do full index update (around every 2 months time).

What is AC Rank? How does it compare to Google's PageRank?

ACRank is a very simple measure of how important a web page is based on number of unique domains linking to it. More information can be found here: http://www.majesticseo.com/glossary.php#ACRank

This measure is not as good as PageRank because it does not yet "flow" between pages. We are going to have much improved version of ACRank released soon.

Do you have any new features planned?

Can't stop thinking about them ;)

You allow people to export an amazing amount of data, but mostly in a spreadsheet basis on a per site basis. Have you thought about creating a web based or desktop interface where people can do advanced analysis?

We offer a web based interface to all this data with ability to quickly export it using CSV format.

For example, what if I wanted to know pages (or sites) that were linking to SearchEngineWatch.com AND SearchEngineLand.com but NOT linking to SeoBook.com AND have a minimum AC rank of 3 AND are not using nofollow. Doing something like that would be quite powerful, and given that you have already done the complex crawl I imagine adding a couple more filters on top should be doable. Another such feature that would be cool would be adding an Optilink-link anchor text analysis feature which allows users to break down the anchor text percentages.

We do have powerful options that enable our customers to slice and dice data in many ways, such as excluding backlinks marked as nofollow or only showing such backlinks, this applies to single domain analysis however, but something like what you describe in your example of interdomain linking will be possible soon.

Have customers shared with you creative ways to use Majestic SEO that you have not yet thought of?

We get good customer feedback and often implement customer requested features to make data analysis easier. As for new creative ways our customers prefer to keep them to themselves, but once you look at the data you might see one or two good strategies on how to use it. ;)

How big of an issue is duplicate content while crawling the web? How do you detect and filter duplicate content?

It is a very big issue for search engines (and thus us) as many pages are duplicate or near duplicate of each other, with very small changes that make it hard to detect them. We do not currently detect such pages (we crawl
pretty much everything) though we have a good idea how to do it and will implement it soon. Our reports tend to show data ordered by importance of the backlink, so often it is not an issue though it depends on backlinking profile of a particular site.

A lot of links are rented/bought, and many of these sources get filtered by Google. Does your link graph take into account any such editorial actions? If not, what advice would you give Majestic SEO users when describing desirable links vs undesirable ones?

At the moment our tools report factual information on where backlinks were found, we do not currently flag links as paid or not. This is something that humans are good with and computer algorithms ain't - that's why Google hates such paid links so much. We do have some ideas however on how to detect topically relevant backlinks (paid would usually come from irrelevant sites) - it's coming soon and might actually turn up to be a ground breaking feature!

Microsoft has done research on BrowseRank, which is a system of using usage data to augment or replace link data. Do you feel such a system is viable? If search engines incorporate usage data will links still be the backbone of relevancy algorithms?

BrowseRank is a very interesting concept, though we are yet to see practical implementation on large scale web engine. I don't think such system obsoletes link data at all, in fact it is based on link data just like PageRank only it allows to detect the most relevant outgoing links on a page, essentially such votes should be given more weight in PageRank-like analysis. For example imagine that this very interview page is analysed using BrowseRank and it finds that the following cleverly crafted link to Majestic-SEO homepage is clicked a lot, then such link could be judged as the real vote that this page gives out!

This approach would help identify more important parts of on page content as well so that keyword matches within this content block could get higher score in ranking algorithms. So I actually think there is a lot of mileage in BrowseRank concept, but it would be a mistake to think that it will completely replace need for link data analysis. I am pretty sure Google uses something like this already - Google Toolbar stats would give them all they need to know.

The great irony in my view is that Microsoft lacks good web graph data to apply their browsing concept, this is probably why they are so desperate to buy Yahoo search operations who are much better than it comes to backlinks analysis, though Google are the real masters. Majestic-SEO is trying to slot itself just behind Google and who knows what happens after it ;)

I look up a competing site and see that a competitor has 150,000 more links than I do and feel that it would take years to catch up. Would you suggest I look into other keywords & markets, or what tricks and ideas do you have for how to compete using fewer links, or what strategies do you find effective for building bulk links?

First of all: don't panic! :)

Secondly use the SEO Toolbar that will query our Majestic-SEO database to show number of referring domains - it may well be very few.

Thirdly consider investing into detailed stats we have on this domain: this will tell you anchor text used, actual backlinks that you can analyse by their importance (we measure it using ACRank). Once you see real data a lot of things can become clear: for example you can see that your competitor has got lots of backlinks pointing just to homepage or spread around the site. Seeing actual anchor text is really an eye opener - it can show which keywords site was optimised for, this will allow you to make a good decision whether you can catch up or not. Chances are you may find that your competitor is weak for some keywords, this is where keywords research tool like Wordtracker is invaluable.

And finally consider that a few good relevant backlinks are likely to be worth more than many irrelevant ones: it is those backlinks that you want to get and knowing where your competitor got them should help you create a well targeted strategy.

You allow people to download a free link report for their site. How does this report compare to other link sources (Yahoo! Site Explorer, Alexa links, Google link:, and links in Google Webmaster Central)?

We give free access to verified websites, this is a great way to try our system and you might see the backlinks that you won't find elsewhere because our index is so large and we show you all backlinks (rather than top 1000) that we've got: this will include backlinks from "bad neighbourhoods" (this is not yet automatically marked by our system, but visual human analysis wins the day here) that you may not be shown in other sources.

We believe that our free access reports are the best in class, since it's free why not find out for yourself?

For analyzing third party sites you have a credit based system. How much does it cost to analyze an average site?

The price depends on how large (in terms of external referring domains) a particular website is. We have some sites that have hundreds of millions of backlinks, average would be very different depending on what you really after, the best option is just to run searches for domains that you interested in on our website, this will give you very interesting free information as well as price for full data access.

For a domain like Wikipedia I might only want the links to a specific page. Are you thinking about offering page level reports?

Yep I am thinking of it - I actually had requests like this, funnily Wikipedia being the main object of interest.

What is the maximum number of links can we download in 1 report?

Our web reporting system tries to focus on most valuable backlinks to avoid information overload, however we allow complete dataset download that will include all backlinks - some of our clients have retrieved data on domains with well over 100 mln backlinks! Using our powerful analysis options you can focus on backlinks for particular urls coming from particular pages and retrieve all qualifying data.

------

Thanks Alex. For more information on Majestic SEO please visit their site and look up your domain.

Barack Obama Earns the 2009 Domainer of the Year Award

It is no secret that Obama is great at public speaking, building a fan base, working the press, and leveraging new distribution channels, but one of his most overlooked marketing achievements is...domain name selection. Anyone who has watched Idiocracy should appreciate the domain names that government programs are now launched under.

Before Obama was sworn in, he launched Change.gov under The Office of The President-elect. Change is an easy concept to grasp after 8 years of crony capitalism and fraudulent wars built on lies from international war criminals. But "change" in and of itself is a tool, not a destination...where is it going?

While launching a plan to increase government spending by nearly a trillion dollars to "create" millions of jobs, upgrade infrastructure, and computerize the national health care system (what hidden costs might come to individuals from that "change"?), Barack announced that his plan to spend this money will be tracked under Recovery.gov.

Part of Obama's spending plan is to "expanding broadband access to millions of Americans so businesses can compete on a level playing field, wherever they are located." But the playing field has never been level (just look at how the bankers rewrote the consumer bankruptcy laws to shaft consumers a few years before the banks were begging the government for trillions of dollars of handouts).

Investments in the web will increase the value of web assets, but the increased competition will make it harder to gain attention and exposure unless you have capital to invest, and invest it wisely. Just this week a corporation worth $10's of billions put one of our core keywords in the page title of their home page! We still outrank them, but for how long?

I thought it would be at least a decade before the United States government started domaining. Many large corporations are sure to catch on soon, increasing domain prices and closing off a great investment opportunity for smaller players. I have been busy over at BuyDomains looking for good names to hoard and build, picking up another 3 yesterday. SEO Book members have access to a coupon code to get 15% off BuyDomains domain names on our member discounts & coupons page.

Spying on Customers & SEO Data Aggregation

We Do Not Spy on Our Customers

I have had a very well known SEO company dust one of best link building strategies (outing it directly to a Google engineer) because I was trusting enough to mention how effective it was inside our training program, thinking that a competitor would not out it, but I was wrong! At least I know what to expect, and can use that knowledge to mitigate future risks.

One of the common concerns about the SEO Toolbar is something along the lines of "does it phone home" or "are you spying on us" or "what data is it sending you". Some SEO companies offer a huge EULA and do spy on the people who use their toolbars, but we do not do that for a number of reasons

  • I felt rather angry when that well known SEO company outed my site (and haven't really trusted them since then)
  • I never really liked the idea of spying on customers, and going down that path could harm our perceived brand value
  • knowing that information is kept private adds value and builds trust
  • we are already under-staffed (running quite lean) and have more projects to work on than time, so we are not in need of new projects
  • With all the great competitive research tools available now (like Microsoft Ad Intelligence, Google Search-based Keyword Tool, Compete.com, SEM Rush, and many others) it is easy to get a lot of keyword data quickly, and I see little value add in spying on our users.

Why Give Away so Much Value?

It is pretty obvious that the trend in software (since the day I got on the web) is that open source software is commoditizing the value of most software products and tools. Providing tools that require limited maintenance costs and provide access to a best of breed collection of SEO tools makes it easy for us to evolve with the space and help our customers do so, without building up a huge cost sink that requires raising capital and having to listen to some icky investors. :)

The reason we can (and do) provide so many free SEO tools is because I feel doing so...

  • makes the web a better place (Tim O'Reilly says you should create more value than you capture)
  • offers value to the community
  • extends opportunity to more people around the globe (anyone who is just fresh starting out like I was ~6 years ago could use the help)
  • commoditizes the value of some bloated all-in-one SEO software (many of those products generally lack value and misguide people)
  • makes it hard for con-artists to sell hyped up junk (by commoditizing the value of their offerings to all but the most desperate of get rich quick folks)
  • helps to educate potential future customers (when we did a survey recently about 80% of our customers have been practicing SEO for over a year)
  • is an affordable distribution strategy for brand awareness
  • builds trust by delivering value for free (rather than trying to squeeze every penny out of potential customers)
  • is a big differentiator between us and most SEO websites

In addition to all the above points, most of the tools we create are tools I want to use. So the cost of building them would still be there even if we did not share them. Sharing them gets us lots of great user feedback to improve them, and does not cost us much relative to the potential upside.

Small Industry, Lightweight Strategy

Rather than centralizing things, we like to rely on a distributed software strategy which has a much lower cost structure.

That strategy allows this site (with a popular blog, an array of tools, some videos, training modules, and an active community) to run on 1 server. We find the Plenty of Fish story inspiring, though doubt we will need his distributed computing skills anytime soon given how small our industry is. After 5 years we are still millions of visitors and over a billion monthly pageviews behind Plenty of Fish :)

Though we are doing ok in our little corner of the web :)

We have analytics on our website to help us see where we are getting coverage, and to measure and improve conversions (an area ripe for opportunity given our brand exposure and site traffic). We may add relevant affiliate links and offers to some of our SEO tools to help pay for the 10s or 100s of thousands of dollars we spent developing our various tools (for example, see how we integrated a link to our Wordtracker keyword guide and the Wordtracker keyword research service in our keyword tool). But we have no need or desire to spy on users who download our tools. Spying and outing are poor strategies for professional SEOs to employ....they erode trust and value.

Advanced SEO Toolbar Functions

I thought it would be worth highlighting a few of the advanced features in the SEO Toolbar. Some of the highest value ideas do not consist of looking at one data point, or boiling things down to 1 arbitrary and meaningless number (like many "professional" SEO tools do), but consist of looking at many data points across multiple sites, and hunting for inconsistencies that help you build new profitable traffic streams. Along those lines, I thought I would run through a few ideas to get your juices flowing...there are dozens more like these :)

The advanced tips are here.

How Twitter Can be Corrosive to Online Marketing

In the past when you did something quite cool and attention-worthy people would reference it on their blogs. But now in the age of Twitter, many people mention your stuff on Twitter. This can be good if they have thousands of Twitter followers, but if most the people mentioning a topic are all in the same small tight knit space then you are only reaching a fraction of a fraction of the potential distribution you would have before the age of Twitter.

  • How many people read every Twitter update from the people they subscribe to? Very few. Since you are in a high volume aggregator the loyalty is nowhere near where it is with traditional blog subscribers.
  • Exciting news quickly falls into the archives due to the rapid nature and high volume of Tweets.
  • If you dominate a channel and keep reaching the same people over and over again that does help provide social proof of value, but after seeing the same message 5 or 10 times it becomes noise.

Worse yet, even though Twitter mentions are organic links and recommendations by highly trusted topical experts, those don't show up on the broader web graph since Google pressured Twitter into adopting nofollow EVERYWHERE, even for user profiles.

And unlike Delicious...

  • most people do not have automated mechanisms to dump their daily Tweets / Tweet links into their blog to provide trusted direct links
  • people rarely use Twitter as a bookmarking service, so it is rarely worth searching into yesterday's content. The Twitter content is very zen-like...here today, gone today.

Multiple people asked me to add their RSS feeds to the default set that in the SEO Toolbar that was soon to be downloaded by over 10,000 webmasters. And for wanting all that exposure (and future exposure) they didn't even post about it on their blog. They mentioned it on Twitter...where the same 3,000 people saw the message 20 times each. No value add whatsoever.

Out of over 21 pages of Tweets (300+ Tweets) mentioning "SEO Toolbar" in the last 3 days, Yahoo! is showing less than 10 inbound links to the SEO Toolbar page that came from sources other than direct friend requests, social news sites, or automated links brought on by that exposure. Twitter is pretty worthless as a link building strategy, even if you are giving away something that is both free and better than similar tools selling for hundreds of dollars.

Even if you have a strong launch and a product far superior to related products, the exposure you get may not matter if your coverage is stuck on Twitter. It is a connecting medium, but it doesn't make money:

Venture Beat says that Twitter made Dell a million dollars. That's nuts. Did the phone company make Dell a billion dollars? Just because people used the phone to order their Dell doesn't mean that the phone was a marketing medium. It was a connecting medium. Big difference.

Is Twitter a nice complimentary channel that can add exposure to your launch? Absolutely. But if the conversation does not leave Twitter.com then it has quite limited value in a search-driven Google-centric web. And that limited value is even less if you don't already have thousands of Twitter followers.

The "make money on Twitter" ebooks will be coming out soon, but other than the ebook authors, I doubt anyone will make much money from it (unless customer feedback helps them create new product lines).

Thanks for the Feedback & All You Need is Love

Early feedback on the SEO Toolbar has been quite positive.

You know people like a Firefox extension when an official Microsoft blog makes an entry promoting it!

Have a good weekend everyone!

Actually I just wanted an excuse to embed this Beattles video in a blog post :)

Time to Update the SEO Toolbar

Earlier today I called an older version of the SEO Toolbar that does not have the update option built in it. If you downloaded it earlier today, please download again from
http://tools.seobook.com/seo-toolbar/

The current version should be 1.0.1 (rather than 0.1). Sorry about the error on the updating part...but you won't have to download it again after this time...the update feature will work, and it is safe to just download it now as it will write over the earlier version of the extension.

The SEO Toolbar

What would happen if you smooshed together many of the best parts of Rank Checker, SEO for Firefox, the best keyword research tools across the web, a feed reader (pre-populated with many SEO feeds), a ton of competitive research tools, the ability to compare up to 5 competing sites against each other, easy data export, and boatloads of other features into 1 handy Firefox extension? Well, you would have the SEO Toolbar.

Around the Web in 20 Links

Loren Baker is holding a 3 day spring break SEO get together in Deerfield Beach, Florida. There are only 200 tickets cheaply priced at $500 each (especially when you consider that there will only be 200 people attending and you have people like Loren Baker, Chris Winfield, Todd Malicoat, Brent Csutoras, Rae Hoffman, and more speaking).

David Harry has been publishing brilliant content, including finding Yahoo!'s patent for automating SEO and a wonderful post on SEO higher learning.

Todd Malicoat highlights some great social media marketing tips.

Conversion Rate Experts shared some tips on why to get obsessed with conversion rate optimization.

Fantomaster highlights how behavioral metrics would create many surfbot nets, and offers an insightful cynical comment about the dangers of trusting data mining outfits:

Conclusion #1: The more they know about you as an individual, the more likely they will be to try and track and, as required, exploit or manipulate you - be it as a consumer, as a citizen i.e. a polity member, as a (perceived) health hazard, as a (perceived) sociopath, as a (perceived) security risk, etc. etc.

Conclusion #2: The better they are able to categorize you (aka slap some generalized "profile" of theirs onto you), the easier it will be for the process to become self-perpetuating and auto-referential: anything you may do or avoid doing (as tracked and monitored by them) will actually only reinforce their hold on you - both as an individual and as a member of whichever societal group or subgroup you may belong to.

SugarRae highlights how you can rank well quickly, without focusing on SEO.

Dazzlin Donna is offering a mini-stimulus jump start plan.

Andrew Goodman and Aaron Goldman highlighted some shortfalls behind bid management software - namely that some of the rules are too concrete, give credit to the wrong spots, and don't provide a huge competitive advantage since there are sooo many services and in house technologies being built that commoditize most of the offerings. Time naturally commoditizes most software, especially in saturated fields. Just look at how Google Analytics ate the analytics market.

Michael VanDeMar shares some tips on how to find images.

And if you are into economics it sure looks ugly. The market casino is rough, debt to GDP is huge (and like bank credit is STILL GROWING). The ending isn't going to be pretty. I am trying to ignore the market and spend more time and effort investing in myself, but the carnage keeps attracting attention! I have lost faith in the US government and US dollar, but ponder where to invest.

What else of interest have I recently missed?

Google: Closing the Loop on Content, Advertising, & Commerce

Every listing site or review site has to start off from scratch at some point. Over the past 3 or 4 years it has got much harder to rank thin affiliate database sites, and now that is only going to get harder, with Matt Cutts asking for spam reports on empty review sites.

Of course if Amazon.com or TripAdvisor or Craigslist open new sections they can probably get away with using duplicate or thin content based on the strength of their brands. Branded networks can always throw out a new related niche site and have it be seen as being above board:

The internet is fast becoming a "cesspool" where false information thrives, Google CEO Eric Schmidt said yesterday. "Brands are how you sort out the cesspool."

But new competitors are going to have a hard time building the budget and funding the brand exposure needed to rank because SEO is getting more complex, and if you don't have enough brand or enough AdWords spend you pretty-much are not going to get the exposure needed to get consumer reviews and rank organically, unless you license/steal/borrow/mix/re-mix content to build an opening "reviews" database. Some software tools, like Web Data Parser, make the process easier, but you still need to wrap everything in some time of value add (good design, mash ups, etc.). Or have great public relations. Or start your site off as an editorial only play, where you review what interests you, and then move the brand into the reviews space after you get some momentum and an organic traffic flow.

Matt Cutts explained how thin listing pages may be against their guidelines

Use robots.txt to prevent crawling of search results pages or other auto-generated pages that don’t add much value for users coming from search engines.
….
Don’t create multiple pages, subdomains, or domains with substantially duplicate content.
….
Avoid “doorway” pages created just for search engines, or other “cookie cutter” approaches…

and yet Google crawls form boxes to generate new URLs.

Search is growing more subjective, becoming more about competition and expanding the ad channel. Think like a black hat. You have to stay ahead of Google's internal products & services if you want to avoid the spam label.

The shopping search engines/price comparison sites spend enough on AdWords to be considered a value add user experience (they give AdWords a broad backfill baseline inventory which other merchants have to compete against), but if Google can evolve their Product Search into a revenue stream and encourage reviews then many shopping search engines will soon run out of steam.

A Microsoft engineer notes:

I believe that the locus of advertising will gradually shift towards the creation of valuable and compelling content. There is, however, a relative dearth of professionals or companies that can provide such content creation services. Perhaps advertising agencies might evolve in this direction, or perhaps this may an opportunity for forward-thinking individuals?

Eventually Google will need to become more of a content play if they want to keep growing revenues. This is why...

And if Google co-opts the media that makes it hard to give them serious negative press. Eric Schmidt thinks the press needs to be more tightly integrated into Google

I think the solution is tighter integration. In other words, we can do this without making an acquisition. The term I've been using is 'merge without merging.' The Web allows you to do that, where you can get the Web systems of both organizations fairly well integrated, and you don't have to do it on exclusive basis.

Google's growing depth gives it a huge network advantage. More advertisers = more relevant ads = higher monetization with better user experience & more user loyalty. Microsoft is trying to buy marketshare and will likely push search harder in Windows 7, but it might be too little too late.

Yahoo! screwed up their US advertiser terms of service AND gave up on their international contextual ad service, giving Google yet another competitive advantage.

After reading John Andrews write a great review of Affiliate Summit I got thinking about some of Google's potential moves...

  • give consumers discounts for reviewing merchants and products to quickly build up a leading reviews database
  • broaden the AdWords ad system to allow room for more CPA deals / lead gen inside the SERP
  • offer free hosting and CMS for Google AdWords customers (& track inventory)
  • offer credit cards, or perhaps their own “goog” currency system, pegged to a basket of currencies
  • start buying out leading players in large verticals (Expedia - $2.5B, Bankrate - $600M, Monster.com - $1.2B, and/or WebMD - $1.2B) to strengthen their network advantage

Pages