Jon Glick Interview

Jon Glick.
Jon Glick is one of the leading experts on search, having literally both wrote the code at leading search engines and later becoming an SEO professional. I remember speaking with him in 2004 at the Ghost Bar in Las Vegas and it was perhaps the most fascinating conversation about search I have ever been part of. I have wanted to interview him for years & just recently was able to. :)

In some past interviews (like this one) you have highlighted how Google's key strength is perhaps brand rather than relevancy. After seeing Yahoo! bow out of the search game do you still hold that same opinion? What do you think of the Bing brand?

Brand is still Google’s strongest competitive asset in search. It means that to get someone to switch you have to be significantly better than they are, which is a tall order. Bing is the first search offering from MSFT that is in the same league with Google, so it’s more about branding and positioning than objective quality at this point. If Bing was a standalone brand they wouldn’t have a chance, but it has the advantage of default positioning in IE, so for now it just has to be close enough that people won’t swap it out. Over time Bing may evolve some interesting differentiation from Google, but that’s not really the case right now (at least it seems to be pressuring Google to experiment/innovate a bit more). It’s been quite a while since using a MSFT product was “cool” and Bing has that drag on its brand.

Some of the new upstarts entering the search game believe that perhaps the thinning of the herd is creating an entry opportunity? Have you checked out Blekko yet? Any other new general search projects interest you?

Google rose to prominence during the dot-com bust when the existing players were quite disinterested in search, since at the time (pre-PPC) it was money loser. Search is so ridiculously lucrative right now that any promising technology that starts to get traction or buzz is likely to be quickly acquired by one of the major players as a blocking measure. Google’s rumored attempt to acquire Cuil for $80MM pre-launch is an example. There is an opportunity, but it’s more about getting bought out for a sweet price than taking down the SEs.

There is also so much manual tuning in search these days that even a great system will take a lot of effort to return great results. “Plumber OR Pipefitter” is a Boolean query, “Portland OR Plumber” is not, and someone’s got to build code to recognize that. This is where the existing players have a huge legacy advantage.

Looking at new search technologies I’m very cautious about those that ask users to do more work in return for better results. Search is a low-intensity activity that people don’t really want to learn or spend time on. This is where an approach like WA (that Bing is also aiming towards) looks interesting. We’d all like search to be like the computer from Star Trek that gives you back exactly the answer/data you ask for. The complication with this, beyond the technical issues, is what benefit it has for the webmasters (i.e. why should I let you crawl/index my site). Current SEs take your data for their use, but provide traffic in return, which an answering system would not.

You are one of the few guys who literally wrote the relevancy algorithms & then later worked in the SEO space. Do you consider the roles to be primarily complimentary or adversarial?

So is SEO good or bad for SEs? On the whole I think it’s a benefit for them. From an algo perspective it’s a lot easier to determine the intent of a well SEO’d page. The SEs give webmasters a lot of tools and encourage them to use them because it makes search better. 301 your pages so we know where the content went, let us know what parameters don’t impact page content so we don’t get caught in robot traps, tell us what language your page is in using the metatags so we don’t have to guess, etc. If one of these tools ends up being a net negative, SEs can always change how they treat it (NoFollow), or just start ignoring it all together (Keywords MetaTag). This is not to say that a lot of work doesn’t have to be put into removing spam and factoring out overly aggressive optimization, but it’s a lot less than what they’d need to do if no one SEO’d.

Given your experience on both sides of the table, do you feel that ranking great in other search engines is like stealing candy from a baby, or is it still hard? What aspects of the SEO process do you find most challenging?

For SEO-ing established businesses it’s not a slam dunk, but it is still possible to generate very strong returns. At we have dozens of people working on SEO in a very organized manner and paybacks on investing effort are better than almost any other aspect of our business. The challenging part is the innate volatility of SEO and the fact that ultimately the SEs control our destiny. You can put together a great growth plan, and then watch an algo update like MayDay shred it.

For the spammers, it’s like stealing candy from a sleeping Doberman. It’s easy until the Doberman wakes up.

Does your experience allow you to just look at a search result and almost instantly know why something is ranked? If so, what are the key things SEOs should study / work on to help gain that level of understanding?

I wish. There is always some pattern recognition that comes from experience (i.e. this is a collage site), but there are so many nuances in the code and off-page stuff that it’s not always instant, you just get better at knowing what to look for. The real learning comes from looking at pages that are ranking well for no obvious reason and seeing what they are doing. It’s no secret why apple is #1 for “ipod nano,” but what is that site I haven’t heard of doing right to get the #5 position? Also if we see a competitor suddenly see a step-function traffic lift we look to see what they changed/added that the SEs seem to be liking.

Back in 2006 you highlighted the rise of some of the MFA collage websites. In 2010 content mills are featured in the press almost every week. Are you surprised how far it has went & how long it has lasted?

I think Google actually likes folks like Demand Media. What they are doing is seeing where GG’s users are looking for something and not finding it, then plugging that hole. It may not be the Pulitzer Prize-winning content, but it allows users to find something and thus makes Google more useful and universal. When better content comes along those pages will slip down, but they serve a purpose in Google’s ecosystem.

Collage websites (stitch sites in Yahoo! parlance) are another story entirely. They add virtually no value and are pretty much spam IMO. The difficulty is in detecting and eradicating them as fast as they can be robo-created.

You mentioned looking at the aboutness of a site for when judging links. Do you think broad general search engines care about link relevancy?

Personally, I have not seen it have much of an impact, which is a shame. I think the main reason is that it is quite difficult for general SEs to judge which site relationships are meaningful, and which are not. For example, a golf course might get links from a real estate site; golf and real estate might be classified as very different verticals, but the links are quite relevant because the real estate agent is pointing out one of the benefits of the community. As a result link relevancy has become more about avoiding bad neighborhoods (3Ps, link farms, etc.) than finding good ones.

How important do you think temporal analysis is in judging the quality and authenticity of a link profile?

It’s certainly a red flag if a site gains too many links too quickly. The same is true if the profile of the links looks unnatural. If all your new links are coming from PR3-PR4 blog sites, something’s off. If bloggers are suddenly that interested in you wouldn’t a lot of PR0 comments exist, FB mentions, tweets, and a few higher PR press mentions? At Yahoo! sites that got a sudden upsurge in inlinks were classed as “spike” sites. Legit spike sites (ex. the website of some unknown who wins an Olympic medal) have typical hallmarks like temporally-linked mentions in media sites that you can’t buy access to (AP, NYT, Time, etc.). The spikes that are blackhatted look totally different.

In an interview a couple years ago Priyank Garg mentioned Yahoo! looked at the link's location on a page. Do you feel other search engines take this into account?

All of the major SEs have been doing boilerplate stripping for a while. They recognize footers, rail nav., etc. and look at those links differently. Also, SEs will only follow a limited number of links per page. They typically collect all the links, remove the checksum dups (note: if your links vary by even one parameter they will not be deduped at this phase), and follow the first N links from the code. None of the SEs will say exactly what N is, but it’s probably somewhere between 75 and 300 links (Google recommends you have <100). Put your important links high up in the code and save the header/footer stuff for further down.

What are some of the biggest advantages vertical search engines have over general search engines? As Google adds verticals, will they be able to build meaningful services that people prefer to use over leading vertical plays?

The big advantage of being a vertical search engine is the ability to limit the scope of the problem we’re trying to tackle. You can use a more focused taxonomy to provide a better experience, and present data in a way that is much more relevant than the 10 blue links. Sidestep is going to help me find the plane flight I want a lot easier than a Google search. The challenge is that the experience that you offer has to be dramatically better than Google. Google is easy, people know how to use it and it works for almost everything. Being 5% better at one thing won’t get anyone to switch behavior.

As Google adds verticals, it’s ironic that they are in a position in the browser similar to how I think of Microsoft historically on the desktop (link and leverage): they don’t need to win by being the best, they win by being the default. Google Product Search doesn’t have to provide a better user experience than say; it will get used because it gets placed prominently on the Google SERP.

At the upcoming SES you are speaking about meaningful SEO metrics. What are some of the least valuable metrics people still track heavily?

The one that jumps to mind is pages indexed. Depending on which GG servers you are hitting, that number is going to fluctuate, and I see people stress over those fluctuations when there is often no actual change. Also, getting indexed is virtually worthless; it’s getting ranked that’s valuable. It’s easy to get your “iPod” page indexed, getting a top10 ranking is another story. What’s the point of having 300,000 pages indexed if all your traffic is coming from 30 that have decent rankings? If you have pages that are indexed, but not ranking; either do some SEO for those pages (internal links, extra content, etc.) or NoIndex them and take them out of your sitemaps so other pages on your site get a chance.

Another is pageload time. Google has mentioned this as a ranking factor, but we really have not seen an impact. We focus on reducing latency, and loading search relevant content first (vs. headers or banner media), but that’s because it reduces abandonment rate not that it helps SEO.

What are some of the most valuable metrics which are not generally appreciated enough in the market?

The big one is revenue. Everything else is a means to this end; never lose sight of that.

The other is crawl rate (esp. from Google). This is a great leading indicator.


Thanks Jon! To hear more of Jon's insights on search check out his panel at San Francisco's SES conference next week.

Published: August 12, 2010 by Aaron Wall in interviews


August 12, 2010 - 5:43pm

1. checksum dups
2. GG
3. collage website

I can find out what the first one is but can I please have the definition to the other two? I've never heard either of the phrases before.

August 12, 2010 - 6:36pm

GG = Google

collage websites = sorta thin rehashed content the type of sites that scrape content from many sources and then just combine it without adding any unique editorial or value to it (basically imagine an autogenerated search result within a search result: likeso)

August 12, 2010 - 6:46pm

Thanks again.

August 12, 2010 - 6:16pm

Great questions and answers. 100% agree that G's big advantage is their brand. Yes, they typically have the best search results, but most of the people I interact with wouldn't know the difference.

Decent (and good) point about Demand Media -

Kris Day
August 12, 2010 - 6:30pm

I always learn a little something here. Thanks,

August 12, 2010 - 7:19pm

You're bringing us the advice of someone who chose not to be the next Google but instead be an SEO professional? IDK, Aaron... ;) just kidding u guys

August 12, 2010 - 7:31pm

"The real learning comes from looking at pages that are ranking well for no obvious reason and seeing what they are doing. It’s no secret why apple is #1 for “ipod nano,” but what is that site I haven’t heard of doing right to get the #5 position? Also if we see a competitor suddenly see a step-function traffic lift we look to see what they changed/added that the SEs seem to be liking."

Damn - excellent insight!

August 13, 2010 - 2:55am

Aaron, I wish you made it easy for me to share this post. At least a tweet button, yeah? Ok, I'll go tweet it anyway cuz it's that good, but I'd luv u lots more if it was easy. :) But yeah, well worth sharing. Good one!

August 13, 2010 - 9:21pm

From Business Insider:
Demand Media’s real advantage may be their article quality. They’ve set prices at a point that makes it uneconomic to write a lengthy, well-researched article; better to write something quick that covers the topic without ultimately answering the question. And that is the perfect way to create an article that is less compelling than the ads. Right now, I can see the Demand Media is offering $16.00 to write “How to Build an Acorn Skiff”. I could write 500 words on the subject, but if someone read what I had to say, and saw ads for an “Acorn Skiff Kit,” or “Acorn Skiff Instructions,” they’d probably opt for the ad.

Read more:

August 13, 2010 - 10:13pm

Brilliant citation Jonah. The leaving something missing piece is exactly how a person can create an AdSense page which yields a 10% 20% 30% 40% or even 50% clickthrough rate.

Add new comment

(If you're a human, don't change the following field)
Your first name.
(If you're a human, don't change the following field)
Your first name.
(If you're a human, don't change the following field)
Your first name.