I recently changed one of my robots.txt files pruning duplicate content pages to help more of the internal PageRank flow to the higher quality and better earning pages. In the process of doing that, I forgot that one of the most well linked to pages on the site had a similar URL as the noisy pages. About a week ago the site's search traffic halved (right after Google was unable to crawl and index the powerful URL). I fixed the error pretty quickly, but the site now has hundreds of pages stuck in Google's supplemental index, and I am out about $10,000 in profit for that one line of code! Both Google and Yahoo support wildcards, but you really have to be careful when changing a robots.txt file because a line like this
also blocks a file like this from being indexed in Google
Unless you are thinking of that in advance it is easy to make a mistake.
If you are trying to prune duplicate content for Google and are fine with it ranking in other search engines, you may want to make those directives specific for GoogleBot. If you make a directive for a specific robot, that bot will ignore your general robots directives in favor of following the more specific directives you created for it.
Google also offers a free robots.txt test tool, which allows you to see how robots will respond to your robots.txt file, notifying you of any files that are blocked.
You can use Xenu link sleuth to generate a list of URLs from your site. Upload that URL list to the Google robots.txt test tool (currently in 5,000 character chunks...an arbitrary limit I am sure they will eventually lift).
Inside the webmaster console Google will also show you what pages are currently blocked by your robots.txt file, and let you view when Google tried to crawl the page and noticed it was blocked. Google also shows you what pages are 404 errors, which might be a good way to see if you have any internal broken links or external links pointing at pages that no longer exist.
Jim Boykin recently offered tips to help webmasters understand how to audit a site to see what pages are the most link rich, how internal link equity flows around websites, and how to optimize your internal link architecture. In addition to Jim's tips, you can also improve your internal link structure by using some of the following tips.
Create Promotional Content Sections
The following ideas display social acceptance (which helps improve conversion) while also funneling PageRank at important pages without looking spammy.
heavily promote seasonal stuff in advance (internally and externally)
use sales data or other metrics to create a what's hot in this category and what's hot on our site section to flow more link equity to best sellers (these can be called anything like what's hot, top rated, etc)
create pages high in the site structure to support high value keywords that were only tangentially covered on lower level nodes
over-represent new content in your link structure to help it get indexed quickly, see how well it will rank, and learn how profitable it is
Internal to External Link Ratio
Doing theses sorts of things will still give you all the good karma and benefit that linking out does, while minimizing any downside caused by funneling a significant portion of your PageRank out of your site.
if you have a blog cross reference old posts where and when it makes sense
if you link out heavily on a page ensure you also place numerous internal links on the page
use breadcrumb navigation or other navigational schemes to help structure the site and improve the internal to external link ratio
if you have a ton of outbound site-wide links change some of them to only list them on a single page or section of your site
Keep the Noise Out of the Index
demoting an entire section of the site in the link structure if it has a lower ROI than other sections
use robots.txt and meta robots exclusion tags to prevent duplicate content and other low information or noisy pages from getting indexed
instead of using pagination try to display more content on each page
check your server logs for 404 errors. fix any broken links and redirect old linked to pages to their new locations
This idea may sound a bit complex until you visualize it as a keyword chart with an x and y axis.
Imagine that a, b, c, ... z are all good keywords.
Imagine that 1, 2, 3, ... 10 are all good keywords.
If you have a page on each subject consider placing the navigation for a through z in the sidebar while using links and brief descriptions for 1 through 10 as the content of the page. If people search for a 7 or b 9 that cross referencing page will be relevant for it, and if it is done well it does not look to spammy.
Since these types of pages can spread link equity across so many pages of different categories make sure they are linked to well high up in the site's structure. These pages works especially well for categorized content cross referenced by locations.
SEO Question: Do domain names play a role in SEO? Do search engines understand that the words are in the URL even if they are ran together without hyphens in between them? What techniques are best for registering a domain name that search engines like Google will like?
Answer: Over time the role of the domain name as an SEO tool has changed, but currently I think they carry a lot of weight for the associated exact match search. Depending on how they are leveraged going forward they may or may not continue to be a strong signal of quality to search engines.
Domain Names & Link Anchor Text
When I first got in the SEO game a good domain name was valuable because if you got the exact keywords you wanted to rank for in your name it made it easier to get anchor text related to what you wanted to rank for. For example, being seobook.com made it easier for me to rank for seo book and seo.
That link still exists, but nowhere near as strongly or broadly as it once did.
The Fall of Anchor Text & the Rise of Filters
Anchor text as an SEO technique is no secret. To make up for the long ongoing abuse of it, Google started placing less weight on anchor text AND creating more aggressive filters that would filter out sites that have a link profile that looked too spammy with too many inbound links containing the exact same anchor text. If everyone who links to me uses seo book as the anchor text it is much harder to consistently rank for that term than it would be if there was a more natural mixture to it. A natural mix would have some of the following
Natural link profiles also contain deep links to internal pages, whereas spammy sites tend to point almost all of their links at their home page.
Domain Names in Action
As Google started getting more aggressive at filtering anchor text, they started placing more weight on the domain name if the domain name exactly matched the keyword search query. They had to do this because they were filtering out too many brands for the search query attached to their brand. Some examples of how this works:
At one point, about 2 years back, SeoBook.com stopped ranking for seo book due to a wonky filter that also prevented Paypal for ranking for their own name for a little bit.
A friend recently 301 redirected an education site on a bad URL to a stronger domain name. The site's ranking for the exact phrase went from 100+ to top 20 in Google. But, it still is a long way from #1, and it still is at 100+ for the singular version. In competitive industries you need a lot of links to compete, and the redirect also caused the site to slip a bit for some of the other target keyword phrases that the site used to rank for.
When you launch a new site on a domain name like mykeywordphrase.com and get it a few trusted links it should almost immediately rank for mykeywordphrase. A friend launched a 3-word education site about a week ago. That site ranks #1 in Google right now for those keywords ran together. That site also just ranked #118 in Google for the phrase with the words spread apart. As the site ages and gets more links it should be easier to rank for that exact phrase (but that domain probably wouldn't help its rankings much for stuff like the root sub-phrase).
My domain name Search Engine History.com ranked better than it should have for the query search engine history when its only real signs of trust were age and domain name. It was nowhere in the rankings for just about any other query.
Things Will Change Over Time
A few other caveats worth noting
From my experience this exact match domain bonus works with all domain extensions (even .info), but that could change over time. And if the content isn't any good it is still going to be hard to get traction in any market worth developing content for. This exact match domain bonus also works well in local markets for regional domains like .ca.
This post is about the current market, and is highly focused on Google's relevancy algorithms (rather than other search engines). I expect the weight on domain names to be lowered significantly (especially for competitive queries) as Google moves toward incorporation more usage data into their relevancy algorithms. This is especially true if many domainers put up low quality to average quality websites on premium domain names. Moves like creating 100,000 keyword laden sites in one massive push (as Marchex recently did) don't bode well for the future of domain names as a signal of quality.
The search traffic trends are moving toward consolidating traffic onto the largest high authority sites, so it probably is not a good idea to have 100 deep niche domain names like OnlineHealthcareDegrees.org, OnlineNurseDegrees.net, OnlineNursingSchools.com, OnlineLawDegrees.com, OnlineParalegalDegrees.net etc when you can cover a lot of those topics with a singular broad domain like Online Degrees.org.
Any advantage exact match domains seem to have for ranking is much smaller for related phrases that do not exactly match the keyword string or phrases within the anchor text of most of the inbound links.
For local businesses a keyword matching domain might be a way around paying to list in all the regional directories and other related arbitrage plays.
Domains that use familiar language and sound credible also have a resonance that helps build trust, make the information seem more credible, easier to link at, easier to syndicate, and easier to do business with. It is hard to estimate the value of that since much of it is indirect, and few have measured the affect of domain name on linkability or clickability of a listing outside of paid search arbitrage.
As an SEO one of our primary goals is to get more search traffic for targeted search terms. Search traffic is typically far more valuable than other traffic sources because it is so targeted. But non-search traffic is perhaps the single most reliable sign of quality. As Google controls a larger portion of the overall traffic flow across the web, they risk creating self fulfilling prophecies where low quality sites continue to rank only because they already rank.
If you were Google, and discovered that 98% of a site's traffic comes from Google.com might you want to give that site a bit less exposure? I would. Maybe those algorithms do not exist now, but eventually they could.
If you have a site that earns far beyond your living costs, and it is almost entirely reliant on search for income, then one of the best moves you can make for the sustainability of that site is to lower the percentage of traffic that comes from search by creating other traffic sources. The other traffic sources may not be as profitable on a CPM basis, but as you diversify you lower your risks. It doesn't matter how the algorithms shift if your site is strong in every signal of trust they could possibly measure.
Frank mentioned this NYP article about how some companies are buying sites outright rather than increasing their AdWords bid prices. I expect this to be a large and growing trend for at least a couple years. As Google gets more efficient at pricing the ads they increase the value of the top ranked sites that sit alongside those ads. Internet Search Metrics, quoted in the NYP article as Internet Search Management, is providing audits on the competitive landscape of search
ISM's audits track the top 4.5 million search phrases on Google and Yahoo!, a total of 7.3 billion searches a month, to determine which companies across 50 business sectors pop up most frequently in the top three or four positions in natural search. ...
The ISM audits, to be released in London, break down which of 50 business sectors are locked up - that is, have large chunks of natural search dominated by a handful of companies - and which are wide open.
I have not yet seen any of the reports, but the network is still young. If you love marketing, are in tune with web trends, and are well funded I am guessing that many of the markets that appear locked up are still wide open.
Creating Shadow Brands & Buying Top Ranked Competing Sites
While small businesses are worried about the risks of buying or renting a few links, some large corporations are launching shadow brands or buying out competing domains en mass. There are thousands or millions of other examples, so it is unfair for me to point any out, but here are a few for the sake of argument.
How many different verticals does Yahoo! cover the Nintendo Wii in? Off the top of my head, at least 9: their brand universe, yahoo tech, yahoo shopping, yahoo news, yahoo directory, ask yahoo, yahoo answers, videogames.yahoo, games.yahoo, etc. (and that doesn't even count geolocal subdomains for answers, shopping, etc.)
What happened to result diversity? When and why did Google stop caring about that?
Why is buying links bad, when using infinite domains or buying a bunch of sites are both legitimate? Why is it ok for the WSJ to publish this type of content, but wrong for me to do whatever necessary to compete in a marketplace cluttered with that information pollution?
The point here is not to say that big businesses are bad or doing anything wrong, but to show the stupidity Google is relying on when they scaremonger newer and smaller webmasters about the risks of buying a link here or there. The big businesses do all of the above, gain more organic links by being well known, and still buy links because the techniques works. Whatever Google ranks is what people will create more of, so long as it is profitable to do so.
If you create a real brand you can buy more links and be far spammier with your optimization with a lower risk profile, because Google has to rank your site or they lose marketshare. Create something that is best of breed and then market the hell out of it. If marketing requires buying a few links then open up the wallet and get ready to rank.
What happens to the value of your content when search engines get better at providing answers directly in the search results? Is your site the type of site they would like to cite, or does it fall further down the list on another category of queries? What can you do to make them more likely to want to source your site? Does your site have enough perceived trust and value to draw clicks after they put your content directly in the search results?
As search engines work harder at things like universal search, search personalization, and cyc any sites which are only facts and filler won't get much exposure.
Some top ranking sites do not deserve to. When one is lucky enough to be in such a situation it allows us to get away with being lazy, because a site does not have to be too efficient to make money if it is well represented for targeted search queries that send free traffic. But every website has upside potential, even if it already ranks #1.
Improve Internal Navigation & Usability
One client of mine only ranks below his official manufacturer for their name. His site had inadequate internal navigation. I took a day to improve the navigation, and the result was a 150% increase in sales. The last 8 days of last month sold nearly as much as the rest of the month combined. His business model looked like it was about to die, but that one day of work made it functional for at least a couple more years.
High Profit Parallel Markets
I had a site which made a couple hundred dollars a day that was well established in its market, but did not dominate it. Taking the path of lowest resistance, I branched the site into two parallel markets of greater commercial interest where the competition was weaker. On an investment which is less than what the site earns in a month now I was able to increase its income 5x, without even doing much link building.
An Undeserved Ranking
One of my friends is in a high profit market where the competition is absolutely clueless. Basic SEO brought that site to a #1 ranking in Google. The site is highly conversion oriented and makes great income, but now that it already ranks it probably makes sense to reinvest some of the profit into improving content quality and reinforcing that market position. Businesses that do not reinvest eventually fall, especially if they are winning only because the competition is clueless. After spending a couple thousand dollars a day on AdWords eventually the competitors will start to look into SEO.
The Value of Branding
If a site ranks #1, and is monetized via PPC ads, it still might only make a portion of what it should because AdSense is not as efficient as some people would lead you to believe. If a site is strong enough to attract brand advertisers they will pay a premium just for getting their brand seen. Scraper sites and thin content sites don't attract brand advertisers, even if they convert. I have seen a site that was making $80 a month on AdSense make over $10,000 a month selling brand advertisements.
By the time people are looking to automate a no cost SEO technique, as a competitive strategy it is already dead. Blog spamming was once highly effective, but when commercial blog comment spam software was available the practice already stopped working in Google.
Automated Article Submission Software
At SMX advanced a Yahoo! engineer noted that if they detect content as duplicate they are less likely to trust it to seed crawling other documents. People are pushing article submission software to submit articles to article directories, but if most of the content on an article directory site is duplicate, marketers are pushing spamming them via an automated system, and the content networks accept automated submissions, obviously this is not going to be a clean and trusted part of the web that you can go back to again and again. Maybe it is good to try here or there for a bit of anchor text or other market testing, but it is probably not worth automating and doing on a mass scale, especially if the site lacks important signs of quality.
Hundreds of Engineers Work to Kill Spamming Techniques
The spam detection and anti-spam algorithms are driven by people. If something is commonplace in a market then the search engines try their best to stop it. If they can automate it they will. If they have to demote it manually they will.
In the second video here Matt Cutts talked about how spam prevention methods may be different based on language, country, or even market...noting that many real estate sites rely too much on reciprocal link spam.
The less your site's marketing methods look like spam and the harder it is to duplicate what you have done the less likely you are to get hurt by the next update. By the time there is a mass market automated spamming solution the technique is already dead.
[Tim] Mayer reminded that what's relevant for a query can often change over time. Google's Udi Manber, vice president of engineering, made similar remarks when I spoke with him about human-crafted results when I was visiting at Google yesterday.
One example he pointed out was how Google's human quality reviewers -- people that Google pays to provide a human double-check on the quality of its results, so they can then better tune the search algorithm -- started to downgrade results for [cars] when information about the movie Cars started turning up. The algorithm had picked up that the movie was important to that term before some of the human reviewers were aware of it.
Obviously human review is used at all major search engines, but even when outsourcing reviews humans have limits just like with producing content. Even if Google has 10,000 quality raters those people can only be trained to find and rate certain things.