Subscription Fatigue

Subscription Management

I have active subscriptions with about a half-dozen different news & finance sites along with about a half dozen software tools, but sometimes using a VPN or web proxy across different web browsers makes logging in to all of them & clearing cookies for some paywall sites a real pain.

If you don't subscribe to any outlets then subscribing to an aggregator like Apple News+ can make a lot of sense, but it is very easy to end up with dozens of forgotten subscriptions.

Winner-take-most Market Stratification

The news business is coming to resemble other tech-enabled businesses where a winner takes most. The New York Times stock, for instance, is trading at 15 year highs & they recently announced they are raising subscription prices:

The New York Times is raising the price of its digital subscription for the first time, from $15 every four weeks to $17 — from about $195 to $221 a year.

With a Trump re-election all but assured after the Russsia, Russia, Russia garbage, the party-line impeachment (less private equity plunderer Mitt Romney) & the ridiculous Iowa primary, many NYT readers will pledge their #NeverTrumpTwice dollars with the New York Times.

If you think politics looks ridiculous today, wait until you see some of the China-related ads in a half-year as the 2019 novel coronavirus spreads around the world.

Arresting a doctor who warned about the outbreak doesn't have good optics, particularly after hundreds of other deaths piled up from it & when he later died from from the virus.

The optics keep getting worse.

How does a broad-based news site compete with the user generated Tweets in such a zone?

And any widely known individual journalist who builds a large audience might get disappeared.

Twitter recently surpassed $1 billion in quarterly revenues, but time spent on Twitter is time not spent on other news websites.

McClatchy filed for chapter 11 bankruptcy. Outside of a few core winners, the news business online has been so brutal that even Warren Buffett is now a seller. As the economics get uglier news sites get more extreme with ad placements, user data sales, and pushing subscriptions. Some of these aggressive monetization efforts make otherwise respectable news outlets look like part of a very downmarket subset of the web.

Users Fight Back

Users have thus adopted to blocking ads & are also starting to ramp up blocking paywall notifications.

Each additional layer of technological complexity is another cost center publishers have to fund, often through making the user experience of their sites worse, which in turn makes their own sites less differentiated & inferior to the copies they have left across the web (via AMP, via Facebook Instant Articles, syndication in Apple News or on various portal sites like MSN or Yahoo!).

A Web Browser For Every Season

Google Chrome is spyware, so I won't recommend installing that.

Here Google's official guide on how to remove the spyware.

The easiest & most basic solution which works across many sites using metered paywalls is to have multiple web browsers installed on your computer. Have a couple browsers which are used exclusively for reading news articles when they won't show up in your main browser & set those web browsers to delete cookies on close. Or open the browsers in private mode and search for the URL of the page from Google to see if that allows access.

  • If you like Firefox there are other iterations from other players like Pale Moon, Comodo IceDragon or Waterfox using their core.
  • If you like Google Chrome then Chromium is the parallel version of it without the spyware baked in. The Chromium project is also the underlying source used to build about a dozen other web browsers including: Opera, Vivaldi, Brave, Cilqz, Blisk, Comodo Dragon, SRWare Iron, Yandex Browser & many others. Even Microsoft recently switched their Edge browser to being powered by the Chromium project. The browsers based on the Chromium store allow you to install extensions from the Chrome web store.
  • Some web browsers monetize users by setting affiliate links on the home screen and/or by selling the default search engine recommendation. You can change those once and they'll typically stick with whatever settings you use.
  • For some browsers I use for regular day to day web use I set them up to continue session on restart, and I have a session manager plugin like this one for Firefox or this one for Chromium-based browsers. For browsers which are used exclusively for reading paywall blocked articles I set them up to clear cookies on restart.

Bypassing Paywalls

Many web browsers have a "reader" mode available which bypasses some JavaScript overlays that obfuscate content.

Here is a picture of the Mozilla Firefox reading mode button and how a website appears after clicking it.

There are also a couple solid web browser plugins built specifically for bypassing paywalls.

Academic Journals

Unpaywall is an open database of around 25,000,000 free scholarly articles. They provide extensions for Firefox and Chromium based web browsers on their website.

News Articles

There is also one for news publications called bypass paywalls.

  • Mozilla Firefox: To install the Firefox version go here.
  • Chrome-like web browsers: To install the Chrome version of the extension in Opera or Chromium or Microsoft Edge you can download the extension here, enter developer mode inside the extensions area of your web browser & install extension. To turn developer mode on, open up the drop down menu for the browser, click on extensions to go to the extension management area, and then slide the "Developer mode" button to the right so it is blue.

Regional Blocking

If you travel internationally some websites like YouTube or Twitter or news sites will have portions of their content restricted to only showing in some geographic regions. This can be especially true for new sports content and some music.

These can be bypassed by using a VPN service like NordVPN, ExpressVPN, Witopia or IPVanish. Some VPN providers also sell pre-configured routers. If you buy a pre-configured router you can use an ethernet switch or wifi to switch back and forth between the regular router and the VPN router.

You can also buy web proxies & enter them into the Foxy Proxy web browser extension (Firefox or Chromium-compatible) with different browsers set to default to different country locations, making it easier to see what the search results show in different countries & cities quickly.

If you use a variety of web proxies you can configure some of them to work automatically in an open source rank tracking tool like Serposcope.

The Future of Journalism

I think the future of news is going to be a lot more sites like Ben Thompson's Stratechery or Jessica Lessin's TheInformation & far fewer broad/horizontal news organizations. Things are moving toward the 1,000 true fans or perhaps 100 true fans model:

This represents a move away from the traditional donation model—in which users pay to benefit the creator—to a value model, in which users are willing to pay more for something that benefits themselves. What was traditionally dubbed “self-help” now exists under the umbrella of “wellness.” People are willing to pay more for exclusive, ROI-positive services that are constructive in their lives, whether it’s related to health, finances, education, or work. In the offline world, people are accustomed to hiring experts across verticals

A friend of mine named Terry Godier launched a conversion-oriented email newsletter named Conversion Gold which has done quite well right out of the gate, leading him to launch IndieMailer, a community for paid newsletter creators.

The model which seems to be working well for those sorts of news sites is...

  • stick to a tight topic range
  • publish regularly at a somewhat decent frequency like daily or weekly, though have a strong preference to quality & originality over quantity
  • have a single author or a small core team which does most the writing and expand editorial hiring slowly
  • offer original insights & much more depth of coverage than you would typically find in the mainstream news
  • Rely on Wordpress or a low-cost CMS & billing technology partner like Substack, Memberful, sell on a marketplace like Udemy, Podia or Teachable, or if they have a bit more technical chops they can install aMember on their own server. One of the biggest mistakes I made when I opened up a membership site about a decade back was hand rolling custom code for memberhsip management. At one point we shut down the membership site for a while in order to allow us to rip out all that custom code & replace it with aMember.
  • Accept user comments on pieces or integrate a user forum using something like Discord on a subdomain or a custom Slack channel. Highlight or feature the best comments. Update readers to new features via email.
  • Invest much more into obtaining unique data & sources to deliver new insights without spending aggressively to syndicate onto other platforms using graphical content layouts which would require significant design, maintenance & updating expenses
  • Heavily differentiate your perspective from other sources
  • maintain a low technological maintenance overhead
  • low cost monthly subscription with a solid discount for annual pre-payment
  • instead of using a metered paywall, set some content to require payment to read & periodically publish full-feature free content (perhaps weekly) to keep up awareness of the offering in the broader public to help offset churn.

Some also work across multiple formats with complimentary offerings. The Ringer has done well with podcasts & Stratechery also has the Exponent podcast.

There are a number of other successful online-only news subscription sites like TheAthletic & Bill Bishop's Sinocism newsletter about China, but I haven't subscribed to them yet. Many people support a wide range of projects on platforms like Patreon & sites like MasterClass with an all-you-can-eat subscription will also make paying for online content far more common.

Favicon SEO

Google recently copied their mobile result layout over to desktop search results. The three big pieces which changed as part of that update were

  • URLs: In many cases Google will now show breadcrumbs in the search results rather than showing the full URL. The layout no longer differentiates between HTTP and HTTPS. And the URLs shifted from an easily visible green color to a much easier to miss black.
  • Favicons: All listings now show a favicon next to them.
  • Ad labeling: ad labeling is in the same spot as favicons are for organic search results, but the ad labels are a black which sort of blends in to the URL line. Over time expect the black ad label to become a lighter color in a way that parallels how Google made ad background colors lighter over time.

One could expect this change to boost the CTR on ads while lowering the CTR on organic search results, at least up until users get used to seeing favicons and not thinking of them as being ads.

The Verge panned the SERP layout update. Some folks on Reddit hate this new layout as it is visually distracting, the contrast on the URLs is worse, and many people think the organic results are ads.

I suspect a lot of phishing sites will use subdomains patterned off the brand they are arbitraging coupled with bogus favicons to try to look authentic. I wouldn't reconstruct an existing site's structure based on the current search result layout, but if I were building a brand new site I might prefer to put it at the root instead of on www so the words were that much closer to the logo.

Google provides the following guidelines for favicons

  • Both the favicon file and the home page must be crawlable by Google (that is, they cannot be blocked to Google).
  • Your favicon should be a visual representation of your website's brand, in order to help users quickly identify your site when they scan through search results.
  • Your favicon should be a multiple of 48px square, for example: 48x48px, 96x96px, 144x144px and so on. SVG files, of course, do not have a specific size. Any valid favicon format is supported. Google will rescale your image to 16x16px for use in search results, so make sure that it looks good at that resolution. Note: do not provide a 16x16px favicon.
  • The favicon URL should be stable (don’t change the URL frequently).
  • Google will not show any favicon that it deems inappropriate, including pornography or hate symbols (for example, swastikas). If this type of imagery is discovered within a favicon, Google will replace it with a default icon.

In addition to the above, I thought it would make sense to provide a few other tips for optimizing favicons.

  • Keep your favicons consistent across sections of your site if you are trying to offer a consistent brand perception.
  • In general, less is more. 16x16 is a tiny space, so if you try to convey a lot of information inside of it, you'll likely end up creating a blob that almost nobody but you recognizes.
  • It can make sense to include the first letter from a site's name or a simplified logo widget as the favicon, but it is hard to include both in a single favicon without it looking overdone & cluttered.
  • A colored favicon on a white background generally looks better than a white icon on a colored background, as having a colored background means you are eating into some of the scarce pixel space for a border.
  • Using a square shape versus a circle gives you more surface area to work with.
  • Even if your logo has italics on it, it might make sense to avoid using italics in the favicon to make the letter look cleaner.

Here are a few favicons I like & why I like them:

  • Citigroup - manages to get the word Citi in there while looking memorable & distinctive without looking overly cluttered
  • Nerdwallet - the N makes a great use of space, the colors are sharp, and it almost feels like an arrow that is pointing right
  • Inc - the bold I with a period is strong.
  • LinkedIn - very memorable using a small part of the word from their logo & good color usage.

Some of the other memorable ones that I like include: Twitter, Amazon, eBay, Paypal, Google Play & CNBC.

Here are a few favicons I dislike & why

  • Wikipedia - the W is hard to read.
  • USAA - they included both the logo widget and the 4 letters in a tiny space.
  • Yahoo! - they used inconsistent favicons across their sites & use italics on them. Some of the favicons have the whole word Yahoo in them while the others are the Y! in italics.

If you do not have a favicon Google will show a dull globe next to your listing. Real Favicon Generator is a good tool for creating favicons in various sizes.

What favicons do you really like? Which big sites do you see that are doing it wrong?

Brands vs Ads

Brand, Brand, Brand

About 7 years ago I wrote about how the search relevancy algorithms were placing heavy weighting on brand-related signals after Vince & Panda on the (half correct!) presumption that this would lead to excessive industry consolidation which in turn would force Google to turn the dials in the other direction.

My thesis was Google would need to increasingly promote some smaller niche sites to make general web search differentiated from other web channels & minimize the market power of vertical leading providers.

The reason my thesis was only half correct (and ultimately led to the absolutely wrong conclusion) is Google has the ability to provide the illusion of diversity while using sort of eye candy displacement efforts to shift an increasing share of searches from organic to paid results.

Shallow Verticals With a Shill Bid

As long as any market has at least 2 competitors in it Google can create a "me too" offering that they hard code front & center and force the other 2 players (along with other players along the value chain) to bid for marketshare. If competitors are likely to complain about the thinness of the me too offering & it being built upon scraping other websites, Google can buy out a brand like Zagat or a data supplier like ITA Software to undermine criticism until the artificially promoted vertical service has enough usage that it is nearly on par with other players in the ecosystem.

Google need not win every market. They only need to ensure there are at least 2 competing bids left in the marketplace while dialing back SEO exposure. They can then run other services to redirect user flow and force the ad buy. They can insert their own bid as a sort of shill floor bid in their auction. If you bid below that amount they'll collect the profit through serving the customer directly, if you bid above that they'll let you buy the customer vs doing a direct booking.

Adding Volatility to Economies of Scale

Where this gets more than a bit tricky is if you are a supplier of third party goods & services where you buy in bulk to get preferential pricing for resale. If you buy 100 rooms a night from a particular hotel based on the presumption of prior market performance & certain channels effectively disappear you have to bid above market to sell some portion of the rooms because getting anything for them is better than leaving them unsold.

"Well I am not in hotels, so thankfully this won't impact me" is an incomplete thought. Google Ads now offer a lead generation extension.

Dipping a bit back into history here, but after Groupon said no to Google's acquisition offer Google promptly partnered with players 2 through n to ensure Groupon did not have a lasting competitive advantage. In the fullness of time most those companies died, LivingSocial was acquired by Groupon for nothing & Groupon is today worth less than the amount they raised in VC & IPO funding.

Markets Naturally Evolve Toward Promoting Brands

When a vertical is new a player can compete just by showing up. Then over time as the verticals become established consumers develop habits, brands beat out generics & the markets get consolidated down to being heavily influenced & controlled by a couple strong players.

In the offline world of atoms there are real world costs tied to local regulations, shipping, sourcing, supply chains, inventory management, etc. The structure of the web & the lack of marginal distribution cost causes online markets to be even more consolidated than their offline analogs.

When Travelocity outsourced their backend infrastructure to Expedia most people visiting their website were unaware of the change. After Expedia acquired the site, longtime Travelocity customers likely remained unaware. In some businesses the only significant difference in the user experience is the logo at the top of the page.

Most large markets will ultimately consolidate down to a couple players (e.g. Booking vs Expedia) while smaller players lack the scale needed to have the economic leverage to pay Google's increasing rents.

This sort of consolidation was happening even when the search results were mostly organic & relevancy was driven primarily by links. As Google has folded in usage data & increased ad load on the search results it becomes harder for a generically descriptive domain name to build brand-related signals.

Re-sorting the Markets Once More

It is not only generically descriptive sorts of sites that have faded though. Many brand investments turned out to be money losers after the search result set was displaced by more ads (& many brand-related search result pages also carry ads above the organic results).

The ill informed might write something like this:

Since the Motorola debacle, it was Google's largest acquisition after the $676 million purchase of ITA Software, which became Google Flights. (Uh, remember that? Does anyone use that instead of Travelocity or one of the many others? Neither do I.)

The reality is brands lose value as the organic result set is displaced. To make the margins work they might desperately outsource just about everything but marketing to a competitor / partner, which will then latter acquire them for a song.

Travelocity had roughly 3,000 people on the payroll globally as recently as a couple of years ago, but the Travelocity workforce has been whittled to around 50 employees in North America with many based in the Dallas area.

The best relevancy algorithm in the world is trumped by preferential placement of inferior results which bypasses the algorithm. If inferior results are hard coded in placements which violate net neutrality for an extended period of time, they can starve other players in the market from the vital user data & revenues needed to reinvest into growth and differentiation.

Value plays see their stocks crash as growth slows or goes in reverse. With the exception of startups funded by Softbank, growth plays are locked out of receiving further investment rounds as their growth rate slides.

Startups like Hipmunk disappear. Even an Orbitz or Travelocity become bolt on acquisitions.

The viability of TripAdvisor as a stand alone business becomes questioned, leading them to partner with Ctrip.

TripAdvisor has one of the best link profiles of any commercially oriented website outside of perhaps Amazon.com. But ranking #1 doesn't count for much if that #1 ranking is below the fold. Or, even worse, if Google literally hides the organic search results.

TripAdvisor shifted their business model to allow direct booking to better monetize mobile web users, but as Google has ate screen real estate and grew Google Travel into a $100 billion business other players have seen their stocks sag.

Top of The Funnel

Google sits at the top of the funnel & all other parts of the value chain are compliments to be commoditized.

  • Buy premium domain names? Google's SERPs test replacing domain names with words & make the words associated with the domain name gray.
  • Improve conversion rates? Your competitor almost certainly did as well, now you both can bid more & hand over an increasing economic rent to Google.
  • Invest in brand awareness? Google shows ads for competitors on your brand terms, forcing you to buy to protect the brand equity you paid to build.

Search Metrics mentioned Hotels.com was one of the biggest losers during the recent algorithm updates: "I’m going to keep on this same theme there, and I’m not going to say overall numbers, the biggest loser, but for my loser I’m going to pick Hotels.com, because they were literally like neck and neck, like one and two with Booking, as far as how close together they were, and the last four weeks, they’ve really increased that separation."

As Google ate the travel category the value of hotel-related domain names has fallen through the floor.

Most of the top selling hotel-related domain names were sold about a decade ago:

On August 8th HongKongHotels.com sold for $4,038. A decade ago that name likely would have sold for around $100,000.

And the new buyer may have overpaid for it!

Growing Faster Than the Market

Google consistently grows their ad revenues 20% a year in a global economy growing at under 4%.

There are only about 6 ways they can do that

  • growth of web usage (though many of those who are getting online today have a far lower disposable income than those who got on a decade or two ago did)
  • gain marketshare (very hard in search, given that they effectively are the market in most markets outside of a few countries like China & Russia)
  • create new inventory (new ad types on image search results, Google Maps & YouTube)
  • charge more for clicks
  • improve at targeting through better surveillance of web users (getting harder after GDPR & similar efforts from some states in the next year or two)
  • shift click streams away from organic toward paid channels (through larger ads, more interactive ad units, less appealing organic result formatting, pushing organic results below the fold, hiding organic results, etc.)

Six of One, Half-dozen of the Other

Wednesday both Expedia and TripAdvisor reported earnings after hours & both fell off a cliff: "Both Okerstrom and Kaufer complained that their organic, or free, links are ending up further down the page in Google search results as Google prioritizes its own travel businesses."

Losing 20% to 25% of your market cap in a single day is an extreme move for a company worth billions of dollars.

Thursday Google hit fresh all time highs.

"Google’s old motto was ‘Don’t Be Evil’, but you can’t be this big and profitable and not be evil. Evil and all-time highs pretty much go hand in hand." - Howard Lindzon

Booking held up much better than TripAdvisor & Expedia as they have a bigger footprint in Europe (where antitrust is a thing) and they have a higher reliance on paid search versus organic.

Frozen in Fear vs Fearless

The broader SEO industry is to some degree frozen by fear. Roughly half of SEOs claim to have not bought *ANY* links in a half-decade.

Long after most of the industry has stopped buying links some people still run the "paid links are a potential FTC violation guideline" line as though it is insightful and/or useful.

Ask the people carrying Google's water what they think of the official FTC guidance on poor ad labeling in search results and you will hear the beautiful sound of crickets chirping.

Where is the ad labeling in this unit?

Does small gray text in the upper right corner stating "about these results" count as legitimate ad labeling?

And then when you scroll over that gray text and click on it you get "Some of these hotel search results may be personalized based on your browsing activity and recent searches on Google, as well as travel confirmations sent to your Gmail. Hotel prices come from Google's partners."

Ads, Scroll, Ads, Scroll, Ads...

Zooming out a bit further on the above ad unit to look at the entire search result page, we can now see the following:

  • 4 text ad units above the map
  • huge map which segments demand by price tier, current sales, luxury, average review, geographic location
  • organic results below the above wall of ads, and the number of organic search results has been reduced from 10 to 7

How many scrolls does one need to do to get past the above wall of ads?

If one clicks on one of the hotel prices the follow up page is ... more ads.

Check out how the ad label is visually overwhelmed by a bright blue pop over.

Defund

It is worth noting Google Chrome has a built-in ad blocking feature which allows them to strip all ads from displaying on third party websites if they follow Google's best practices layout used in the search results.

You won't see ads on websites that have poor ad experiences, like:

  • Too many ads
  • Annoying ads with flashing graphics or autoplaying audio
  • Ad walls before you can see content

When these ads are blocked, you'll see an "Intrusive ads blocked" message. Intrusive ads will be removed from the page.

The following 4 are all true:

And, as a bonus, to some paid links are a crime but Google can sponsor academic conferences for market regulators while requesting the payments not be disclosed.

Excessive Profits = Spam

Hotels have been at the forefront of SEO for many years. They drive massive revenues & were perhaps the only vertical ever referenced in the Google rater guidelines which explicitly stated all affiliate sites should be labeled as spam even if they are helpful to users.

Google has won most of the profits in the travel market & so they'll need to eat other markets to continue their 20% annual growth.

As they grow, other markets disappear.

"It's a bug that you could rank highly in Google without buying ads, and Google is trying to fix the bug." - Googler John Rockway, January 31, 2012

Some people who market themselves as SEO experts not only recognize this trend but even encourage this sort of behavior:

Zoopla, Rightmove and On The Market are all dominant players in the industry, and many of their house and apartment listings are duplicated across the different property portals. This represents a very real reason for Google to step in and create a more streamlined service that will help users make a more informed decision. ... The launch of Google Jobs should not have come as a surprise to anyone, and neither should its potential foray into real estate. Google will want to diversify its revenue channels as much as possible, and any market that allows it to do so will be in its sights. It is no longer a matter of if they succeed, but when.

If nobody is serving a market that is justification for entering it. If a market has many diverse players that is justification for entering it. If a market is dominated by a few strong players that is justification for entering it. All roads lead to the pile of money. :)

Extracting information from the ecosystem & diverting attention from other players while charging rising rents does not make the ecosystem stronger. Doing so does not help users make a more informed decision.

Information as a Vertical

The dominance Google has in core profitable vertical markets also exists in the news & general publishing categories. Some publishers get more traffic from Google Discover than from Google search. Publishers which try to turn off Google's programmatic ads find their display ad revenues fall off a cliff:

"Nexstar Media Group Inc., the largest local news company in the U.S., recently tested what would happen if it stopped using Google’s technology to place ads on its websites. Over several days, the company’s video ad sales plummeted. “That’s a huge revenue hit,” said Tony Katsur, senior vice president at Nexstar. After its brief test, Nexstar switched back to Google." ... "Regulators who approved that $3.1 billion deal warned they would step in if the company tied together its offerings in anticompetitive ways. In interviews, dozens of publishing and advertising executives said Google is doing just that with an array of interwoven products."

News is operating like many other (broken) markets. The Salt Lake Tribune converted to a nonprofit organization.

Many local markets have been consolidated down to ownership by a couple private equity shop roll ups looking to further consolidate the market. Gatehouse Media acquired Gannett & has a $1.8 billion mountain of debt to pay off.

McClatchy - the second largest domestic newspaper chain - may soon file for bankruptcy:

there’s some nuance in this new drama — one of many to come from the past decade’s conversion of news companies into financial instruments stripped of civic responsibility by waves of outside money men. After all, when we talk about newspaper companies, we typically use their corporate names — Gannett, GateHouse, McClatchy, MNG, Lee. But it’s at least as appropriate to use the names of the hedge funds, private equity companies, and other investment vehicles that own and control them.

The Washington Post - owned by Amazon's Jeff Bezos - is creating an ad tech stack which serves other publishers & brands, though they also believe a reliance on advertiser & subscription revenue is unsustainable: “We are too beholden to just advertiser and subscriber revenue, and we’re completely out of our minds if we think that’s what’s going to be what carries us through the next generation of publishing. That’s very clear.”

Future Prospects

We are nearing inflection points in many markets where markets that seemed somewhat disconnected from search will still end up being dominated by Google. Gmail, Android, Web Analytics, Play Store, YouTube, Maps, Waze ... are all additional points of leverage beyond the core search & ads products.

If all roads lead to money one can't skip healthcare - now roughly 20% of the United States GDP.

Google scrubbed many alternative health sites from the search results. Some of them may have deserved it. Others were perhaps false positives.

Google wants to get into the healthcare market in a meaningful way. Google bought Fitbit and partnered with Ascension on a secret project gathering health information on over 50 million Americans.

Google is investing heavily in quantum computing. Google Fiber was a nothingburger to force competing ISPs into accelerating expensive network upgrades, but beaming in internet services from satellites will allow Google to bypass local politics, local regulations & heavy network infrastructure construction costs. A startup named Kepler recently provided high-bandwidth connectivity to the Arctic. When Google launches a free ISP there will be many knock on effects causing partners to long for the day where Google was only as predatory as they are today.

"Capitalism is an efficient system for surfacing and addressing the needs of consumers. But once it veers toward control over markets by a single entity, those benefits disappear." - Seth Godin

Internet Wayback Machine Adds Historical TextDiff

The Wayback Machine has a cool new feature for looking at the historical changes of a web page.

The color scale shows how much a page has changed since it was last cached & you can select between any two documents to see how a page has changed over time.

You can then select between any two documents to see a side-by-side comparison of the documents.

That quickly gives you an at-a-glance view of how they've changed their:

  • web design
  • on-page SEO strategy
  • marketing copy & sales strategy

For sites that conduct seasonal sales & rely heavily on holiday themed ads you can also look up the new & historical ad copy used by large advertisers using tools like Moat, WhatRunsWhere & Adbeat.

Dofollow, Nofollow, Sponsored, UGC

A Change to Nofollow

Last month Google announced they were going to change how they treated nofollow, moving it from a directive toward a hint. As part of that they also announced the release of parallel attributes rel="sponsored" for sponsored links & rel="ugc" for user generated content in areas like forums & blog comments.

Why not completely ignore such links, as had been the case with nofollow? Links contain valuable information that can help us improve search, such as how the words within links describe content they point at. Looking at all the links we encounter can also help us better understand unnatural linking patterns. By shifting to a hint model, we no longer lose this important information, while still allowing site owners to indicate that some links shouldn’t be given the weight of a first-party endorsement.

In many emerging markets the mobile web is effectively the entire web. Few people create HTML links on the mobile web outside of on social networks where links are typically nofollow by default. This reduces the potential signal available to either tracking what people do directly and/or shifting how the nofollow attribute is treated.

Google shifting how nofollow is treated is a blanket admission that Penguin & other elements of "the war on links" were perhaps a bit too effective and have started to take valuable signals away from Google.

Google has suggested the shift in how nofollow is treated will not lead to any additional blog comment spam. When they announced nofollow they suggested it would lower blog comment spam. Blog comment spam remains a growth market long after the gravity of the web has shifted away from blogs onto social networks.

Changing how nofollow is treated only makes any sort of external link analysis that much harder. Those who specialize in link audits (yuck!) have historically ignored nofollow links, but now that is one more set of things to look through. And the good news for professional link auditors is that increases the effective cost they can charge clients for the service.

Some nefarious types will notice when competitors get penalized & then fire up Xrummer to help promote the penalized site, ensuring that the link auditor bankrupts the competing business even faster than Google.

Links, Engagement, or Something Else...

When Google was launched they didn't own Chrome or Android. They were not yet pervasively spying on billions of people:

If, like most people, you thought Google stopped tracking your location once you turned off Location History in your account settings, you were wrong. According to an AP investigation published Monday, even if you disable Location History, the search giant still tracks you every time you open Google Maps, get certain automatic weather updates, or search for things in your browser.

Thus Google had to rely on external signals as their primary ranking factor:

The reason that PageRank is interesting is that there are many cases where simple citation counting does not correspond to our common sense notion of importance. For example, if a web page has a link on the Yahoo home page, it may be just one link but it is a very important one. This page should be ranked higher than many pages with more links but from obscure places. PageRank is an attempt to see how good an approximation to "importance" can be obtained just from the link structure. ... The denition of PageRank above has another intuitive basis in random walks on graphs. The simplied version corresponds to the standing probability distribution of a random walk on the graph of the Web. Intuitively, this can be thought of as modeling the behavior of a "random surfer".

Google's reliance on links turned links into a commodity, which led to all sorts of fearmongering, manual penalties, nofollow and the Penguin update.

As Google collected more usage data those who overly focused on links often ended up scoring an own goal, creating sites which would not rank.

Google no longer invests heavily in fearmongering because it is no longer needed. Search is so complex most people can't figure it out.

Many SEOs have reduced their link building efforts as Google dialed up weighting on user engagement metrics, though it appears the tide may now be heading in the other direction. Some sites which had decent engagement metrics but little in the way of link building slid on the update late last month.

As much as Google desires relevancy in the short term, they also prefer a system complex enough to external onlookers that reverse engineering feels impossible. If they discourage investment in SEO they increase AdWords growth while gaining greater control over algorithmic relevancy.

Google will soon collect even more usage data by routing Chrome users through their DNS service: "Google isn't actually forcing Chrome users to only use Google's DNS service, and so it is not centralizing the data. Google is instead configuring Chrome to use DoH connections by default if a user's DNS service supports it."

If traffic is routed through Google that is akin to them hosting the page in terms of being able to track many aspects of user behavior. It is akin to AMP or YouTube in terms of being able to track users and normalize relative engagement metrics.

Once Google is hosting the end-to-end user experience they can create a near infinite number of ranking signals given their advancement in computing power: "We developed a new 54-qubit processor, named “Sycamore”, that is comprised of fast, high-fidelity quantum logic gates, in order to perform the benchmark testing. Our machine performed the target computation in 200 seconds, and from measurements in our experiment we determined that it would take the world’s fastest supercomputer 10,000 years to produce a similar output."

Relying on "one simple trick to..." sorts of approaches are frequently going to come up empty.

EMDs Kicked Once Again

I was one of the early promoters of exact match domains when the broader industry did not believe in them. I was also quick to mention when I felt the algorithms had moved in the other direction.

Google's mobile layout, which they are now testing on desktop computers as well, replaces green domain names with gray words which are easy to miss. And the favicon icons sort of make the organic results look like ads. Any boost a domain name like CreditCards.ext might have garnered in the past due to matching the keyword has certainly gone away with this new layout that further depreciates the impact of exact-match domain names.

At one point in time CreditCards.com was viewed as a consumer destination. It is now viewed ... below the fold.

If you have a memorable brand-oriented domain name the favicon can help offset the above impact somewhat, but matching keywords is becoming a much more precarious approach to sustaining rankings as the weight on brand awareness, user engagement & authority increase relative to the weight on anchor text.

New Keyword Tool

Our keyword tool is updated periodically. We recently updated it once more.

For comparison sake, the old keyword tool looked like this

Whereas the new keyword tool looks like this

The upsides of the new keyword tool are:

  • fresher data from this year
  • more granular data on ad bids vs click prices
  • lists ad clickthrough rate
  • more granular estimates of Google AdWords advertiser ad bids
  • more emphasis on commercial oriented keywords

With the new columns of [ad spend] and [traffic value] here is how we estimate those.

  • paid search ad spend: search ad clicks * CPC
  • organic search traffic value: ad impressions * 0.5 * (100% - ad CTR) * CPC

The first of those two is rather self explanatory. The second is a bit more complex. It starts with the assumption that about half of all searches do not get any clicks, then it subtracts the paid clicks from the total remaining pool of clicks & multiplies that by the cost per click.

The new data also has some drawbacks:

  • Rather than listing search counts specifically it lists relative ranges like low, very high, etc.
  • Since it tends to tilt more toward keywords with ad impressions, it may not have coverage for some longer tail informational keywords.

For any keyword where there is insufficient coverage we re-query the old keyword database for data & merge it across. You will know if data came from the new database if the first column says something like low or high & the data came from the older database if there are specific search counts in the first column

For a limited time we are still allowing access to both keyword tools, though we anticipate removing access to the old keyword tool in the future once we have collected plenty of feedback on the new keyword tool. Please feel free to leave your feedback in the below comments.

One of the cool features of the new keyword tools worth highlighting further is the difference between estimated bid prices & estimated click prices. In the following screenshot you can see how Amazon is estimated as having a much higher bid price than actual click price, largely because due to low keyword relevancy entities other than the official brand being arbitraged by Google require much higher bids to appear on competing popular trademark terms.

Historically, this difference between bid price & click price was a big source of noise on lists of the most valuable keywords.

Recently some advertisers have started complaining about the "Google shakedown" from how many brand-driven searches are simply leaving the .com part off of a web address in Chrome & then being forced to pay Google for their own pre-existing brand equity.

AMP'd Up for Recaptcha

Beyond search Google controls the leading distributed ad network, the leading mobile OS, the leading web browser, the leading email client, the leading web analytics platform, the leading mapping platform, the leading free video hosting site.

They win a lot.

And they take winnings from one market & leverage them into manipulating adjacent markets.

Embrace. Extend. Extinguish.

AMP is an utterly unnecessary invention designed to further shift power to Google while disenfranchising publishers. From the very start it had many issues with basic things like supporting JavaScript, double counting unique users (no reason to fix broken stats if they drive adoption!), not supporting third party ad networks, not showing publisher domain names, and just generally being a useless layer of sunk cost technical overhead that provides literally no real value.

Over time they have corrected some of these catastrophic deficiencies, but if it provided real value, they wouldn't have needed to force adoption with preferential placement in their search results. They force the bundling because AMP sucks.

Absurdity knows no bounds. Googlers suggest: "AMP isn’t another “channel” or “format” that’s somehow not the web. It’s not a SEO thing. It’s not a replacement for HTML. It’s a web component framework that can power your whole site. ... We, the AMP team, want AMP to become a natural choice for modern web development of content websites, and for you to choose AMP as framework because it genuinely makes you more productive."

Meanwhile some newspapers have about a dozen employees who work on re-formatting content for AMP:

The AMP development team now keeps track of whether AMP traffic drops suddenly, which might indicate pages are invalid, and it can react quickly.

All this adds expense, though. There are setup, development and maintenance costs associated with AMP, mostly in the form of time. After implementing AMP, the Guardian realized the project needed dedicated staff, so it created an 11-person team that works on AMP and other aspects of the site, drawing mostly from existing staff.

Feeeeeel the productivity!

Some content types (particularly user generated content) can be unpredictable & circuitous. For many years forums websites would use keywords embedded in the search referral to highlight relevant parts of the page. Keyword (not provided) largely destroyed that & then it became a competitive feature for AMP: "If the Featured Snippet links to an AMP article, Google will sometimes automatically scroll users to that section and highlight the answer in orange."

That would perhaps be a single area where AMP was more efficient than the alternative. But it is only so because Google destroyed the alternative by stripping keyword referrers from search queries.

The power dynamics of AMP are ugly:

"I see them as part of the effort to normalise the use of the AMP Carousel, which is an anti-competitive land-grab for the web by an organisation that seems to have an insatiable appetite for consuming the web, probably ultimately to it’s own detriment. ... This enables Google to continue to exist after the destination site (eg the New York Times) has been navigated to. Essentially it flips the parent-child relationship to be the other way around. ... As soon as a publisher blesses a piece of content by packaging it (they have to opt in to this, but see coercion below), they totally lose control of its distribution. ... I’m not that smart, so it’s surely possible to figure out other ways of making a preload possible without cutting off the content creator from the people consuming their content. ... The web is open and decentralised. We spend a lot of time valuing the first of these concepts, but almost none trying to defend the second. Google knows, perhaps better than anyone, how being in control of the user is the most monetisable position, and having the deepest pockets and the most powerful platform to do so, they have very successfully inserted themselves into my relationship with millions of other websites. ... In AMP, the support for paywalls is based on a recommendation that the premium content be included in the source of the page regardless of the user’s authorisation state. ... These policies demonstrate contempt for others’ right to freely operate their businesses.

After enough publishers adopted AMP Google was able to turn their mobile app's homepage into an interactive news feed below the search box. And inside that news feed Google gets to distribute MOAR ads while 0% of the revenue from those ads find its way to the publishers whose content is used to make up the feed.

Appropriate appropriation. :D

Thank you for your content!!!

The mainstream media is waking up to AMP being a trap, but their neck is already in it:

European and American tech, media and publishing companies, including some that originally embraced AMP, are complaining that the Google-backed technology, which loads article pages in the blink of an eye on smartphones, is cementing the search giant's dominance on the mobile web.

Each additional layer of technical cruft is another cost center. Things that sound appealing at first blush may not be:

The way you verify your identity to Let's Encrypt is the same as with other certificate authorities: you don't really. You place a file somewhere on your website, and they access that file over plain HTTP to verify that you own the website. The one attack that signed certificates are meant to prevent is a man-in-the-middle attack. But if someone is able to perform a man-in-the-middle attack against your website, then he can intercept the certificate verification, too. In other words, Let's Encrypt certificates don't stop the one thing they're supposed to stop. And, as always with the certificate authorities, a thousand murderous theocracies, advertising companies, and international spy organizations are allowed to impersonate you by design.

Anything that is easy to implement & widely marketed often has costs added to it in the future as the entity moves to monetize the service.

This is a private equity firm buying up multiple hosting control panels & then adjusting prices.

This is Google Maps drastically changing their API terms.

This is Facebook charging you for likes to build an audience, giving your competitors access to those likes as an addressable audience to advertise against, and then charging you once more to boost the reach of your posts.

This is Grubhub creating shadow websites on your behalf and charging you for every transaction created by the gravity of your brand.

Shivane believes GrubHub purchased her restaurant’s web domain to prevent her from building her own online presence. She also believes the company may have had a special interest in owning her name because she processes a high volume of orders. ... it appears GrubHub has set up several generic, templated pages that look like real restaurant websites but in fact link only to GrubHub. These pages also display phone numbers that GrubHub controls. The calls are forwarded to the restaurant, but the platform records each one and charges the restaurant a commission fee for every order

Settling for the easiest option drives a lack of differentiation, embeds additional risk & once the dominant player has enough marketshare they'll change the terms on you.

Small gains in short term margins for massive increases in fragility.

"Closed platforms increase the chunk size of competition & increase the cost of market entry, so people who have good ideas, it is a lot more expensive for their productivity to be monetized. They also don't like standardization ... it looks like rent seeking behaviors on top of friction" - Gabe Newell

The other big issue is platforms that run out of growth space in their core market may break integrations with adjacent service providers as each want to grow by eating the other's market.

Those who look at SaaS business models through the eyes of a seasoned investor will better understand how markets are likely to change:

"I’d argue that many of today’s anointed tech “disruptors” are doing little in the way of true disruption. ... When investors used to get excited about a SAAS company, they typically would be describing a hosted multi-tenant subscription-billed piece of software that was replacing a ‘legacy’ on-premise perpetual license solution in the same target market (i.e. ERP, HCM, CRM, etc.). Today, the terms SAAS and Cloud essentially describe the business models of every single public software company.

Most platform companies are initially required to operate at low margins in order to buy growth of their category & own their category. Then when they are valued on that, they quickly need to jump across to adjacent markets to grow into the valuation:

Twilio has no choice but to climb up the application stack. This is a company whose ‘disruption’ is essentially great API documentation and gangbuster SEO spend built on top of a highly commoditized telephony aggregation API. They have won by marketing to DevOps engineers. With all the hype around them, you’d think Twilio invented the telephony API, when in reality what they did was turn it into a product company. Nobody had thought of doing this let alone that this could turn into a $17 billion company because simply put the economics don’t work. And to be clear they still don’t. But Twilio’s genius CEO clearly gets this. If the market is going to value robocalls, emergency sms notifications, on-call pages, and carrier fee passed through related revenue growth in the same way it does ‘subscription’ revenue from Atlassian or ServiceNow, then take advantage of it while it lasts.

Large platforms offering temporary subsidies to ensure they dominate their categories & companies like SoftBank spraying capital across the markets is causing massive shifts in valuations:

I also think if you look closely at what is celebrated today as innovation you often find models built on hidden subsidies. ... I’d argue the very distributed nature of microservices architecture and API-first product companies means addressable market sizes and unit economics assumptions should be even more carefully scrutinized. ... How hard would it be to create an Alibaba today if someone like SoftBank was raining money into such a greenfield space? Excess capital would lead to destruction and likely subpar returns. If capital was the solution, the 1.5 trillion that went into telcos in late '90s wouldn’t have led to a massive bust. Would a Netflix be what it is today if a SoftBank was pouring billions into streaming content startups right as the experiment was starting? Obviously not. Scarcity of capital is another often underappreciated part of the disruption equation. Knowing resources are finite leads to more robust models. ... This convergence is starting to manifest itself in performance. Disney is up 30% over the last 12 months while Netflix is basically flat. This may not feel like a bubble sign to most investors, but from my standpoint, it’s a clear evidence of the fact that we are approaching a something has got to give moment for the way certain businesses are valued."

Circling back to Google's AMP, it has a cousin called Recaptcha.

Recaptcha is another AMP-like trojan horse:

According to tech statistics website Built With, more than 650,000 websites are already using reCaptcha v3; overall, there are at least 4.5 million websites use reCaptcha, including 25% of the top 10,000 sites. Google is also now testing an enterprise version of reCaptcha v3, where Google creates a customized reCaptcha for enterprises that are looking for more granular data about users’ risk levels to protect their site algorithms from malicious users and bots. ... According to two security researchers who’ve studied reCaptcha, one of the ways that Google determines whether you’re a malicious user or not is whether you already have a Google cookie installed on your browser. ... To make this risk-score system work accurately, website administrators are supposed to embed reCaptcha v3 code on all of the pages of their website, not just on forms or log-in pages.

About a month ago when logging into Bing Ads I saw recaptcha on the login page & couldn't believe they'd give Google control at that access point. I think they got rid of that, but lots of companies are perhaps shooting themselves in the foot through a combination of over-reliance on Google infrastructure AND sloppy implementation

Today when making a purchase on Fiverr, after converting, I got some of this action

Hmm. Maybe I will enable JavaScript and try again.

Oooops.

That is called snatching defeat from the jaws of victory.

My account is many years old. My payment type on record has been used for years. I have ordered from the particular seller about a dozen times over the years. And suddenly because my web browser had JavaScript turned off I was deemed a security risk of some sort for making an utterly ordinary transaction I have already completed about a dozen times.

On AMP JavaScript was the devil. And on desktop not JavaScript was the devil.

Pro tip: Ecommerce websites that see substandard conversion rates from using Recaptcha can boost their overall ecommerce revenue by buying more Google AdWords ads.

---

As more of the infrastructure stack is driven by AI software there is going to be a very real opportunity for many people to become deplatformed across the web on an utterly arbitrary basis. That tech companies like Facebook also want to create digital currencies on top of the leverage they already have only makes the proposition that much scarier.

If the tech platforms host copies of our sites, process the transactions & even create their own currencies, how will we know what level of value they are adding versus what they are extracting?

Who measures the measurer?

And when the economics turn negative, what will we do if we are hooked into an ecosystem we can't spend additional capital to get out of when things head south?

The Fractured Web

Anyone can argue about the intent of a particular action & the outcome that is derived by it. But when the outcome is known, at some point the intent is inferred if the outcome is derived from a source of power & the outcome doesn't change.

Or, put another way, if a powerful entity (government, corporation, other organization) disliked an outcome which appeared to benefit them in the short term at great lasting cost to others, they could spend resources to adjust the system.

If they don't spend those resources (or, rather, spend them on lobbying rather than improving the ecosystem) then there is no desired change. The outcome is as desired. Change is unwanted.

News is a stock vs flow market where the flow of recent events drives most of the traffic to articles. News that is more than a couple days old is no longer news. A news site which stops publishing news stops becoming a habit & quickly loses relevancy. Algorithmically an abandoned archive of old news articles doesn't look much different than eHow, in spite of having a much higher cost structure.

According to SEMrush's traffic rank, ampproject.org gets more monthly visits than Yahoo.com.

Traffic Ranks.

That actually understates the prevalence of AMP because AMP is generally designed for mobile AND not all AMP-formatted content is displayed on ampproject.org.

Part of how AMP was able to get widespread adoption was because in the news vertical the organic search result set was displaced by an AMP block. If you were a news site either you were so differentiated that readers would scroll past the AMP block in the search results to look for you specifically, or you adopted AMP, or you were doomed.

Some news organizations like The Guardian have a team of about a dozen people reformatting their content to the duplicative & proprietary AMP format. That's wasteful, but necessary "In theory, adoption of AMP is voluntary. In reality, publishers that don’t want to see their search traffic evaporate have little choice. New data from publisher analytics firm Chartbeat shows just how much leverage Google has over publishers thanks to its dominant search engine."

It seems more than a bit backward that low margin publishers are doing duplicative work to distance themselves from their own readers while improving the profit margins of monopolies. But it is what it is. And that no doubt drew the ire of many publishers across the EU.

And now there are AMP Stories to eat up even more visual real estate.

If you spent a bunch of money to create a highly differentiated piece of content, why would you prefer that high spend flagship content appear on a third party website rather than your own?

Google & Facebook have done such a fantastic job of eating the entire pie that some are celebrating Amazon as a prospective savior to the publishing industry. That view - IMHO - is rather suspect.

Where any of the tech monopolies dominate they cram down on partners. The New York Times acquired The Wirecutter in Q4 of 2016. In Q1 of 2017 Amazon adjusted their affiliate fee schedule.

Amazon generally treats consumers well, but they have been much harder on business partners with tough pricing negotiations, counterfeit protections, forced ad buying to have a high enough product rank to be able to rank organically, ad displacement of their organic search results below the fold (even for branded search queries), learning suppliers & cutting out the partners, private label products patterned after top sellers, in some cases running pop over ads for the private label products on product level pages where brands already spent money to drive traffic to the page, etc.

They've made things tougher for their partners in a way that mirrors the impact Facebook & Google have had on online publishers:

"Boyce’s experience on Amazon largely echoed what happens in the offline world: competitors entered the market, pushing down prices and making it harder to make a profit. So Boyce adapted. He stopped selling basketball hoops and developed his own line of foosball tables, air hockey tables, bocce ball sets and exercise equipment. The best way to make a decent profit on Amazon was to sell something no one else had and create your own brand. ... Amazon also started selling bocce ball sets that cost $15 less than Boyce’s. He says his products are higher quality, but Amazon gives prominent page space to its generic version and wins the cost-conscious shopper."

Google claims they have no idea how content publishers are with the trade off between themselves & the search engine, but every quarter Alphabet publish the share of ad spend occurring on owned & operated sites versus the share spent across the broader publisher network. And in almost every quarter for over a decade straight that ratio has grown worse for publishers.

The aggregate numbers for news publishers are worse than shown above as Google is ramping up ads in video games quite hard. They've partnered with Unity & promptly took away the ability to block ads from appearing in video games using googleadsenseformobileapps.com exclusion (hello flat thumb misclicks, my name is budget & I am gone!)

They will also track video game player behavior & alter game play to maximize revenues based on machine learning tied to surveillance of the user's account: "We’re bringing a new approach to monetization that combines ads and in-app purchases in one automated solution. Available today, new smart segmentation features in Google AdMob use machine learning to segment your players based on their likelihood to spend on in-app purchases. Ad units with smart segmentation will show ads only to users who are predicted not to spend on in-app purchases. Players who are predicted to spend will see no ads, and can simply continue playing."

And how does the growth of ampproject.org square against the following wisdom?

Literally only yesterday did Google begin supporting instant loading of self-hosted AMP pages.

China has a different set of tech leaders than the United States. Baidu, Alibaba, Tencent (BAT) instead of Facebook, Amazon, Apple, Netflix, Google (FANG). China tech companies may have won their domestic markets in part based on superior technology or better knowledge of the local culture, though those same companies have largely went nowhere fast in most foreign markets. A big part of winning was governmental assistance in putting a foot on the scales.

Part of the US-China trade war is about who controls the virtual "seas" upon which value flows:

it can easily be argued that the last 60 years were above all the era of the container-ship (with container-ships getting ever bigger). But will the coming decades still be the age of the container-ship? Possibly not, for the simple reason that things that have value increasingly no longer travel by ship, but instead by fiberoptic cables! ... you could almost argue that ZTE and Huawei have been the “East India Company” of the current imperial cycle. Unsurprisingly, it is these very companies, charged with laying out the “new roads” along which “tomorrow’s value” will flow, that find themselves at the center of the US backlash. ... if the symbol of British domination was the steamship, and the symbol of American strength was the Boeing 747, it seems increasingly clear that the question of the future will be whether tomorrow’s telecom switches and routers are produced by Huawei or Cisco. ... US attempts to take down Huawei and ZTE can be seen as the existing empire’s attempt to prevent the ascent of a new imperial power. With this in mind, I could go a step further and suggest that perhaps the Huawei crisis is this century’s version of Suez crisis. No wonder markets have been falling ever since the arrest of the Huawei CFO. In time, the Suez Crisis was brought to a halt by US threats to destroy the value of sterling. Could we now witness the same for the US dollar?

China maintains Huawei is an employee-owned company. But that proposition is suspect. Broadly stealing technology is vital to the growth of the Chinese economy & they have no incentive to stop unless their leading companies pay a direct cost. Meanwhile, China is investigating Ericsson over licensing technology.

Amazon will soon discontinue selling physical retail products in China: "Amazon shoppers in China will no longer be able to buy goods from third-party merchants in the country, but they still will be able to order from the United States, Britain, Germany and Japan via the firm’s global store. Amazon expects to close fulfillment centers and wind down support for domestic-selling merchants in China in the next 90 days."

India has taken notice of the success of Chinese tech companies & thus began to promote "national champion" company policies. That, in turn, has also meant some of the Chinese-styled laws requiring localized data, antitrust inquiries, foreign ownership restrictions, requirements for platforms to not sell their own goods, promoting limits on data encryption, etc.

The secretary of India’s Telecommunications Department, Aruna Sundararajan, last week told a gathering of Indian startups in a closed-door meeting in the tech hub of Bangalore that the government will introduce a “national champion” policy “very soon” to encourage the rise of Indian companies, according to a person familiar with the matter. She said Indian policy makers had noted the success of China’s internet giants, Alibaba Group Holding Ltd. and Tencent Holdings Ltd. ... Tensions began rising last year, when New Delhi decided to create a clearer set of rules for e-commerce and convened a group of local players to solicit suggestions. Amazon and Flipkart, even though they make up more than half the market, weren’t invited, according to people familiar with the matter.

Amazon vowed to invest $5 billion in India & they have done some remarkable work on logistics there. Walmart acquired Flipkart for $16 billion.

Other emerging markets also have many local ecommerce leaders like Jumia, MercadoLibre, OLX, Gumtree, Takealot, Konga, Kilimall, BidOrBuy, Tokopedia, Bukalapak, Shoppee, Lazada. If you live in the US you may have never heard of *any* of those companies. And if you live in an emerging market you may have never interacted with Amazon or eBay.

It makes sense that ecommerce leadership would be more localized since it requires moving things in the physical economy, dealing with local currencies, managing inventory, shipping goods, etc. whereas information flows are just bits floating on a fiber optic cable.

If the Internet is primarily seen as a communications platform it is easy for people in some emerging markets to think Facebook is the Internet. Free communication with friends and family members is a compelling offer & as the cost of data drops web usage increases.

At the same time, the web is incredibly deflationary. Every free form of entertainment which consumes time is time that is not spent consuming something else.

Add the technological disruption to the wealth polarization that happened in the wake of the great recession, then combine that with algorithms that promote extremist views & it is clearly causing increasing conflict.

If you are a parent and you think you child has no shot at a brighter future than your own life it is easy to be full of rage.

Empathy can radicalize otherwise normal people by giving them a more polarized view of the world:

Starting around 2000, the line starts to slide. More students say it's not their problem to help people in trouble, not their job to see the world from someone else's perspective. By 2009, on all the standard measures, Konrath found, young people on average measure 40 percent less empathetic than my own generation ... The new rule for empathy seems to be: reserve it, not for your "enemies," but for the people you believe are hurt, or you have decided need it the most. Empathy, but just for your own team. And empathizing with the other team? That's practically a taboo.

A complete lack of empathy could allow a psychopath to commit extreme crimes while feeling no guilt, shame or remorse. Extreme empathy can have the same sort of outcome:

"Sometimes we commit atrocities not out of a failure of empathy but rather as a direct consequence of successful, even overly successful, empathy. ... They emphasized that students would learn both sides, and the atrocities committed by one side or the other were always put into context. Students learned this curriculum, but follow-up studies showed that this new generation was more polarized than the one before. ... [Empathy] can be good when it leads to good action, but it can have downsides. For example, if you want the victims to say 'thank you.' You may even want to keep the people you help in that position of inferior victim because it can sustain your feeling of being a hero." - Fritz Breithaupt

News feeds will be read. Villages will be razed. Lynch mobs will become commonplace.

Many people will end up murdered by algorithmically generated empathy.

As technology increases absentee ownership & financial leverage, a society led by morally agnostic algorithms is not going to become more egalitarian.

When politicians throw fuel on the fire it only gets worse:

It’s particularly odd that the government is demanding “accountability and responsibility” from a phone app when some ruling party politicians are busy spreading divisive fake news. How can the government ask WhatsApp to control mobs when those convicted of lynching Muslims have been greeted, garlanded and fed sweets by some of the most progressive and cosmopolitan members of Modi’s council of ministers?

Mark Zuckerburg won't get caught downstream from platform blowback as he spends $20 million a year on his security.

The web is a mirror. Engagement-based algorithms reinforcing our perceptions & identities.

And every important story has at least 2 sides!

Some may "learn" vaccines don't work. Others may learn the vaccines their own children took did not work, as it failed to protect them from the antivax content spread by Facebook & Google, absorbed by people spreading measles & Medieval diseases.

Passion drives engagement, which drives algorithmic distribution: "There’s an asymmetry of passion at work. Which is to say, there’s very little counter-content to surface because it simply doesn’t occur to regular people (or, in this case, actual medical experts) that there’s a need to produce counter-content."

As the costs of "free" become harder to hide, social media companies which currently sell emerging markets as their next big growth area will end up having embedded regulatory compliance costs which will end up exceeding any sort of prospective revenue they could hope to generate.

The Pinterest S1 shows almost all their growth is in emerging markets, yet almost all their revenue is inside the United States.

As governments around the world see the real-world cost of the foreign tech companies & view some of them as piggy banks, eventually the likes of Facebook or Google will pull out of a variety of markets they no longer feel worth serving. It will be like Google did in mainland China with search after discovering pervasive hacking of activist Gmail accounts.

Lower friction & lower cost information markets will face more junk fees, hurdles & even some legitimate regulations. Information markets will start to behave more like physical goods markets.

The tech companies presume they will be able to use satellites, drones & balloons to beam in Internet while avoiding messy local issues tied to real world infrastructure, but when a local wealthy player is betting against them they'll probably end up losing those markets: "One of the biggest cheerleaders for the new rules was Reliance Jio, a fast-growing mobile phone company controlled by Mukesh Ambani, India’s richest industrialist. Mr. Ambani, an ally of Mr. Modi, has made no secret of his plans to turn Reliance Jio into an all-purpose information service that offers streaming video and music, messaging, money transfer, online shopping, and home broadband services."

Publishers do not have "their mojo back" because the tech companies have been so good to them, but rather because the tech companies have been so aggressive that they've earned so much blowback which will in turn lead publishers to opting out of future deals, which will eventually lead more people back to the trusted brands of yesterday.

Publishers feeling guilty about taking advertorial money from the tech companies to spread their propaganda will offset its publication with opinion pieces pointing in the other direction: "This is a lobbying campaign in which buying the good opinion of news brands is clearly important. If it was about reaching a target audience, there are plenty of metrics to suggest his words would reach further – at no cost – on Facebook. Similarly, Google is upping its presence in a less obvious manner via assorted media initiatives on both sides of the Atlantic. Its more direct approach to funding journalism seems to have the desired effect of making all media organisations (and indeed many academic institutions) touched by its money slightly less questioning and critical of its motives."

When Facebook goes down direct visits to leading news brand sites go up.

When Google penalizes a no-name me-too site almost nobody realizes it is missing. But if a big publisher opts out of the ecosystem people will notice.

The reliance on the tech platforms is largely a mirage. If enough key players were to opt out at the same time people would quickly reorient their information consumption habits.

If the platforms can change their focus overnight then why can't publishers band together & choose to dump them?

In Europe there is GDPR, which aimed to protect user privacy, but ultimately acted as a tax on innovation by local startups while being a subsidy to the big online ad networks. They also have Article 11 & Article 13, which passed in spite of Google's best efforts on the scaremongering anti-SERP tests, lobbying & propaganda fronts: "Google has sparked criticism by encouraging news publishers participating in its Digital News Initiative to lobby against proposed changes to EU copyright law at a time when the beleaguered sector is increasingly turning to the search giant for help."

Remember the Eric Schmidt comment about how brands are how you sort out (the non-YouTube portion of) the cesspool? As it turns out, he was allegedly wrong as Google claims they have been fighting for the little guy the whole time:

Article 11 could change that principle and require online services to strike commercial deals with publishers to show hyperlinks and short snippets of news. This means that search engines, news aggregators, apps, and platforms would have to put commercial licences in place, and make decisions about which content to include on the basis of those licensing agreements and which to leave out. Effectively, companies like Google will be put in the position of picking winners and losers. ... Why are large influential companies constraining how new and small publishers operate? ... The proposed rules will undoubtedly hurt diversity of voices, with large publishers setting business models for the whole industry. This will not benefit all equally. ... We believe the information we show should be based on quality, not on payment.

Facebook claims there is a local news problem: "Facebook Inc. has been looking to boost its local-news offerings since a 2017 survey showed most of its users were clamoring for more. It has run into a problem: There simply isn’t enough local news in vast swaths of the country. ... more than one in five newspapers have closed in the past decade and a half, leaving half the counties in the nation with just one newspaper, and 200 counties with no newspaper at all."

Google is so for the little guy that for their local news experiments they've partnered with a private equity backed newspaper roll up firm & another newspaper chain which did overpriced acquisitions & is trying to act like a PE firm (trying to not get eaten by the PE firm).

Does the above stock chart look in any way healthy?

Does it give off the scent of a firm that understood the impact of digital & rode it to new heights?

If you want good market-based outcomes, why not partner with journalists directly versus operating through PE chop shops?

If Patch is profitable & Google were a neutral ranking system based on quality, couldn't Google partner with journalists directly?

Throwing a few dollars at a PE firm in some nebulous partnership sure beats the sort of regulations coming out of the EU. And the EU's regulations (and prior link tax attempts) are in addition to the three multi billion Euro fines the European Union has levied against Alphabet for shopping search, Android & AdSense.

Google was also fined in Russia over Android bundling. The fine was tiny, but after consumers gained a search engine choice screen (much like Google pushed for in Europe on Microsoft years ago) Yandex's share of mobile search grew quickly.

The UK recently published a white paper on online harms. In some ways it is a regulation just like the tech companies might offer to participants in their ecosystems:

Companies will have to fulfil their new legal duties or face the consequences and “will still need to be compliant with the overarching duty of care even where a specific code does not exist, for example assessing and responding to the risk associated with emerging harms or technology”.

If web publishers should monitor inbound links to look for anything suspicious then the big platforms sure as hell have the resources & profit margins to monitor behavior on their own websites.

Australia passed the Sharing of Abhorrent Violent Material bill which requires platforms to expeditiously remove violent videos & notify the Australian police about them.

There are other layers of fracturing going on in the web as well.

Programmatic advertising shifted revenue from publishers to adtech companies & the largest ad sellers. Ad blockers further lower the ad revenues of many publishers. If you routinely use an ad blocker, try surfing the web for a while without one & you will notice layover welcome AdSense ads on sites as you browse the web - the very type of ad they were allegedly against when promoting AMP.

Tracking protection in browsers & ad blocking features built directly into browsers leave publishers more uncertain. And who even knows who visited an AMP page hosted on a third party server, particularly when things like GDPR are mixed in? Those who lack first party data may end up having to make large acquisitions to stay relevant.

Voice search & personal assistants are now ad channels.

App stores are removing VPNs in China, removing Tiktok in India, and keeping female tracking apps in Saudi Arabia. App stores are centralized chokepoints for governments. Every centralized service is at risk of censorship. Web browsers from key state-connected players can also censor messages spread by developers on platforms like GitHub.

Microsoft's newest Edge web browser is based on Chromium, the source of Google Chrome. While Mozilla Firefox gets most of their revenue from a search deal with Google, Google has still went out of its way to use its services to both promote Chrome with pop overs AND break in competing web browsers:

"All of this is stuff you're allowed to do to compete, of course. But we were still a search partner, so we'd say 'hey what gives?' And every time, they'd say, 'oops. That was accidental. We'll fix it in the next push in 2 weeks.' Over and over. Oops. Another accident. We'll fix it soon. We want the same things. We're on the same team. There were dozens of oopses. Hundreds maybe?" - former Firefox VP Jonathan Nightingale

As phone sales fall & app downloads stall a hardware company like Apple is pushing hard into services while quietly raking in utterly fantastic ad revenues from search & ads in their app store.

Part of the reason people are downloading fewer apps is so many apps require registration as soon as they are opened, or only let a user engage with them for seconds before pushing aggressive upsells. And then many apps which were formerly one-off purchases are becoming subscription plays. As traffic acquisition costs have jumped, many apps must engage in sleight of hand behaviors (free but not really, we are collecting data totally unrelated to the purpose of our app & oops we sold your data, etc.) in order to get the numbers to back out. This in turn causes app stores to slow down app reviews.

Apple acquired the news subscription service Texture & turned it into Apple News Plus. Not only is Apple keeping half the subscription revenues, but soon the service will only work for people using Apple devices, leaving nearly 100,000 other subscribers out in the cold: "if you’re part of the 30% who used Texture to get your favorite magazines digitally on Android or Windows devices, you will soon be out of luck. Only Apple iOS devices will be able to access the 300 magazines available from publishers. At the time of the sale in March 2018 to Apple, Texture had about 240,000 subscribers."

Apple is also going to spend over a half-billion Dollars exclusively licensing independently developed games:

Several people involved in the project’s development say Apple is spending several million dollars each on most of the more than 100 games that have been selected to launch on Arcade, with its total budget likely to exceed $500m. The games service is expected to launch later this year. ... Apple is offering developers an extra incentive if they agree for their game to only be available on Arcade, withholding their release on Google’s Play app store for Android smartphones or other subscription gaming bundles such as Microsoft’s Xbox game pass.

Verizon wants to launch a video game streaming service. It will probably be almost as successful as their Go90 OTT service was. Microsoft is pushing to make Xbox games work on Android devices. Amazon is developing a game streaming service to compliment Twitch.

The hosts on Twitch, some of whom sign up exclusively with the platform in order to gain access to its moneymaking tools, are rewarded for their ability to make a connection with viewers as much as they are for their gaming prowess. Viewers who pay $4.99 a month for a basic subscription — the money is split evenly between the streamers and Twitch — are looking for immediacy and intimacy. While some hosts at YouTube Gaming offer a similar experience, they have struggled to build audiences as large, and as dedicated, as those on Twitch. ... While YouTube has made millionaires out of the creators of popular videos through its advertising program, Twitch’s hosts make money primarily from subscribers and one-off donations or tips. YouTube Gaming has made it possible for viewers to support hosts this way, but paying audiences haven’t materialized at the scale they have on Twitch.

Google, having a bit of Twitch envy, is also launching a video game streaming service which will be deeply integrated into YouTube: "With Stadia, YouTube watchers can press “Play now” at the end of a video, and be brought into the game within 5 seconds. The service provides “instant access” via button or link, just like any other piece of content on the web."

Google will also launch their own game studio making exclusive games for their platform.

When consoles don't use discs or cartridges so they can sell a subscription access to their software library it is hard to be a game retailer! GameStop's stock has been performing like an ICO. And these sorts of announcements from the tech companies have been hitting stock prices for companies like Nintendo & Sony: “There is no doubt this service makes life even more difficult for established platforms,” Amir Anvarzadeh, a market strategist at Asymmetric Advisors Pte, said in a note to clients. “Google will help further fragment the gaming market which is already coming under pressure by big games which have adopted the mobile gaming business model of giving the titles away for free in hope of generating in-game content sales.”

The big tech companies which promoted everything in adjacent markets being free are now erecting paywalls for themselves, balkanizing the web by paying for exclusives to drive their bundled subscriptions.

How many paid movie streaming services will the web have by the end of next year? 20? 50? Does anybody know?

Disney alone with operate Disney+, ESPN+ as well as Hulu.

And then the tech companies are not only licensing exclusives to drive their subscription-based services, but we're going to see more exclusionary policies like YouTube not working on Amazon Echo, Netflix dumping support for Apple's Airplay, or Amazon refusing to sell devices like Chromecast or Apple TV.

The good news in a fractured web is a broader publishing industry that contains many micro markets will have many opportunities embedded in it. A Facebook pivot away from games toward news, or a pivot away from news toward video won't kill third party publishers who have a more diverse traffic profile and more direct revenues. And a regional law blocking porn or gambling websites might lead to an increase in demand for VPNs or free to play points-based games with paid upgrades. Even the rise of metered paywalls will lead to people using more web browsers & more VPNs. Each fracture (good or bad) will create more market edges & ultimately more opportunities. Chinese enforcement of their gambling laws created a real estate boom in Manila.

So long as there are 4 or 5 game stores, 4 or 5 movie streaming sites, etc. ... they have to compete on merit or use money to try to buy exclusives. Either way is better than the old monopoly strategy of take it or leave it ultimatums.

The publisher wins because there is a competitive bid. There won't be an arbitrary 30% tax on everything. So long as there is competition from the open web there will be means to bypass the junk fees & the most successful companies that do so might create their own stores with a lower rate: "Mr. Schachter estimates that Apple and Google could see a hit of about 14% to pretax earnings if they reduced their own app commissions to match Epic’s take."

As the big media companies & big tech companies race to create subscription products they'll spend many billions on exclusives. And they will be training consumers that there's nothing wrong with paying for content. This will eventually lead to hundreds of thousands or even millions of successful niche publications which have incentives better aligned than all the issues the ad supported web has faced.

Added: Facebook pushing privacy & groups is both an attempt to thwart regulation risk while also making their services more relevant to a web that fractures away from a monolithic thing into more niche communities.

One way of looking at Facebook in this moment is as an unstoppable behemoth that bends reality to its will, no matter the consequences. (This is how many journalists tend to see it.) Another way of looking at the company is from the perspective of its fundamental weakness — as a slave to ever-shifting consumer behavior. (This is how employees are more likely to look at it.) ... Zuckerberg’s vision for a new Facebook is perhaps best represented by a coming redesign of the flagship app and desktop site that will emphasize events and groups, at the expense of the News Feed. Collectively, the design changes will push people toward smaller group conversations and real-world meetups — and away from public posts.

Keyword Not Provided, But it Just Clicks

When SEO Was Easy

When I got started on the web over 15 years ago I created an overly broad & shallow website that had little chance of making money because it was utterly undifferentiated and crappy. In spite of my best (worst?) efforts while being a complete newbie, sometimes I would go to the mailbox and see a check for a couple hundred or a couple thousand dollars come in. My old roommate & I went to Coachella & when the trip was over I returned to a bunch of mail to catch up on & realized I had made way more while not working than what I spent on that trip.

What was the secret to a total newbie making decent income by accident?

Horrible spelling.

Back then search engines were not as sophisticated with their spelling correction features & I was one of 3 or 4 people in the search index that misspelled the name of an online casino the same way many searchers did.

The high minded excuse for why I did not scale that would be claiming I knew it was a temporary trick that was somehow beneath me. The more accurate reason would be thinking in part it was a lucky fluke rather than thinking in systems. If I were clever at the time I would have created the misspeller's guide to online gambling, though I think I was just so excited to make anything from the web that I perhaps lacked the ambition & foresight to scale things back then.

In the decade that followed I had a number of other lucky breaks like that. One time one of the original internet bubble companies that managed to stay around put up a sitewide footer link targeting the concept that one of my sites made decent money from. This was just before the great recession, before Panda existed. The concept they targeted had 3 or 4 ways to describe it. 2 of them were very profitable & if they targeted either of the most profitable versions with that page the targeting would have sort of carried over to both. They would have outranked me if they targeted the correct version, but they didn't so their mistargeting was a huge win for me.

Search Gets Complex

Search today is much more complex. In the years since those easy-n-cheesy wins, Google has rolled out many updates which aim to feature sought after destination sites while diminishing the sites which rely one "one simple trick" to rank.

Arguably the quality of the search results has improved significantly as search has become more powerful, more feature rich & has layered in more relevancy signals.

Many quality small web publishers have went away due to some combination of increased competition, algorithmic shifts & uncertainty, and reduced monetization as more ad spend was redirected toward Google & Facebook. But the impact as felt by any given publisher is not the impact as felt by the ecosystem as a whole. Many terrible websites have also went away, while some formerly obscure though higher-quality sites rose to prominence.

There was the Vince update in 2009, which boosted the rankings of many branded websites.

Then in 2011 there was Panda as an extension of Vince, which tanked the rankings of many sites that published hundreds of thousands or millions of thin content pages while boosting the rankings of trusted branded destinations.

Then there was Penguin, which was a penalty that hit many websites which had heavily manipulated or otherwise aggressive appearing link profiles. Google felt there was a lot of noise in the link graph, which was their justification for the Penguin.

There were updates which lowered the rankings of many exact match domains. And then increased ad load in the search results along with the other above ranking shifts further lowered the ability to rank keyword-driven domain names. If your domain is generically descriptive then there is a limit to how differentiated & memorable you can make it if you are targeting the core market the keywords are aligned with.

There is a reason eBay is more popular than auction.com, Google is more popular than search.com, Yahoo is more popular than portal.com & Amazon is more popular than a store.com or a shop.com. When that winner take most impact of many online markets is coupled with the move away from using classic relevancy signals the economics shift to where is makes a lot more sense to carry the heavy overhead of establishing a strong brand.

Branded and navigational search queries could be used in the relevancy algorithm stack to confirm the quality of a site & verify (or dispute) the veracity of other signals.

Historically relevant algo shortcuts become less appealing as they become less relevant to the current ecosystem & even less aligned with the future trends of the market. Add in negative incentives for pushing on a string (penalties on top of wasting the capital outlay) and a more holistic approach certainly makes sense.

Modeling Web Users & Modeling Language

PageRank was an attempt to model the random surfer.

When Google is pervasively monitoring most users across the web they can shift to directly measuring their behaviors instead of using indirect signals.

Years ago Bill Slawski wrote about the long click in which he opened by quoting Steven Levy's In the Plex: How Google Thinks, Works, and Shapes our Lives

"On the most basic level, Google could see how satisfied users were. To paraphrase Tolstoy, happy users were all the same. The best sign of their happiness was the "Long Click" — This occurred when someone went to a search result, ideally the top one, and did not return. That meant Google has successfully fulfilled the query."

Of course, there's a patent for that. In Modifying search result ranking based on implicit user feedback they state:

user reactions to particular search results or search result lists may be gauged, so that results on which users often click will receive a higher ranking. The general assumption under such an approach is that searching users are often the best judges of relevance, so that if they select a particular search result, it is likely to be relevant, or at least more relevant than the presented alternatives.

If you are a known brand you are more likely to get clicked on than a random unknown entity in the same market.

And if you are something people are specifically seeking out, they are likely to stay on your website for an extended period of time.

One aspect of the subject matter described in this specification can be embodied in a computer-implemented method that includes determining a measure of relevance for a document result within a context of a search query for which the document result is returned, the determining being based on a first number in relation to a second number, the first number corresponding to longer views of the document result, and the second number corresponding to at least shorter views of the document result; and outputting the measure of relevance to a ranking engine for ranking of search results, including the document result, for a new search corresponding to the search query. The first number can include a number of the longer views of the document result, the second number can include a total number of views of the document result, and the determining can include dividing the number of longer views by the total number of views.

Attempts to manipulate such data may not work.

safeguards against spammers (users who generate fraudulent clicks in an attempt to boost certain search results) can be taken to help ensure that the user selection data is meaningful, even when very little data is available for a given (rare) query. These safeguards can include employing a user model that describes how a user should behave over time, and if a user doesn't conform to this model, their click data can be disregarded. The safeguards can be designed to accomplish two main objectives: (1) ensure democracy in the votes (e.g., one single vote per cookie and/or IP for a given query-URL pair), and (2) entirely remove the information coming from cookies or IP addresses that do not look natural in their browsing behavior (e.g., abnormal distribution of click positions, click durations, clicks_per_minute/hour/day, etc.). Suspicious clicks can be removed, and the click signals for queries that appear to be spmed need not be used (e.g., queries for which the clicks feature a distribution of user agents, cookie ages, etc. that do not look normal).

And just like Google can make a matrix of documents & queries, they could also choose to put more weight on search accounts associated with topical expert users based on their historical click patterns.

Moreover, the weighting can be adjusted based on the determined type of the user both in terms of how click duration is translated into good clicks versus not-so-good clicks, and in terms of how much weight to give to the good clicks from a particular user group versus another user group. Some user's implicit feedback may be more valuable than other users due to the details of a user's review process. For example, a user that almost always clicks on the highest ranked result can have his good clicks assigned lower weights than a user who more often clicks results lower in the ranking first (since the second user is likely more discriminating in his assessment of what constitutes a good result). In addition, a user can be classified based on his or her query stream. Users that issue many queries on (or related to) a given topic T (e.g., queries related to law) can be presumed to have a high degree of expertise with respect to the given topic T, and their click data can be weighted accordingly for other queries by them on (or related to) the given topic T.

Google was using click data to drive their search rankings as far back as 2009. David Naylor was perhaps the first person who publicly spotted this. Google was ranking Australian websites for [tennis court hire] in the UK & Ireland, in part because that is where most of the click signal came from. That phrase was most widely searched for in Australia. In the years since Google has done a better job of geographically isolating clicks to prevent things like the problem David Naylor noticed, where almost all search results in one geographic region came from a different country.

Whenever SEOs mention using click data to search engineers, the search engineers quickly respond about how they might consider any signal but clicks would be a noisy signal. But if a signal has noise an engineer would work around the noise by finding ways to filter the noise out or combine multiple signals. To this day Google states they are still working to filter noise from the link graph: "We continued to protect the value of authoritative and relevant links as an important ranking signal for Search."

The site with millions of inbound links, few intentional visits & those who do visit quickly click the back button (due to a heavy ad load, poor user experience, low quality content, shallow content, outdated content, or some other bait-n-switch approach)...that's an outlier. Preventing those sorts of sites from ranking well would be another way of protecting the value of authoritative & relevant links.

Best Practices Vary Across Time & By Market + Category

Along the way, concurrent with the above sorts of updates, Google also improved their spelling auto-correct features, auto-completed search queries for many years through a featured called Google Instant (though they later undid forced query auto-completion while retaining automated search suggestions), and then they rolled out a few other algorithms that further allowed them to model language & user behavior.

Today it would be much harder to get paid above median wages explicitly for sucking at basic spelling or scaling some other individual shortcut to the moon, like pouring millions of low quality articles into a (formerly!) trusted domain.

Nearly a decade after Panda, eHow's rankings still haven't recovered.

Back when I got started with SEO the phrase Indian SEO company was associated with cut-rate work where people were buying exclusively based on price. Sort of like a "I got a $500 budget for link building, but can not under any circumstance invest more than $5 in any individual link." Part of how my wife met me was she hired a hack SEO from San Diego who outsourced all the work to India and marked the price up about 100-fold while claiming it was all done in the United States. He created reciprocal links pages that got her site penalized & it didn't rank until after she took her reciprocal links page down.

With that sort of behavior widespread (hack US firm teaching people working in an emerging market poor practices), it likely meant many SEO "best practices" which were learned in an emerging market (particularly where the web was also underdeveloped) would be more inclined to being spammy. Considering how far ahead many Western markets were on the early Internet & how India has so many languages & how most web usage in India is based on mobile devices where it is hard for users to create links, it only makes sense that Google would want to place more weight on end user data in such a market.

If you set your computer location to India Bing's search box lists 9 different languages to choose from.

The above is not to state anything derogatory about any emerging market, but rather that various signals are stronger in some markets than others. And competition is stronger in some markets than others.

Search engines can only rank what exists.

"In a lot of Eastern European - but not just Eastern European markets - I think it is an issue for the majority of the [bream? muffled] countries, for the Arabic-speaking world, there just isn't enough content as compared to the percentage of the Internet population that those regions represent. I don't have up to date data, I know that a couple years ago we looked at Arabic for example and then the disparity was enormous. so if I'm not mistaken the Arabic speaking population of the world is maybe 5 to 6%, maybe more, correct me if I am wrong. But very definitely the amount of Arabic content in our index is several orders below that. So that means we do not have enough Arabic content to give to our Arabic users even if we wanted to. And you can exploit that amazingly easily and if you create a bit of content in Arabic, whatever it looks like we're gonna go you know we don't have anything else to serve this and it ends up being horrible. and people will say you know this works. I keyword stuffed the hell out of this page, bought some links, and there it is number one. There is nothing else to show, so yeah you're number one. the moment somebody actually goes out and creates high quality content that's there for the long haul, you'll be out and that there will be one." - Andrey Lipattsev – Search Quality Senior Strategist at Google Ireland, on Mar 23, 2016

Impacting the Economics of Publishing

Now search engines can certainly influence the economics of various types of media. At one point some otherwise credible media outlets were pitching the Demand Media IPO narrative that Demand Media was the publisher of the future & what other media outlets will look like. Years later, after heavily squeezing on the partner network & promoting programmatic advertising that reduces CPMs by the day Google is funding partnerships with multiple news publishers like McClatchy & Gatehouse to try to revive the news dead zones even Facebook is struggling with.

"Facebook Inc. has been looking to boost its local-news offerings since a 2017 survey showed most of its users were clamoring for more. It has run into a problem: There simply isn’t enough local news in vast swaths of the country. ... more than one in five newspapers have closed in the past decade and a half, leaving half the counties in the nation with just one newspaper, and 200 counties with no newspaper at all."

As mainstream newspapers continue laying off journalists, Facebook's news efforts are likely to continue failing unless they include direct economic incentives, as Google's programmatic ad push broke the banner ad:

"Thanks to the convoluted machinery of Internet advertising, the advertising world went from being about content publishers and advertising context—The Times unilaterally declaring, via its ‘rate card’, that ads in the Times Style section cost $30 per thousand impressions—to the users themselves and the data that targets them—Zappo’s saying it wants to show this specific shoe ad to this specific user (or type of user), regardless of publisher context. Flipping the script from a historically publisher-controlled mediascape to an advertiser (and advertiser intermediary) controlled one was really Google’s doing. Facebook merely rode the now-cresting wave, borrowing outside media’s content via its own users’ sharing, while undermining media’s ability to monetize via Facebook’s own user-data-centric advertising machinery. Conventional media lost both distribution and monetization at once, a mortal blow."

Google is offering news publishers audience development & business development tools.

Heavy Investment in Emerging Markets Quickly Evolves the Markets

As the web grows rapidly in India, they'll have a thousand flowers bloom. In 5 years the competition in India & other emerging markets will be much tougher as those markets continue to grow rapidly. Media is much cheaper to produce in India than it is in the United States. Labor costs are lower & they never had the economic albatross that is the ACA adversely impact their economy. At some point the level of investment & increased competition will mean early techniques stop having as much efficacy. Chinese companies are aggressively investing in India.

“If you break India into a pyramid, the top 100 million (urban) consumers who think and behave more like Americans are well-served,” says Amit Jangir, who leads India investments at 01VC, a Chinese venture capital firm based in Shanghai. The early stage venture firm has invested in micro-lending firms FlashCash and SmartCoin based in India. The new target is the next 200 million to 600 million consumers, who do not have a go-to entertainment, payment or ecommerce platform yet— and there is gonna be a unicorn in each of these verticals, says Jangir, adding that it will be not be as easy for a player to win this market considering the diversity and low ticket sizes.

RankBrain

RankBrain appears to be based on using user clickpaths on head keywords to help bleed rankings across into related searches which are searched less frequently. A Googler didn't state this specifically, but it is how they would be able to use models of searcher behavior to refine search results for keywords which are rarely searched for.

In a recent interview in Scientific American a Google engineer stated: "By design, search engines have learned to associate short queries with the targets of those searches by tracking pages that are visited as a result of the query, making the results returned both faster and more accurate than they otherwise would have been."

Now a person might go out and try to search for something a bunch of times or pay other people to search for a topic and click a specific listing, but some of the related Google patents on using click data (which keep getting updated) mentioned how they can discount or turn off the signal if there is an unnatural spike of traffic on a specific keyword, or if there is an unnatural spike of traffic heading to a particular website or web page.

And, since Google is tracking the behavior of end users on their own website, anomalous behavior is easier to track than it is tracking something across the broader web where signals are more indirect. Google can take advantage of their wide distribution of Chrome & Android where users are regularly logged into Google & pervasively tracked to place more weight on users where they had credit card data, a long account history with regular normal search behavior, heavy Gmail users, etc.

Plus there is a huge gap between the cost of traffic & the ability to monetize it. You might have to pay someone a dime or a quarter to search for something & there is no guarantee it will work on a sustainable basis even if you paid hundreds or thousands of people to do it. Any of those experimental searchers will have no lasting value unless they influence rank, but even if they do influence rankings it might only last temporarily. If you bought a bunch of traffic into something genuine Google searchers didn't like then even if it started to rank better temporarily the rankings would quickly fall back if the real end user searchers disliked the site relative to other sites which already rank.

This is part of the reason why so many SEO blogs mention brand, brand, brand. If people are specifically looking for you in volume & Google can see that thousands or millions of people specifically want to access your site then that can impact how you rank elsewhere.

Even looking at something inside the search results for a while (dwell time) or quickly skipping over it to have a deeper scroll depth can be a ranking signal. Some Google patents mention how they can use mouse pointer location on desktop or scroll data from the viewport on mobile devices as a quality signal.

Neural Matching

Last year Danny Sullivan mentioned how Google rolled out neural matching to better understand the intent behind a search query.

The above Tweets capture what the neural matching technology intends to do. Google also stated:

we’ve now reached the point where neural networks can help us take a major leap forward from understanding words to understanding concepts. Neural embeddings, an approach developed in the field of neural networks, allow us to transform words to fuzzier representations of the underlying concepts, and then match the concepts in the query with the concepts in the document. We call this technique neural matching.

To help people understand the difference between neural matching & RankBrain, Google told SEL: "RankBrain helps Google better relate pages to concepts. Neural matching helps Google better relate words to searches."

There are a couple research papers on neural matching.

The first one was titled A Deep Relevance Matching Model for Ad-hoc Retrieval. It mentioned using Word2vec & here are a few quotes from the research paper

  • "Successful relevance matching requires proper handling of the exact matching signals, query term importance, and diverse matching requirements."
  • "the interaction-focused model, which first builds local level interactions (i.e., local matching signals) between two pieces of text, and then uses deep neural networks to learn hierarchical interaction patterns for matching."
  • "according to the diverse matching requirement, relevance matching is not position related since it could happen in any position in a long document."
  • "Most NLP tasks concern semantic matching, i.e., identifying the semantic meaning and infer"ring the semantic relations between two pieces of text, while the ad-hoc retrieval task is mainly about relevance matching, i.e., identifying whether a document is relevant to a given query."
  • "Since the ad-hoc retrieval task is fundamentally a ranking problem, we employ a pairwise ranking loss such as hinge loss to train our deep relevance matching model."

The paper mentions how semantic matching falls down when compared against relevancy matching because:

  • semantic matching relies on similarity matching signals (some words or phrases with the same meaning might be semantically distant), compositional meanings (matching sentences more than meaning) & a global matching requirement (comparing things in their entirety instead of looking at the best matching part of a longer document); whereas,
  • relevance matching can put significant weight on exact matching signals (weighting an exact match higher than a near match), adjust weighting on query term importance (one word might or phrase in a search query might have a far higher discrimination value & might deserve far more weight than the next) & leverage diverse matching requirements (allowing relevancy matching to happen in any part of a longer document)

Here are a couple images from the above research paper

And then the second research paper is

Deep Relevancy Ranking Using Enhanced Dcoument-Query Interactions
"interaction-based models are less efficient, since one cannot index a document representation independently of the query. This is less important, though, when relevancy ranking methods rerank the top documents returned by a conventional IR engine, which is the scenario we consider here."

That same sort of re-ranking concept is being better understood across the industry. There are ranking signals that earn some base level ranking, and then results get re-ranked based on other factors like how well a result matches the user intent.

Here are a couple images from the above research paper.

For those who hate the idea of reading research papers or patent applications, Martinibuster also wrote about the technology here. About the only part of his post I would debate is this one:

"Does this mean publishers should use more synonyms? Adding synonyms has always seemed to me to be a variation of keyword spamming. I have always considered it a naive suggestion. The purpose of Google understanding synonyms is simply to understand the context and meaning of a page. Communicating clearly and consistently is, in my opinion, more important than spamming a page with keywords and synonyms."

I think one should always consider user experience over other factors, however a person could still use variations throughout the copy & pick up a bit more traffic without coming across as spammy. Danny Sullivan mentioned the super synonym concept was impacting 30% of search queries, so there are still a lot which may only be available to those who use a specific phrase on their page.

Martinibuster also wrote another blog post tying more research papers & patents to the above. You could probably spend a month reading all the related patents & research papers.

The above sort of language modeling & end user click feedback compliment links-based ranking signals in a way that makes it much harder to luck one's way into any form of success by being a terrible speller or just bombing away at link manipulation without much concern toward any other aspect of the user experience or market you operate in.

Pre-penalized Shortcuts

Google was even issued a patent for predicting site quality based upon the N-grams used on the site & comparing those against the N-grams used on other established site where quality has already been scored via other methods: "The phrase model can be used to predict a site quality score for a new site; in particular, this can be done in the absence of other information. The goal is to predict a score that is comparable to the baseline site quality scores of the previously-scored sites."

Have you considered using a PLR package to generate the shell of your site's content? Good luck with that as some sites trying that shortcut might be pre-penalized from birth.

Navigating the Maze

When I started in SEO one of my friends had a dad who is vastly smarter than I am. He advised me that Google engineers were smarter, had more capital, had more exposure, had more data, etc etc etc ... and thus SEO was ultimately going to be a malinvestment.

Back then he was at least partially wrong because influencing search was so easy.

But in the current market, 16 years later, we are near the infection point where he would finally be right.

At some point the shortcuts stop working & it makes sense to try a different approach.

The flip side of all the above changes is as the algorithms have become more complex they have went from being a headwind to people ignorant about SEO to being a tailwind to those who do not focus excessively on SEO in isolation.

If one is a dominant voice in a particular market, if they break industry news, if they have key exclusives, if they spot & name the industry trends, if their site becomes a must read & is what amounts to a habit ... then they perhaps become viewed as an entity. Entity-related signals help them & those signals that are working against the people who might have lucked into a bit of success become a tailwind rather than a headwind.

If your work defines your industry, then any efforts to model entities, user behavior or the language of your industry are going to boost your work on a relative basis.

This requires sites to publish frequently enough to be a habit, or publish highly differentiated content which is strong enough that it is worth the wait.

Those which publish frequently without being particularly differentiated are almost guaranteed to eventually walk into a penalty of some sort. And each additional person who reads marginal, undifferentiated content (particularly if it has an ad-heavy layout) is one additional visitor that site is closer to eventually getting whacked. Success becomes self regulating. Any short-term success becomes self defeating if one has a highly opportunistic short-term focus.

Those who write content that only they could write are more likely to have sustained success.

How The Internet Happened: From Netscape to the iPhone

Brian McCullough, who runs Internet History Podcast, also wrote a book named How The Internet Happened: From Netscape to the iPhone which did a fantastic job of capturing the ethos of the early web and telling the backstory of so many people & projects behind it's evolution.

I think the quote which best the magic of the early web is

Jim Clark came from the world of machines and hardware, where development schedules were measured in years—even decades—and where “doing a startup” meant factories, manufacturing, inventory, shipping schedules and the like. But the Mosaic team had stumbled upon something simpler. They had discovered that you could dream up a product, code it, release it to the ether and change the world overnight. Thanks to the Internet, users could download your product, give you feedback on it, and you could release an update, all in the same day. In the web world, development schedules could be measured in weeks.

The part I bolded in the above quote from the book really captures the magic of the Internet & what pulled so many people toward the early web.

The current web - dominated by never-ending feeds & a variety of closed silos - is a big shift from the early days of web comics & other underground cool stuff people created & shared because they thought it was neat.

Many established players missed the actual direction of the web by trying to create something more akin to the web of today before the infrastructure could support it. Many of the "big things" driving web adoption relied heavily on chance luck - combined with a lot of hard work & a willingness to be responsive to feedback & data.

  • Even when Marc Andreessen moved to the valley he thought he was late and he had "missed the whole thing," but he saw the relentless growth of the web & decided making another web browser was the play that made sense at the time.
  • Tim Berners-Lee was dismayed when Andreessen's web browser enabled embedded image support in web documents.
  • Early Amazon review features were originally for editorial content from Amazon itself. Bezos originally wanted to launch a broad-based Amazon like it is today, but realized it would be too capital intensive & focused on books off the start so he could sell a known commodity with a long tail. Amazon was initially built off leveraging 2 book distributors ( Ingram and Baker & Taylor) & R. R. Bowker's Books In Print catalog. They also did clever hacks to meet minimum order requirements like ordering out of stock books as part of their order, so they could only order what customers had purchased.
  • eBay began as an /aw/ subfolder on the eBay domain name which was hosted on a residential internet connection. Pierre Omidyar coded the auction service over labor day weekend in 1995. The domain had other sections focused on topics like ebola. It was switched from AuctionWeb to a stand alone site only after the ISP started charging for a business line. It had no formal Paypal integration or anything like that, rather when listings started to charge a commission, merchants would mail physical checks in to pay for the platform share of their sales. Beanie Babies also helped skyrocket platform usage.
  • The reason AOL carpet bombed the United States with CDs - at their peak half of all CDs produced were AOL CDs - was their initial response rate was around 10%, a crazy number for untargeted direct mail.
  • Priceline was lucky to have survived the bubble as their idea was to spread broadly across other categories beyond travel & they were losing about $30 per airline ticket sold.
  • The broader web bubble left behind valuable infrastructure like unused fiber to fuel continued growth long after the bubble popped. The dot com bubble was possible in part because there was a secular bull market in bonds stemming back to the early 1980s & falling debt service payments increased financial leverage and company valuations.
  • TED members hissed at Bill Gross when he unveiled GoTo.com, which ranked "search" results based on advertiser bids.
  • Excite turned down offering the Google founders $1.6 million for the PageRank technology in part because Larry Page insisted to Excite CEO George Bell ‘If we come to work for Excite, you need to rip out all the Excite technology and replace it with [our] search.’ And, ultimately, that’s—in my recollection—where the deal fell apart.”
  • Steve Jobs initially disliked the multi-touch technology that mobile would rely on, one of the early iPhone prototypes had the iPod clickwheel, and Apple was against offering an app store in any form. Steve Jobs so loathed his interactions with the record labels that he did not want to build a phone & first licensed iTunes to Motorola, where they made the horrible ROKR phone. He only ended up building a phone after Cingular / AT&T begged him to.
  • Wikipedia was originally launched as a back up feeder site that was to feed into Nupedia.
  • Even after Facebook had strong traction, Marc Zuckerberg kept working on other projects like a file sharing service. Facebook's news feed was publicly hated based on the complaints, but it almost instantly led to a doubling of usage of the site so they never dumped it. After spreading from college to college Facebook struggled to expand ad other businesses & opening registration up to all was a hail mary move to see if it would rekindle growth instead of selling to Yahoo! for a billion dollars.

The book offers a lot of color to many important web related companies.

And many companies which were only briefly mentioned also ran into the same sort of lucky breaks the above companies did. Paypal was heavily reliant on eBay for initial distribution, but even that was something they initially tried to block until it became so obvious they stopped fighting it:

“At some point I sort of quit trying to stop the EBay users and mostly focused on figuring out how to not lose money,” Levchin recalls. ... In the late 2000s, almost a decade after it first went public, PayPal was drifting toward obsolescence and consistently alienating the small businesses that paid it to handle their online checkout. Much of the company’s code was being written offshore to cut costs, and the best programmers and designers had fled the company. ... PayPal’s conversion rate is lights-out: Eighty-nine percent of the time a customer gets to its checkout page, he makes the purchase. For other online credit and debit card transactions, that number sits at about 50 percent.

Here is a podcast interview of Brian McCullough by Chris Dixon.

How The Internet Happened: From Netscape to the iPhone is a great book well worth a read for anyone interested in the web.

Pages