Google’s search algorithms get a lot of play and not enough people are paying attention to the fact that Google’s contextual ad network still enjoys a technical superiority to its peers that it long lost in the search relevancy algorithm. Simply put, some search competitors are doing a decent job with search relevancy but still seem to be nowhere when it comes to serving relevant ads.
I’d like to share some of my thoughts on the Adsense algorithm, which I will revisit in detail in the future. Given the secrecy of the sauce I will not try to prove what is and what is not the Google Adsense algorithm and will take the approach that any SEO worth his salt should and speculate as to what the algorithm should be, and what current technology is capable of.
At its simplest level I believe the algorithm should, and likely does, work like this:
- Attempt to determine page context and serve a contextually relevant ad
- Use clickstream data to determine what the user might be interested in and serve an ad that may not be contextually relevant.
- Use basic demographic data (e.g. geolocation) to attempt to target ad relevance to the user.
The premise is simple, the context of the page is a strong indication about what the user will click on and is the first priority of the algorithm. You may know that the user was interested in other, potentially more profitable, subjects but that the user is on that page now is a fairly good indication of what the user is interested in at that particular moment.
But then again it isn’t always the case, and clickstream data can help identify what the user is really interested in. For example, the user’s previous searches can indicate what is really meant for the query “apple”, but even more immediately relevant is that Google often knows where you were right before you got to the page. And with increasing frequency, it was Google itself.
This is the single biggest reason that clickstream data must be a part of the Google algorithm. It’s much easier to determine context from a user-input query. That’s why other search engines are starting to compete with Google in relevance on most queries. If Google knows what the user searched for before clicking on this page they have a variable that rivals page context in relevance to the user. If they know the user searched for “buy a green widget in San Diego” and landed on a general page about green widgets they would be foolish not to use the additional context they know about the user (the location specific subset that they are looking for) in their attempt to serve an ad the user is most likely to click.
The “session context” as I, as of a moment ago, like to call it in the clickstream would be weighed heavily with page context in my algo, and historic clickstream data would follow at a distance. If you know they area always looking for certain widgets and you don’t have a great ad for the page or session context then an ad about what the user has expressed past interest in is the next best thing. Google has a lot of clickstream data from their own web properties as well as others through sites running Adsense itself as well as their free log analytics service they provide to webmasters in exchange for the data. For example, they could know what you searched for on Yahoo when you land on a page with their own ads or log tracking and it’s precisely such examples that they can use to their benefit. Search history presents the easiest contextualization opportunities because the user has given the context in a string. Other clickstream data requires a lot more guesswork and for these reasons I think that Google should, and does, focus on mainly search related clickstream data. Given my read on their corporate culture, I’m not sure if they are doing this outside of their own web properties, as in my Yahoo example, but they should and I can’t imagine that they don’t for their own search engine.
Lastly you can throw anything else you know about the user. You have the IP and can map to geodata in a simple example, like showing the user an ad for a local restaurant. And you can even get fancy and use aggregate trends (e.g. people searching in a certain area might be in town for a certain reason, come up with your own specifics) and other logical deductions (i.e. “wild guesses” like searching in English from Mexico might mean you are interested in a hotel). I think focusing your efforts is a big part of the end result of this kind of work and believe that if Google uses any of this fall back data they do it simply. Why spend time on a complicated algorithm to generate poor guesses when you can spend more time nailing the real priorities like page context?
In another post, I’ll break down the on-page context algorithm possibilities but I’m out of time for today.