Google Business Model

Required reading

Sponsored search: an overview of the concept, history, and technology Bernard J. Jansen and Tracy Mullen, International Journal of Electronic Business, 6:2 114-131.
Internet Advertising and the Generalized Second-Price Auction: Selling Billions of Dollars worth of Keywords, Benjamin Edelman, Michael Ostrovsky, Michael Schwartz, American Economic Review, 97:1, March 2007, 242-259. The introductory section and section I (up to the beginning of p. 248) are required; the rest is optional.

More reading

An Empirical Analysis of Search Engine Advertising: Sponsored Search in Electronic Markets, Anindya Ghose and Sha Yang, Management Science 55:10, 1605,
The Lane's Gifts vs. Google Report Alexander Tuzhilin
Online Advertising Fraud, by N. Daswani et al. in M. Jakobsson and Z. Ramzan (eds.) Crimeware Symantec Press, 2008. Linked from Publications --- Neil Daswani
The Goals and Challenges of Click Fraud Penetration Testing Systems, Kintana et al. International Symposium on Software Reliability Engineering 2009

Sponsored links

The opposite is "algorithmic" or "organic" results.

Mode of payment

Different risk and (though not mentioned in the articles I've seen) with trust: With CPI, the risk is taken by the sponsor, and the sponsor must trust the search engine. With CPA, the risk is taken by the search engine, and the search engine must trust the sponsor. With CPC, both sides can (to some degree) verify and the risk is divided. Thus, with CPC, the search engine has a strong direct incentive to place the ads correctly, whereas with CPI it has only a weaker incentive.

Also, cost per action only works for items that people buy over the internet, not for large-ticket items (cars) or small-ticket item (soap, soft drinks).

Bids placed on

Query words, time period, geographical location. Query specification can be somewhat complex; e.g. if you are selling chocolate truffles, you can bid on queries containing "truffle" but not on queries containing "truffle (not chocolate)". Total budget. Auction carried out on each individual query. (Cached for common queries? Not sure that it would be practical).

Not clear to me what the agreed on matching criterion is: whether it is the generalized Google match, which will presumably work better, but is opaque, or precise match, which is transparent. Also, I don't know how it interacts with personalized search.

AdSense program

Sponsored links on the results page comprise Google's AdWords program. In the AdSense program, Google arranges with a publisher of a web site to present an ad for a sponsor. This has two forms: Both Google and the publisher are paid CPC.

Form of auction

(Simplified: We will add a further complexity below). For detailed description, examples, see readings. VCG is theoretically better than second price auction, but Google etc. still use second price auction because:

Further complexity, at least for Google

It is in Google's interest (a) that ads with a high click-through rate are ranked highly (more total clicks = more revenue); (b) that ads be relevant and sites be high quality (searcher satisfaction means that the searcher will continue to use Google). Therefore, in doing the auction, Google does not use the bid as such; for ranking, it multiplies the bid by a "quality score" some combination of click-through rate and Google's secret formula. What you pay is the actual next lowest bid. The downside is that Google is now selling a non-transparent item --- that is, advertisers don't exactly know how their bid is going to work; this regularly leads to discontent. Apparently, this doesn't change the theoretical analysis as much as one might suppose.

Click Fraud

In a pay-per-click system, fraudulent clicks aimed at either One estimate: 10-15% of all clicks are fraudulent. Google says that less than 10% of all clicks are invalid. But all such estimates are guesstimates (Tuzhilin).

The problem is that the search engine has no direct incentive to eliminate click fraud; quite the reverse. So sponsors always suspect that they are doing less than they might about it.

(Tuzhilin) If Google and the sponsor could combine information, then click fraud would be easier to detect. But the sponsor don't want to tell Google which clicks have led to conversions, and Google does not tell the publishers which clicks it considered valid and which it considered invalid. (When Google tags a click as invalid, the sponsor is not billed, but the click goes through to the sponsor's site anyway (a) so as not to tip off the fraudster that he has been caught (b) so as not to annoy a legitimate user in the case of a false positive.

Tuzhilin says, based on confidential information that there is a sequence of rather simple filters. These eliminate most click spam because most click spam uses pretty simple-minded techniques. He thinks Google is doing a reasonable, good-faith effort.

Collaborative Filtering

More reading

Advances in Collaborative Filtering, Yehuda Koren and Robert Bell in F. Ricci et al (eds.) Recommender Systems Handbook Springer 2011.

Evaluating Recommendation Systems , Guy Shani and Asela Gunawardana, in Ricci et al. 2011.

Evaluating collaborative filtering recommender systems By Jonathan Herlocker, Joseph Konstan, Loren Terveen, and John Reidl, ACM Transations on Information Systems, vol. 22, No. 1, 2004, pp. 5-53. Out of date in some respects, but still very thorough in terms of the issues.

If you really want to read a lot,

read the other chapters of Recommender Systems Handbook F. Ricci et al. Springer 2011.

Recommender Systems

Netflix recommends movies to customers. Amazon recommends books. Based on:

Other applications

Neighborhood models

Nearest neighbors algorithm.
User-based vs. Item-based. Start with user-based.

To predict user U's evaluation of item V:

Note that to recommend, the task is to find the K highest rated items, not to predict the ratings on all movies. This is a somewhat different task.

Item-based method is dual:

Tradeoffs. Let N=number of users and M=number of items; presumably M is substantially less than N.

Alternate to nearest neighbors is clustering: Cluster user by item ratings. Predict that U's rating of V will be the average rating of V in U's cluster. Or dually cluster items by users. Or do both:

Cluster both movies and users .

1. Choose fixed number #V of item clusters and number #U of user clusters.
2. Form initial clusters based on the ratings vectors.
3. repeat  {
4.     for each user UI, 
5.        define a vector UUI[1...#M] s.t. 
6.            UUI[J] = number of movies in Jth cluster that UI likes;
7.     cluster the vectors UUI; 
8.     for each movie VJ, 
9.        define a vector VVJ[1...#U] s.t. 
10.           VVJ[I] = number of users in Ith cluster that like VJ;
       cluster the vectors VVJ; 
    }
11. until nothing changes.

Singular Value Decomposition

Model: There is a set of K item features. Each item is a vector of features; V[F] is the degree to which item V possesses F. Each user is a vector of features: U[F] is the degree to which user U values F. Assume that U's rating of V is approximately the dot product of U dot V = ΣF U[F] * V[F]. So the idea is

Algorithm: Iterative, hill climbing. See reading.

Mathematical connection:

Temporal issues

I.e. recent ratings should count more than older ones. Individual tastes change; fashions change; external events make items popular/important. I will not discuss techniques, but see Koren and Bell.

Evaluation

Offline evaluation

Basic method: Separate out a test set of individual ratings, and see how accurately the CF algorithm predicts them.

Problems:

User Studies

Usual pros and cons. Pro: Much more informative data, much more manipulable experiment. Cons: Much higher cost per datum. Also, experimental subjects try to please the experimenter, leading to skewed results.

Online studies

Many of same difficulties as in offline studies. You don't dare try out a system whose quality you are not very sure of.

Inference from positive data

In many cases the data set contains only "True" and "Null" e.g. the data set of what customers have bought what item. The inference that a null item is rejected is extremely weak, though non-zero. The CF algorithms actually still work reasonably well, but evaluation becomes very problematic; we can't either assume that a null rating is a negative rating or evaluate only over positive ratings.

The sparse data problem

Suppose that the users are ranking the items on a scale from 1 to 10, so that non-votes and negative votes are no longer confounded. Still, the data is very sparse (most users have not ranked most objects). This raises its own problems for offline evaluation.

1. The recorded votes are generally a non-representative sample of all potential votes, since users tend to be more interested in ranking items they like than items they dislike. If algorithms are evaluated for their accuracy over the recorded votes recorded in the dataset, this creates a bias in favor of algorithms that tend to give unduly many favorable votes. If you try to fix this by treating a non-vote as a somewhat negative vote, then that creates a bias in favor of algorithms that tend to produce somewhat negative votes.

2. There are other subtler biases. For example, suppose that item I has only been evaluated by one user U. What is an algorithm to do about recommending I to other users? Well, some algorithms will recommend it to users who resemble U, some will not, but we have no way to measuring which is the better strategy. If [U,I] is in the training set, then we never test whether [U1,I] is a valid recommendation, because we don't have its value for any U1 != U. And if [U,I] is in the test set, then we have no basis for recommending it to U, because we have no evaluations of I in the training set.

Beyond accuracy

Coverage: :
"Prediction coverage": For what fraction of pairs [U,I] does the system provide a recommendation?
"Catalog coverage": What fraction of items I are recommended to someone?
"Useful coverage" (analogous to recall): What is the likelihood that an item actually useful to a given user will be recommended to him?

Learning rate: How soon can the system start to make recommendations to a user?

Novelty: There is no point to recommending to grocery shoppers that they buy milk, bread, and bananas. There is no point in recommending the Beatles "White Album" to music shoppers. If the user has highly rated 6 books by an author, it doesn't take a lot of brains to recommend the 7th. Distinguish novelty from serendipity: A recommendation is serendipitous if the user would have been unlikely to find the item otherwise.

Strength: How much does the system think the user will like the iterm? vs.
Confidence: How sure is the system of its own recommendation?

Trust: The recommender system wants to inspire confidence in itself. For that purpose, it actually pays to occasionally recommend items that the user already knows about and likes; this works against novelty.

User Interface: Additional material about the item (picture, snippet etc.); explanation of the recommendation (similar items bought by user etc.)