Among computer technologies, the Internet gives rise to a remarkable array of ethical, political, legal, and social issues. (There would not be much to say about the politics of compilers).

The Politics of Search Engines

Defining the Web: The Politics of Search Engines Lucas Introna and Helen Nissenbaum, Computer 2000.
The tools that navigate the astronomical number of pages ... favor popular, wealthy, and powerful sites at the expense of others.

Through the Google Goggles: Sociopolitical Bias in Search Engine Design, Alejandro Diaz, in A. Spink and M. Zimmer (eds.) Web Search: Multidisciplinary Perspectives, Springer, 2008.

The web is a public resource, and the search engine is profiting from the use of this resource which it has not paid to build. Moreover, at this point the search engines substantially control the access of the public to the resource. (Of course, unlike a mining company on public land, they don't consume the resource or block anyone from finding other forms of access.) Still, the public has a legitimate interest in how they are operating.

Suppression of unpopular views

Unpopular views get low ranking, in part because of PageRank.
Crawlers may be using PageRank, so sites linked from unpopular sites may not even get indexed (or may get indexed rarely).

Bias against new pages. "Entrenchment effect".

Caveat: "Small players still matter", in part because the correlation between rank order and page rank in any given query is not large (query specific criteria are dominant).

Advertising

Most users don't distinguish between the sponsored links and the organic results.

Policy on what ads to accept are arbitrary and unfair. Wine is OK, beer is not; pornography is OK but guns are not. "When the nonprofit environmental advocacy group Oceana tried to run ads on google they were rejected because the organizations site was critical of Royal Carribean Cruise Lines, a Google advertiser."

Search Oligopoly

The eternal answer of doctinaire free-market enthusiasts: "If Google is doing something wrong, then whoever can do it better can make a lot of money" was largely true in 1998, but much less true now that the costs of building a competitive new general search engine are prohibitive.

Web site design

(Discussed by Introna and Nissenbaum, but curiously not by Diaz). Rich players are able to hire web site designers to organize web sites in such a way that the search engines will rank them higher.

Introna and Nissenbaum proposal

We would demand full and truthful disclosure of the underlying algorithms that govern indexing, searching, and prioritizing, stated in a way meaningful to most Web users. Although such information might help spammers, we argue otherwise. Would not the impact of spammer's unethical practices be severely dampened if both seekers and those wishing to be found became aware of the particular biases inherent in any given search engine?

****************************************************************

Do Web search engines suppress controversy Susan Gerhart, First Monday 9:1 2004.

Five controversial subjects:

Will a search engine user who is serching for general information find out about the controversy in the course of searching? Yes for St. John's Wort and female astronauts, no for distance learning, Einstein, and Belize. Explanations: Organizational clout and sheer volume of the conventional wisdom (particularly Einstein).

Conclusions:

Recommendations:

****************************************************************

Shuffling a stacked deck: the case for partially randomized search engine results, Pandey et al., VLDB 2005.

Page Quality: In Search of an Unbiased Web Junghoo Cho, Souras Roy, Robert Adams, SIGMOD 2005.

Entrenchment effect:

Heavy reliance on a search engine that ranks results according to popularity can delay widespread awareness of a high-quality page by a factor of over 60, compared with a simulated world without a search engine in which pages are accessed through browsing alone.
(Caveat: All results of this kind are necessarily based on models that are necessarily highly idealized, since you can't create an alternate world in which there are no search engines.)

Essentially the same as the cold start problem in recommender systems, but more acute because search engines are the main way in which users finds web pages, whereas recommender systems are not the main way in which buyers choose items.

Let P be a page; Define:

Relevant users for P = users interested in general topic of P.
Quality of P, denoted Q(P) = fraction of relevant users who would potentially like P (i.e. once they have found it)
Popularity of P = fraction of relevant users who have seen P and liked it.
R(P,T) = PageRank of P at time T.

Dynamic model. Assume that:

Then the PageRank of P satisfies the differential equation
dR(P,T)/dT = A * (R(P,T)-B) * Q(P) * (|U|-D*R(P,T))
(B is the same constant as above; A and D are two other constants)
which has the solution
R(P,T) = B + E * Q(P) / (1+F exp(-T))
(E and F are two more constants.)

As T goes to infinity, this converges to B + E*Q(P); so eventually the PageRank becomes a linear function of the quality, which is what is wanted. As T goes to minus infinity it converges to B. If R is much smaller than the eventual value B+E*Q(P) at time T0, then R is growing exponentially at time T0. This is thus a sigmoid curve with a sharp ascent. (It stays near B for a long time growing slowly, then climbs rapidly to close to B+ E*Q(P), then converges slowly toward B+E*Q(P). Classic sigmoid curve, symmetric around the half-way point.

Based on this, there are two proposals for getting the true quality of new pages recognized more quickly.

Proposal 1: Randomized rank promotion. From time to time put a page with low PageRank but high query relevance to a high rank in the results page. Obviously you can't do a lot of this, because you'll degrade the search results, but a little goes a long way, because of the positive feedback.

Proposal 2: Use the quantity

R(P,T) + C*[d R(P,T)/dt]/R(P,T)
as the measure of query-independent quality. Ideally, this is a linear function of the quality Q(P). Estimate the derivative by tracking the PageRank.

Proposal 1 degrades individual results pages in the short term, but presumably increases them over the long term. Proposal 2 should increase the value of individual results pages even in the short term, though it probably increases the variance (the probability of including a very low quality page).

Estimate the derivative by tracking the PageRank over time.

***********************************************************

The Democratizing Effects of Search Engine Use: On Chance Exposures and Organizational Hubs , A. Lev-On, in A. Spink and M. Zimmer (eds.) Web Search: Multidisciplinary Perspectives, Springer, 2008.

Panglossian view. Search engines are democratizing in two ways:

***********************************************************

Hate speech

Web links and search engine ranking: The case of Google and the query "jew" Judit Bar-Ilan

At one point the top Google page returned on query "jew" was JewWatch.com, an anti-semitic site. When this was noticed, it led to intense Google bombing on the part of both camps (the pro-Jew camp created links to the Wikipedia article). Google posted an explanation.

***********************************************************

The general question

The search engine serves a number of different communities: What are the ethical and political responsibilities of the search engine toward each of these?