Web Search Engines: Lecture 1

Required Reading

Suggested Further Reading

Structure of a Search Engine

Parallelism in downloading.

Courtesy toward server

Robot Exclusion Standard

MERCATOR

Early crawler, precursor to AltaVista, publically available, first clear detailed write-up.

(RIS = "RewindInputStream"; i.e. an input stream that can be reread arbitrarily often.)