This article is in continuation of last month’s piece onsearch engines (page 71). There we discussed two issues—first, that even thelargest of engines can’t keep pace with the stupendous growth of the Web, andsecond, today’s one-size-fits-all approach to search must yield touser-adaptive searches, which learn from the past behavior of users. In this article, we’ll take a look at technologies behind,what can be termed as, second-generation search engines—engines that exploitthe so-called social network of the World Wide Web. These technologies use acombination of statistics, pattern recognition, machine learning, and artificialintelligence to analyze sources of information and extract useful patterns fromthem. Filtering information A common approach for filtering information on the Net usestwo strategies: filter by relevance and filter by quality or popularity.Distilling a general topic to a size that will make sense to a user involvesidentifying the most definitive or authoritative Web pages on that topic. Thisis done by locating not only a set of relevant pages, but also those pages thatare of the highest quality. Relevance is handled, to an extent, by keywordmatching. Another approach, that of tapping into the social network ofthe World Wide Web, can also derive notions of authority. The social network canbe accessed through hyperlinks that contain enormous amounts of latent humanannotation. Specifically, the creation of a hyperlink by the author of a Webpage represents an implicit endorsement of the page. Looking at suchendorsements collectively can give a better understanding of the relevance andquality of the page’s contents. Google and Clever are two search engines that use bothapproaches of filtering and social-network analysis to search the Internet.Google (www.google.com) was developed atStanford, while researchers at IBM began the development of the Clever system (www.almaden.ibm.com/cs/k53/clever.html).Google analyzes hyperlinks to uncover the best pages on the Internet on a giventopic. Clever goes a step ahead and generates good starting points for Webnavigation on a given topic. Clever gives two types of pages: authorities, whichprovide the best source of information on a given topic, and hubs, which providecollections of links to authorities. For Google, the measure of authority of a page isproportional to the total authority of all the pages that cite it. We will focusan Clever, since Google can be regarded as a special implementation of Clever.
|