Continued from Page 1 Authorities and hubs Even though the Web’s link structure can reveal notions ofauthority, it’s not possible to apply text-based methods to collect manypotentially relevant pages, and then comb them for the most authoritative ones.For example, if we were to look for the Web’s main search engines, we woulderr badly if we searched only for "search engines". Although the setof pages containing this term is enormous, it doesn’t contain most of thenatural authorities we would expect to find, such as Yahoo!, Excite, Infoseekand AltaVista. Similarly we can’t expect Honda’s or Toyota’s home pages tocontain the words "Japanese automobile manufacturers", nor Microsoft’sor Lotus’ home pages to contain the words "software companies". This difficulty arises mainly because many links lacksemantic content, that is, although most links represent the type of endorsementwe seek (for example, a software engineer whose home site links to Microsoft andLotus), others are created for reasons that have nothing to do with conferringauthority. Some links exist purely for navigational purposes: "Click hereto return to main menu". Others serve as paid advertisements: "Thevacation of your dreams is only a click away". The question then arises: how do we model the way in whichauthority is conferred on the Web? Clearly, when commercial or competitiveinterests are at stake, most organizations will perceive no benefit from linkingdirectly to another one. For example, AltaVista, Excite, and Infoseek may all beauthorities for the topic "search engines", but will be unlikely toendorse one another directly. If, as in the above example, the major search engines don’texplicitly describe themselves as authorities, how can we determine that theyare indeed the most authoritative pages on search engines? We can say that theyare authorities because many relatively anonymous pages, clearly relevant to"search engines", link to AltaVista, Excite, and Infoseek. Such"anonymous" pages are hubs that link to a collection of prominentsites on a common topic. Hub pages appear in a variety of forms, ranging fromprofessionally-assembled resource lists on commercial sites to lists ofrecommended links on individual home pages. These pages need not be prominentthemselves, or even have any links pointing to them. Their distinguishingfeature is that they confer authority on a focused topic. In this way, theyactually form a symbiotic relationship with authorities. Thus, we can say that agood authority is a page pointed to many good hubs, while a good hub is a pagethat points to many good authorities. This relationship between authorities and hubs is central toexploring link-based methods of search, automated compilation of high-qualityWeb resources, and discovery of cohesive Web communities. HITS: Computing authorityand hub scores The Hyperlink Induced Topic Search (HITS) algorithm by JonKleinberg, which is the backbone of Clever search, computes lists of authoritiesand hubs for Web search topics. Beginning with a search topic specified by oneor more query terms, HITS creates a focused sample of several thousand Web pageslikely to be rich in relevant authorities, and determines the estimated weightsof hub and authority. Such a technique can uncover Web communities, defined by aspecific interest, which even a human-assisted search engine like Yahoo! mayoverlook.
|