Searching and browsing for content are probably the most common tasks performed on the Web. Consequently, search engines and directories have become the most popular Websites today. However, despite the great value added by these sites,service providers face two problems. One, the rapid growth of the Web keeps search engines on their toes as they struggle to scale up hardware and software.According to a recent survey, there are more than one billion Web pages online today. This amounts to more than 10 terabytes of text. However, the majority of traditional search engines can cover only a fraction of these. Moreover, an increasing amount of information gets hidden behind search forms, or gets stored in databases not directly accessible to search engines. The other problem, and the more serious one, is the difficulty in matching your needs with the information available on the Web. In this series of articles, we introduce you to technologies that are making their way into Web-information management products. This article starts with traditional search technologies and moves on to some popular search engines. Traditional search technologies Traditionally, the process of information retrieval was restricted to static data present either in the form of a table, a file, or at the most, a collection of data files. Data was nothing more than a collection of records, each of them associated with unique keys. A search algorithm would accept an argument "a" and try to find a record whose key was"a". Decades of research in information retrieval resulted in stables olutions like Verity, AltaVista, Fulcrum, ZyLAB, PLS, Open-Text and Lexis-Nexis. Researchers at McGill University, Montreal, developed a very early Internet search engine—called Archie—in 1990. Archie searches the files on Internet FTP servers. Two more search gopher servers—Veronica in 1992 and Jughead in 1993—followed Archie. In the case of traditional search technologies, you enter a keyword, or a key phrase (keywords along with Boolean modifiers, such as "and", "or", "not") into a search service, which then scans an index of Web pages for the keywords. To determine the order in which to display pages, the engine uses an algorithm to rank sites that contain the keyword(s). For example, the engine may count the number of times the keyword appears on a page. Or it may look for keywords in metatags. A metatag is an HTML tag that provides information about a Web page.
|