FAQ's
![]() |
Why should I use more than a single Search Engine? No single search engine covers it all. In fact there are billions of pages of information that remain hidden to search engines. Since search engines differ in their crawling, indexing, and retrieval procedures, their results will vary. While overlap exists, each search engine will contain web pages that have been missed by the others. This means that identical queries made to different search engines will yield different results. For this reason alone, using more than a single search engine is a wise move. If you rely on a single source of information you will get an incomplete picture. |
These variations occur even with search engines that use the same database of information. For example, Netscape, and AOL, use the Google Database. However each of these organizations applies their own method for retrieving information from the Google database. Additional content from their other resources may be added to the search results. This is why identical queries will often return different results.
How 'fresh' or 'current' is the information we get from a search engine?
Search engines must first find, copy, and index a web resource before it is made available for retrieval. This process takes time. Crawlers may return to a page on a daily, weekly, or monthly basis. Additionally, author submitted web pages might take weeks or months to process. Once a page is in the database, it is only a copy of the original, which may have already changed. The best way to determine the 'currency' of a web page is to examine the original material, looking for a record of when the page was last updated.
How do search engines find web pages?
We have already seen how information discovery software, sometimes-called spiders or crawlers, automates the process of finding new web pages. Most search engines also allow authors to submit their pages directly. The author supplies the web address and some information about content. The sites are then crawled, indexed and made available. Some search engines allow web page authors to buy quick placement in their systems.
How much information on each page is actually indexed?
Some search engines make a copy of the entire web page. Others take a snapshot consisting of essential address information and the first few hundred words of text on the page. There is no guarantee that all of the pages on a site have been indexed. Usually, only the main pages are included in the search engine's database. Additionally some words are left out of the indexing process. Conjunctions, numbers and common words like 'web' or 'internet' might be excluded. These 'stop words' are removed to improve system speed and efficiency, but this practice can lead to lost information as well.
Are crawlers, spiders, and robotic information discovery the same thing? Do they work the same way?
Crawlers, spiders, and robotic information discovery all describe the process of automatic web page copying. This is an essential first step in the process of building and maintaining a search engine database. Because it is an automated process, crawlers work around the clock to find new sites and recheck sites for changes. Crawlers can be set to investigate a website in depth, visiting and copying every page. Crawlers might also just skim the surface content of a site, leaving a lot of information in the shadows.