Full Circle Resource Kits

Getting StartedDatabase ChoicesQuestion to QueryOperatorsHoming InBrowsingWeb 2.0EvaluationEthical Use

Just Added

Searching the Cache

Limiting a search to the past offers some advantages. First, the pages retrieved match the snippet descriptions. Cached Web pages always contain the keywords promised by the snippets. That is not always the case when going from a snippet to a live page. Moreover, the page listed will always be retrieved--no more 'Page Not Found' messages. Another advantage of searching the cached page is that keywords matching the query are highlighted and easier to spot (BBC example). This can really be a benefit to searchers, including younger students, who are challenged by scanning text to find keywords.

Three ways to search a cache or database

three fingers

Search Engine -- by far the most efficient and powerful means of finding archived information. Requires the ability to translate a question into a query and select effective keywords and operators. The effectiveness of a search engine is limited by the number of items collected by crawlers and indexed in an associated database. Looking for information that is indexed in a different database than the one being used involves Deep Web searching. There are many search engines, including Google, Yahoo, Ask, JSTOR and the WayBackMachine.

Subject Directory -- an effective means for finding information that has been categorized by editors. Requires browsing and interpreting keyword categories. Limited to a relatively small number of records selected by directory editors and constrained by the keywords they use. A directory may consist of snippets and/or links and represents a fraction of the contents of the complete database. Subject Directories are available at Google, Yahoo and many other sites.

Browsing -- a manual approach to searching that involves scanning text and clicking relevant links found in a subject directory or live Web page. Effective browsing requires experimentation and prediction: being able to select a link that is likely to advance the search in the direction of its objective. Limited only by the user's knowledge of specific URLs to visit and links connecting one page to another.

 

Problems searching the past

Most of the time, collecting information from the past is not a problem. Unless up-to-the-minute information or statistics are needed, a search engine will return the greatest number of documents and sources. But there are cases in which searching a cache becomes problematic.

  1. Should a Web page's URL change, a cached copy's live link will no longer work. The "broken" copy in the cache will be retained only until its crawlers try to revisit the page. If they cannot find the page, the cache is vacated, wiping out all traces of the former information. A different database may still have the information as long as its crawlers have not tried to revisit the live page. Broad-topic databases like archive.org and subject-specific ones like mathforum.org will retain archived copies long after Google and MSN have deleted theirs.
  2. If a "Page Not Found" is encountered when clicking on a snippet's live title link, go back and click the Cached copy instead, if available.
  3. Not all archived information is accessible with a search engine. In the BBC example above for instance, Yahoo does not link to its cache for the BBC home page. Neither does Ask. That means it is possible to retrieve information based on what's stored in the cache, only to find that the live page no longer contains that information. This actually happens a lot. The only way to find the information cited in the snippet is to search the live site, hoping it has a search engine, subject directory or links to previous versions of the page.

 

Back