How Can You Search For Documents In Non-HTML Formats?

Cartoon: Librarian stacks books in a computer

Search engines were first built to find HTML pages for keyword searching.  HTML pages are often rich in text and follow common conventions that are easy to index.  At first, search engine crawlers ignored relatively rare non-html files that contained little text.  As search technology matured and market pressures created a more competitive environment, the large commercial search engines began to index non-HTML file formats.

Now, search engines are adding PDF, Microsoft Application, image, video, and audio files to their search indexes. Most major search engines make it easy to do simple searches for image files.  Additionally, new, specialized search tools have been developed that focus on multimedia formats. Still the results of your searches will usually be HTML web pages, non-HTML files remain a small subset of information on most search engines. 

Searching for information in alternate file formats can enrich your results. You can search for alternate formats in a number of ways.  Many search engines now provide specialized search tools for image formats.  Many search engines and meta-search engines have advanced options that will help you locate other file types.  Additionally, highly specialized search engines that focus on multimedia formats are now available. 

Authored by Dennis O'Connor