SEARCH HELP FOR WEB ARCHIVE COLLECTIONS
The search tool used to provide full-text access to the Library's Web archive collections is powered by the open-source search engine, Nutch. The programming team at the Archive-It project are continually adding features that will make searching our collections easier and more streamlined. As such, this help page will be revised periodically. You are encouraged to return often in order to see what new features have been added.
Generally, your search results are ranked by relevance according to several factors:
- how often the query terms appear in the page relative to how often they appear throughout the collection
- how often the query terms appear in the page compared to the length of the page
- whether the query terms appear in the URL
- whether the query terms appear in the hostname
You can execute advanced searches using some of the following tricks:
- Boolean search default is "and"
- for example, if you enter tax reform in the search field, the engine will search for tax and reform, not just tax reform as two words adjacent to one another. But, remember, both words must be present in the page to end up in your results list!
- If you want results about taxes, but none with any reference to the tax reform—or any kind of reform for that matter—use a minus sign next to the term you do not want searched
- If you know that what you're looking for is in a specific type of file, you can limit your search to just that format by adding type:[file type] to your search terms
- for example, a PDF document about tax reform might be found using the following string: tax reform type:pdf
- If you want to find out about a topic discussed specifically on Governor Warner's archived Web site, you can limit your search by adding site:[URL of archived site] to your search terms
- for example, information about Governor Warner's tax reform policies as found on his archived site would be found using the search string tax reform site:www.governor.virginia.gov
Once you have your search results, you can refine them in the following ways:
- In your results list, there will be a link to other versions. This takes you to a list of archived versions for that exact URL that you can browse by capture date.
- If you click the more from... link, you will limit your results to hits from that single host.
- a search of tax reform might get you results from several different archived Web sites. If you want to limit your results to Governor Warner's site only, find a search result from his site and click the link to more from www.governor.virginia.gov.
Text search capability is only available for the Library's archived Web collections. However, since the Internet Archive has been archiving Web sites since 1996, pre-September 2005 archived versions of many of the sites in the Library's collections may be available through the Internet Archive's general Wayback Machine. The Wayback, however, is not text searchable; researchers must know the URL of the site they'd like to view. In order to view earlier archived versions of sites in the Library's collection through the Wayback, click here, or use the link found on the Wayback browse pages of the Library's web collections.
If you still have questions about how to refine and improve your search results, please contact Roger Christman, the Library's Web-archiving project manager, at Roger.Christman@lva.virginia.gov.