THIS PAGE HAS BEEN FORMATTED FOR EASY PRINTING

VIRGINIA WEB ARCHIVE FAQ

What is Virginia Web Archive?

Virginia Web Archive is a fully text searchable archive of Web sites made available by the Library of Virginia. The Virginia Web Archive focuses on Virginia government, political and organizational web sites and enables Virginia to preserve the Commonwealth's Internet heritage for permanent public access.

How do I search the Virginia Web Archive?

You can search our collections as a whole by using the search box on the Virginia Web Archive page or browse specific archived Web site collections by URL. Please see Search Help for more detailed assistance.

Can I search by URL?

Yes, in the Virginia Web Archive search box, enter the url: and then the domain name of the URL. For example, to search the Virginia state government URL, which is http://virginia.gov/ enter: url:virginia.gov.

What types of Web sites does the Library of Virginia archive?

The Virginia Web Archive includes the Web sites of Virginia's state government, members of the Virginia legislative branch, Congressional delegation, candidate sites, political sites, and Virginia-based organizations.

What kind of information can be found in the Virginia Web Archive?

A wide variety of information can be found in Virginia Web Archive including specific documents such as reports, press releases, meeting minutes, photographs, and audio and video files.

What is the historical coverage of Virginia Web Archive?

In the fall of 2005, the Library of Virginia was one of several cultural heritage institutions that participated in a pilot project with the Internet Archive to develop and refine a tool to collect, preserve, and provide access to Web sites that meet institutional collection policies and are considered to be of enduring value. This tool is called Archive-It.

Spurred on by our mission to preserve and make accessible in perpetuity the Web heritage of each Virginia governor, the Library as part of its pilot project archived the Web-based materials for the administration of Governor Mark R. Warner. (2002-2006) The Library also expanded its collection parameters for the project to include the campaign Web sites related to Virginia's statewide elections taking place in the fall of 2005. This included the sites of the candidates for Governor, Lieutenant Governor, and Attorney General, as well as related political party sites and several political blogs. The Virginia Web Archive began preserving all state government Web sites in 2006, some political Web sites in 2007 and organizational Web sites in 2008.

How do I access the Web sites for the administration of Governor James S. Gilmore (1998-2002)?

Web sites for the Gilmore Administration are only available through the Wayback Machine. Users must know the exact Web site address. Click here for a list of Gilmore Administration urls with links to the Wayback Machine. Please note: the Wayback Machine does not have search capabilities.

What are the Virginia Web Archive's collection guidelines?

The Library's collection development policies concerning the selection of Web sites can be found here.

What is a "crawl?"

A web crawler is a type of "bot" or software agent that creates copies of the pages it is assigned to visit. A "crawl" happens each time the web harvesting software is sent out to capture a Web site. Each URL the crawler is directed to visit is called a "seed." The Virginia Web Archive includes several collections made up of multiple seeds. The Library of Virginia has contracted with Archive-It, which uses an open-source, archival quality web crawler called Heritrix. The Heritrix crawler takes periodic snapshots of Web sites of historical and/or research importance as directed by the policies of the Virginia Web Archive. Crawls are made on a specific schedule, often annually, semi-annually, or quarterly. The results of the crawls are made accessible through the Internet Archive's Wayback Machine, as well as the Library's Virginia Web Archive page, where the archived pages are full-text searchable.

How can I tell when a Web page was archived?

Archived Web pages look slightly different from "live" sites. Archived pages contain a yellow banner at the top which includes the date that particular version of the site was crawled and archived. The URL of an archived site also looks different because it contains the date of capture.

What is the WayBack Machine?

The Wayback Machine is owned and operated by the Internet Archive. The Wayback Machine allows users to visit archived Web sites that may no longer exist or revisit older versions of current live sites. The Wayback Machine can be accessed through the Internet Archive and searched using URLs and date ranges. Through the use of the Archive-It program, the Virginia Web Archive collections are full-text searchable when accessed through the Library of Virginia Web site.

What is the Internet Archive?

The Internet Archive is a 501(c)(3) non-profit that was founded in San Francisco, California in 1996. The Internet Archive's purpose is to build an Internet library that offers permanent access for researchers, historians, and scholars to historical collections in digital formats. The Internet Archive site also includes texts, audio, moving images, and software as well as archived web pages in its collections. The Virginia Web Archive partnered with a more specialized Web crawling service offered by the Internet Archive called Archive-It, in 2005.

Does Virginia Web Archive have technical limitations?

The Virginia Web Archive includes Web sites, documents, images, audio and video files and other items available when domains are crawled. Not included in Web crawls are certain databases and search-related items that require user input. In some cases broken pages, missing graphics and distorted text may exist due to limitations inherent to Web crawling software. Some things that make archiving Web sites difficult are robots.txt, Javascript, server side image maps, and links to external Web sites that are not part of a particular archival collection.

Does the Virginia Web Archive need permission to crawl and make available certain Web sites?

Since the Library of Virginia serves as the official archives for the Commonwealth of Virginia, the Virginia Web Archive does not need permission to crawl the sites of government and quasi-governmental agencies, boards, and institutions. However, for non-governmental Web sites, the Virginia Web Archive will seek permission prior to making the archived sites available to the public. For non-governmental Web sites, the Library will contact site owners via e-mail. The message will contain a link to the Library of Virginia permission form where the site owner will have the option to accept or decline the Library's permission request. If the site owner declines, the Web site will not publicly display on our Web Archiving page. However, if the site owner does not reply after at least two attempts, the Web site will be publically displayed unless the site owner specifically requests that the archived page be removed.