Archive
Current Archive
The overall CHIA archive is a distributed archive, linking a growing number of repositories. Principal applications and incorporated data are held on the CHIA Server (“Poirot”) contributed and maintained by the School of Information Science (SIS) at Pitt and on the Zadorozhny server in SIS. Computations are performed on Poirot and through the services of the Pittsburgh Supercomputer Center. Additional data files are held on the Dataverse Network at Harvard University. Various other data files associated with CHIA are held on servers of cooperating researchers; these are linked increasingly by API to Poirot.Future Archive
To achieve global and interactive analysis of historical data, it is not sufficient to assemble a large number of datasets—the data need to be merged into a distributed but uniform data repository. Nor is it possible to create a uniform data repository through automated processing of the existing metadata—the terms are inconsistent and, too often, there turn out to be major bits of information simply missing. The problem is that additional metadata must be created to account for harmonization and linkage of inconsistent local datasets and for aggregation to regional and global levels. The CHIA project is to address these issues directly through creation of a global historical data resource.The tasks involved in creating a global archive include:
- Achieving geographical flexibility (geographic units change shape over time); develop a global gazetteer including both well-documented and weakly-documented places and administrative units
- Achieving temporal flexibility (to address multiple forms of time reported in data—in instants, periods [both open-ended and closed], cycles, and in ordinal terms)
- Ingesting data (we rely on a crowd-sourcing system in which data contributors convey data to the archive electronically, but with insurance that archive standards are maintained)
- Documenting data (contributors develop detailed description of newly incorporated data, consistent with and linked to descriptions of previously incorporated archival holdings)
- Ontology (development of standardized terminology to describe data, data processing, and data analysis)
- Harmonizing data (transforming data to be consistent in language, weights and measures)
- Integrating data (resolving inconsistencies and overlaps in data)
- Aggregating data by space, time, topic