Research Design

Overall Project Design

The global strategy of CHIA requires a consistent global vision combined with intensive interactions at local and intermediate levels. To implement this global strategy, we are building a Research Collaborative, a Headquarters or center of gravity, and intensive interactions between them. The Research Collaborative functions at the levels of research institutions and individual researchers, to establish close contacts among research institutions in various disciplines and various regions, to share data and to collaborate in research projects. Further, the Collaborative is to facilitate the sharing of data by individual researchers through crowdsourcing. The Collaborative is thus to exist at multiple levels. The Headquarters houses elements including an archive of global-historical data (reaching across time, space, scales from local to global, and data from various disciplines); a clearing house to facilitate collaborative development of consistent data and metadata; and an intellectual center to develop global connections in social-science theory and to make key decisions in developing the overall project. The Headquarters too exists at multiple levels. The interplay of Collaborative and Headquarters may be illustrated through the issue of creating the global archive: are we to construct it by assembling existing pieces and moving incrementally, or should we begin with an all-at-once design? We rely on both the Collaborative and Headquarters to make the best choice.

The long-term purpose of CHIA—in the time frame of a decade or more—is to facilitate the creation and maintenance of historical data sets from local to global levels, from short term to long term, linking variables on many areas of human experience. The resultant summation of human experience can reveal the varying patterns and dynamics of social change. While past social, economic, and cultural dynamics may not carry automatically into the future, they should not be neglected in our attempts to make plans and form policy. The Collaborative intends to link social sciences to each other and to the principal problems in human society, at scales from the local to the global over the past four centuries and into the future. It seeks to encourage a culture of data sharing among social scientists. And it expects to develop a global, integrative repository and analytical framework supporting specific research projects on four domains of social life: human-natural interaction, population change, development of socio-economic inequality, local and global governance. New knowledge of these past patterns will surely shape policy formulation. In the intermediate term, roughly five years, CHIA intends to develop a strong and expanding research team which will unleash a rapid inflow of historical data to be documented and archived; develop an expanding system of metadata and ontology to describe data and assist in their integration and aggregation; conduct interactive analysis at regional and global levels of variables in social sciences, health, and climate; and develop systems of visualization that will assist in analysis and provide feedback for collection and definition of data.

Data Collection

For collecting and documenting data we have begun work on a collaborative architecture that will allow us to consolidate heterogeneous historical data sources in a scalable way. We rely on a general architecture that utilizes collective intelligence to form a global repository of historical data. This architecture efficiently combines methods of crowdsourcing with wrapper/mediator technology. We assume that information providers will submit wrappers that utilize an application programming interface (API) to extract information from their corresponding data sources and to map the information to a standard homogeneous representation. If the data set includes information not covered by a target schema, we extend the schema correspondingly. The data submission system allows providers to register their wrappers as a part of the data-access layer of the global repository. The system will also support a wrapper-generation functionality to facilitate the wrapper development process. The wrappers can be used either to access data remotely or to load/replicate parts of the data at different nodes of the distributed repository (i.e., to optimize data analysis, or to consolidate a repository profile to deal with a specific application domain). Both information providers and consumers will also be able to submit their subjective data reliability assessments. Such external reliability assessment will be combined with internal reliability assessment protocols based on analysis of data inconsistencies in the integrated repositories. The data reliability assessment will occur in the process of data curation and data fusion. A first prototype of the system is expected to be ready for testing within the CHIA group during 2012.

Archive

Assembling a large number of datasets is not sufficient to produce global data—the data need to be merged into a single, uniform data repository. Nor is it possible to create a uniform data repository through automated processing of the existing metadata—the terms are inconsistent and, too often, there turn out to be major bits of information simply missing. The problem is that additional metadata must be created to account for harmonization and linkage of inconsistent local datasets and for aggregation to regional and global levels. The CHIA project is to address these issues directly through creation of a global historical data resource.

Analysis and Visualization

Analysis and visualization are to be multidimensional and are to allow focus on small scale, large scale, and links of various levels of analysis. In initial stages of global analysis, work will concentrate on linking the various social-science theories through the varying uses of population variables in their analyses. For visualization, the project will explore a wide range of visualization practices in many areas of social science, natural science, and engineering, to appropriate the latest developments.