Most major scientific journals now require data archiving after publication. Many require it for review. So anthropology must figure out how to responsibly provide access to data. Luckily, other fields have already invested a lot in figured this out.
The important thing to realize is that providing access does not mean complete loss of control, either for the communities that provide the data or for researchers. For example, access can be restricted to researchers affiliated with research institutions, and limits can be placed on rights of reuse. But some level of access is necessary, if for no other reason than to allow other scholars the ability to verify our analyses. At the same time, researchers who want access to data must respect that data are political and continued collaboration with communities that provide data requires privacy and limits on reuse.
A standard data sharing memorandum grants all collaborators non-exclusive use of the data for scientific analysis and publication. This means that none of the collaborators can completely deny another access to the data. But individual agreements typically specify several planned publications that define the uses and co-authorship rights.
There are many possible detailed agreements, and these should be constructed through discussion of individual concerns and wishes. But it's useful to outline the kinds of choices that we make when designing data sharing protocols and agreements.
We outline here choices for access and usage. But there are other issues, like redistribution, that may be important in some cases.
We distinguish between complete data and redacted data and analysis data.
Complete data is the full data from an investigation. It may contain information that cannot be publicly released, like real names for example. This is typically never released to people outside a collaboration. But it must be securely archived.
Redacted data is a partial record that removes sensitive information.
Analysis data are data used to produced a specific analysis. This is a subset of the redacted complete data. These are the data that must be shared, according to publisher and funder rules.
There are three basic access models. In detailed protocols, these models can apply to specific subsets of the data. For example, basic demographic information could be unrestricted, while detailed disease history could be restricted.
(1) Embargoed access: Data are inaccessible outside the research team until a specific date, usually no more than 2 years, allowing time for a PhD student to finish planned publications. Then the data enter either unrestricted access or restricted access.
(2) Restricted access: Only individuals who pass a screening procedure have access to the analysis data or redacted data. Users must also typically agree not to attempt to re-identify individuals from the data, as part of the screening procedure. This is the model used by many medical and epidemiological data archives. Restricted access often has a moving window. For example, demographic data are often restricted for living people but unrestricted for the deceased (or for people born some number of years ago).
(3) Unrestricted access: Anybody with a web browser can download and use the analysis data or redacted data as they like. This is a common model in the physical and biological sciences, but also in behavioral sciences like psychology, political science, and economics.
There are three usage models.
(1) Audit use: Accessible data can be used to verify existing results but not to produce novel analyses, without prior agreement of the original investigator or representative. This is necessarily a time-limited policy. It should accompany embargoed access, since reviewers require data access in order to complete peer review.
(2) Restricted use: Accessible data can be used for prearranged purposes. Within a collaboration, this usually takes the form of writing specific papers with prearranged lists of co-authors.
(3) Unrestricted use: Anyone with a copy of the data can use it however they like.
In practice, data sharing protocols are highly standardized. Here are three standard examples.
(1) Team Data Protocol: Collaboration members have non-exclusive access to the complete data or redacted data. When a collaboration member plans an analysis, they share the plan with other collaboration members. Co-authorship is negotiated at this stage (if not earlier). All analysis data is published with the reports, or else within 2 years of publication (see embargoed access above).
(2) Open Database Protocol: As (1), but the redacted data are accessible to researchers outside the collaboration. Outside researchers cite a publication that describes the origin of the data. That requirement may be part of the access screening.
(3) Consortium Protocol: Institutions or individual researchers undergo screening or pay a fee for data access. This access does not grant the right to redistribute data. But it does grant the right to publish original analyses. Analysis data cannot be distributed, but results can be repeated and criticized because other researchers can gain access to the same data through the consortium, and instructions on how to do this are public.
Our goal at HBEC MPI-EVA Leipzig is to facilitate the broadest access to redacted data as possible. This means that individual sets of data (or variables within sets of data) may have individual protocols. Members of HBEC acquire no special use or redistribution rights, as a consequence of the department's hosting or processing of raw data, outside of prearranged co-authorships. Other investigators are not required to add HBEC scientists as co-authors as a condition of data access. However we require a transparent access plan that permits scientific reuse of the data, and this plan may include requirements to cite original researchers who gathered the data and the possible necessity of obtaining new consent.