BioCADDIE (Biomedical and healthCAre Data Discovery and Indexing Ecosystem-Supplement 2

a MiCDA Research Project Description

Investigators: George C. Alter

Funding (subcontract): National Institute Of Allergy And Infectious Diseases, 2016-2017 (3 U24 AI 117966 03 S1)

NIH encourages the use of common data elements (CDEs) in research and health records to improve data quality and increase opportunities for comparison and synthetic research. The National Library of Medicine CDE Portal lists 11 CDE initiatives and 7 tools and resources. Some of these initiatives, such as PhenX and NINDS CDEs, have identified hundreds of instruments and thousands of variables. As the research and clinical communities adopt CDE model, the NIH Data Discovery Index (DDI) should be able to help users find data relevant to specific CDEs. Indeed, datasets including CDEs are already abundant. Hundreds of variables associated with CDEs can be found in publicly available datasets, such as the National Health Interview Survey, the National Survey on Drug Use and Health, and the Midlife Development in the United States. This supplement will take the first step towards adding CDEs to the DDI by comparing metadata used in CDE repositories to the bioCADDIE metadata model.

Since the CDE movement is recent, there is currently no agreement on standards for describing CDEs across NIH. Some CDE efforts, such as NINDS, have developed detailed metadata standards and controlled vocabularies. Others identify only a few fields or supplement one or two descriptive fields with links to publications or documents in pdf. BioCADDIE can have an influence on the development of CDE specifications by providing guidance on metadata standards.

To link CDEs to data, the DDI must include metadata at an appropriate granular level. Datasets like health interviews can include hundreds or even thousands of CDEs in a single file. This item-level information must be mapped to the bioCADDIE metadata standard to be available for searching in the DDI. We will examine two widely used metadata standards for data types that include CDEs (CDISC for clinical data and Data Documentation Initiative for survey data) to determine how they can be translated to the bioCADDIE standard for the discovery of CDEs.

The key activities will be:
? Map metadata fields used by selected CDE repositories to the bioCADDIE Metadata Standard. Create examples.
? Inventory repositories and databases holding data about CDEs
? Identify metadata items required for describing CDEs in metadata standards (CDISC, Data Documentation Initiative) used by relevant repositories
? Evaluate the availability of data and metadata for CDEs

This project will be conducted by staff at the Inter-university Consortium for Political and Social Research (ICPSR). ICPSR, which is the largest archive of social science data in the world, serves as a data repository for centers in three NIH institutes (NIA, NICHD, NIDA). ICPSR was a founder of the Data Documentation Initiative, an internationally accepted metadata standard for social and behavioral data, and the secretariat of the DDI Alliance is hosted at ICPSR. Thus, ICPSR has many years of experience designing and applying metadata specifications. ICPSR has also partnered with the Grid-Enabled Measures (GEM) Database, an NCI sponsored CDE tool, on a pilot project linking measures at GEM to data in the ICPSR repository.

Research Signature Theme:

Survey Measurement and Methods: Archiving