How Does Automated Record Linkage Affect Inferences about Population Health?
Investigators: Martha J. Bailey, Catherine Massey, Eytan Adar
Funding: National Institute on Aging, 2017-2019 (1 R21 AG 056912 01)
This project compares the performance of automated linking algorithms with the goal of improving their potential. Automated linking methods are required to complete the NSF-funded Longitudinal Intergenerational Family Electronic Micro-dataset (LIFE-M), which will link millions of US vital records to historical decennial census records to create an extensive longitudinal dataset covering individuals born in the US from 1880 to 1930. This analysis emanates from that need.
The project will produce systematic evidence regarding the performance of the most popular automated linking methods in terms of match rates, representativeness of the underlying population, erroneous match rates, and systematic measurement error. It will also examine how phonetic name-cleaning methods affect quality. Significantly, the project will analyze how match quality metrics vary for different underrepresented subgroups - including women, racial/ethnic minorities, and immigrants - to determine how specific linking methods could differentially affect inferences for different populations. Finally, the project will formulate recommended practices for researchers based upon the findings.