Estimate of disease heritability using 7.4 million familial relationships inferred from electronic health records
Fernanda Polubriaginof, Rami Vanguri, Kayla Quinnies, Gillian Belbin, Alexandre Yahi, Hojjat Salmasian, Tal Lorberbaum, Victor Nwankwo, Li Li, Mark Shervey, Patricia Glowe, Iuliana Ionita-Laza, Mary Simmerling, George Hripcsak, Suzanne Bakken, David Goldstein, Krzysztof Kiryluk, Eimear Kenny, Joel Dudley, David K. Vawdrey, Nicholas P. Tatonetti

Supporting Materials

RIFTEHR (Relationship Inference from the Electronic Health Records) is a tool for mining familial relationships using the emergency contact data provided by patients during their inpatient stays. The inferred relationships accurately predict genetic relatendess with 87% to 99% positive predictive value, depending on the relationship type. The preliminary manuscript describing the method, its validation, and the use of EHR-inferred relationships to estimate disease heritability is available pre-print for the scientific community. The following is a list of code and data referenced by the manuscript. We endeavor to make as much of the data publicly available as possible while still protecting patient privacy.

Heritability measures what proportion of disease can be attributed to genetics. Observational Heritability is an estimate of the heritability using observational resources where ascertainment is uncontrolled. We introduce a methodology, called SOLARStrap, to estimate Observational Heritability in the preliminary manuscript referenced above. Source code and data to run RIFTEHR and SOLARStrap are available on GitHub. Data files for notebooks and the rhinitis example are also available.

Browse the high confidence observational heritability estimates. Lower confidence estimates are available at Mendeley Data and additional files are in the Supplemental Data Files.

Clinical Data Release

Clinical and familial relationships data from the Columbia and Cornell will be made available at this URL at the time of publication: Mendeley Data and additionally at Release Data. The data will be prepared according to "Section 4. Preparation of clinical data for release" in the supplemental materials. These data will cover approximately 500 traits.

Download the 500 traits observational heritability estimates from the table below.


Updates and News

May 17th, 2018

April 20th, 2017

July 29th, 2016

July 27th, 2016
Supporting website created and pre-print manuscript deposited at bioRxiv at

June 14th, 2018
Peer reviewed manuscript available at Cell