Friday, April 18, 2014

Publicly Available Electronic Health Data

This entry in the blog is a list of electronic health data sets that are available, in some way or another. Some are freely available, some require fees and some require special connections.

Data source web site.  There is an Interactive Compendium of Health Datasets for Economists web site maintained by the University of Oxford that should be mentioned in this context.  It provides links to a number of health related datasets for the purposes of health economics research.  There is a nice search feature that allows filtering of the known data sets based on a number of different fields.  For example, I found one data source containing longitudinal primary care data at the level of the individual.

Free, publicly available
·         A recent release of health data by the US Centers for Medicare and Medicaid Services made a large splash in the mainstream media.  That data does not give patient level records, but it does represent very granular information about providers.  The data is split into three groups: Physician and Other Supplier, Inpatient, Outpatient.
·         The PhysioNet challenge is an annual competition focused on computer analysis in the field of cardiology.  It has been running since 2000, and a few of the competitions have involved electronic medical records.
·         The Heritage Provider Network released some insurance claims data as part of a competition to predict which patients will be admitted to the hospital within the next year.  Claims data is what the hospitals report to insurance companies and is utilized almost exclusively for billing.  Some studies have suggested that it is inferior in some ways for the purpose of identifying and tracking patient disease.  There are certainly strong financial incentives for hospitals to distort the picture presented in claims data as long as they avoid fraud.
·         The Pooled Resource Open-Access ALS Clinical Trials (PRO-ACT) database is a collection of data from studies of Amyotrophic lateral sclerosis.  Generally, clinical trials data is more extensive, more complete and more accurate than typical electronic medical records data.  However, there is a lot of oversight of patients who are on trials, and patients have to volunteer to join the trial.  This means that there are differences in the likelihood that patients on trials will stop taking their drugs as well as more general demographic differences between patients on trials and the general patient population.
·         The Agency for Healthcare Research and Quality (AHRQ) has made available a number of data sources associated with its Healthcare Cost and Utilization Project.  These data sources include  limited information about a large collection of hospital discharges.
·         Every year I2B2 hosts a competition designed around natural language processing of electronic health records.  This year there are two challenges.  One focused on de-identification and another focused on identifying risk factors for heart disease.  You need  to register before the contest begins in order to get access to the data, and you have to agree to the contest rules.

Connections required
·         If you have or can find a research collaborator in Canada, the Canadian Institute for Health Information makes available most of the hospitalization data from Canadian hospitals.

Fees required
·         A plan to share the British national health data broadly has been put on temporary hold.  However, the British National Institute for Health Research does make at least some of the British health system data available under the name Clinical Practice Research Datalink.  I am told that fees for access to this data are around $100K/year, but I could not find pricing information online.
·         I examined the New Zealand National Minimum Dataset in a previous article.  I have since found out that it is available for a fee that is determined based on the hours required to pull the data (priced at around $70/hour).

If I find out about any more, I will post them.

No comments:

Post a Comment