This entry in the blog is a list of electronic health data
sets that are available, in some way or another. Some are freely available, some require fees
and some require special connections.
Data source web
site. There is an Interactive Compendium
of Health Datasets for Economists web site maintained by the University of
Oxford that should be mentioned in this context. It provides links to a number of health
related datasets for the purposes of health economics research. There is a nice search
feature that allows filtering of the known data sets based on a number of
different fields. For example, I found
one data source containing longitudinal
primary care data at the level of the individual.
Free, publicly available
·
A recent release of health data by the US Centers
for Medicare and Medicaid Services made a large
splash in the mainstream media. That
data does not give patient level records, but it does represent very granular
information about providers. The data is
split into three groups: Physician
and Other Supplier, Inpatient,
Outpatient.
·
The PhysioNet
challenge is an annual competition focused on computer analysis in the
field of cardiology. It has been running
since 2000, and a few of the competitions have involved electronic medical
records.
·
The Heritage Provider Network released some insurance
claims data as part of a competition
to predict which patients will be admitted to the hospital within the next
year. Claims data is what the hospitals
report to insurance companies and is utilized almost exclusively for billing. Some studies
have suggested that it is inferior in some ways for the purpose of identifying
and tracking patient disease. There are
certainly strong financial incentives for hospitals to distort the picture
presented in claims data as long as they avoid fraud.
·
The
Pooled Resource Open-Access ALS Clinical Trials (PRO-ACT) database is a collection
of data from studies of Amyotrophic lateral sclerosis. Generally, clinical trials data is more extensive,
more complete and more accurate than typical electronic medical records
data. However, there is a lot of
oversight of patients who are on trials, and patients have to volunteer to join
the trial. This means that there are
differences in the likelihood that patients on trials will stop taking their
drugs as well as more general
demographic differences between patients on trials and the general patient
population.
·
The Agency for Healthcare
Research and Quality (AHRQ) has made available a number of data sources
associated with its Healthcare
Cost and Utilization Project. These data sources include limited information about a large collection of
hospital discharges.
·
Every year I2B2 hosts a competition designed
around natural language processing of electronic health records. This year there are two
challenges. One focused on de-identification
and another focused on identifying risk factors for heart disease. You need to register before the contest begins in order to get access to the data, and you have to agree to the contest rules.
Connections required
·
If you have or can find a research collaborator
in Canada, the Canadian Institute for Health Information makes available most
of the hospitalization data from Canadian hospitals.
Fees required
·
A plan to share the British national health data
broadly has been put
on temporary hold. However, the
British National Institute for Health Research does make at least some of the
British health system data available under the name Clinical Practice Research Datalink. I am told that fees for access to this data
are around $100K/year, but I could not find pricing information online.
·
I examined
the New Zealand National
Minimum Dataset in a previous article.
I have since found out that it is available for a fee that is determined
based on the hours required to pull the data (priced at around $70/hour).
If I find out about any more, I will post them.
No comments:
Post a Comment