Thursday, April 24, 2014

A rapidly changing landscape is leading to uncertainty and opportunities throughout healthcare

Huge volumes of data about patient health from electronic medical records (EMR), high-throughput molecular data, insurance claims, the “quantified self” movement, and social media, are rapidly becoming available.  At the same time, changes in financial incentives such as the utilization of healthcare exchanges, the creation of ACOs (Accountable Care Organizations) and the growth of clinical research networks are driving changes in business models that will have far reaching consequences.  Currently there is a gap between the huge quantities of health data and the discovery/validation of new approaches to managing the health of patients and patient populations. There is a tremendous opportunity to develop new statistical methodologies to pull information out of the data that can be used to improve the efficiency and effectiveness of healthcare delivery.

Quality improvement by hospital systems.  One of the challenges facing physicians today is deciding which “standard of care” to follow.  In many cases there are numerous therapeutic options for a patient, all of which are acceptable.  Published studies addressing the question are often sparse, so the decisions are commonly made based on marketing materials provided by the pharmaceutical companies themselves.  In addition, in a “fee-for-service” environment, there is a perverse financial incentive to choose the most expensive therapeutic.  However, for hospital systems that accept some of the expense when patients do not respond well to treatment, such as ACOs, incentives are quite different.  Even for traditional “fee-for-service” institutions, new federal regulations and “meaningful use” criteria are driving a need to identify and impose optimal care.  How should “optimal care” be defined? How do health systems utilize patients’ health records to identify treatment decisions that lead to optimal care?  How can healthcare systems design trials, run from the EMR or other automated data sources, to prove or disprove the hypotheses generated from retrospective analyses?

Example.  Our modern healthcare system is fragmented.  This leads to different providers following different patient outcomes that are tied to the diseases for which they are responsible.  A cardiologist may prescribe a statin for high cholesterol, but if the patient taking that statin gets muscle aches they are more likely to go to their family practitioner; the physician who originally prescribed the medication might never even find out about the side effects!   If there is institutional motivation, the health record can be used to track and measure overall health.  The proxy for “overall health” in this scenario may very well be defined as lower utilization of hospital resources; In a perfect world, patients will agree that this is a good proxy.

Recruitment for clinical studies.  Typical large trials are run at many different clinical sites in order to ensure the accrual of enough patients for the study.  In this setting there are often numerous sites that fail to recruit even a single patient.  The availability of electronic health records creates the opportunity to directly identify the right patients for a new trial and to target recruitment efforts.  This can simultaneously cut down on trial startup expenses and boost recruitment rates.  Networks of hospital systems are already building this capability and will have tremendous advantages when competing to run certain types of clinical studies. However, electronic health records are inherently messy and incomplete.  What is the best way to cut through the noise and identify the right patients?  How early in the course of disease can patient populations be identified?

Example.  PCORnet is a group of hospital systems who have obtained federal funding to develop an automated system for pooling and sharing the health data of individual patients.  It is designed to automate many of the steps involved in conducting clinical trials.  If you are a fan of NPR, Diane Rehm devoted a show to this concept (and PCORnet specifically); you can listen to it here.

A separate, innovative approach to patient recruitment has been developed through the participation of the patients themselves.  Last year a social networking web site, Patients Like Me, and a clinical research organization, inVentive Health, formed a partnership to advertise recruitment for clinical trials directly to the patients.

Preventive medicine.  A systematic approach to preventive care will be important for those healthcare systems who are trying to minimize the future disease burdens of their patient populations.  Historical health data, high-throughput molecular data, information from social media, data from “quantified self” devices, and even purchasing data from credit cards can all offer insight into the current and future health of patients.  Which patients within the health system are most susceptible to future disease?  What sources of data are best able to identify those patients? What interventions are best able to prevent bad outcomes in the long term? Integrating all of the relevant sources of information – and filtering out the irrelevant sources – in order to build disease specific models of risk will be critical to identifying patients who are appropriate for preventive medicine efforts.

Example. Consider the announcement from CVS that they will stop selling cigarettes in order to better position themselves as a healthcare delivery company.  As they begin to provide healthcare services they will accrue health data on their customers which can presumably – barring legal restrictions – be tied to other purchases.  Purchases of candy bars, shampoo and razors can easily become part of your electronic health record.  If one of the first signs of dementia is neglect of personal hygene, CVS may be the first to know when grandma is developing Alzheimer’s disease!  CVS is not alone in this new business model; Walmart, Target and Walgreens all have clinics in at least a subset of their stores.

Precision medicine.  Until now, clinical research has favored a “one size fits all” approach to the development of novel therapeutics.  This is driven by a desire to maximize the market share of any new drug; if the drug can only be given to the patient sub-population who passes a companion diagnostic test, then the drug has a smaller market.  However, the cost of development is increasing exponentially and the chance of eventual FDA approval is dropping.  Acceptance of a smaller market share in trade for an improved chance of FDA approval (and possibly higher market penetration) is driving an increasing willingness in the pharmaceutical industry to develop drugs with companion diagnostics.  Companion diagnostics are often based on high-throughput molecular data such as DNA mutation, RNA expression, metabolomics and proteomics.  What is the best way to integrate high-throughput molecular data with clinical data to ensure the identification of the optimal subpopulation for a new therapeutic?  Can we make the case for a new therapeutic within the context of the new financial and regulatory incentives faced by healthcare systems?

Example.  The FDA lists 9 different drugs and 19 different drug – companion diagnostic combinations that are approved.  However, they list 154 drug – gene pairs for which particular versions of the gene lead to potential adverse events.  Some of these are serious events.  For example, some people have a variant in a gene called CYP2D6 that causes Codeine to be metabolized into morphine very quickly.  In children, that process can lead to lethal doses.  Unfortunately, identifying genetic variants that lead to serious adverse events does not automatically lead to the requirement that the gene be tested before the drug is given.  It will be up to providers to decide what is best for their patients and payers to decide which tests will be reimbursed.


I have discussed only a few places where the combination of federal regulation, changing incentives and “big” data are coming together to transform healthcare as an industry.  However, when combined these constitute large shifts in business models with the potential to leave companies who stick to old approaches in the dust.  It is impossible to know where healthcare in America is going, but it is clearly going somewhere.

Friday, April 18, 2014

Publicly Available Electronic Health Data

This entry in the blog is a list of electronic health data sets that are available, in some way or another. Some are freely available, some require fees and some require special connections.

Data source web site.  There is an Interactive Compendium of Health Datasets for Economists web site maintained by the University of Oxford that should be mentioned in this context.  It provides links to a number of health related datasets for the purposes of health economics research.  There is a nice search feature that allows filtering of the known data sets based on a number of different fields.  For example, I found one data source containing longitudinal primary care data at the level of the individual.

Free, publicly available
·         A recent release of health data by the US Centers for Medicare and Medicaid Services made a large splash in the mainstream media.  That data does not give patient level records, but it does represent very granular information about providers.  The data is split into three groups: Physician and Other Supplier, Inpatient, Outpatient.
·         The PhysioNet challenge is an annual competition focused on computer analysis in the field of cardiology.  It has been running since 2000, and a few of the competitions have involved electronic medical records.
·         The Heritage Provider Network released some insurance claims data as part of a competition to predict which patients will be admitted to the hospital within the next year.  Claims data is what the hospitals report to insurance companies and is utilized almost exclusively for billing.  Some studies have suggested that it is inferior in some ways for the purpose of identifying and tracking patient disease.  There are certainly strong financial incentives for hospitals to distort the picture presented in claims data as long as they avoid fraud.
·         The Pooled Resource Open-Access ALS Clinical Trials (PRO-ACT) database is a collection of data from studies of Amyotrophic lateral sclerosis.  Generally, clinical trials data is more extensive, more complete and more accurate than typical electronic medical records data.  However, there is a lot of oversight of patients who are on trials, and patients have to volunteer to join the trial.  This means that there are differences in the likelihood that patients on trials will stop taking their drugs as well as more general demographic differences between patients on trials and the general patient population.
·         The Agency for Healthcare Research and Quality (AHRQ) has made available a number of data sources associated with its Healthcare Cost and Utilization Project.  These data sources include  limited information about a large collection of hospital discharges.
·         Every year I2B2 hosts a competition designed around natural language processing of electronic health records.  This year there are two challenges.  One focused on de-identification and another focused on identifying risk factors for heart disease.  You need  to register before the contest begins in order to get access to the data, and you have to agree to the contest rules.

Connections required
·         If you have or can find a research collaborator in Canada, the Canadian Institute for Health Information makes available most of the hospitalization data from Canadian hospitals.

Fees required
·         A plan to share the British national health data broadly has been put on temporary hold.  However, the British National Institute for Health Research does make at least some of the British health system data available under the name Clinical Practice Research Datalink.  I am told that fees for access to this data are around $100K/year, but I could not find pricing information online.
·         I examined the New Zealand National Minimum Dataset in a previous article.  I have since found out that it is available for a fee that is determined based on the hours required to pull the data (priced at around $70/hour).

If I find out about any more, I will post them.