November 3rd, 2014

A “Big Data” Approach to Phenotype-Based Clustering of Heart Failure Patients

CardioExchange’s Harlan M. Krumholz and John Ryan interview Tariq Ahmad about his research group’s study of clinical phenotypes in patients with chronic systolic heart failure. The article, published in JACC, includes the complete list of assessed variables.

Krumholz and Ryan: Please summarize your findings for our readers.

Ahmad: We performed a cluster analysis of 45 clinical variables from roughly 1600 patients with systolic heart failure who, in the HF-ACTION clinical trial, had been randomly assigned to exercise training or usual care. Using this approach, we identified four phenotypically distinct, clinically meaningful groups of patients:

  • Cluster 1: predominantly elderly Caucasian men with ischemic cardiomyopathy and a high burden of comorbidities, advanced disease, and the highest mortality rate;
  • Cluster 2: the youngest patients, largely African Americans, with nonischemic cardiomyopathy and milder disease overall, but with high hospitalization rates in the face of lower overall mortality;
  • Cluster 3: patients with ischemic cardiomyopathy and severe angina symptoms;
  • Cluster 4: primarily Caucasian patients with nonischemic cardiomyopathy and milder disease.

The groups differed in their risk for clinical outcomes and their response to exercise therapy. For example, Clusters 2 and 3 had significant improvement in peak VO2 with exercise, whereas the other clusters did not. These findings highlight significant heterogeneity within the syndrome we call heart failure, as well as the potential for “big data” approaches to improve phenotyping of the syndrome.

Krumholz and Ryan: Was the cluster-analysis software something you had been trained in before? Most of us think of cluster analysis in terms of genetic studies. How difficult was it to perform this in a clinical analysis?

Ahmad: I had not been trained in this approach, but I had experience with it because of my background in “omics” research, where the approach is often used to make sense of large amounts of biological data. I was just completing a project on metabolomics of heart failure when this idea occurred to me. I consulted Dr. Michael Pencina at the Duke Clinical Research Institute, and luckily he had used this approach on clinical data from the Framingham study. He and Dr. Philip Schulte, both statisticians, worked with me on this project, using clustering procedures from SAS/STAT software (PROC CLUSTER). They had extensive training in SAS and were able to implement the programming on clinical variables.

Krumholz and Ryan: Please explain what the software does.

Ahmad: This was an agglomerative, hierarchical clustering of patients. Agglomeration is a “bottom up” approach, which means we start with each patient as his or her own singleton cluster and iteratively combine clusters until all patients exist in one large cluster. The process is hierarchical in that when two clusters are combined, they are never separated or rearranged in later iterations. At any particular iteration in the process, we define a distance between every pair of clusters (using Ward’s minimum variance method), and the pair with the smallest distance is merged together to form a new cluster. The iterative process can be stopped according to various criteria, before merging together heterogeneous clusters. The bottom line (in non-mathematical terms) is that cluster analysis groups together sets of objects (in this case, patients) in such a way that those in the same group are more similar to one another than they are to those in other groups.

Krumholz and Ryan: Cluster 3 patients had a lower mortality risk but perhaps a higher risk for hospitalization, compared with other clusters. Did this surprise you, and what conclusions can you draw from this observation?

Ahmad: This was a very interesting observation. Cluster 3 fit the profile of patients we commonly see in practice — those with ischemic cardiomyopathy and profound angina who have frequent admissions for chest pain and get treated for “acute coronary syndromes.” We found that these patients, even though their overall mortality risk might have been intermediate, had a terrible quality of life and a profound degree of angina. Indeed, if these realities were driving their frequent admissions, a focused approach on improving symptoms might keep them out of the hospital and improve their overall quality of life. The disease process in these patients may differ greatly from that of patients with nonischemic cardiomyopathy or those with multiple comorbidities, such as COPD and renal failure. Nonetheless, we currently apply similar interventions across the board. In the future, perhaps subcategorizing heart failure patients according to data-driven phenotypes rather than current subjective classification systems, and modifying or developing therapies accordingly, might improve patients’ quality of life and clinical outcomes.

Krumholz and Ryan: How do you apply your cluster groups to decision making in clinical practice? What about validation — are your findings replicable?

Ahmad: Dr. Krumholz ably described the clinical implications of this approach in his extremely well-written article on big data in Health Affairs earlier this year. He wrote:

… researchers can use approaches that are designed to reveal clusters of patient groups that might suggest new taxonomies of disease based on how similar they are according to a broad range of characteristics, including outcomes. It may be, for instance, that based on biological, clinical, behavioral, and outcomes data there are many more types of diabetes than previously appreciated. The empirical classification could be shown to have value in selecting treatment strategies and predicting outcomes. This knowledge can be useful even in advance of understanding the underlying mechanisms of disease and response to therapy.

Clinicians have known for some time that heart failure is not a single disease but, rather, a syndrome that comprises several disease subtypes. Historically, we have classified heart failure according to measures (e.g., LVEF and NYHA class) that do not adequately capture its phenotypic variation. However, now that we have access to large amounts of patient data, as well as the computational capability to analyze those data, I suspect we will use similar approaches to redefine the syndrome in a way that is closer to “the truth.” Clusters of heart failure are likely to vary according to the patient population being examined. For example, Yale might have different clusters of patients than Duke, and a risk-factor score or a therapy developed from a large trial might apply differently to those two patient groups. Currently, we do not take this possibility into account when caring for patients. However, this challenge is not too different from what companies such as Target, Netflix, or Amazon face when they consider user- and location-specific information to tailor their interactions with individual customers. Perhaps, in the near future, we will be able to do the same when we care for patients with heart failure.


Share your reactions to Dr. Ahmad’s data-driven analysis of clinical clusters of heart failure patients.

One Response to “A “Big Data” Approach to Phenotype-Based Clustering of Heart Failure Patients”

  1. Moamin Alassouli, M.B.B.CH says:

    Great study .. I was always think to make such study but practicing medicine in developing country with limited resources and many restrictions made it difficult .