August 25th, 2011

“Numbers Traps” in Clinical Practice

As we make clinical decisions every day, we assess probabilities in a subjective fashion. And in doing so, we tend to fall into very predictable traps — traps we can get better at avoiding if we learn about how they ensnare us. That requires familiarizing ourselves with a bit of history.

Several decades ago Casscells and colleagues published the results of an interesting experiment (N Engl J Med 1978; 299:999). They asked 60 Harvard medical students, residents, and attending physicians the following question: “If a test to detect a disease whose prevalence is 1/1000 has a false positive rate of 5%, what is the chance that a person found to have a positive result actually has the disease, assuming that you know nothing about the person’s symptoms or signs?”

Interestingly, only 11 of 60 (18%) participants gave the correct answer. What is your answer? If you guessed 95%, you are in good company — but wrong. Twenty-seven of the 60 (45%) gave that incorrect response. The correct answer is actually 2%.

How can we explain such poor performance on a question that seems straightforward and very clinically relevant? Cognitive psychologists would say the high error rate was because the question expressed the false-positive rate as a percentage rather than as a natural frequency. Gigerenzer, Cosmides and Tooby, and others say that the mind has evolved to understand natural frequencies and that our intuition frequently fails when probabilities are presented in other formats such as percentages (Med Decis Making 1996; 16:273; Cognition 1996; 58:1).

To test this theory, Cosmides and Tooby performed a second set of experiments. First they posed Casscells’ original question to a group of Stanford undergraduates. They found that this group fell into the same trap as the Harvard group: Only 3 of 25 (12%) gave the correct answer. Then they phrased the question in a different way using natural frequencies: “One out of every 1000 Americans has disease x. A test has been developed to detect when a person has disease x. Every time the test is given to a person who has the disease, the test comes out positive. But sometimes the test also comes out positive when it is given to a person who is completely healthy. Specifically, out of every 1000 people who are perfectly healthy, 50 of them test positive for the disease…”

When the question was asked using natural frequencies, 19 of 25 participants (76%) got the correct answer. These and other experiments demonstrate that our intuition can be set up to succeed or to fail by how questions are framed. Simply changing the way numbers are formatted can have a dramatic effect on how well we reason.

I suspect that many subjects in the original experiment did not see a quick answer, so they guessed that the true positive rate was the complementary probability of the false positive rate of 5%. The percentage format encourages this error because it causes people to lose track of what the denominator, or reference class, represents. Probabilities presented as natural frequencies force people to recognize that the false-negative rate and the true positive rate have different denominators and are not complementary probabilities. It then becomes more obvious that one has to create either a 2×2 table or a branching algorithm to determine the true positive rate based on the incidence of disease and the false-positive rate.

There are other ways our intuition can be fooled by how numbers are framed. For example, expressing treatment effects as relative risk reduction rather than absolute risk reduction or number needed to treat can exaggerate a treatment effect. That type of exaggeration can be used as a sales gimmick by those who are trying to push a product. Any shopper who has seen products priced at $9.99 rather than $10.00 knows that our intuition can be tricked by how numbers are presented.

The Casscells experiment also illustrates a fallacy known as base-rate neglect, which I will discuss in the next post in this blog series.  For now, here’s a brief probability problem that comes in part from Reichlen et al. (N Engl J Med 2009; 361:858). See if you can solve it. I present it here as a multiple-choice question, for purposes of illustration. Choose your answer, and then discuss your thinking about the question and about probability fallacies in general in the comments. But please don’t give away the answer to other readers.

An emergency department decides to perform serum troponin testing on all patients with any type of chest complaint. They suspect that the incidence of documented myocardial infarction in this subgroup is only about 1%, but they are determined not to miss a single MI. They choose a high-sensitivity troponin assay with a sensitivity of 95% and a specificity of 80%. For one of these patients with a positive troponin, what are the odds of having an MI?

Sorry, there are no polls available at the moment.

11 Responses to ““Numbers Traps” in Clinical Practice”

  1. David Powell , MD, FACC says:

    So…all med students are now familiar with Bayes theorem. However, its intuitive misunderstanding is pervasive. In the above” problem”, one must wonder how the stated troponin sensitivity and specificity were initially derived. Probably from a group not including obvious noncardiac causes (llke reproducible musculoskeletal pain ). Hence, the appropriate specificity to apply in the problem is likely lower and the positive predictive value lower than that calculated by Bayes. This issue comes up whenever the pretest lokelihood changes. ROC curves change with prevalence. The optimal troponin cutoog on thr ROC curve will rise when the pretest likelihood falls.
    I think the most striking ” number traps” can be on patients…how a benefit is portrayed. For example…a prophylactic ICD: “If you dont get this, you will have a 40% higher risk of having sudden death”. Or ” I need to put 11 of these ICDs in to save a life over 5 years”. Whats more convincing to a patient? How bout: “your risk of sudden death goes from 7 to 5 % a year”?

  2. John E Brush, MD says:

    The values given for the troponin sensitivity and specificity are for the Roche High-sensitive troponin T assay as reported in the referenced paper by Reichlin, et al.

    You make a very good point about how we should present numbers to patients. When I explain risks to patients – for example the operative risk for a valve replacement operation – I use natural frequencies and try to phrase the explanation something like this: “If I had 100 patients like you, 95 patients would survive the operation and 5 would not.”

  3. David Powell , MD, FACC says:

    Those referenced values were derived from patients with” symptoms suggestive of AMI”. So they really can not be applied to the hypothetical problem presented

    There are also principles of behavioral economics at play when physicians frame decision-making. Patients are most likely to act if negative consequences of inaction are presented rather than positive consequences of action.

  4. george Ritter, BSci, MD FACC,FACP says:

    Interesting

    Competing interests pertaining specifically to this post, comment, or both:
    none

  5. george Ritter, BSci, MD FACC,FACP says:

    The physician ALWAYS has a bias. Being totally neutral is quite impossible. Patients respond to not only the words, but also to the non verbal clues, of the physician and other medical personnel. The prescribing doctor MUST give recommendations in the simplest language. 99% of patients do not understand statistics, like percentages, etc. For an ICD, the doctor has to simply say, “I recommend an ICD” period. The patient may ask for details, but really full disclosure is quite impossible. Many non cardiologists do not understand the implications of an ICD, let alone a lay person. If the patient appears uncertain or unclear ,I always recoimmend that they get a second opinion. If things go well , uyou are a hero, if they go badly, you are a bum.Accept it, that is the care of the sick.

    Competing interests pertaining specifically to this post, comment, or both:
    None

  6. I do not think that the patient WANTS to have a neutral physician. He/she wants his/her physician to have his/her best in mind. We are treating one single human being at a time, not a member of a population for which there exist guidelines.
    The doctor being a human being, it is obvious that at some point his feeling of what is good for the patient will be influenced by lots of different factors. Not only conflicts of interest, but also education, religion, past history etc… Obviously, the only one which is unacceptable is the first one, which is unfortunately also the most widespread and powerful…
    In my experience, it is definitely worthwhile to expose the figures in different ways. The caricature is the difference between relative and absolute risk reduction. But also taking into account the age and condition of the patient. Stating that taking this pill for ten years carry a 95% likelihood of having no effect and a 5% probability of avoiding a major cardiovascular event, has not the same impact in a 40-y old or in a 80-y old individual.
    And then in addition, we have to be informed on what are the “major CVE” in question, as much too often (as shown recently in the SHARP trial) these CVE reductions are driven by revascularisation.
    And I don’t agree that you balance between being a hero or a bum. It is my experience after more than 30 years of practice: if the contact with the patient is based on trust; if the doctor exposes his view in honesty, giving his/her recommendation but accepting that the patient might decide otherwise, supporting even the decision because it is the patient’s one; then whatever the results the trusted relationship can go on. It sometimes even have to come down to the simple statement: “I do not know”. Not easy to say, but very powerful… It means at least you’re not a God.!

    Competing interests pertaining specifically to this post, comment, or both:
    None, except the patient’s best interest

  7. Dan Hackam, MD PhD says:

    “They asked 60 Harvard medical students, residents, and attending physicians the following question: “If a test to detect a disease whose prevalence is 1/1000 has a false positive rate of 5%, what is the chance that a person found to have a positive result actually has the disease, assuming that you know nothing about the person’s symptoms or signs?””

    I think what you are looking at here is the positive predictive value – the likelihood that the person has the disease or event when the test is positive. The PPV is highly dependent on the prevalence of disease. This is why I like the likelihood ratio (LR) so much – it is portable and does not depend on disease prevalence.

    The other example given – relative risk reduction versus absolute risk reduction – is quite telling. RRR is relatively invariant across clinical situations whereas ARR is highly dependent on baseline risk (it multiplies the RRR by the baseline absolute risk). Thus again I prefer RRR to ARR when trying to calculate the NNT of the patient in front of me (as you know, most trials only enroll the bottom 5-10% of risk).

    Thanks for a great piece, John!

    Competing interests pertaining specifically to this post, comment, or both:
    None

  8. The posttest probability is derived from the formula:
    pretest probability (or prevalence)*Sn/[pretest prob.*Sn+(1-pretest prob.)*1-sp]
    which equates to:
    0.01*0.95/[0.01*0.95+(0.99)*0.20] = 0.05.
    The posttest probability of having an MI is 0.0475 or 0.05 or 5%.
    Stated in odds, there is a “1 in 20” or “19 to 1” odds of having an MI.

    Thus, even a positive troponin test in a patient with very low pretest likelihood of having an MI (1%) does not mean that the patient is having an MI.
    Conversely, if the pretest likelihood was 80%, a negative troponin test does not rule out the possibility of having an MI (posttest probability of 20%).

    Bottom line, the performance of any diagnostic test will be improved if interpreted in the proper clinical context (pretest likelihood).

    For fans of LR (which is independent of prevalence), the positive LR is 4.75 (which is modestly large) and negative LR is 0.06 (clinically very helpful). AUC (another prevalence-independent metric) is 0.95 (highly discriminant).

    Hope this is helpful.

  9. Clinicians above are making attributions as to what patients actually want from their clinicians. Not sure these are based on strong evidence.

    Most patients (and clinicians) have never experienced a satisfactory shared decision making encounter – these invalidates surveys of patient’s preferences for involvement. In our experience of several randomized trials of decision aids, we have seen over 70% of patients expressing high levels of satisfaction with involvement. And these patients are not young; the mean age of our participants in 65.

    Hundreds of video recordings of patient-clinician encounters we have compiled and studied show that unaided, most clinicians “go thru the motions” but rarely express themselves clearly in terms of issues of importance affecting the available options or even present all the options (“if they come to see me it is because they want my opinion not for me to ask them what they want”). They rarely use legible graphs and almost never express themselves quantitatively.

    Shared decision making requires sharing information, deliberation, and making a decision. The ideal extent of patient participation (for this patient considering this decision at this time) is something that clinicians need to gage empathically, like a dance, letting patients take as much of the process over as they want and clinicians taking as much as they need. Clearly offering information that is tailored and clearly designed to be understandable and complete can reduce the risk that the clinician will present the information incompletely or in a biased way (but does not eliminate that risk). Furthermore, presenting information clearly (accessible to patients with limited literacy and numeracy) promotes higher levels of participation in deliberation, asking questions, and expressing preferences. Finally it shows respect, that you care for the patient.

    That many patients find our favorite treatments unattractive once they understand what is involved suggests that clinicians unaided may be overtesting and overtreating patients. This is low quality practice by definition as it is not patient-centered.

    Regarding regret (“if the procedure goes well then I am hero, if not I am a bum”)- the error there results from considering the outcome of the decision making process to include the outcome of the decision itself. If you make a decision that reflects the evidence and the preferences of the patient as a result of patient involvement, the patient suffering an adverse outcome can understand this was a possibility, a real one, that could happen to someone like him/her.

    That patients can understand the probabilities of outcomes is something we have been able to demonstrate repeatedly with our tools (http://kercards.e-bm.info). Low numeracy, limited time, etc are all bad reasons to resort to poor quality practice that fails to care for and about the patient above all else.

    Competing interests pertaining specifically to this post, comment, or both:
    I conduct research into patient-centered care. We do not receive funding from for profit pharma or device companies. We develop decision aids that are freely available for download online and receive no money for their use.

  10. John E Brush, MD says:

    Thanks for your insightful comments. I know that your group has worked on improving shared-decision making using decision aids, including pictoral representations of probabilities. This brings up an additional interesting feature of the study mentioned above by Cosmides and Tooby. They actually tested the effectiveness of using visual frequentist representations. When they asked subjects to construct a pictoral representation of the frequencies, rather than simply stating the frequencies, the number of subjects giving the correct answer to their question went up further – from 76% to 92%. Our intuition works best with natural frequencies, and even better if the frequencies are graphically displayed. Putting this knowledge into practice should lead to better clinical decisions and will help us do a better job of engaging our patients in shared clinical decision-making.

  11. george Ritter, BSci, MD FACC,FACP says:

    The discussion has not addressed the problem of cultural differences. Different cultures react differently to a doctor’s advice. Ideally, the doctor is of the same cultural background as the patient, but frequently that is not the case, yet care must be rendered. this is a huge challenge.So many of my friends have complained of being unable to understand the doctor’s , because of his accent. Actually, it usually turns out that the value systems are different. To complicate the situation even more are the ” new ” EMR systems. People have complained to me that the doctor spends more time on the computer than actually talking with the patient.The 15 minute encounter turns into 5 minutes with the patient and 10 minutes with the evil machine. To properly educate every patient into the full implications of their decision is really impossible, particularly in todays climate.We must compromise on trust and authoriitism.

    Competing interests pertaining specifically to this post, comment, or both:
    no conflicts