By Philip R. Alper
In what seems like another lifetime, one of my fellow interns regularly hustled down to the hospital library to intercept the newly arrived copy of the New England Journal of Medicine (N Eng J Med) in the wee hours of the morning. He wanted to be the first on rounds to cite the latest references
Chances are this wouldn’t work today. Neither the post office nor the mailroom is that efficient anymore. But more important, the articles are just too difficult to plow through quickly. Of course, this isn’t limited to theN Eng J Med. It’s typical of the entire clinical literature, in which it has become a real chore to see the medical forest through the statistical trees.
I’ve been brooding about this for a while, wondering if it is aging, curmudgeonliness, or some other factor that makes me think this way. Several years ago, spotting the (now- former) editor of the Annals of Internal Medicine (Ann Intern Med) at a meeting, I complained, telling him that the only comprehensible article in the issue that had just come out was a piece of wonderful prose written by Howard M. Spiro, MD, renowned Yale gastroenterologist and humanist physician (See Dec. 2004 issue of IMWR, page 1). “The rest are so full of statistical jargon that I can’t make heads or tails of them,” I said.
He appeared bemused. “Which ones, exactly?” he asked. Guilty of hyperbole (but not of essential truth), I dropped the subject—and continued a love-hate relationship that comes to the surface with each frustrating encounter with my journals. Early in 2004, I selected issues of the Journal of the American Medical Association (JAMA), the N Eng J Med,and the Ann Int Med that all came during the same week, intending to use them to write this article. I procrastinated because I wanted to be surer of my grounds and also to have something constructive to say. After all, hadn’t I taken statistics in medical school? And isn’t science a laborious search for an often-elusive truth? So what precisely is my complaint?
Let me now take a stab at articulating it. Statistics themselves are suspect. The introduction of penicillin wiped out a host of useless treatments because its benefits were obvious. When things that are not obvious are subject to statistical analysis, the results must be taken on faith—scientific faith, to be sure, but faith nevertheless. And it takes an act of will to overcome background awareness of lay wisdom inherent in the aphorism, “If you want to really lie, use statistics.”
It isn’t reassuring when the letter columns of journals are filled with carping statistical nitpicks. Go, for example, to page 2084 of the November 3, 2004, issue of JAMA and read more than 1.5 pages of commentary on an article about androgen suppression plus radiation therapy for prostate cancer (nearly one third of the entire letters section), that is essentially a critique of the study design and statistical accuracy of the article. I invite you to see what you can make of the quibbling over 1-sided versus 2-sided P values. Easily lost in translation is useful information for patient care. And this is in JAMA, not an esoteric specialty journal, but one addressed to all physicians. As one colleague put it, “One needn’t lie with statistics. It’s enough to just confound us.”
Confounding happens regularly when articles and subsequent commentary leave the reader unsure of what to think. It isn’t just the absence of time and expertise needed to slog though the statistical minutiae but the unrealistic expectation that significant numbers of physicians have the requisite interest to do so. If one makes it through P values and understands the effect of myriads of confounding factors and the adequacy of attempts made by the authors to control for them, there remains the issue of statistical spin.
Statistics introduce biases of their own. For example, a relative risk reduction of 50% sounds impressive and makes for great press releases and advertising copy addressed to the public—even when the absolute risk reduction may be no more than from 2% to 1%. The latter is far less likely to impress a clinician when that is made clear. But a commonsense grasp of the issues can be elusive and easily buried in complex statistical discussions.
Moreover, the reader is unlikely to improve with practice. This point was made by a medical librarian I know who said, “Over the years, I’ve read thousands of articles in the mainstream journals and not gained any expertise in evaluating their validity.” I suspect that many (if not most) physicians feel the same way. Alienation can readily progress to disinterest. Physicians I’ve discussed this with and who have largely given up on reading the literature say it’s because they “get so little out of it.”
If it seems that I’m being lopsided in my criticism, I haven’t yet addressed the unresolved issues of commercial bias in scientific articles and political bias in policy pieces. The latter is particularly reprehensible because it masquerades under the rubric of virtue. When the Institute of Medicine came out with the report, “To Err Is Human,” and announced that there are 98,000 hospital deaths caused by medical error annually, they hired public relations firms to tout the results lest they go unnoticed and not be acted upon. Of course later analysis showed the number to be wildly exaggerated. This is hardly an isolated event. The Centers for Disease Control and Prevention have just admitted to drastically overstating the rapidly increasing role of obesity as a cause of death. Nor was it just an oversight or error. Contrary opinion within the research group was ignored. (Wall Street Journal, 11/23/04, page 1).
I am not making a plea for quickie science. Neither am I attempting to excuse the faults of the undedicated. The research community, practicing physicians, and the medical literature need to come into better balance. One simple measure would be to present statistical concepts in color and possibly in the form of a series of thermometers that suggest degrees of reliability. The use of detailed appendices after more pointed text would allow clinicians and researchers their respective areas of major interest. Editorial comment could address the gist of all the letters received from readers and not just the few chosen for publication. If the colored thermometers then need adjustment “on second thought,” that could be included. Online publication is another story altogether.
These are only a few preliminary thoughts. If there are other unhappy readers out there, I’d like to hear from you.