Statistical analysis is essential to practicing modern medicine. Properly designed clinical trials are at the heart of Evidence-based medicine (EBM). The concept of EBM has evolved over the last 30 years from the work of Archie Cochran, an epidemiologist, who wrote a series of lectures in 1972 on the efficacy of medical services.
Simon Douglas Murray, MD
Editor-in-Chief
Statistical analysis is essential to practicing modern medicine. Properly designed clinical trials are at the heart of Evidence-based medicine (EBM). The concept of EBM has evolved over the last 30 years from the work of Archie Cochran, an epidemiologist, who wrote a series of lectures in 1972 on the efficacy of medical services.
Cochrane argued that medical care, at the time, was based on intuition, hearsay evidence, and interventions of dubious safety and efficacy — some of which could cause great harm to patients. He believed it was almost too wasteful using unscientific treatments and iatrogenic injury because of the failure to evaluate effective treatments. He argued that treatment should be evaluated using unbiased methods such as the randomized double-blind studies, which, since then had become widely accepted as the gold standard to evaluate new treatments, diagnostic tests, and health care outcomes. This had evolved into a so-called hierarchy of evidence with a qualitative ranking of different types of evidential support for judgments in clinical practice. The clinical trial is at the heart of scientific research and determines which treatments are the most effective, cost effective, and safe. As this had, undoubtedly transformed the way physicians practice, it behooves us to understand some of the basic terminology used in medical statistics.
In a second article we will discuss some of the pitfalls inherent in practicing evidence based medicine.
What is meant by statistical significance?
While in the English language, significant is defined as what is important, in statistics, significance refers to what is probably true and not due to chance. The most commonly used tool to evaluate significance is called the P value. Simply put, a P value represents the observed difference between 2 study groups is not a result of chance. A P value of 0.05 indicates there is a 95% chance that the observation was not due to chance. As such, the lower the p value, the less likely the results of a given intervention are due to chance. P values are reported in virtually all medical studies and are a way to validate the results of a given intervention. However, as simple as it seems, the P value is one of the most misunderstood, misinterpreted, and miscalculated indexes. Reliance upon P values without taking other things into consideration simplifies the process of hypothesis testing and does not account for error rates, power, and other statistical concepts. As it does not justify how reasonable a conclusion is based upon what is known or expected, it should not be the sole indicator to validate what is really true. It’s important to note that a low P value does not guarantee that the results observed are meaningful, and a high P value does not assure that they are not meaningful.
Even if accurately calculated, a P value highlighting statistical significance does not always translate into clinical significance because it fails to account for magnitude of difference. Secondly the study endpoint may not be clinically important as can be seen with surrogate end points — for instance, using bone density to predict reduction in hip fractures using bisphospantes or LDL to predict mortality rates.
Absolute risk and relative risk
The concept of risk is essential to medical statistics. A risk is simply a chance of something happening. It can be reported in many ways: 1/100, 10/1000, 100/100,000, but it’s far easier to describe risk in terms of percentages. The actual risk of something occuring is called the absolute risk. If a given intervention lowers risk, the reduction is called the absolute risk reduction and is often expressed as a percentage. Relative risk compares the absolute risks of 2 groups. Unless clearly stated, the benefits of a given intervention can be overestimated using relative risk alone.
Consider the study that shows that stomach cancer is 5 times more likely in people who don’t eat spinach vs. people who do eat spinach. To really understand the magnitude of benefit a person gets by eating spinach we need to know what the stomach cancer risk in spinach eaters is as well as what the risk of stomach cancer is in non-spinach eaters. The absolute risk of stomach cancer in spinach eaters is reported to be 1/10000 (.0001) and in non-spinach eaters it’s 5/10000 (. 0005). The magnitude of benefit is 4/10000, hardly worth worrying about. To put it into perspective, among non-spinach eaters 9,995 individuals will never get stomach cancer!
When risk is much less than 1, it is useful to describe risks in relation to other risks that possess real-life meaning. Patients will frequently cite that statins cause liver disease, so naturally won’t take them. In reality, the incidence of getting serious liver injury from statins is about 2.5/100,000 compared to the risk (1/85) of dying in road traffic over 50 years of driving. Potentially needing emergency treatment in the next year from injury by a bed, mattress, or pillow (1/2000) or the risk by being hit by a falling airplane (1/250,000), are also far more realistic than acquiring statin-induced liver injury.
Number of patients needed to treat
The number of patients needed to be treated for preventative interventions will usually be higher than the number needed to treat for treatment interventions, because prevention trials include both high and low risk patients.
The type of study influences the strength of the conclusion secondly drawn. Generally speaking, experimental studies provide stronger evidence than observational studies, and confirmatory studies deliver stronger evidence than exploratory studies.
Sample size and power
Statistically, a larger sample increases the chance of finding a significant difference, as they more accurately reflect the mean.
However, it’s important to note that too large a sample may be expensive and can overestimate false positives. To power a study, a series of assumptions are made based upon prior experience with the treatment comparison and the selection of sufficient participants to minimize false positives and false negatives.
Sensitivity and specificity
One of the biggest issues people often have with statistics is the appearance of it being counterintuitive. Consider the following example. A patient comes to the office for a test, which has a true positive rate of 99/100 (Sensitivity), and a true negative rate of 99/100 (Specificity). Conversely there is a 1/100 chance a result is a false positive and 1/100 chance of a false negative, and the patient tests positive for the disease. What are the odds that the patient has the disease?
The answer is that it depends on the incidence of the disease in the population. If the incidence expected is 1/200, then out of every 200 patients, we expect that one will actually have the disease. In our example, 1/100 or 2/200 will be incorrectly identified as having the disease (False positive rate). Therefore, the chance of having the disease is 1/3. (3 positive results in 200, one of which is a true positive).
Hence, knowing the sensitivity and specificity is not enough to determine the accuracy of a test; you need to know how often the disease occurs in the population. As the incidence of a disease decreases, the probability that a test is positive plummets. That’s precisely why we don’t routinely do stress tests in young women because any positive could be a false positive.
Statistics are not always logical or straightforward. It is had been said that if you ask a mathematician what one plus one is he will answer, “2,” a philosopher will answer, “It depends,” and a statistician will answer, “What do you want it to be?”
That’s not to say we can ignore statistics in our practices, but it is important not to forget to keep them in their proper context, and acknowledge that some studies are better than others. We must keep a critical mind.