View Single Post
Old 09-10-2012, 09:09 AM   #3
gdpawel
Senior Member
 
gdpawel's Avatar
 
Join Date: Aug 2006
Location: Pennsylvania
Posts: 1,080
The Concept of Statistical Significance

The concept of "statistical significance" is so difficult to understand that misunderstandings are forgivable, suggests Donald Berry, a biostatistician at the University of Texas M.D. Anderson Cancer Center. Carl Bialik of the WSJ asked several statisticians to offer definitions of "statistical significance."

Shane Reese had the briefest one, tailored for a clinical trial for a drug: “It is unlikely that chance alone could have produced the improvement shown in our clinical trial. Because it seems unlikely that chance produced the improvements, we logically conclude that the improvement is due to the drug.” Reese and other statisticians noted that this definition is backwards: It is based on assuming there is no link, then finding the probability that chance alone could have produced the experimental results seen.

Reese and Brad Carlin, who also offered a definition, suggest that Bayesian statistics are a better alternative, because they tackle the probability that the hypothesis is true head-on, and incorporate prior knowledge about the variables involved.

There are other problems with "statistical significance." It can be ill-suited to cases where it is unclear if all data is being collected, such as with the reporting of adverse events experienced by users of a drug that is past the clinical-trial stage, or never had to go through clinical trials, and is now on the market. In such a situation, “you have to make a lot of assumptions in order to do any statistical test, and all of those are questionable,” said Susan Ellenberg, a biostatistician at the University of Pennsylvania’s medical school.

“Every statistical test relies on half a dozen assumptions,” echoed Aris Spanos, an economist at Virginia Tech. “Before you use that test, you have to check your assumptions.”

Spanos wishes the Supreme Court had gone further in its recent ruling, in which it determined that a lack of statistical significance didn’t always provide drug companies with enough cover to avoid disclosing reports of adverse events from users of their drugs. Spanos would have liked to see more guidance for how to proceed without relying strictly on statistical significance. “It was a move in the right direction but then you open the system to different kinds of abuses,” Spanos said.

The U.S. FDA also doesn’t use such a black-and-white rule. In January the FDA warned women who have gotten breast implants or might get breast implants, because of an elevated risk of the rare cancer anaplastic large-cell lymphoma. The FDA did so even though the link wasn’t statistically significant, in part because the agency reasoned that perhaps not all such incidents were reported. “It underscores the importance of not solely relying on a statistical test to tell you there is a public-health issue,” said William Maisel, the chief scientist for the agency’s Center for Devices and Radiological Health.

There also are cases where seemingly statistically significant results aren’t, statisticians say. For example, a very large sample size reduces the effects of statistical noise, so it can yield very high levels of significance for fairly minor relationships, or roughly speaking, a large degree of confidence in the existence of a very small effect.

Checking for lots of potential effects can also lead to results that appear to be statistically significant, but aren’t. “In the early days of clinical trials, it wasn’t unusual for people to keep looking at data as they go along,” said Ellenberg. “It was a fishing expedition, completely subverting the whole notion of chance findings.”

Also, a "statistically significant" effect may not matter much in practice. “Statistics and value judgment belong to different domains,” Siu L. Chow, professor emeritus in psychology at the University of Regina in Saskatchewan, wrote in a written response. “It follows that statistical decision and assessment of substantive impact have their own respective metrics. Hence, it is incorrect to use "statistical significance" or any other statistical indices (e.g., effect size) to index real-life importance.”

Stephen Ziliak, an economist at Roosevelt University in Chicago, and co-author of the 2008 book “The Cult of Statistical Significance” with Deirdre McCloskey, an economist at the University of Illinois at Chicago, said he would like to see large effect sizes reported even when they are not statistically significant. Researchers “probably ought to go ahead and report what happened anyway,” Ziliak said. “There’s probably a lot of stuff out there that didn’t see the light of day.”

Stephen Stigler, a statistician at the University of Chicago, agrees with the general premise that “you can have a real effect which is nonetheless trivial in the practical sense.” He doesn’t think this is widely misunderstood, though: “I don’t think in science we generally sanction the unequivocal acceptance of significance tests.”

Ziliak disagrees, saying in his book: “It is passionate in the sense that we do reveal anger. We had collectively been working on this issue in a calm fashion for 45 years. We deserved to open up the conversation a little more widely in this way.”

http://blogs.wsj.com/numbersguy/a-st...-closeup-1050/

For an amusing take on statistical significance.

http://xkcd.com/882/

T-DM1 Anitbody-drug conjugates (ADCs)

http://cancerfocus.org/forum/showthread.php?t=3768

I heard women who were bumped from the T-DM1 clinical trial because of disease progression, which meant that their cancer was growing despite the drug. Bumped off the trial because of disease progression? Wonder how many patients there were?

Response rates (how much a tumor decreased in size) can be inflated when excluding patients during clinical trials (evaluable patients). Patients not considered "evaluable" are often those who do not get the benefit of an entire treatment plan. The response rate is calculated after removing patients, who die or have been excluded, from the calculation. This inflates the response rate.

But clinical oncologists want to publish their papers. They need to report on the outcomes of their experiments, but if they had to wait for survival data, it could take years until all the data was aggregated. That wouldn't bode well for them to participate in pharma-sponsored trials in the future.

Response rates give clinical oncologists the opportunity to take a more optimistic look at therapies that have limited success. They can describe results as being complete remission, partial remission or simply clinical improvement.

If they treat all patients for three weeks, they can fairly evaluate the efficacy of a compound, which takes that lone (on average) before it can be regarded as effective. If they disregard all patients who die or were excluded after onset of therapy, and include only those treated three weeks or more, they can improve their data.

To justify their existence, they have to publish papers. That's what they do.
gdpawel is offline   Reply With Quote