Ah, excellent that you are a biostatistician! I can now make this next point which is crucially important I think, yet usually doesn’t get across when I try to make it. For the most part I’d guess I’m just saying what you know all too well, but possibly the interchange will help get across some important pointsgenerally.
I don’t know how it is, but typically even in the sciences and in medicine, a person goes through all this education and is never taught what a p value truly means and what conclusions are merited from it.
p = 0.05, for example, as you know does not, does not, does not, does not mean that there is only a 5% probability that the outcome resulted from chance alone, and there is a 95% probability of actual causation. Not even close.
In and of itself its says nothing, nothing, nothing about the likelihood of the result being from actual cause, or chance alone.
The correct interpretation, generally not understood by doctors and most scientists, is that it means that 5% of the time, when given no actual effect chance alone will yield such an outcome. Totally different.
So let’s say we have some folks who need to publish papers. They do some random extracts from weeds found in lawns, from dirt, from chinchilla fur, from all kinds of things that there’s no preceding evidence of likelihood to be of benefit, and speaking broadly, which a person skilled in the art would consider all to be unlikely to give substantial if any health benefit. For illustration, let’s say that they’re all worthless, though in reality one or two might surprise.
10000 such studies are perfomed over some period of time, just to have a number.
For any one measured parameter, how many studies will “show” effect of a worthless substance to p <=-.05? Five hundred!
But it gets worse, because the researchers will probably test for say 10 or more possible “benefits.” Any that are “found,” will be claimed. Now, chance alone will give 5000 “studies proving benefit” of worthless stuff. Sheesh!
I don’t recall the specific paper on it that I have read or the exact outcome of the proof, but there’s a proof that to evaluate likelihood that result was caused,[/i] one takes the estimated likelihood of causation before the experiment, and then combines this in a given way with the p value.
If for example something, without the evidence of the experiment, would have been deemed a 1 in 1000 chance of working, for example, because let’s say that experience has shown that approximately 1 in 1000 materials chosen in a particular way actually work when tested sufficiently – then you wouldn’t want to accept say p = 0.05 as supporting let alone showing likely causation. In that instance, almost all such experiments that “showed” a positive outcome would in fact be the product of chance alone. Instead, for the probability of chance alone actually being the only cause in this instance[/i] you’d need a p value of either .00005 or half that, I don’t recall.
For this reason, and variability in biological experiments, an enormous of things that are “shown,” aren’t shown to any substantial confidence at all.
And then there’s selection bias. Between that and the above, it really takes working hard to interpret, and a lot of analysis of mechanism to make qualitative plausibility estimates.
It could be that your involvement in these many published papers provides a selection causing it to appear that there is more care than there is generally!