Error Statistics Philosophy

Frequentists have long been in a kind of exile when it comes to statistical philosophy. The line is—and how many times can one hear it?—that only personalistic Bayesianism had a shot at coming up with respectable philosophical foundations. This may now be changing. Perhaps frequentist foundations, never made fully explicit, but at most lying deep below the ocean floor, are finally being disinterred. Join me, if you will, for a little deep-water drilling, as I cast about on my isle of Elba.

Friday, September 30, 2011

ELBA GREASE

In exile from exile, I sort of miss one of the places my Island friends would insist I accompany them to on Friday nights: a watering hole called the “Elbar Room” which serves up a wonderful sour drink called “Elbar Grease” (I am serious)—it is like drinking straight lemon which for some reason I‘ve always liked (GW says I may be missing a gene). Anyway it’s some kind of sparkling wine with extremely sour lemon liquor and nectarines. The shiny military brass barstools alone make the place interesting. Sadly, I don’t know when I can return just yet.

Part 1: Imaginary scientist at an imaginary company, Prionvac, and an imaginary Reformer

Prionvac: Our experiments yield a statistically significant increase in survival among scrapie-infected mice who are given our new vaccine (p = .01) compared to infected mice who are treated with a placebo. The data indicate H: an increased survival time of 9 months, compared to untreated mice.*

Reformer: You are exaggerating what your data show. In fact, there is a fairly high probability, more than .5, that your study would produce a p = .01 difference, even if the actual increased survival were only 1 month! (That is, the power to reject the null and infer H: increase of 9 months, is more than .5.)

WHIPPING BOYS AND WITCH HUNTERS --comments are now open

In an earlier post I alleged that frequentist hypotheses tests often serve as whipping boys, by which I meant “scapegoats”, for the well-known misuses, abuses, and flagrant misinterpretations of tests (both simple Fisherian significance tests and Neyman-Pearson tests, although in different ways). Checking the history of this term however, there is a certain disanalogy with at least the original meaning of a of “whipping boy,” namely, an innocent boy who was punished when a medieval prince misbehaved and was in need of discipline. It was thought that seeing an innocent companion, often a friend, beaten for his own transgressions would supply an effective way to ensure the prince would not repeat the same mistake. But significance tests floggings, rather than a tool for a humbled self-improvement and commitment to avoiding flagrant rule violations, has tended instead to yield declarations that it is the rules that are invalid! The violators are excused as not being able to help it! The situation is more akin to that of witch hunting, that in some places became an occupation in its own right.

LUCKY 13 (Criticisms)

Given some slight recuperation delays, interested readers might wish to poke around the multiple layers of goodies on the left hand side of this web page, wherein all manner of foundational/statistical controversies are considered. In a recent attempt by Aris Spanos and I to address the age-old criticisms from the perspective of the “error statistical philosophy,” we delineate 13 criticisms. Here they are:

A Highly Anomalous Event

The journey to San Francisco was smooth sailing with no plane delays; within two hours of landing I found myself in the E.R. of St. Francis Hospital (with the philosopher of science Ronald Giere), unable to walk. I have just described an unexpected, “anomalous”, highly unusual event, but no one would suppose it was anomalous FOR, i.e., evidence against some theory, say, in molecular biology. Yet I am getting e-mails (from readers) saying, in effect, that since the improbable coin toss result is very unexpected/anomalous in its own right, it therefore is anomalous for any and all theories, which is patently absurd. What had happened, in case you want to know, is that just as I lunged forward to grab my (bulging) suitcase off the airline baggage thingy, out of the corner of my eye I saw my computer bag being pulled away by someone on my left, and as I simultaneously yanked it back, I tumbled over---very gently it seemed-- twisting my knee in a funny way. To my surprise/alarm, much as a tried, I could put no weight on my right leg without succumbing to a Geppeto-puppet-like collapse. The event, of course, could rightly be regarded as anomalous for hypotheses about my invulnerability to such mishaps, because it runs counter to them. I will assume this issue is now settled for our discussions, yes?

Thursday, September 15, 2011

Getting It Right But for the Wrong Reason

Sitting in the airport . . . a temporary escape from Elba, which I’m becoming more and more loathe to leave. I fear that some might agree, rightly, that Kadane’s “trivial test” is no indictment of significance tests and yet for the WRONG reason. I don’t want to beat a dead horse, but perhaps a certain confusion is going to obstruct understanding later on. Let us abbreviate “tails” on a coin toss that lands tails 5% of the time, as “a rare coin toss outcome”. Some seem to reason: since a rare coin toss outcome is an event with probability .05 REGARDLESS of the truth or falsity of a hypothesis H, then the test is still a legitimate significance test with significance level .05; it is just a lousy one, with no discriminating ability. I claim it is no significance test at all, and that there is an important equivocation going on (in some letters I’ve received)---one which I hoped would be skirted by the analogy with ordinary hypothesis testing in science. Heading off this confusion was the key rationale for my discussion in the Kuru post. Finding no nucleic acid in prions is inconsistent, or virtually so, under the hypothesis H: all pathogens are transmitted with nucleic acid. The observed results are anomalous for the central dogma H BECAUSE they are counter to what H says we would expect. If you maintain that the “rare coin toss outcome” is anomalous for a statistical null hypothesis H, then you would also have to say they are anomalous for H: all pathogens have nucleic acid. But it is obvious this is false in the case of the scientific hypothesis. It must also be rejected in the case of the statistical hypothesis (Rule #1).

SF conferences & E. Lehmann

I’m jumping off the Island for a bit. Destination: San Francisco, a conference on “The Experimental Side of Modeling” http://www.isabellepeschard.org/ . Kuru makes a walk on appearance in my presentation, “How Experiment Gets a Life of its Own”. It does not directly discuss statistics, but I will post my slides.

In Exile, Clinging to Old Ideas?

To take up the first criticism, we can consider J. Kadane’s new book, Principles of Uncertainty (2011, CRC Press*). Kadane, to his credit, does not beat around the bush as regards his subjective Bayesian perspective; his is a leading Bayesian voice in the tradition of Savage. He takes up central criticisms of frequentist methods in Chapter 12 called “Exploration of Old Ideas”. So now I am not only in foundational exile, I am clinging to ideas that are in need of Juvederm!

Here is his criticism: “Flip a biased coin that comes up heads with probability 0.95, and tails with probability 0.05. If the coin comes up tails reject the null hypothesis. Since the probability of rejecting the null hypothesis if it is true is 0.05, this is a valid 5% level test. It is also very robust against data errors; indeed it does not depend on the data at all. It is also nonsense, of course, but nonsense allowed by the rules of significance testing.” (439)

Comment on the Comments

It was arranged for an islander J.M. to help with this (my first) blog, especially as service is spotty and I have to take a ferry to get proper internet. During a brief blackout in the village (Saturday), it seems, J.M. was working on the settings and lost the comments! Apologies. Comments are back up now, and open. Given that the ferry is also used as some sort of a funky fishing boat, the thought of which makes me feel seasick, I restrict trips to around 3 a week. Thus, don't expect greater output than that for now, although J.M., having been some kind of a sailor or perhaps a whaler could go for me. I am staying over this time—dinner invite at the Mulini Palace!--- and should put up a new post later, dealing with the first criticism.

Friday, September 9, 2011

KURU

I have been reading about a disorder that intrigues me, Kuru (which means “shaking”) widespread among the Fore people of New Guinea in the 1960s. In around 3-6 months, Kuru victims go from having difficulty walking, to outbursts of laughter, to inability to swallow and death. Kuru, and (what we now know to be) related diseases, e.g., Mad Cow, Crutzfield Jacobs, scrapie) are “spongiform” diseases, causing brains to appear spongy. (They are also called TSEs: transmissible spongiform encephalopathies). Kuru clustered in families, in particular among Fore women and their children, or elderly parents.

They began to suspect transmission was through mortuary cannibalism. Apparently this was deemed a way of honoring the dead, and was also a main source of meat permitted women. It seems that men also took part, but got first dibs on eating the muscle. Ending these cannibalistic practice all but eradicated the disease, which had been of epidemic proportions.

Drilling Rule #1*

A simple rule before getting started: In presenting their arguments, philosophers sometimes appear to go off into far distant islands entirely, and then act as if they have shown something about the case at hand. The mystery evaporates if one keeps in mind the following rule of argument:

If one argument is precisely analogous to another, in all relevant respects, and the second argument is pretty clearly fishy, then so is the first. Likewise, if one argument is precisely analogous to another, in all relevant respects, and the second argument passes swimmingly, then so must the first.

If the argument at hand is murky, while the one in the distant land crystal clear, then appealing to the latter is a powerful way to make a point. Because the relevance for the case at hand seems obvious, details may be left unstated. Of course you may avoid these conclusions by showing just where the analogies break down.

*Full disclosure: I own a fair amount of Diamond Offshore (DO), but do not plan to purchase more in the next 72 hours.

Saturday, September 3, 2011

Overheard at the comedy hour at the Bayesian retreat:

“Did you hear the one about the frequentist . . .

“who claimed that observing “heads” on a biased coin that lands heads with probability .05 is evidence of a statistically significant improvement over the standard treatment of diabetes, on the grounds that such an event occurs with low probability (.05)?”

“who defended the reliability of his radiation reading, despite using a broken radiometer, on the grounds that most of the time he uses one that works, so on average he’s pretty reliable?”

Such jests may work for an after-dinner laugh, but if it turns out that, despite being retreads of “straw-men” fallacies, they form the basis of why some reject frequentist methods, then they are not such a laughing matter. But surely the drubbing of frequentist methods could not be based on a collection of howlers, could it? I invite the curious reader to stay and find out.

Frequentists in Exile: The Purpose of this Blog

Confronted with the position that “arguments for this personalistic theory were so persuasive that anything to any extent inconsistent with that theory should be discarded” (Cox 2006, 196), frequentists might have seen themselves in a kind of exile when it came to foundations, even those who had been active in the dialogues of an earlier period. Sometime around the late 1990s there were signs that this was changing. Regardless of the explanation, the fact that it did occur and is occurring is of central importance to statistical philosophy.

Now that Bayesians have stepped off their a priori pedestal, it may be hoped that a genuinely deep scrutiny of the frequentist and Bayesian accounts will occur. In some corners of practice it appears that frequentist error statistical foundations are being discovered anew. Perhaps frequentist foundations, never made fully explicit, but at most lying deep below the ocean floor, are finally being disinterred. But let’s learn from some of the mistakes in the earlier attempts to understand it. With this goal I invite you to join me in some deep water drilling, here as I cast about on my Isle of Elba.

Cox, D. R. (2006), Principles of Statistical Inference, CUP.