Error Statistics Philosophy

Frequentists have long been in a kind of exile when it comes to statistical philosophy. The line is—and how many times can one hear it?—that only personalistic Bayesianism had a shot at coming up with respectable philosophical foundations. This may now be changing. Perhaps frequentist foundations, never made fully explicit, but at most lying deep below the ocean floor, are finally being disinterred. Join me, if you will, for a little deep-water drilling, as I cast about on my isle of Elba.

Monday, October 31, 2011

Oxford Gaol: Statistical Bogeymen

Oxford Jail is an entirely fitting place to be on Halloween!

Moreover, rooting around this rather lavish set of jail cells (what used to be a single cell is now a dressing room) is every bit as conducive to philosophical reflection as is exile on Elba! My goal (while in this gaol—as the English sometimes spell it) is to try and free us from the bogeymen and bogeywomen often associated with “classical” statistics. As a start, the very term “classical statistics” should I think be shelved, not that names should matter.

In appraising statistical accounts at the foundational level, we need to realize the extent to which accounts are viewed through the eyeholes of a mask or philosophical theory. Moreover, the mask some wear while pursuing this task might well be at odds with their ordinary way of looking at evidence, inference, and learning. In any event, to avoid non-question-begging criticisms, the standpoint from which the appraisal is launched must itself be independently defended. But for Bayesian critics of error statistics the assumption that uncertain inference demands a posterior probability for claims inferred is thought to be so obvious as not to require support. Critics are implicitly making assumptions that are at odds with the frequentist statistical philosophy. In particular, they assume a certain philosophy about statistical inference (probabilism), often coupled with the allegation that error statistical methods can only achieve radical behavioristic goals, wherein all that matters are long-run error rates (of some sort)

Background Knowledge: Not to Quantify, But To Avoid Being Misled By, Subjective Beliefs

Increasingly, I am discovering that one of the biggest sources of confusion about the foundations of statistics has to do with what it means or should mean to use “background knowledge” and “judgment” in making statistical and scientific inferences. David Cox and I address this in our “Conversation” in RMM (2011); it is one of the three or four topics in that special volume that I am keen to take up.

Insofar as humans conduct science and draw inferences, and insofar as learning about the world is not reducible to a priori deductions, it is obvious that “human judgments” are involved. True enough, but too trivial an observation to help us distinguish among the very different ways judgments should enter according to contrasting inferential accounts. When Bayesians claim that frequentists do not use or are barred from using background information, what they really mean is that frequentists do not use prior probabilities of hypotheses, at least when those hypotheses are regarded as correct or incorrect, if only approximately. So, for example, we would not assign relative frequencies to the truth of hypotheses such as (1) prion transmission is via protein folding without nucleic acid, or (2) the deflection of light is approximately 1.75” (as if, as Pierce puts it, “universes were as plenty as blackberries”). How odd it would be to try to model these hypotheses as themselves having distributions: to us, statistical hypotheses assign probabilities to outcomes or values of a random variable.

RMM-3: Special Volume on Stat Scie Meets Phil Sci

The article "Empirical Economic Model Discovery and Theory Evaluation" by Sir David Hendry has now been published in our special volume of the on-line journal, Rationality, Markets, and Morals (Special Topic: Statistical Science and Philosophy of Science: Where Do/Should They Meet?")
http://www.rmm-journal.de/downloads/Article_Hendry.pdf
Abstract:
Economies are so high dimensional and non-constant that many features of models can- not be derived by prior reasoning, intrinsically involving empirical discovery and requiring theory evaluation. Despite important differences, discovery and evaluation in economics are similar to those of science. Fitting a pre-specified equation limits discovery, but automatic methods can formulate much more general initial models with many possible variables, long lag lengths and non-linearities, allowing for outliers, data contamination, and parameter shifts; then select congruent parsimonious-encompassing models even with more candidate variables than observations, while embedding the theory; finally rigorously evaluate selected models to ascertain their viability.

Tuesday, October 25, 2011

Thinking of Eating Meat Causes Antisocial Behavior?

0
inShare

Study of meat eaters are selfish and less social was a hoax

One of the faculty members here in The Netherlands (Richard Gill) told me about this social scientist (Diederik Stapel) who long fabricated data purporting to provide evidence for things like: thinking of eating meat causes anti-social behavior. He was only very recently fired. My cynical question is: isn't there enough latitude in any data purporting to provide evidence for such claims to avoid the need for outright fabrication?

See also http://ktwop.wordpress.com/tag/diederik-stapel/

The study by three university professors to show that meat eaters are "selfish bastards" is based on fraud.

The professors suggested, based on a study, that meat eaters are more selfish than vegetarians and that they are less social to compensate their insecurity and loneliness.

The psychologists of the Radboud University Nijmegen and Tilburg University concluded from various studies on the psychological significance of meat.

They stated that thinking of meat makes people less socially and in many respects more "loutish". It also appears that people are more likely to choose meat when they feel insecure, perhaps because it is a feeling of superiority or status displays, the researchers suggest.

Sunday, October 23, 2011

OPERA ERROR? or ...

Driven in this rather far-out pink Hummer car rental (not my idea, but cute--takes deisel too), I quickly got to the Zurich airport. Next stop: a workshop on error in the sciences ( Lorentz Center in the Netherlands). Now last week I’d read that there was a fairly blatant error in the statistical analysis (or in the prediction) involved in the experiments on faster-than-the speed-of-light-particles by the OPERA group (Oscillation Project with Emulsion-tRacking Apparatus), but now it appears there is back-tracking on the back-tracking. What do readers think? Can anyone update me on this? (Hunches ok too.)

Saturday, October 22, 2011

The Will to Understand Power: Neyman’s Nursery

Way back when, although I’d never met him, I sent my doctoral dissertation, Philosophy of Statistics, to one person only: Professor Ronald Giere. (And he would read it, too!) I knew from his publications that he was a leading defender of frequentist statistical methods in philosophy of science, and that he'd worked for at time with Birnbaum in NYC;

Some ten years ago, he decided to quit philosophy of statistics (while remaining in philosophy of science): I believe that he’d had enough of a certain form of statistical exile. He asked me if I wanted his papers—a mass of work on statistics and statistical foundations gathered over many years. Could I make a home for them? I said yes. Then came his caveat: there would be a lot of them.

Blogging the Likelihood Principle #2: Solitary Fishing:SLP Violations

The Appendix of the "Conversation" (posted yesterday) is an attempt to quickly sketch the SLP argument, and its sins. Couple of notes: Firstly, I am a philosopher (of science and statistics) not a statistician. That means, my treatment will show all of the typical (and perhaps annoying) signs of being a trained philosopher-logician. I’ve no doubt statisticians would want to use different language, which is welcome. Second, this is just a blog (although perhaps my published version is still too informal for some).

But Birnbaum’s idea for comparing evidence across different methodologies is also an informal notion! He abbreviates by Ev(E, x): the inference, conclusion or evidence report about the parameter μ arising from experiment E and result x, according to the methodology being applied.

RMM: "A Conversation Between Sir David Cox & D.G. Mayo"

Published today in Rationality, Markets and Morals

Studies at the Intersection of Philosophy and Economics

"A Statistical Scientist Meets a Philosopher of Science: A Conversation between Sir David Cox and Deborah Mayo"

(as recorded, June, 2011)

http://www.rmm-journal.de/htdocs/st01.html

Saturday, October 15, 2011

Objectivity #3: Clean(er) Hands With Metastatistics

I claim that all but the first of the “dirty hands” argument’s five premises are flawed. Even the first premise too directly identifies a policy decision with a statistical report. But the key flaws begin with premise 2. Although risk policies may be based on a statistical report of evidence, it does not follow that the considerations suitable for judging risk policies are the ones suitable for judging the statistical report. They are not. The latter, of course, should not be reduced to some kind of unthinking accept/reject report. If responsible, it must clearly and completely report the nature and extent of (risk-related) effects that are and are not indicated by the data, making plain how the methodological choices made in the generation, modeling, and interpreting of data raise or lower the chances of finding evidence of specific risks. These choices may be called risk assessment policy (RAP) choices.

Granted, values do arise from data interpretation, but they reflect the value of responsibly reporting the evidence of risk. Some ethicists argue that scientists should favor public and environmental values over those of polluters, developers, and others with power. Maybe they should, but it is irrelevant. Even if one were to grant this (and it would be a matter of ethics), it still would be irresponsible (on scientific grounds) to interpret what the data indicate about the risk in the light of policy advancement, even assuming that the vulnerable parties would prefer that policy. The job of the scientist is to unearth what is and is not known about the substance, practice, or technology.

King Tut Includes ErrorStatistics in Top 50 Statblogs!

http://www.thebestcolleges.org/best-statistics-blogs/

I didn’t think our little rag tag blog-in-exile was even noticed. I’m glad to discover several other sites I was unaware of (providing yet more grist for our mills).

(Note: I am not at all happy with the way the comments are appearing here; there's insufficient space. I will be investigating better solutions.....I'm aware of the problem.)

I will soon be departing from this cushy chateau, where even King Tut reads EGEK.

Objectivity #2: The “Dirty Hands” Argument for Ethics in Evidence

Some argue that generating and interpreting data for purposes of risk assessment invariably introduces ethical (and other value) considerations that might not only go beyond, but might even conflict with, the “accepted canons of objective scientific reporting.” This thesis, we may call it the thesis of ethics in evidence and inference, some think, shows that an ethical interpretation of evidence may warrant violating canons of scientific objectivity, and even that a scientist must choose between norms of morality and objectivity.

The reasoning is that since the scientists’ hands must invariably get “dirty” with policy and other values, they should opt for interpreting evidence in a way that promotes ethically sound values, or maximizes public benefit (in some sense).

Objectivity 1: Will the Real Junk Science Please Stand Up?

Have you ever noticed in wranglings over evidence-based policy that it’s always one side that’s politicizing the evidence—the side whose policy one doesn’t like? The evidence on the near side, or your side, however, is solid science. Let’s call those who first coined the term “junk science” Group 1. For Group 1, junk science is bad science that is used to defend pro-regulatory stances, whereas sound science would identify errors in reports of potential risk. For the challengers—let’s call them Group 2—junk science is bad science that is used to defend the anti-regulatory stance, whereas sound science would identify potential risks, advocate precautionary stances, and recognize errors where risk is denied.

Both groups agree that politicizing science is very, very bad—but it’s only the other group that does it!

RMM: Special Volume on Stat Scie Meets Phil Sci

Little by little the articles on Stat Sci Meets Phil Sci are appearing in "Rationality, Markets and Morals," online at:

http://www.rmm-journal.de/htdocs/st01.html

The article "Statistical Science and Philosophy of Science: Where Do/Should They Meet in 2011 (and Beyond)?" has now been published
http://www.rmm-journal.de/downloads/Article_Mayo.pdf

Saturday, October 8, 2011

Playing to the Probability Poem

ISAAC CHATFIELD, COMPOSER, JAZZ GUITARIST (son)

Friday, October 7, 2011

Probability Poetry

I am reminded it is Friday, having just gotten a Skype call from friends back at Elbar; so here’s another little contest. Each of the following statisticians provided useful help on drafts of papers I was writing in response to (Bayesian) critics. One of them, to my surprise, attached the following poem to his remarks:

A toast is due to one who slays

Misguided followers of Bayes,

And in their heart strikes fear and terror,

With probabilities of error!

Without looking this up, guess the author:

1) I.J. Good

2) George Barnard

3) Erich Lehmann

4) Oscar Kempthorne

The first correct guess will receive an amusing picture from "the whaler" sent from Elbar.

(Note: The author wanted me to note that this poem was to be taken in a jocular vein. )

Song Corresponding to Poem

Thursday, October 6, 2011

Blogging the (Strong) Likelihood Principle

I am guilty of not having provided the detailed responses that are owed to the several entries in Christian Robert’s blog on Mayo and Spanos (eds.), ERROR AND INFERENCE: Recent Exchanges on Experimental Reasoning Reliability, and the Objectivity and Rationality of Science (E.R.R.O.R.S.) (2010, CUP). Today, I couldn’t resist writing a (third) follow-up comment having to do with my argument on the (strong) Likelihood Principle, even though I wasn't planning to jump into that issue on this blog just yet. Having been lured to react, and even sketch the argument, I direct interested readers to his blog:

http://xianblog.wordpress.com/

Formaldehyde Hearing: How to Tell the Truth With Statistically Insignificant Results

One of the first examples I came across of problems in construing statistically insignificant (or “negative”) results was a House Science and Technology investigation of an EPA ruling on formaldehyde in the 1980’s. Investigators of the EPA (led by Senator Al Gore!) used rather straightforward, day-to-day reasoning: No evidence of risk is not evidence of no risk. Given the growing interest in science and values both in philosophy and in science and technology studies, I made the “principle” explicit. I thought it was pretty obvious, aside from my Popperian leanings. I’m surprised it’s still an issue.

The case involved the Occupational Safety and Health Administration (OSHA), and possible risks of formaldehyde in the workplace. In 1982, the new EPA assistant administrator, who had come in with Ronald Reagan, “reassessed” the data from the previous administration and, reversing an earlier ruling, announced: “There does not appear to be any relationship, based on the existing data base on humans, between exposure [to formaldehyde] and cancer” (Hearing p. 260).

The trouble was that this assertion was based on epidemiological studies that had little ability to produce a statistically significant result even if there were risks worth worrying about (according to OSHA’s standards of risks of concern, which were not in dispute).[i]The EPA’s assertion that the risks ranged from “0 to no concern” had not passed a very stringent or severe test.

Part 3: Prionvac: How the Reformers Should Have done Their Job

Here’s how the Prionvac appraisal should have ended:

Prionvac: Our experiments yield a statistically significant increase in survival among scrapie-infected mice who are given our new vaccine compared to infected mice who are treated with a placebo (p = .01). The data indicate H: an increased survival rate of 9 months, compared to untreated mice.

Reformer: You are exaggerating what your data show. In fact, there is a fairly high probability, more than .5, that your study would produce a p = .01 difference, even if the actual increased rate of survival were only 1 month! (That is, the power to reject the null and infer H: increase of 9 months, is more than .5.)

Part 2 Prionvac: The Will to Understand Power

As a Nietzschean, I am fond of the statistical notion of power; yet it is often misunderstood by critics of testing. Consider leaders of the reform movement in economics, Ziliac and McCloskey (Michigan, 2009).

In this post, I will adhere precisely to the text, and offer no new interpretation of tests. Type 1 and 2 errors and power are just formal notions with formal definitions. But we need to get them right (especially if we are giving expert advice). You can hate them; just define them correctly please. They write:

“The error of the second kind is the error of accepting the null hypothesis of (say) zero effect when the null is in face false, that is, then (say) such and such a positive effect is true.”

So far so good.

And the power of a test to detect that such and such a positive effect d is true is equal to the probability of rejecting the null hypothesis of (say) zero effect when the null is in fact false, and a positive effect as large as d is present.

Fine.

Let this alternative be abbreviated H’(d):