The Appendix of the "Conversation" (posted yesterday) is an attempt to quickly sketch the SLP argument, and its sins. Couple of notes: Firstly, I am a philosopher (of science and statistics) not a statistician. That means, my treatment will show all of the typical (and perhaps annoying) signs of being a trained philosopher-logician. I’ve no doubt statisticians would want to use different language, which is welcome. Second, this is just a blog (although perhaps my published version is still too informal for some).
But Birnbaum’s idea for comparing evidence across different methodologies is also an informal notion! He abbreviates by Ev(E, x): the inference, conclusion or evidence report about the parameter μ arising from experiment E and result x, according to the methodology being applied.
So, for sampling theory (I prefer "error statistics", but no matter), the report might be a p-value (it could also be a confidence interval with its confidence coefficent, etc).
The strong LP is a general conditional claim:
(SLP): For any two experiments E’ and E” with different probability models but with the same unknown parameter μ, and x’ and x” data from E’ and E” respectively, where the likelihoods of x’ and x” are proportional to each other, then x’ and x” ought to have the identical evidential import for any inference concerning parameter ì.
For instance, E’ and E” might be Binomial sampling with n fixed, and Negative Binomial sampling, respectively. There are pairs of outcomes from E’ and E” that could serve in STP violations. For a more extreme example, E’ might be sampling from a Normal distribution with a fixed sample size n, and E” might be the corresponding experiment that uses an optional stopping rule: keep sampling until you obtain a result 2 standard deviations away from a null hypothesis.
Suppose we are testing the null hypothesis that μ = 0 (and for simplicity, a known standard deviation).
The SLP tells us (in relation to the optional stopping rule) that once you have observed a 2-standard deviation result, there ought to be no evidential difference between its having arisen from experiment E’, where n was fixed at 100, and experiment E” where the stopping rule happens to stop at n = 100 (i.e., it just happens that a 2-standard deviation result was observed after n= 100 trials).
The key point is that there is a difference in the corresponding p-‐values from E’ and E”, which we may write as p’ and p”, respectively. While p’ would be ~.05, p” would be much larger, perhaps ~ .3 (the numbers do not matter). The error probability accumulates because of the optional stopping.
Clearly p’ is not equal to p”, so the two outcomes are not evidentially equivalent for a frequentist. This constitutes a violation of the strong LP (which of course is just what is proper for a frequentist).
Unless a violation of the SLP is understood, it will be impossible to understand the issue about the Birnbaum argument. Some people are forgetting that for a "sampling theory" person, evidential import must always consider the sampling distribution. This sounds awfully redundant, and it is, but given what I'm reading on some blogs, it bears repeating.
One excellent feature of Kadane's book is that he is very clear in remarking how frequentists violate the SLP.
I should note that Birnbaum himself rejected the SLP.
The SLP is a conditional (if-then claim) that makes a general assertion, about any x', x" that satisfy the conditions in the antecedent. Therefore, it is false so long as there is any case where the antecedent holds and the consequent does not. Any STP violation takes this form.
(SLP Violation): Any case of two experiments E’ and E” with different probability models but with the same unknown parameter μ, where
- x’ and x” are results from E’ and E” respectively,
- likelihoods of x’ and x” are proportional to each other
- AND YET x’ and x” have different evidential import (i.e., Ev(E',x') is not equal to Ev(E", x"))
I’ll wait a bit to continue with this. I am traveling around different countries, so blog posts may be irratic, (with possible errors you'll point out).
(Made it to Zurich and rented car to Konstanz)
But Birnbaum’s idea for comparing evidence across different methodologies is also an informal notion! He abbreviates by Ev(E, x): the inference, conclusion or evidence report about the parameter μ arising from experiment E and result x, according to the methodology being applied.
So, for sampling theory (I prefer "error statistics", but no matter), the report might be a p-value (it could also be a confidence interval with its confidence coefficent, etc).
The strong LP is a general conditional claim:
(SLP): For any two experiments E’ and E” with different probability models but with the same unknown parameter μ, and x’ and x” data from E’ and E” respectively, where the likelihoods of x’ and x” are proportional to each other, then x’ and x” ought to have the identical evidential import for any inference concerning parameter ì.
For instance, E’ and E” might be Binomial sampling with n fixed, and Negative Binomial sampling, respectively. There are pairs of outcomes from E’ and E” that could serve in STP violations. For a more extreme example, E’ might be sampling from a Normal distribution with a fixed sample size n, and E” might be the corresponding experiment that uses an optional stopping rule: keep sampling until you obtain a result 2 standard deviations away from a null hypothesis.
Suppose we are testing the null hypothesis that μ = 0 (and for simplicity, a known standard deviation).
The SLP tells us (in relation to the optional stopping rule) that once you have observed a 2-standard deviation result, there ought to be no evidential difference between its having arisen from experiment E’, where n was fixed at 100, and experiment E” where the stopping rule happens to stop at n = 100 (i.e., it just happens that a 2-standard deviation result was observed after n= 100 trials).
The key point is that there is a difference in the corresponding p-‐values from E’ and E”, which we may write as p’ and p”, respectively. While p’ would be ~.05, p” would be much larger, perhaps ~ .3 (the numbers do not matter). The error probability accumulates because of the optional stopping.
Clearly p’ is not equal to p”, so the two outcomes are not evidentially equivalent for a frequentist. This constitutes a violation of the strong LP (which of course is just what is proper for a frequentist).
Unless a violation of the SLP is understood, it will be impossible to understand the issue about the Birnbaum argument. Some people are forgetting that for a "sampling theory" person, evidential import must always consider the sampling distribution. This sounds awfully redundant, and it is, but given what I'm reading on some blogs, it bears repeating.
One excellent feature of Kadane's book is that he is very clear in remarking how frequentists violate the SLP.
I should note that Birnbaum himself rejected the SLP.
The SLP is a conditional (if-then claim) that makes a general assertion, about any x', x" that satisfy the conditions in the antecedent. Therefore, it is false so long as there is any case where the antecedent holds and the consequent does not. Any STP violation takes this form.
(SLP Violation): Any case of two experiments E’ and E” with different probability models but with the same unknown parameter μ, where
- x’ and x” are results from E’ and E” respectively,
- likelihoods of x’ and x” are proportional to each other
- AND YET x’ and x” have different evidential import (i.e., Ev(E',x') is not equal to Ev(E", x"))
I’ll wait a bit to continue with this. I am traveling around different countries, so blog posts may be irratic, (with possible errors you'll point out).
(Made it to Zurich and rented car to Konstanz)
No comments:
Post a Comment
===========================