jsBonfer
Run a program
Table of Contents
Which browsers?
jsBonfer
A fictitious example
Dr. Arbuthnot’s sample
What if more females?
A two-dimensional problem
More than three variables
Regression
A simple Markov chain
A less simple Markov chain
Second thoughts about Dr. Arbuthnot
Grue and bleen and black swans
Bonferroni’s method with possibly-unequal weights
Discreteness of sample space
Run a program
Indebtedness
Bibliography
License, revision date, and e-mail address

Which browsers?

Firefox 2, Safari 3, Netscape 7, Opera 8, and their successors can use this page correctly. Microsoft Internet Explorer 7 is not able to show this page correctly, because it cannot show the big “cup” operator correctly, but it is able to run the programs correctly. JavaScript must be enabled, or else the programs can not run.

To top

jsBonfer

Every statistics teacher warns us against constructing the formula(s) of a hypothesis test by looking at the sample, and all students and all statisticians disobey the teacher. See, for example, Babyak (2004). The students and the statisticians are unknowingly trying to do multiple inference, so I use a Bonferroni correction with infinitely many weights to help them. The present file considers the special case where a sample of numbers or vectors from a discrete sample space is given, and a student or statistician looks at that sample and then chooses a suitable likelihood quotient formula for a likelihood ratio test of a simple null hypothesis and a simple alternative hypothesis. (The quotients will be multiplied together to get the ratio.) If the Bonferroni-corrected ratio of the alternative likelihood to the null likelihood is large, then the null hypothesis will be rejected, using Doob (1953)’s remark that a sequence of likelihood ratios is a nonnegative martingale and using his inequality for nonnegative martingales. The definition of “martingale” is on his page 91. The remark is on his page 93. It is his Example 3. His inequality for nonnegative martingales is on his page 314 as THEOREM 3.2. For an easier-looking proof of the inequality, see pages 235-236 of Feller (1966). He mentions likelihood ratios on his page 211 and defines martingales on his page 210. Another good place to look for the inequality is pages 524-526 of Loève (1960).

The reader is respectfully warned that (1) all of the examples on the present page have very small p-values before correction, (2) the present page does not help at all in finding parameters or models, and (3) a little knowledge of JavaScript (or Java or C or C++ or C#) is needed to use the programs on this page.

All but one of the examples are fictitious. That one is Dr. Arbuthnot’s sample.

The name “jsBonfer” means “JavaScript and Bonferroni.”

To top

A fictitious example

Here is a simple example:
[
[1,3,1,3,1,3,1,1,2,2,1,2,1,3,1,4,3,3,1,1,2,1,2,1,1,2,2,2,1,1,2,1,3,1,1,1,3,1,2,1,1,3,2,4,5,2,2,2,1,2],
"x"
]
The row of 50 numbers is the given sample. These numbers are the values of x. Perhaps the teacher says that the null hypothesis asserts a Poisson distribution with λ equal to 1. Having looked at the sample, the student sees that there are no zeroes, so it does not look like any kind of Poisson. Maybe the alternative probability expression should have a factor of x in it. That would work, because a Poisson with λ equal to 1 must have a mean value of 1, so that multiplying that Poisson by x will make an expression which sums to one, so that the product is a probability expression. Then the quotient formula is "x", where the quote marks are mandatory.

I respectfully invite the reader to select and copy the above array, including all the rows and square brackets, to move the mouse to the upper text area, to click on the “Clear” button if needed, to paste into the upper text area, and to click on the “jsBonfer” button. Four numbers will appear in the second text area. The first is the length of the formula, but I call it formulaLength because length is already in use in JavaScript.

This formulaLength is used to find the second number, the Bonferroni weight, by calculating 1/Math.pow(base,formulaLength) * (1-geometricR)/geometricR * Math.pow(geometricR,formulaLength), where base is 128 and geometricR is 128/129 . This idea I have partly copied from Solomonoff (1960). The idea is that there are 128 formulas having exactly one character, and 128*128 formulas having exactly two characters, and so on. (A formula having exactly zero characters is useless to us.) Then the total Bonferroni weight assigned to formulas having exactly formulaLength characters is (1-geometricR)/geometricR * Math.pow(geometricR,formulaLength), and the reader sees that these totals add up to unity, if formulaLength goes from one to infinity in whole numbers. We are merely summing a geometric series. The reason that I chose geometricR to be 128/129 is that this choice simplifies the Bonferroni weight. It simplifies to 1 / 128 / Math.pow(129,formulaLength) .

The Bonferroni weight is used to correct (divide) the upper bound on the uncorrected p-value, here 1 over the product of the individual likelihood quotients for the individual values of x. The third number is this upper bound on the uncorrected p-value. The fourth number is the upper bound on the Bonferroni-corrected p-value.

The reader will notice that we are not using a fixed value of n, the size of the sample. Since the likelihood ratio is a martingale, we may use optional stopping.

The reader is respectfully warned that JavaScript thinks that integers beginning with a zero digit are in base 8. Non-integers with multiple leading zeroes on the left of the decimal point are not legal. This applies to both data and programs.

The formula between the quote marks must be legal in the JavaScript language, except that methods from the Math class need not have their Math. prefix. I am speaking of sin cos exp log pow sqrt floor round abs and the like. I did this by using the with(Math) of JavaScript. The characters used in the formula must have code numbers between zero and 127 inclusive. Any character out of this domain will cause an alert.

Hypotheses: The null hypothesis asserts that the probability at x is given by the Poisson distribution with λ equal to 1. The alternative hypothesis asserts that the probability at x is x multiplied by the probability given by that Poisson distribution.

To top

Dr. Arbuthnot’s sample

All students of statistical inference know about “ An argument for Divine Providence, taken from the constant Regularity observ’d in the Births of both Sexes.” By Dr. John Arbuthnott, Physitian in Ordinary to Her Majesty, and Fellow of the College of Physitians and the Royal Society. From Phil. Trans. (1710) 27, 186-90. Dr. Arbuthnot looked at all the birth records of the city of London, 82 years’ worth, and he found that in every year there were more live male births than live female births. He was greatly astonished, and he used the binomial distribution to calculate the probability of such a thing in the sample if male and female had equal probabilities in the population. He said that the answer was the 82nd power of one half.

However, he looked at the sample before he decided to see how many years had more live male births. What if, instead, there had been more females for each year? Then he would have used a different hypothesis test, would he not? Let us be modern now. I use 1 to represent a year with more male, and 0 to represent a year with more female. The null hypothesis says that the probability at 0 is 1/2 and the probability at 1 is 1/2. The alternative hypothesis says that the probability at 0 is 0 and the probability at 1 is 1. The quotient will be 0 at 0 and 2 at 1. That is, the quotient is "2*x". The quote marks in "2*x" are required.

[
[1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1],
"2*x"
]
The p-value for Dr. Arbuthnot’s sample is still astonishing, but much less astonishing than he thought. We are not using his binomial distribution, but we are now permitted to do optional stopping.

The user might get the impression that the power, or the efficiency, or something, has gone down to nearly zero. Maybe things are not that bad. Let us go back 29 years into the past:

var nn=82-29;
var temp=[];
for( var jj=0;jj<nn;jj++ )temp[jj]=1;
[
temp,
"2*x"
]
The user is respectfully invited to select and copy this program, and so on. The Bonferroni corrected p-value for 82 years is a little smaller than the uncorrected p-value for 82-29 years.

Hypotheses: The null hypothesis asserts that the probabilities at 1 and at 0 are 1/2 and 1/2. The alternative hypothesis asserts that the probabilities at 1 and at 0 are respectively 1 and 0.

To top

What if more females?

What if Dr. Arbuthnot had found more female live births for each of the 82 years? Then he would have put 82 zeroes in the x row. He would say that his alternative hypothesis placed probability 1 at 0 and probability 0 at 1, but his null hypothesis would as before place 1/2 at 0 and 1/2 at 1. His quotient would then be 2 at 0 and 0 at 1. This is "2-x*2". So we use the array
[
[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
"2-x*2"
]
The user is respectfully invited as before. The Bonferroni-corrected p-value for 82 female years is much bigger than that for 82 male years, but it is still impressively small.

Hypotheses: The null hypothesis asserts that the probabilities at 1 and at 0 are 1/2 and 1/2. The alternative hypothesis asserts that the probabilities at 1 and at 0 are respectively 0 and 1.

To top

A two-dimensional problem

The previous samples had only one dimension. Here is a fictitious sample in two dimensions. The first row of numbers is the x values of the points, and the second row is the y values.
[
[144,96,54,47,51,240,42,32,215,136,224,80,176,208,31,224,144,158,240,89,29,29,48,160,116,160,96,224,192,43,240,96,74,227,240,128,48,192,162,123,192,115,80,179,107,22,240,28,73,128,191,15,57,95,58,240,96,176,112,176,224,224,192,240,85,89,41,32,176,141],
[2,47,208,208,176,136,176,240,224,192,89,15,46,150,160,219,12,192,92,176,80,192,40,35,192,73,18,32,156,48,220,66,128,240,50,40,22,116,224,128,38,128,57,208,160,128,49,128,144,41,208,48,160,96,160,101,68,101,17,107,106,179,98,65,208,128,224,0,28,176],
"max(x,y)%16==0?16:0"
]
The values in the x row and the y row seem to lie on the integers between 0 and 255 inclusive, but the (x,y) points cannot be uniformly distributed in a square, as one might at first have thought, because the larger of x and y is always divisible by 16. The number of points having this property is less than 1/16 of the total number of points. The reader is respectfully invited to select and copy, and so on, as before.

Hypotheses: The null hypothesis asserts uniformity over the integers in a square with edge size 256, where the x and y are integers between 0 and 255 inclusive. The alternative hypothesis asserts uniformity over only those points of the square whose larger co-ordinate is divisible by 16. That is, the alternative probability on such a point is more than 16 times the null probability on such a point.

To top

More than three variables

The sample of this section is fictitious, but it is inspired by a shocking demonstration that I saw in the Web. The reader is respectfully invited to click on http://www.cs.pitt.edu/~kirk/cs1501/animations/Random.html and run the “3D” demonstration. Tilting the 3D graph with the mouse will show the Marsaglia planes. Returning to my fictitious sample, here are four rows each containing 85 whole numbers from 0 to 8 inclusive.
variables=[ "p","q","r","s" ];
[
[3,6,4,6,1,0,3,2,0,8,4,0,2,7,2,6,5,8,0,5,7,6,2,1,2,4,2,5,4,5,7,4,0,2,8,4,3,5,6,0,8,7,1,1,6,2,2,5,4,5,5,4,7,1,6,4,2,2,8,8,1,2,6,3,1,8,1,6,8,4,3,4,5,8,4,3,7,4,4,1,5,2,6,1,0],
[1,5,6,3,0,2,3,4,6,6,7,3,7,3,5,0,5,8,0,5,8,3,4,7,8,0,6,3,3,3,6,7,2,7,8,5,2,0,2,2,8,0,7,6,3,4,1,2,0,8,6,4,6,0,4,2,7,6,2,3,0,3,8,4,8,1,1,6,2,7,4,6,7,6,5,8,4,8,6,2,1,0,6,3,8],
[8,7,3,6,3,1,0,1,0,0,3,7,6,3,3,3,7,0,4,6,3,5,8,2,2,2,5,6,6,2,3,4,2,5,6,1,3,4,5,8,2,0,4,3,3,6,3,7,3,7,4,1,3,4,4,8,3,4,4,2,3,5,3,1,2,0,4,6,2,8,4,5,7,3,7,2,4,6,4,8,8,4,6,7,1],
[6,0,5,3,5,6,3,2,3,4,4,8,3,5,8,0,1,2,5,2,0,4,4,8,6,3,5,4,5,8,2,3,5,4,5,8,1,0,5,8,0,2,6,8,6,6,3,4,2,7,3,0,2,4,4,4,6,6,4,5,5,8,1,1,7,0,3,0,6,8,7,3,8,1,2,5,3,0,4,7,4,3,0,7,0],
"(p+q+r+s)%9==0?9:0"
]
Could one think that the 85 points are uniformly distributed in a four-dimensional hypercubical lattice? After all, the digits seem to be nearly uniformly distributed in each row, and each pair of rows seems to show nearly zero correlation. However, let no one be deceived: each vertical sum of four digits is exactly divisible by nine. That is, only one-ninth of the available points are in use, so their probability is nine times what one might have thought, and the probability of the other points is zero.

The new difficulty here is that the English alphabet has no letter after "z", so I have provided no variable for the fourth row. The user will instead choose her/his own variable names, each exactly one letter of course, and tell the program about them. The letters chosen here are "p", "q", "r", and "s". The global called variables is loaded with an array containing these names by a JavaScript statement preceding the array of data. Do please remember to use a semicolon at the end of the statement.

Hypotheses: The null hypothesis asserts that the probabilities for all the points in the hypercubical lattice have the same value. The alternative hypothesis asserts that points whose coordinates sum to an exact multiple of 9 have all the probability, nine times as much as they would in the null hypothesis, and the other points have zero probability.

To top

Regression

The reader who plots the first and second rows of the following (fictitious) array will see that the curve appears to be parabolic. Indeed, it appears to be y=4*x*(1-x/1e3).
[
[129,125,561,103,743,325,892,492,529,686,552,241,574,499,247,830,719,291,610,541,232,38,709,588,369,155,172,492,105,2,740,960,468,321,61,148,42,917,986,552,175,853,442,665,382,648,223,232,205,264,651,165,549,156,134],
[449,437,985,369,763,877,385,999,996,861,989,731,978,999,743,564,808,825,951,993,712,146,825,969,931,523,569,999,375,7,769,153,995,871,229,504,160,304,55,989,577,501,986,891,944,912,693,712,651,777,908,551,990,526,464],
"abs(4*x*(1-x/1e3)-y)<=1?1e3/3:0"
]
Let the alternative hypothesis assert that the conditional likelihood for y given x is uniform on the integers between (4*x*(1-x/1e3)-1 and (4*x*(1-x/1e3)+1 inclusive. (There may be either two or three such integers, and three makes the more conservative test.) Let the null hypothesis assert that the conditional likelihood for y given x is uniform on the integers between zero and 999 inclusive. The jsBonfer method will not know that we are doing conditional quotient instead of unconditional quotient. The reader is respectfully invited to change to different quoted formulas to see what happens. It is legal to pick the formula most favorable to us, because we are doing a multiple inference with the help of the Bonferroni correction.

Hypotheses: The null hypothesis asserts that the conditional likelihood for y given x is uniform on the integers between zero and 999, inclusive. The alternative hypothesis asserts that the conditional likelihood for y given x is uniform on the integers within 1 unit of 4*x*(1-x/1e3), inclusive.

To top

A simple Markov chain

Every Markov chain problem can be converted into a regression problem by using the data more than once. Here is a fictitious sample: 175, 170, 145, 20, 103, 164, 115, 47, 61, 131, 127, 107, 7, 38, 16, 83, 64, 146, 25, 128, 112, 32, 163, 110, 22, 113, 37, 11, 58, 116, 52, 86, 79, 44, 46, 56, 106, 2, 13, 68, and 166. After some trial and error this is seen to be y==(5*x+3)%177 where x is the previous value and y is the present value. This is exactly true, without noise, for the numbers in the sample. To make the array, first use all the numbers except the rightmost, and then use all the numbers except the leftmost:
[
[175,170,145,20,103,164,115,47,61,131,127,107,7,38,16,83,64,146,25,128,112,32,163,110,22,113,37,11,58,116,52,86,79,44,46,56,106,2,13,68],
[170,145,20,103,164,115,47,61,131,127,107,7,38,16,83,64,146,25,128,112,32,163,110,22,113,37,11,58,116,52,86,79,44,46,56,106,2,13,68,166],
"y==(5*x+3)%177?177:0"
]

Hypotheses: The null hypothesis asserts that the numbers are distributed uniformly on the discrete domain from zero to 176, inclusive. The alternative hypothesis asserts that given x we must have y==(5*x+3)%177, and the probability of any other value of y must be zero.

To top

A less simple Markov chain

Some people might say that the preceding example is unrealistic. It is too simple and has no noise. Here is a (fictitious) less simple, more noisy chain: 717, 521, 715, 522, 713, 523, 711, 524, 709, 525, 707, 526, 705, 527, 703, 528, 701, 529, 699, 530, 697, 531, 695, 532, 693, 533, 691, 534, 689, 535, 687, 536, 685, 537, 683, 538, 681, 539, 679, 540, 677, 541, 675, 542, 673, 543, 671, 544, 669, 545, 667, and 546. This requires three rows instead of only two. The top row lacks two numbers on the right. The middle row lacks one number on the left and one on the right. The third row lacks two numbers on the left. These rows are, as usual, called x, y, and z. The z row is the present. The y row is the time before the present. The x row is the time before the time before the present. It seems that nearly z=.99*x+.01*y. Maybe we ought to allow one unit of error. Here is the array to use:
[
[717,521,715,522,713,523,711,524,709,525,707,526,705,527,703,528,701,529,699,530,697,531,695,532,693,533,691,534,689,535,687,536,685,537,683,538,681,539,679,540,677,541,675,542,673,543,671,544,669,545],
[521,715,522,713,523,711,524,709,525,707,526,705,527,703,528,701,529,699,530,697,531,695,532,693,533,691,534,689,535,687,536,685,537,683,538,681,539,679,540,677,541,675,542,673,543,671,544,669,545,667],
[715,522,713,523,711,524,709,525,707,526,705,527,703,528,701,529,699,530,697,531,695,532,693,533,691,534,689,535,687,536,685,537,683,538,681,539,679,540,677,541,675,542,673,543,671,544,669,545,667,546],
"abs(.99*x+.01*y-z)<=1?1e3/3:0"
]

Hypotheses: The null hypothesis asserts that the conditional distribution of z given y and x is uniform on the integers from zero to 999. The alternative hypothesis asserts that the conditional distribution of z given y and x is uniform on the integers not farther from .99*x+.01*y than 1 unit.

To top

Second thoughts about Dr. Arbuthnot

It occurs to me now that Dr. Arbuthnot did some of the statistical calculation in his head, namely, finding out for each year whether there were more male births or more female births. Really we ought to account for all the calculation. Here is an array showing for each column the year, the number of male births, and the number of female births.
[
[1629, 1630, 1631, 1632, 1633, 1634, 1635, 1636, 1637, 1638, 1639, 1640, 1641, 1642, 1643, 1644, 1645, 1646, 1647, 1648, 1649, 1650, 1651, 1652, 1653, 1654, 1655, 1656, 1657, 1658, 1659, 1660, 1661, 1662, 1663, 1664, 1665, 1666, 1667, 1668, 1669, 1670, 1671, 1672, 1673, 1674, 1675, 1676, 1677, 1678, 1679, 1680, 1681, 1682, 1683, 1684, 1685, 1686, 1687, 1688, 1689, 1690, 1691, 1692, 1693, 1694, 1695, 1696, 1697, 1698, 1699, 1700, 1701, 1702, 1703, 1704, 1705, 1706, 1707, 1708, 1709, 1710],
[5218, 4858, 4422, 4994, 5158, 5035, 5106, 4917, 4703, 5359, 5366, 5518, 5470, 5460, 4793, 4107, 4047, 3768, 3796, 3363, 3079, 2890, 3231, 3220, 3196, 3441, 3655, 3668, 3396, 3157, 3209, 3724, 4748, 5216, 5411, 6041, 5114, 4678, 5616, 6073, 6506, 6278, 6449, 6443, 6073, 6113, 6058, 6552, 6423, 6568, 6247, 6548, 6822, 6909, 7577, 7575, 7484, 7575, 7737, 7487, 7601, 7909, 7662, 7602, 7676, 6985, 7263, 7632, 8062, 8426, 7911, 7578, 8102, 8031, 7765, 6113, 8366, 7952, 8379, 8239, 7840, 7640],
[4683, 4457, 4102, 4590, 4839, 4820, 4928, 4605, 4457, 4952, 4784, 5332, 5200, 4910, 4617, 3997, 3919, 3395, 3536, 3181, 2746, 2722, 2840, 2908, 2959, 3179, 3349, 3382, 3289, 3013, 2781, 3247, 4107, 4823, 4881, 5681, 4858, 4319, 5322, 5560, 5829, 5719, 6061, 6120, 5822, 5738, 5717, 5847, 6203, 6033, 6041, 6299, 6533, 6744, 7158, 7127, 7246, 7119, 7214, 7101, 7167, 7302, 7392, 7316, 7483, 6647, 6713, 7229, 7767, 7626, 7452, 7061, 7514, 7656, 7683, 5738, 7779, 7417, 7687, 7623, 7380, 7288],
"y>z?2:0"
]
The reader sees that the formula, "y>z?2:0", is somewhat longer than what was used before for Dr. Arbuthnot’s data, but the null hypothesis is still rejected.

To top

Grue and bleen and black swans

Let no reader suppose that Dr. Arbuthnot’s hypothesis test permits him to predict that there will be more males than females born in the year 1711. The difficulty is that the years used in his hypothesis test were not chosen at random from the whole population of possible years. Therefore his inference applies only to the years he used, and to no others. For a theoretical attack against the idea of prediction, see Goodman (2006). For a practical attack against the practice of prediction, see Taleb (2007). Dr. Arbuthnot’s data form a Markov chain of order zero. All Markov chains have the same difficulty as his.

To top

Bonferroni’s method with possibly-unequal weights

Experienced statisticians know this proof, but beginners might not. I am certainly not its inventor, but I do not know who is.

Let α be a real number strictly between 0 and 1. For some integers j let real numbers wj be strictly positive, and let Σj( wj ) ≤ 1. For the same integers j let real numbers pj be p-values of hypothesis tests. Then

Prob( ⋃j( pj/wj < α ) )
= Prob( ⋃j( pj < α wj ) )
≤ Σj( Prob( pj < α wj ) )
≤ Σj( α wj )
= α Σj( wj )
α


To top

Discreteness of sample space

To some readers it may not be clear at first why the sample space must be discrete. (To me it was not clear at first.) To see what can go wrong, let the sample space be the open interval from zero to unity. Let the null hypothesis assert uniformity of probability DENSITY on that interval. Let the sample contain only the number .6, so the sample size is one. Let the alternative hypothesis assert that the probability DENSITY is uniform on the open interval from .6-.5e-300 to .6+.5e-300. Then the array to use is
[
[.6],
"abs(x-.6)<.5e-300?1e300:0"
]
I respectfully invite the reader to select and copy the above array, including all the rows and square brackets, to move the mouse to the upper text area, to click on the “Clear” button if needed, to paste into the upper text area, and to click on the “jsBonfer” button. The results will be
formulaLength=25   Bonferroni weight=1.3429111288008964e-55
uncorrected p<=1e-300   Bonferroni p<=7.446509143854616e-246
That Bonferroni-corrected p-value is of course ridiculous.

To top

Run a program

While I was writing this page, I needed to run many little programs, and I built the “Run a program” button to run them. After I finished, I left the button for users wishing to practice their JavaScript. For example, to find the ratio of 2 and 3, just please select and copy
2/3
and move the mouse to the upper text area, and click on the “Clear” button, and paste into the text area, and click on the “Run a program” button. The answer will be printed in the second text area.

To add up the whole numbers from 17 to 53 inclusive, please select and copy

var sum=0;
for( var jj=17;jj<=53;jj++ )sum+=jj;
sum;
and move the mouse to the upper text area as before, and click on the “Clear” button, and paste into the text area, and click on the “Run a program” button. The answer will be printed in the second text area, as before.

Of course, the user’s own programs can be typed directly into the upper text area. Users who have not programmed in JavaScript before are warned that it has the third worst diagnostics in all computing.

To top

Indebtedness

I am indebted to a paper by Rissanen (1983) for telling me about the idea of description length and for showing the need to have a fixed number of fractional digits in the binary representation of each datum. I have simplified this to the need to have discreteness of the sample space.

I am indebted to the book by Miller (1966) for my knowledge of the Bonferroni method.

To top

Bibliography

Arbuthnot, John (1710), “An argument for Divine Providence, taken from the constant Regularity observ'd in the Births of both Sexes,” Philosophical Transactions of the Royal Society, Volume 27, pages 186-90. This is also on the Web at http://www.taieb.net/auteurs/Arbuthnot/arbuth.html

Babyak, Michael A. (2004), “What You See May Not Be What You Get: A Brief, Nontechnical Introduction to Overfitting in Regression-Type Models,” Psychosomatic Medicine, Volume 66, pages 411-421. This is also on the Web at http://www.psychosomaticmedicine.org/cgi/content/full/66/3/411

Doob, J. L., Stochastic Processes, John Wiley & Sons, Inc. New York, London, Sydney. 1953.

Feller, William, An Introduction to Probability Theory and Its Applications, Volume II, John Wiley & Sons, Inc., New York, 1966.

Goodman, Nelson, Fact, Fiction, and Forecast, Fourth Edition, Harvard University Press, 2006. For just grue and bleen see http://www-math.mit.edu/~tchow/grue.html

Loève, Michel, Probability Theory, second edition, D. van Nostrand Company, Inc., Princeton, New Jersey, 1960.

Miller, Rupert G., Jr., Simultaneous Statistical Inference, McGraw-Hill Book Company, New York, San Francisco, St. Louis, London, Toronto, Sydney, 1966.

Rissanen, J. (1983), “A Universal Prior for Integers and Estimation by Minimum Description Length,” Annals of Statistics, Volume 11, Number 2, pages 416-431. This is also on the Web at http://projecteuclid.org/euclid.aos/1176346150

Solomonoff, R. J. (1960), “A Preliminary Report on a General Theory of Inductive Inference.” This is on the Web at http://world.std.com/~rjs/z138.pdf

Taleb, Nassim Nicholas, The Black Swan: The Impact of the Highly Improbable, Random House, 2007.



To top

License, revision date, and e-mail address

The data of Dr. Arbuthnot are perhaps copyrighted by the Philosophical Transactions, or perhaps by Elisabeth Millet, the transcriber. The remainder of the present file is in the public domain. It is revised 31 May 2008. Constructive and destructive remarks come to me, Harold Kaplan,
       at        dot
smtw2gh  toadmail   com


To top

Harold Kaplan’s statistics.htm

John C. Pezzullo’s page