The program of this page is suggested by Sir Ronald Fisher’s book, The Design of Experiments, Hafner Pub. Co., New York, 1966. Any mistakes in the program are mine, not Fisher’s. The test is exact.
In the following paragraphs are some examples showing use of the program. The user is respectfully invited to try out these examples or to use any others. The only thing to remember is: follow the grammatical rules of JavaScript. (This is because the “eval” method of JavaScript is used in picking up the data from the text area.) Also, the user is respectfully reminded that integers beginning with a zero digit will be understood to be in base eight. To top.
Which browsers?
Modern browsers such as Safari 3, Microsoft Internet Explorer 6, Netscape 7, and Opera 8 can work this page correctly. Netscape 4 is out of date and cannot work this page correctly. To top.
Symmetry around zero.
The following array is for Darwin’s experiment using matched pairs of plants to compare self-fertilized and cross-fertilized.
[ 49, -67, 8, 16, 6, 23, 28, 41, 14, 29, 56, 24, 75, 60, -48 ]Each number is a difference: cross minus self. Fisher explained that by null hypothesis these differences are distributed symmetrically around zero. The user is respectfully invited to select and copy the array, click on the “Clear” button to clear the upper text area, paste the array into that text area, and click on the “Test for symmetry around zero” button. The answers will appear in the lower text area. Doubtless the user already knows what a “p value” is. For an explanation and example of “normalized negated log p” in Bayesian meta-analysis, I respectfully invite the user to click on martMean.htm#Discrete.
The “harmonic” is the harmonic mean of the left p-value and the right p-value. It is too small to be a frequentist p-value, but its normalized negated logarithm works correctly in Bayesian meta-analysis.
Fisher’s test for Darwin’s data is non-parametric, and all it does is use convolution. There are no assumptions of normality, no ranking. However, the numbers in the array must be integers. The program is speedy for Darwin’s data, because his data have only one or two digits to each number and only 15 numbers in the array. For data with many digits to each number and 15 numbers, the required time can be several seconds. For more many-digit numbers the time can be prohibitive. For only one or two digits to each number the user can have many more numbers in the array. To top.
The rankAbs function.
It is a bit tedious not being allowed to use nonintegers in the array. Also, Fisher’s algorithm takes a long time to run. Furthermore, big outliers in the array can bring down the power. Wilcoxon’s signed ranks test is designed to overcome these three difficulties. The user need not do the ranking by human effort, because I furnish a function to do it. It is called “rankAbs.” It can handle ties correctly, just as the Fisher algorithm can. Here is how to use it on Darwin’s sample:
rankAbs( [ 49, -67, 8, 16, 6, 23, 28, 41, 14, 29, 56, 24, 75, 60, -48 ] )As before, the user is respectfully invited to select and copy that line, click on the “Clear” button to clear the upper text area, paste the line into that text area, and click on the “Test for symmetry around zero” button. The p-values are slightly different, and so are the times.
Here is a fictitious example shown here without the rankAbs function:
[ 1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, -4096 ]
and with the rankAbs function:
rankAbs( [ 1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, -4096 ] )The p-values are entirely different, and so are the times.
Now let’s have some fractions and ties:
rankAbs( [ 0.3, 1/9, 3.3, 3.3, -3.3, 48.909, 4.89, .0321, 22/7 ] )The user is respectfully invited. To top.
Let the null hypothesis assert that if we take a value at random from the first population and a value at random from the second population, then the probability that the second is greater than the first is 1/2, and the probability that the second is less than the first is 1/2, and ties cannot occur. This is a much different hypothesis from identity of treatments. Let us suppose that we have sampled some values from the two populations.
If we have just as many values from the first population as from the second population, then we can randomly match them into pairs, each pair having one value from each of the two populations. Let us then score +1 when the second value is greater than the first value, and -1 when the second is less than the first. By null hypothesis they cannot tie. The famous sign test can be used. There is nothing new about this.
If instead we have more values from the second population than from the first, things are more complicated. Some of the values from the first population must have two partners instead of one. When that happens, let the value from the first population be called X, and let the two values from the second population be called Y and Z. Then there are four possibilities for the orders of the values. Here is a table showing them and their probabilities:
| Y<X | Y>X | ||
| Z<X | a | b | 1/2 |
| Z>X | c | d | 1/2 |
| 1/2 | 1/2 |
The scoring should, of course, be done by a computer, not by a human. For example, let the values in the first sample be 1.2, 4.3, 5.7, 0.9, and 3.2 . Let the values in the second sample be 5.3, 7.2, 8.4, 6.7, 11.5, 9.8, and 5.9 . (There is not any need for these values to be integers.) Also, we need an arbitrary nonnegative six-digit integer to be the seed of the randomness for the random matching, so let us use 132743 . (If the user’s seed is negative or noninteger, it will be silently changed.) I furnish a JavaScript function called “twoSample” to convert these into the kind of array needed by the program of this file. Here is what to put into the upper text area:
twoSample( [1.2, 4.3, 5.7, 0.9, 3.2], [5.3, 7.2, 8.4, 6.7, 11.5, 9.8, 5.9], 132743 )The user is respectfully invited. And, yes, twoSample can correctly handle the case where the first sample has more values than the second sample, instead of fewer. To top.
twoSample( [1.2, 4.3, 5.7, 0.9, 3.2], [5.3, 7.2, 8.4, 6.7, 11.5, 9.8, 5.9, 1e6, 5.8, 13.0, 10.7, 10.6, 8.1], 132743 )The user is respectfully invited. To top.
1, 2, 4, 3, 5, 6, 9,with measurements made after the middle time:
8, 8, 10, 9, 6, 8, 5
in such a way that each pair has the same time difference. For example, we pair 1 with 8, 2 with 8, 4 with 10 and so on. Then we subtract measurements in the same pair, greater time minus smaller time. (Cox and Stuart use only the sign of the difference.) “We” means the computer, of course; a function called “trend” is called to do all the pairing and subtracting. Here is the call:
trend( [1, 2, 4, 3, 5, 6, 9, 8, 8, 10, 9, 6, 8, 5] )The user is respectfully invited. If the measurements are not integers, or if some appear to be outliers, then a function called “rank” can be used:
trend( rank( [1, 2, 4, 3, 5, 6, 9, 8, 8, 10, 9, 6, 8, 5] ) )To top.
var x=[]; for(var j=0;j<10;j++)x[j]=j; x;The user is respectfully invited. To top.
at dot
smtw2gh gmail com
Harold Kaplan’s statistics.htm