This browser cannot do Java, or Java is not installed in this computer, or else Java is turned off in this browser.

Table of contents

Introduction and the goConfide button
The goPairs button
Rules to keep in mind
Which browsers?
Download
The idea and the algorithm
Advantages and disadvantages
A note to developers
Revision date, licenses, and e-mail address

Introduction and the goConfide button

If the Java applet components seem to be on crooked, please set the “zoom” to 100% or reset the zoom to zero, depending on the browser. The zoom control is commonly in the dropdown “view” menu.

If the user’s browser can work with Java™, and if Java has been downloaded and installed, and if Java is turned on for this browser, then the user will see a text area, which I will call the upper text area, then a button named “goConfide,” then a button named “goPairs,” and then another text area, which I will call the lower text area. The present page is for working k-sample tests in statistics. Here are four fictitious samples, one to a line:

-.01 .02 .1 .2 .1 .2 .1 .22 .11
.25 .3 .4 .3 .4 .3 .4 .5
.6 .7 .6 .7 .6 .7 .6 .7 .6 .7 8.8 99.7
8.9 102.03 105 104 103 106.7 110.05
I respectfully suggest that the user select those four lines with the mouse, copy them to the clipboard, move the mouse to the upper text area, paste, and click on the “goConfide” button. In less than a second the program will print
	There are 4 samples, each on its own line.

	The (two-sided) p-value is 

	0.015625
in the lower text area. If the user has, say, 5% in mind for an α value, then the null hypothesis is rejected. That is, it seems that no number can be a median of all the four populations from which the four samples were taken.

Numbers on the same line of the upper text area belong to the same sample. Numbers on the same line may be separated by one or more spaces or tabs or any mixture of spaces and tabs. Blank lines are permitted, and they have no meaning.
to Top

The goPairs button

It is all very well to say that one population is different, but the researcher needs to know which populations are different from which other populations. That is what the other button, the “goPairs” button is for. I respectfully invite the reader to click with the mouse on the “goPairs” button. (I am assuming that the upper text area is unchanged from the previous section.) Then the lower text area will change to
	a		b		p_value (two-sided)

	0		2		0.015625
	0		1		0.03125
	1		2		0.03125
	0		3		0.0625
	1		3		0.0625
	2		3		0.0625
The six p-values are the result of a multiple inference. The p-value for comparing the zeroth and second samples is 0.015625, and the p-value for comparing the zeroth and first samples is 0.03125, and so on. The table is printed in order of p-values with the smaller at the top and the larger at the bottom. It is an odd thing that the smallest p-value is the same as the p-value found in the previous section. This always happens to me in my numerical trials of the programs, and I do not know why, and I cannot find a proof or a counterexample.
to Top

Rules to keep in mind

The numbers in the upper text area may have decimal points and exponents, and they may be negative. They must satisfy the usual format rules of the Java language. Mistakes in number format will be diagnosed in the lower text area.
to Top

Which browsers?

Any famous modern browser will work correctly with this page, but only if its “zoom” is set exactly on 100% or reset to zero, depending on the browser. The zoom control is commonly in the dropdown “view” menu.

However, the program can work only if Java 1.5 or higher is installed, and only if Java is turned on. If there is no Java or if Java has been turned off, then the text areas and the “goConfide” button and the “goPairs” button will not even be visible. In that case, the user is respectfully requested to download and install Java and/or to turn Java back on. Users who have trouble doing this are respectfully asked to get help from their classmates, children, spouses, or teachers.
to Top

Download

A reader or user wishing to download the files, including the source files, of this applet is respectfully invited to click on KSampleTest.jar to save the “jar” file. All the other KSampleTest files are zipped into it, so it can be unzipped after download to see and change them all. Yes, a “jar” file is merely a special kind of “zip” file.

If the browser’s downloader renames the “jar” file to a “zip” file or a “txt” file, please be sure to rename it back to a “jar”. I know some browsers which take too much on themselves in this way.
to Top

The idea and the algorithm

The idea of the present inference is simple. Having the sign test in mind, for each sample we build a two-sided confidence interval for the median of its population distribution. (This is of course a multiple inference, so the confidence of each interval must be adjusted for the multiplicity.) If the intersection of the intervals is empty, then we reject the null hypothesis. I do not claim to have invented this idea, but I do not know to whom to give the credit.

The algorithm is less simple. We desire a p-value. We begin by forming Pascal’s triangle. Then we take partial sums in each row. Then we normalize each row, so that the right-most number is unity. That is, each row is a list of p-values. Then we multiply every number in every row by 2*k, where k is the number of samples. This is the Bonferroni adjustment, because there are k confidence intervals, each two-sided.

Next put all the numbers of all the samples together in a one-dimensional array of double, with an integer in an array of int to show which sample each number came from. Then “sort” the numbers in the array of double without actually moving them, by calling the SortPointer.sortPointer method. It is now possible to see where numbers are tied. Numbers tied together form what I will call a “clump.” Each number not tied forms a clump all by itself.

Now try out all possible positions of the common median. It can be to the left of all clumps, on a clump, between two clumps, or to the right of all clumps. We move the common median from left to right through all these positions. At each position we find all the p-values, one for each sample. Then we take minimum of these p-values for this position. Then we take the biggest of all these minima. This is the required grand p-value of the problem.

The above algorithm is for the “goConfide” button. It uses all the samples. If we use the “goPairs” button, things are a little bit different. Let us have a boolean array called permission. If all the samples are in use, then all get true permission, but if only two samples are in use, then only those two get true and the others get false. In a fixed position, removing samples from use cannot lower the minimum p-value in that position but only raise it or leave it where it was. This is because the table of Bonferroni-adjusted p-values is not changed. (We are doing a multiple inference.) Therefore the biggest minimum, over all positions, can get bigger but not smaller.
to Top

Advantages and disadvantages

The chief advantage of the present test is its null hypothesis:
The numbers in each sample are drawn independently at random from that samples’s population. That population is infinite. There exists a number which is a common median for all the different populations.
Here different samples may have different population distributions. In particular, the distributions may be heteroskedastic. For example, the variances, if any, may be different. (Also, the samples may have different sizes, but sample size is not part of the null hypothesis.) The population distributions may be continuous or discrete or singular or any mixtures of these. The population distributions may be unsymmetric. The samples may be dependent on one another. That is, the null hypothesis of the present test is weaker than the null hypothesis of any other k-sample test that I know about.

The present test is conservative or exact if this weak null hypothesis is true.

The chief disadvantage of the present test is its low power. Tests assuming stronger null hypotheses have more power. However, those tests may not be what they seem. Most users of statistics know that the null hypothesis of the Kruskal-Wallace “ranks” test asserts identity of the population distributions. Fewer users know that the null hypothesis of the Brown-Mood “median” test asserts the very same thing: identity of the population distributions. Both tests behave wrong in the presence of heteroskedasticity.
to Top

A note to developers

Statisticians who do not plan to change or repair the program need not read this. The attention of developers is respectfully drawn to the line
final boolean debugging=false;
in the KSampleTest.java file. The debugging variable is used in the try/catch statement
		try
		{
		parse();
		}

		catch( Throwable thro )
		{
		String temp="";
		temp+=thro.getClass().getCanonicalName();
		temp+="\n\n";
		temp+=thro.getMessage();
		temp+="\n";
		lowerTextArea.setText( temp );

			if( debugging )
			{
			StackTraceElement[] ste=thro.getStackTrace();
			int n=ste.length;
	
				for( int j=0;j<n;j++ )
				{
				lowerTextArea.append( "\n"+ste[j] );
				}

			}

		}
where I have bolded and bigged the line where debugging is used. When debugging is true, the StackTraceElements will be shown in the lower text area. That is, the program will say not only what happened, but also where it happened. Doubtless the professional Java programmers know about this already, but I am an amateur and I found out only recently. After the program is changed or repaired, the value of debugging can of course be set back to false.

In the KSample.java file is the actual statistical test. Its work method is called from the KSampleTest.java file. The SortPointer.java file contains the sortPointer method which sorts an array of double without moving any numbers in the array. Instead it creates an array of int pointing to the array of double. That is, pointer[0] points to the most negative double, pointer[1] points to the next most negative, and so on.
to Top

Revision date, licenses, and e-mail address

This program and its files are revised 21 January 2010.

This file and all of the other files which I include with it are in the public domain.

I am not claiming that I invented the test. It is so obvious that it must have been done long ago. I respectfully request readers who know more about it to tell me.

The Java™ language is the property of Sun.

Any mistakes in my programs or documentation are my own.

Please send all criticism, both constructive and destructive, to me, Harold M. Kaplan,
       at        dot
smtw2gh  toadmail   com
to Top

Harold Kaplan’s statistics.htm
John C. Pezzullo’s page