The Java™ applet on this page is a freeware Monte Carlo program to work two-dimensional contingency tables, testing independence of rows against columns. I make no claim of originality for the idea. I wrote and uploaded this applet merely because I could not find such an applet on the Web. Some proprietary statistics packages claim to contain such a program, if I understand their advertisements correctly.
The AWT controls in order are a single-line TextField to hold the many integer, a multi-line TextArea which I will call the upper text area, a Button called “goShuffle,” a faded Button called “percent,” a faded Button called “stop,” and a multi-line TextArea which I will call the lower text area.
When loading is finished, many is already set to its default value: one million. Nothing prevents the user from changing this. (This many is the total number of genuine and fake samples.)
Here is a table copied from P. Diaconis and B. Sturmfels, Algebraic algorithms for sampling from conditional distributions, Annals of Statistics, 1998, Vol. 26, No. 1, 363-397. This table is on their page 364.
1 0 0 0 1 2 0 0 1 0 1 0
1 0 0 1 0 0 0 0 0 1 0 2
1 0 0 0 2 1 0 0 0 0 0 1
3 0 2 0 0 0 1 0 1 3 1 1
2 1 1 1 1 1 1 1 1 1 1 0
2 0 0 0 1 0 0 0 0 0 0 0
2 0 2 1 0 0 0 0 1 1 1 2
0 0 0 3 0 0 1 0 0 1 0 2
0 0 0 1 1 0 0 0 0 0 1 0
1 1 0 2 0 0 1 0 0 1 1 0
0 1 1 1 2 0 0 2 0 1 1 0
0 1 1 0 0 0 1 0 0 0 0 0
Diaconis and Sturmfels say, on their page 363, “The classical rules of thumb for validity of the chi-square approximation (minimum 5 per cell) are badly violated here, and there are too many tables with these margins to permit exact enumeration.”
I respectfully suggest that the reader select the table with the mouse, copy it, move the mouse to the inside of the upper text area, paste into that text area, and click with the mouse on the “goShuffle” button. The “goShuffle” button will vanish, and the remaining two buttons will un-fade. The lower text area will have the word “Working” in it for some seconds, and then the lower text area will change to
pValue of this run is 0.67758 time to run is 17.281 seconds
The following contingency table is quoted from page 584 of Kendall, Maurice G., and Stuart, Alan, The Advanced Theory of Statistics, Volume 2, Inference and Relationship, Charles Griffin & Company Limited, London, 1961. They cite Ammon, Zur Anthropologie der Badener.
1768 807 189 47 946 1387 746 53 115 438 288 16The numbers in this table are large, so I respectfully ask the user to change the million in the many TextField to only ten thousand. Then I respectfully suggest clearing the upper text area, and copying and pasting Kendall and Stuart’s table into the upper text area, and clicking on the “goShuffle” button. The result will be
pValue of this run is 1.0E-4 time to run is 9.797 secondsor the like.
Here is a table from Agresti, A., A Survey of Exact Inference for Contingency Tables, Statistical Science 1992, Vol. 7, No.1, 131-177. It is Table 1 on page 132.
17066 14464 788 126 37
48 38 5 1 1
I respectfully suggest that the user keep the ten thousand in the many TextField, clear the upper text area, select and copy and paste Agresti’s table, and click on the “goShuffle” button. The result will be
pValue of this run is 0.0366 time to run is 48.203 secondsor the like.
The value of many is up to the user. If lunch will take an hour, then many can be made larger, so as to get a better p-value.
to Top
How do I stop the program?
It may happen that the program is working and working and shows no sign of stopping. This is a good time to click the mouse on the “percent” button. The percent of work done will be printed on the lower text area. If the percent is near one hundred, then the user may just wait a small time and the answers will be printed.
If on the other hand the percent is small, then just please click the mouse on the “stop” button. Then please change the many to something smaller, and click the “goShuffle” button again.
to Top
Rules to keep in mind
The integer value for many must not have any decimal point or exponent. Neither may it be negative. The same goes for the counts in the upper text area. These rules and the usual rules of number format for the Java language will be enforced and diagnosed in the lower text area. Each non-empty row in the upper text area must have the same number of counts as each other non-empty row. Empty rows are permitted, and they have no meaning. Counts in the same row may be separated by one or more blanks or one or more tabs or any combination.
The experimenter ought to click the “goShuffle” button only once. (Running the program ten times and taking the smallest p-value is clearly cheating.) Students will perhaps be told by their teacher how many times they may click.
to Top
Browsers and Java
All the famous modern browsers can do this page correctly, but the “zoom” must be set to 100% or reset to zero, depending on the browser. The zoom control will commonly be on the “view” dropdown menu. If the zoom is wrong, then the “layout” of the Java applet will be wrong.
However, the program can work only if Java is installed and Java is turned on. If there is no Java or if Java has been turned off, then the text field and the text areas and the “goShuffle” and “percent” and “stop” buttons will not even be visible. In that case, the user is respectfully requested to download and install Java and/or to turn Java back on. Users who have trouble doing this are respectfully asked to get help from their classmates, children, spouses, or teachers.
to Top
Download
A reader or user wishing to download the files of this applet is respectfully invited to click on
Mcirc.jar
to save the “jar” file. All the other Mcirc files are zipped into it, so it can be unzipped after download to see and change them all. Yes, a “jar” file is merely a special kind of “zip” file.
If the browser’s downloader renames the “jar” file to a “zip” file, please be sure to rename it back to a “jar”. I know a browser which takes too much on itself in this way.
to Top
The algorithm
Everything depends on the two methods called expand and contract, which are in the Mcirc.java file. The original genuine two-dimensional table, x, is expanded to two one-dimensional arrays called eyeHelp and jHelp. To make a fake two-dimensional table, called y, we use Shuffle.intShuffle on eyeHelp and then we contract eyeHelp and jHelp. That is, y comes from a population of tables having the same marginal totals as x but having rows independent from columns.
For each table, genuine or fake, we calculate the usual chi-square statistic. A number called right is the total number of statistics greater than or equal to the genuine statistic. Finally the p-value is the quotient of right and many.
to Top
Advantages and disadvantages
The test on this page is frequentist and exact. Though a frequentist exact test, it is speedy. Asymptotic methods are more speedy but not exact. Bayes methods are more speedy but not frequentist. Deterministic methods, such as Fisher’s, are usually slow.
However, the test on this page is not deterministic. Since it uses random numbers, the p-values can differ from run to run. This is why the experimenter may run only once. Students, of course, may run as many times as their teacher permits.
to Top
A note to developers
Statisticians who do not plan to change or repair the program need not read this. The attention of developers is respectfully drawn to the line
final boolean debugging=false;in the Mcirc.java file. The debugging variable is used in the try/catch statement
try
{
parse();
}
catch( Throwable thro )
{
String temp="";
temp+=thro.getClass().getCanonicalName();
temp+="\n\n";
temp+=thro.getMessage();
temp+="\n";
lowerTextArea.setText( temp );
if( debugging )
{
StackTraceElement[] ste=thro.getStackTrace();
int n=ste.length;
for( int j=0;j<n;j++ )
{
lowerTextArea.append( "\n"+ste[j] );
}
}
}
where I have bolded and bigged the line where debugging is used. When debugging is true, the StackTraceElements will be shown in the lower text area. That is, the program will say not only what happened, but also where it happened. Doubtless the professional Java programmers know about this already, but I am an amateur and I just found out. After the program is changed or repaired, the value of debugging can of course be set back to false.
The intShuffle method is in the Shuffle.java file.
to Top
Revision date, licenses, and e-mail address
This program and its files are revised 7 March 2012.
The tabular data quoted from journals and books and other web pages are copyrighted by their publishers.
All the rest of this file and all of the other files which I include with it are in the public domain.
The Java™ language is the property of Oracle.
Please send all criticism, both constructive and destructive, to me, Harold M. Kaplan,
at dot
smtw2gh gmail com
to Top