Your browser cannot do the canvas tag, or else JavaScript is turned off.


Random beta
Run a JavaScript program


This is a one-sample inference. The corresponding two-sample inference is at randomBeta2.htm.

In the following sections are examples showing use of the programs. The user is respectfully invited to try out the examples or to use any others. The only thing to remember is: follow the grammatical rules of JavaScript. (This is because the “eval” method of JavaScript is used in picking up the data from the textarea.) In particular, remember that starting an integer with a zero may force the use of base 8.

Links for this page

Which browsers?
Random beta
Two more examples
Modifiers
The idea of the inference
Run a JavaScript program
Bibliography
License, revision date, and e-mail address

Which browsers?

Modern browsers such as Google Chrome, Safari, and Mozilla Firefox can use this page. Microsoft Internet Explorer times out too quickly when the program is running. Also, Microsoft Internet Explorer 8 and earlier cannot do the canvas tag. I have read that number 9 can do the canvas, but my computer cannot load number 9.

Google Chrome and Safari are much speedier than Mozilla Firefox. However, in Mozilla Firefox the canvas is an image, so the canvas can be saved and viewed, and when viewed the canvas can be copied to the clipboard for later pasting into word processors and spread-sheets. Google Chrome and Safari do not treat the canvas as an image.

To top.

Random beta

Let us consider the following two JavaScript statements:
died=    [ 6,4,3 ];
censored=[ 7,7,3 ];
Both arrays have the same length: three. That is the number of different times, and both arrays are using the same times. Times increase from left to right. All the numbers in the arrays are non-negative integers. These integers are counts data.

As this page opens, there are three “textareas” visible. I will call them the top, middle, and bottom textareas. The top is for input. The middle and bottom are for output. Also, there will be a “canvas” output at the beginning of the file, but it is not visible yet. I respectfully invite the reader to use the mouse to select the two JavaScript statements and to “copy” them to the clipboard, then to move up to the top textarea, to click the “clear” button if necessary, to paste into that textarea, and finally to click on the “Random beta” button. After maybe five seconds, the canvas will open at the beginning of the file. The graph shows three descending curves. The curve in the middle, which I have colored blue, is the survival curve of Kaplan and Meier (1958). The top black horizontal line is at 1.0, and the bottom black horizontal line is at zero, so the middle thick black horizontal is at 0.5, half-way between.

The top and bottom curves, which I have colored red, are the top and bottom edges of a “Bayes credible band” containing the Kaplan-Meier survival curve. I have built it to have 95% credibility. Readers unfamiliar with Bayesian inference are invited to look at Lehmann and Romano (2008). A later section of the present page, Modifiers, will show how to change the percent of credibility and the colors and thicknesses of the line segments.

It may be that some readers would rather have a numerical table instead of a graph. The bottom textarea is a table with tab characters separating the number fields. It is meant to be put into a spread-sheet. The way of doing this is first to move the mouse into the textarea, then click with the right-hand button of the mouse, then use the left-hand button of the mouse to “select all,” then “copy.” Then open one’s favorite spread-sheet. Then click the mouse on the cell at “A1.” Then paste. Most spread-sheets will then accept the table. A few will instead open a “wizard.” Then just please make sure the wizard’s circle or square for separating with a tab character is checked, and proceed.

The columns of the table from left to right are deaths, censorings, row number, lower edge of the band, Kaplan-Meier survival, and upper edge of the band. Row number increases as time increases. The reader will notice that there are four rows, not the three that would be expected. The fourth row contains the “fiction” data. They will be explained later in this section.

Most spread-sheet programs can make “charts.” These are like graphs. To make a chart, just select columns C through F, or columns D through F, depending on the kind of chart, and proceed. Readers who do not know what I am talking about are respectfully asked to consult friends. No two spread-sheet programs have exactly the same kinds of charts.

Let me return to the graph in the canvas at the top of the present file. The best browser to work on it is Mozilla Firefox. I invite the user who has that browser to click the canvas with the mouse and then click the mouse with the right-hand button. It will then be possible to click the left-hand button for “View Image” or “Save Image As.” I recommend the former. The graph will then be viewed all by itself, without the rest of the page. Also, the graph can then be copied to the clipboard: again click the right-hand button of the mouse and then click the left button for “Copy Image.” Then the graph can be pasted from the clipboard into a spread-sheet or into a word processor. Once the graph has been so pasted, its width and height can be changed. To return from the viewing, just use the keyboard’s “Backspace” key.

Now let us return to the topic of “fiction” data. They have been put in the arrays to reduce the bias and improve the stability of the Kaplan-Meier estimator, especially in small samples. Here is an innocent-looking data-set:

died=    [ 1 ];
censored=[ 0 ];
I respectfully suggest that the reader select it and copy it and clear the upper textarea and paste from the clipboard and click on the “Random beta” button. The red credible band edges lie one above the other, and the blue Kaplan-Meier curve is half-way between them. Now instead let us use
died=    [ 1 ];
censored=[ 0 ];
fiction=0;
where the “fiction” is set to 0 instead of defaulting to 1. The band edges and the Kaplan-Meier curve are all three plotted on top of each other, so the band has zero half-width, and the edges and the Kaplan-Meier curve all drop to the zero line for some time values. Really? For only one death? This is not believable. Please do not set the “fiction” to zero unless commanded to do so by your teacher or research director or editor or referee.

This section does not explain anything about the middle textarea, but the Modifiers section does.

To top.

Two more examples

The previous example had only three time values. Here is an example with thirty. Again I respectfully invite the reader, and this time it really will be necessary to click on the “clear” button before pasting:
died=    [ 0,1,0,1,1,0,2,0,0,3,0,1,2,0,0,1,0,1,0,0,0,1,0,0,0,0,0,0,0,0 ];
censored=[ 0,1,1,3,1,1,3,2,1,0,2,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 ];
The reader will notice the vertical lines inside the graph. There are five time values between two adjacent vertical lines. Every fifth vertical line is heavier.

The next example has 300 time values, so it looks sparse:

died=    [ 0,0,0,0,0,0,0,2,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 ];
censored=[ 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 ];
Now we really do need those heavy vertical lines.

This may be a good time to point out that each example had the same number of time values for its deaths as for its censorings The first had 3 and 3, the next had 30 and 30, and the last had 300 and 300. This is required.

To top.

Modifiers

Now it is time to talk about the middle textarea. Let us again consider the thirty-time-value sample
died=[ 0,1,0,1,1,0,2,0,0,3,0,1,2,0,0,1,0,1,0,0,0,1,0,0,0,0,0,0,0,0 ];
censored=[ 0,1,1,3,1,1,3,2,1,0,2,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 ];
Besides the graph in the canvas and the tab-separated table in the bottom textarea, the user gets in the middle textarea the two lines
half width is 0.28583173176835713
and the time was 5.577 seconds
and the eleven lines
many=100000;
oneOverAlpha=20;
died=[ 0,1,0,1,1,0,2,0,0,3,0,1,2,0,0,1,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0 ];
censored=[ 0,1,1,3,1,1,3,2,1,0,2,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1 ];
maxSeconds=30;
separator="\t";
colors=[ "red","blue","red","black" ];
thicknesses=[ 5,5,5,1 ];
censoredBeforeDied="no";
swap="no";
fiction=1;
The “half width” is half of the width of the credible band. The “time” is the number of seconds that the program ran. The eleven lines look like something that one might put into the top textarea, and in fact the lines for died and censored were put there. The remaining nine lines are defaults. The thicknesses array gives the widths for the three survival curves and the thin horizontal and vertical lines. The thick lines are three times as thick as the thin lines. The [ 5,5,5,1 ] may serve well for one’s personal computer, but for display in a big lecture hall one needs perhaps triple those. Let us use instead [ 15,15,15,3 ]. Then the lines to be copied and pasted into the top textarea will be
died=[ 0,1,0,1,1,0,2,0,0,3,0,1,2,0,0,1,0,1,0,0,0,1,0,0,0,0,0,0,0,0 ];
censored=[ 0,1,1,3,1,1,3,2,1,0,2,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 ];
thicknesses=[ 15,15,15,3 ];
Also, the user may wish to change the colors:
died=[ 0,1,0,1,1,0,2,0,0,3,0,1,2,0,0,1,0,1,0,0,0,1,0,0,0,0,0,0,0,0 ];
censored=[ 0,1,1,3,1,1,3,2,1,0,2,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 ];
thicknesses=[ 15,15,15,3 ];
colors=[ "mediumblue","crimson","mediumblue","darkgray" ];
And why need one use a tab to separate? Some programs expect CSV, “Comma Separated Values:”
died=[ 0,1,0,1,1,0,2,0,0,3,0,1,2,0,0,1,0,1,0,0,0,1,0,0,0,0,0,0,0,0 ];
censored=[ 0,1,1,3,1,1,3,2,1,0,2,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 ];
thicknesses=[ 15,15,15,3 ];
colors=[ "mediumblue","crimson","mediumblue","darkgray" ];
separator=",";
Also, some programs expect semicolon as a separator.

Now another thing: the program works by Monte Carlo, using a great number of times around the big loop. The value of “many” is one more than that number of times. To get more precision, one might use ten times the default value:

died=[ 0,1,0,1,1,0,2,0,0,3,0,1,2,0,0,1,0,1,0,0,0,1,0,0,0,0,0,0,0,0 ];
censored=[ 0,1,1,3,1,1,3,2,1,0,2,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 ];
thicknesses=[ 15,15,15,3 ];
colors=[ "mediumblue","crimson","mediumblue","darkgray" ];
separator=",";
many=1e6;
Now we are in trouble. The diagnostic “The program took more than 30 seconds.” is printed. We need to allow more seconds, maybe 60:
died=[ 0,1,0,1,1,0,2,0,0,3,0,1,2,0,0,1,0,1,0,0,0,1,0,0,0,0,0,0,0,0 ];
censored=[ 0,1,1,3,1,1,3,2,1,0,2,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 ];
thicknesses=[ 15,15,15,3 ];
colors=[ "mediumblue","crimson","mediumblue","darkgray" ];
separator=",";
many=1e6;
maxSeconds=60;
Yes, that works.

So far the percentage of credibility has been 95. Perhaps somebody needs 99 percent. The way to do that is to change “oneOverAlpha” to 100. (I remark that “oneOverAlpha” must be a factor of “many.” This is enforced.)

died=[ 0,1,0,1,1,0,2,0,0,3,0,1,2,0,0,1,0,1,0,0,0,1,0,0,0,0,0,0,0,0 ];
censored=[ 0,1,1,3,1,1,3,2,1,0,2,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 ];
thicknesses=[ 15,15,15,3 ];
colors=[ "mediumblue","crimson","mediumblue","darkgray" ];
separator=",";
many=1e6;
maxSeconds=60;
oneOverAlpha=100;
Of course, a larger credibility makes the band wider.

For the next two modifiers, let us change back to the original three-time-value sample:

died=[ 6,4,3 ];
censored=[ 7,7,3 ];
The usual order of calculation on each time value is “died before censored.” The “censoredBeforeDied” can be changed to “yes” to make the order “censored before died.”
died=[ 6,4,3 ];
censored=[ 7,7,3 ];
censoredBeforeDied="yes";
Also, Kaplan and Meier point out that death and censorship are dual to each other. To swap them in the calculation, just set “swap” to “yes” instead of “no”:
died=[ 6,4,3 ];
censored=[ 7,7,3 ];
censoredBeforeDied="yes";
swap="yes";
I have already said something about the “fiction” modifier. Please do not use it, unless you are commanded to do so.

I point out to the reader that these JavaScript statements may be in any order, not just the one I used. Since there may be as many as 11 statements, there may be as many as 11! = 39916800 different orders. Also, one may use blank lines between the statements, if desired.

To top.

The idea of the inference

Let there be four times; call them 1, 2, 3, and 4. Suppose that deaths can occur only at 1 and 3, and that censorings can occur only at 2 and 4.

Let the numbers of deaths or censorings for those four times be x, y, z, and w. Let the chances of deaths or censorings for those four times be p, q, r, and s. Write n for the total of all deaths and censorings. The probability of x deaths at time 1 is
p x(1-p) n-x,
except that I have not put in the normalizing factorials. Those factorials depend only on x, y, z, and w, but not on p, q, r, or s. The conditional probability of y censorings at time 2 given that x deaths occurred at time 1 is
q y(1-q) n-x-y.
Hence the joint probability that x deaths occurred at 1 and y censorings occurred at 2 is the product
p x(1-p) n-x q y(1-q) n-x-y.
Similarly we get the conditional probability
r z(1-r) n-x-y-z
and the joint probability
p x(1-p) n-x q y(1-q) n-x-y r z(1-r) n-x-y-z
and the conditional probability
sw(1-s) n-x-y-z-w
and the joint probability
p x(1-p) n-x q y(1-q) n-x-y r z(1-r) n-x-y-z sw(1-s) n-x-y-z-w.

This is the likelihood. Since we are doing a Bayes inference, we say that the exponents are constants, but that the chances p, q, r, and s are variables. We shall need a priori probabilities for those variables. Let us use the “improper” formulas
p-1(1-p)-1, q-1(1-q)-1, r-1(1-r)-1, and s-1(1-s)-1.
Multiplying them upon the likelihood, we get
p x(1-p) n-x p-1(1-p)-1
times
q y(1-q) n-x-y q-1(1-q)-1
times
r z(1-r) n-x-y-z r-1(1-r)-1
times
sw(1-s) n-x-y-z-w s-1(1-s)-1.
That is, the a posteriori probability is a product of four factors, each in a different variable. Let us “integrate out” the factor in q. Similarly, let us integrate out the factor in s. The results of the integrals are constants, so there is no need to write them. The marginal a posteriori probability is now
p x(1-p) n-x p-1(1-p)-1
times
r z(1-r) n-x-y-z r-1(1-r)-1.
The factor in p is, except for its normalizing factor, a “beta” density in p. Similarly, the factor in r is, except for its normalizing factor, a beta density in r. The program is a Monte Carlo, and each time around the big loop it must take a random point in ( p,r ) space. What is needed is a way of taking a random number from a beta distribution. It is known that the order statistics of a sorted sample from the uniform distribution between zero and one have beta distributions. The program makes use of that fact.

Let the random numbers so made be called p and r. Let km( 1 ) and km( 3 ) be the Kaplan-Meier values at time 1 and time 3. Let us define a function “f”of two variables p and r by

f( p,r )=max(
| km( 1 )-( 1-p ) |
,
| km( 3 )-( 1-p )( 1-r) |
)

Here I have used the vertical line segments for absolute value.

Then the program goes around its big loop “many”-1 times, making random numbers p and r, and calculating f( p,r ), and putting the values into an array. When the big loop is all finished, the array is sorted. Then the “half width” is found in the array by using “many” and “oneOverAlpha” to calculate the subscript.

This section’s description simplifies the program and leaves out some details. I hope that the idea of the inference is conveyed.

After I wrote the above, I found out about Lo (1993). My ideas are in his paper, but the notation is different. His ideas are improvements on and extensions of Rubin (1981).

To top.

Run a JavaScript program

While building this page I needed a way to run little JavaScript programs, so I constructed the “Run a JavaScript program” button. When I was done I left the button so users could practice JavaScript programming with it. If a program is in the top textarea, the button will run it. Here is a trivial example:
var x=[];
for(var j=0;j<10;j++)x[j]=j;
x;
The user is respectfully invited.

To top.

Bibliography

Kaplan, E. L., and Meier, Paul, Nonparametric Estimation from Incomplete Observations, Journal of the American Statistical Association, Volume 53, Number 282 (June 1958), pages 457-481. Stable URL: http://www.jstor.org/stable/2281868

Lehmann, E. L., and Romano, Joseph P., Testing Statistical Hypotheses, Third Edition, Springer, 2005, fourth printing, 2008. See especially Section 5.7, Bayesian Confidence Sets, which is on pages 171-175.

Lo, Albert Y., A Bayesian Bootstrap for Censored Data, Annals of Statistics, Volume 21, Number 1 (1993), pages 100-123. This is also available in the Web at http://projecteuclid.org/euclid.aos/1176349017

Rubin, Donald B., The Bayesian Bootstrap, Annals of Statistics, Volume 9, Number 1 (1981), pages 130-134. This is also available in the Web at http://projecteuclid.org/euclid.aos/1176345338

To top.

License, revision date, and e-mail address

All of this file is in the public domain. The date of this revision is 3 March 2012. Criticism both constructive and destructive comes to me, Harold Kaplan,
       at     dot        
smtw2gh  gmail   com
To top.
Harold Kaplan’s statistics.htm
John C. Pezzullo’s page