Sample Size Calculation - Binary Endpoint

Yanhong Zhou, Ying Yuan, J. Jack Lee, and Haitao Pan

Department of Biostatistics, MD Anderson Cancer Center, Houston, TX 77030


PID: 965 ; v.2.1.0.0 ; Last Updated: 01/28/2022

Error: \(\delta\) should be between 0 and 1

Error: \(\delta\) should be between 0 and 1

Error: the value of (\(p_t-p_c\)) must < \(\delta\)

Error: the absolute value of (\(p_t-p_c\)) must < \(\delta\)

Error: the absolute value of (\(p_t-p_c\)) must < \(\delta\)





1. Input

Response Rate for Historical Control \((p_c)\): true response rate for the historical control.

Response Rate for Treatment \((p_t)\): true response rate for the treatment.

Type I Error Rate (\(\alpha\)): false positive rate.

Power \((1-\beta)\): where \(\beta\) is type II error rate (i.e., false negative rate).

2. Example (equality test for a one-group design)

Suppose one is interested in detemining whether a new treatment is better than a historical control in terms of response rate. The response rate for the historical control is 0.2. If an increase of 0.3 in the response rate is clinically meaningful, how many subjects are needed to detect the difference with a power of 0.8?

Input: \(p_c=0.2, p_t=0.2+0.3=0.5, \alpha=0.05, 1-\beta=0.8\), and assume a one-sided test.

Output:

In a one-sided z-test for one-sample proportion, at the significance level of 0.05, a sample size of 18 is needed to achieve 80% power when the response rate for the historical control is 0.2 and response rate for the treatment is 0.5.

1. Input

Response Rate for Control \((p_c)\): true response rate for the control.

Response Rate for Treatment \((p_t)\): true response rate for the treatment.

Subject Allocation Ratio \((k=n_t/n_c)\): the ratio of the number of patients allocated to treatment over that to control.

Type I Error Rate (\(\alpha\)): false positive rate.

Power \((1-\beta)\): where \(\beta\) is type II error rate (i.e., false negative rate).

2. Example (chi-square test without continuity correction)

Suppose one is interested in detemining whether a new treatment is better than a standard control in terms of response rate. The response rate for the control is 0.2. If an increase of 0.3 in the response rate is clinically meaningful, how many subjects are needed to detect the difference with a power of 0.8 given equal allocation?

Input: \( p_c=0.2, p_t=0.2+0.3=0.5, k=1,\alpha=0.05, 1-\beta=0.8\), and assume a one-sided test.

Output:

In a one-sided Chi-square test for two-sample proportion, at the significance level of 0.05, 29 subjects for treatment group and 29 subjects for control group are needed to achive 80% power to detect the response rate difference of 0.3 between the treatment and control.

1. Input

Response Rate for Control \((p_c)\): true response rate for the control.

Response Rate for Treatment \((p_t)\): true response rate for the treatment.

Subject Allocation Ratio \((k=n_t/n_c)\): the ratio of the number of patients allocated to treatment over that to control.

Type I Error Rate (\(\alpha\)): false positive rate.

Power \((1-\beta)\): where \(\beta\) is type II error rate (i.e., false negative rate).

2. Example (chi-square test with continuity correction)

Suppose one is interested in detemining whether a new treatment is better than a standard control in terms of response rate. The response rate for the control is 0.2. If an increase of 0.3 in the response rate is clinically meaningful, in a chi-square test with continuity correction, how many subjects are needed to detect the difference with a power of 0.8 given equal allocation ?

Input: \( p_c=0.2, p_t=0.2+0.3=0.5, k=1,\alpha=0.05, 1-\beta=0.8\), and assume a one-sided test.

Output:

In a one-sided Chi-square test for two-sample proportion, at the significance level of 0.05, 37 subjects for treatment group and 37 subjects for control group are needed to achive 80% power to detect the response rate difference of 0.3 between the treatment and control.

1. Input:

Equivalence Limit \((\delta>0)\): \(\delta\) is length of margin, which is called (1) equivalence limit in equivalence test, (2) noninferiority margin in nonferiority test, (3) supriority margin when supriority test is of interest. The difference of the three types of tests can be shown intuitively in the following Figure.

Response Rate for Historical Control \((p_c)\): true response rate for the historical control.

Response Rate for Treatment \((p_t)\): true response rate for the treatment.

Type I Error Rate (\(\alpha\)): false positive rate.

Power \((1-\beta)\): where \(\beta\) is type II error rate (i.e., false negative rate).

2. Example (equivalence test for a one-group design)

Suppose an investigator is interested in detemining whether a new treatment is better than a historical control in terms of response rate. The response rate for the historical control is 0.2. The equivalence limit is 0.2. The investigator believes that the new treatment has a response rate of 0.35. How many subjects are needed to have a power of 0.8 to determine that the new treatment is equivalent to the historical control?

Input: \(\delta=0.2, p_c=0.2, p_t=0.35, \alpha=0.05, 1-\beta=0.8\).

Output:

At the significance level of 0.05, with an equivalence limit of 0.2, a sample size of 563 is required to achieve 80 % power when the response rate for the historical control is 0.2 and response rate for the treatment is 0.35.

1. Input

Equivalence Limit \((\delta>0)\): \(\delta\) is length of margin, which is called (1) equivalence limit in equivalence test, (2) noninferiority margin in nonferiority test, (3) supriority margin when supriority test is of interest. The difference of the three types of tests can be shown intuitively in the following Figure.

Response Rate for Control \((p_c)\): true response rate for the control.

Response Rate for Treatment \((p_t)\): true response rate for the treatment.

Subject Allocation Ratio \((k=n_t/n_c)\): the ratio of the number of patients allocated to treatment over that to control.

Type I Error Rate (\(\alpha\)): false positive rate.

Power \((1-\beta)\): where \(\beta\) is type II error rate (i.e., false negative rate).

2. Example (equivalence test for a two-group design)

Suppose an investigator is interested in detemining whether a new treatment is better than a standard control in terms of response rate. The response rate for the control is 0.2. The equivalence limit is 0.2. The investigator believes that the new treatment has a response rate of 0.35. How many subjects are needed to have a power of 0.8 to determine that the new treatment is equivalent to the control given equal patient allocation?

Input: \(\delta=0.2, p_c=0.25,p_t=0.35, k=1, \alpha=0.05, 1-\beta=0.8\).

Output:

In a two-sided test for two-sample proportion, at the significance level of 0.05, 257 subjects for treatment group and 257 subjects for control group are needed to achieve 80% power when the response rate for control is 0.25 and the response rate for treatment is 0.35.

1. Input

Noninferiority Margin \((\delta>0)\): \(\delta\) is length of margin, which is called (1) equivalence limit in equivalence test, (2) noninferiority margin in nonferiority test, (3) supriority margin when supriority test is of interest. The difference of the three types of tests can be shown intuitively in the following Figure.

Response Rate for Historical Control \((p_c)\): true response rate for the historical control.

Response Rate for Treatment \((p_t)\): true response rate for the treatment.

Type I Error Rate (\(\alpha\)): false positive rate.

Power \((1-\beta)\): where \(\beta\) is type II error rate (i.e., false negative rate).

2. Example (noninferiority test for a one-group design)

Suppose an investigator is interested in claiming that a new treatment is not worse than a historical control in terms of response rate. The response rate for the historical control is 0.3. The noninferiority margin is 0.2. The investigator believes that the new treatment has a response rate of 0.2. How many subjects are needed to have a power of 0.8 to claim that the new treatment is not worse than the historical control?

Input: \(\delta=0.2,p_c=0.3, p_t=0.2, \alpha=0.05, 1-\beta=0.8\).

Output:

At the significance level of 0.05 , with a noninferiority margin of 0.2, a sample size of 99 is required to achieve 80 % power when the response rate for the historical control is 0.3 and response rate for the treatment is 0.2.

1. Input

Noninferiority Margin \((\delta>0)\): \(\delta\) is length of margin, which is called (1) equivalence limit in equivalence test, (2) noninferiority margin in nonferiority test, (3) supriority margin when supriority test is of interest. The difference of the three types of tests can be shown intuitively in the following Figure.

Response Rate for Control \((p_c)\): true response rate for the historical control.

Response Rate for Treatment \((p_t)\): true response rate for the treatment.

Subject Allocation Ratio \((k=n_t/n_c)\): the ratio of the number of patients allocated to treatment over that to control.

Type I Error Rate (\(\alpha\)): false positive rate.

Power \((1-\beta)\): where \(\beta\) is type II error rate (i.e., false negative rate).

2. Example (noninferiority test for a two-group design)

Suppose an investigator is interested in claiming that a new treatment is not worse than a standard control in terms of response rate. The response rate for the control is 0.4. The noninferiority margin is 0.2. The investigator believes that the new treatment has a response rate of 0.3. How many subjects are needed to have a power of 0.8 to determine that the new treatment is not worse than the control given equal patient allocation?

Input: \(\delta=0.2, p_c=0.4, p_t=0.3, k=1, \alpha=0.05, 1-\beta=0.8\).

Output:

At the significance level of 0.05 , with a noninferiority margin of 0.2, 279 subjects for treatment group and 279 subjects for control group are needed to achieve 80 % power when the response rate for control is 0.4 and for treatment is 0.3 .

1. Input

Superiority Margin \((\delta>0)\): \(\delta\) is length of margin, which is called (1) equivalence limit in equivalence test, (2) noninferiority margin in nonferiority test, (3) supriority margin when supriority test is of interest. The difference of the three types of tests can be shown intuitively in the following Figure.

Response Rate for Historical Control \((p_c)\): true response rate for the historical control.

Response Rate for Treatment \((p_t)\): true response rate for the treatment.

Type I Error Rate (\(\alpha\)): false positive rate.

Power \((1-\beta)\): where \(\beta\) is type II error rate (i.e., false negative rate).

2. Example (superiority test for a one-group design)

Suppose an investigator is interested in determine whether a new treatment superior to a historical control in terms of response rate. The response rate for the historical control is 0.3. The superiority margin is 0.15. The investigator believes that the new treatment has a response rate of 0.5. How many subjects are needed to have a power of 0.8 to claim that the new treatment is superior to the historical control?

Input: \(\delta=0.15, p_c=0.3, p_t=0.5,\alpha=0.05, 1-\beta=0.8\).

Output:

At the significance level of 0.05 with a superiority margin of 0.15, a sample size of 619 is required to achieve 80 % power when the response rate for the historical control is 0.3 and response rate for the treatment is 0.5.

1. Input

Superiority Margin \((\delta>0)\): \(\delta\) is length of margin, which is called (1) equivalence limit in equivalence test, (2) noninferiority margin in nonferiority test, (3) supriority margin when supriority test is of interest. The difference of the three types of tests can be shown intuitively in the following Figure.

Response Rate for Control \((p_c)\): true response rate for the control.

Response Rate for Treatment \((p_t)\): true response rate for the treatment.

Subject Allocation Ratio \((k=n_t/n_c)\): the ratio of the number of patients allocated to treatment over that to control.

Type I Error Rate (\(\alpha\)): false positive rate.

Power \((1-\beta)\): where \(\beta\) is type II error rate (i.e., false negative rate).

2. Example (superiority test for a two-group design)

Suppose an investigator is interested in determine whether a new treatment is superior to a standard control in terms of response rate. The response rate for the control is 0.3. The superiority margin is 0.2. The investigator believes that the new treatment has a response rate of 0.6. How many subjects are needed to have a power of 0.8 to determine that the new treatment is superior to the control given equal patient allocation?

Input: \(\delta=0.2, p_c=0.3, p_t=0.6, k=1,\alpha=0.05, 1-\beta=0.8\).

Output:

At the significance level of 0.05 , with a superiority margin of 0.2, 279 subjects for treatment group and 279 subjects for control group are needed to achieve 80% power when the response rate for control is 0.3 and for treatment is 0.6.

1. Input

Response Rate for Control \((p_c)\): true response rate for the historical control.

Response Rate for Treatment \((p_t)\): true response rate for the treatment.

Subject Allocation Ratio \((k=n_t/n_c)\): the ratio of the number of patients allocated to treatment over that to control.

Type I Error Rate (\(\alpha\)): false positive rate.

Power \((1-\beta)\): where \(\beta\) is type II error rate (i.e., false negative rate).

2. Example (Fisher's exact test for a two-group design)

Suppose an investigator is interested in confirming a difference between a treatment and a control. It is assumed that the treatment has a response rate of 0.6 and the control has a response rate of 0.35. In order to confirm that this difference truly exist with a 80% power at the significance level of 0.05, how many subjects are needed given equal patient allocation between the two groups?

Input: \( p_c=0.35, p_t=0.6, k=1, \alpha=0.05, 1-\beta=0.8\), and assume a one-sided test.

Output:

In a one-sided Fisher's exact test, at the significance level of 0.05, 56 subjects for treatment group and 56 for control group are needed to achieve the power 0.8 when the response rate of treatment is 0.6 and the response rate of control is 0.35.

1. Input

Let \(x_1\) and \(x_2\) denote the indicator of whether a laboratory value is normal pre- and post-treatment, respectively, with the value 1 indicating "normal". Define \(p_{10}=Pr(x_{1}=1,x_{2}=0)\) and \(p_{01}=Pr(x_{1}=0,x_{2}=1) \).

Probability of Shifting from Normal to Abnormal \((p_{10})\): probability of shifting from normal in pre-treatment to abnormal in post-treatment.

Probability of Shifting from Abnormal to Normal \((p_{01})\): probability of shifting from abnormal in pre-treatment to normal in post-treatment.

Type I Error Rate (\(\alpha\)): false positive rate.

Power \((1-\beta)\): where \(\beta\) is type II error rate (i.e., false negative rate).

2. Example (McNemar's test for a two-group design)

Suppose an investigator is planning a clinical trial, in which subjects be tested on a variable of interest (say, \(Y\)) both prior to and after a new treatment. It is assumed that about 50% of subjects will have \(Y\) shifting from abnormal to normal while 20% of subjects will have \(Y\) shifting from normal to abnormal. The investigator need to select a sample size such that there is 90% power to detect such a difference in a two-sided test at the significance level of 0.05. How many subects does the investigator need for this trial?

Input: \(p_{10}=0.2, p_{01}=0.5, \alpha=0.05, 1-\beta=0.8\).

Output:

In a one-sided McNemar's test for paired two-sample proportion, at the significance level of 0.05, 46 subjects are needed to achieve the power 0.8 when the probabiltiy of shifting from normal to abnormal is 0.2 and the probability of shifting from abnormal to normal is 0.5 .

1. Input

The Probability that the Second Rater Gives Positive Evaluation \((p_{+.})\): the probability that the first rater will give a positive evaluation on the outcome.

The Probability that the Second Rater Gives Positive Evaluation \((p_{.+})\) : the probability that the second rater will give a positive evaluation on the outcome.

The Value of Kappa coefficient under the Null Hypothesis \((k_0)\) : the Kappa coefficient assumed under the null hypothesis.

The Value of Kappa coefficient under the Alternative Hypothesis \((k_1)\): the Kappa coefficient assumed under the alternative hypothesis.

Type I Error Rate (\(\alpha\)): false positive rate.

Power \((1-\beta)\): where \(\beta\) is type II error rate (i.e., false negative rate).

2. Example (Agreement)

Suppose two observers are asked to observe a group of subjects and to decide whether or not each exhibits a particular behavior. The probability that the first rate will give positive evaluation on the subjects is 0.4 and the corresponding probability for the second rater is 0.3. The Kappa coefficient is assumed to be 0.3 and 0.5 under the null and alternative hypothese, respectively. At the significane level of 0.05, how many subjects are required to achieve the power of 0.8 in a one-sided test?

Input: \(p_{+.}=0.4, p_{.+}=0.3, k_0=0.3, k_1=0.5 \alpha=0.05, 1-\beta=0.8\).

Output:

In a one-sided test for agreement using Kappa's coefficient, at the significance level of 0.05, 136 subjects are needed to achieve the power 0.8 when the probability that the first rater will give positive evaluation is 0.4 and the probability that the second rater will give positive evaluation is 0.3.