1. Input
Mean for Historical Control (\(\mu_c\)):
true mean response for the historical control.
Mean for Treatment (\(\mu_t\)):
true mean response for the treatment.
Standard Deviation (\(s\)):
standard deviation calculated by
\(s=\sqrt{\frac{1}{n-1}\sum_{i=1}^{n}(x_{i}-\bar{x})^2}\), where \(\bar{x}=\frac{1}{n}\sum_{i=1}^{n}x_{i}\) and
\(x_{i}\) is observed response for the \(j\)th patient obtained from previous research or literature, \(i=1,\cdots,n\).
Type I Error Rate (\(\alpha\)):
false positive rate.
Power \((1-\beta)\):
where \(\beta\) is type II error rate (i.e., false negative rate).
2. Example
(equality test for a one-group design)
Suppose one is interested in detemining whether a new treatment is better than
a historical control in terms of mean response. The mean response for the historical control is 0.2. The standard deviation of response
is approximately 1. If an increase of 0.3
in the mean response is clinically meaningful, how many subjects are needed to detect the difference with a power of 0.8?
Input:
\(\mu_c=0.2, \mu_t=0.2+0.3=0.5, s=1, \alpha=0.05, 1-\beta=0.8\), and assume a one-sided test.
Output:
In a one-sided t-test for one-sample mean, at the significance level of 0.05, a sample size of
71
is
needed to achieve 80% power when the mean for the historical control is 0.2 and the mean for the treatment is 0.5.
1. Input
Difference in Mean (\(\mu_t-\mu_c\) ):
\(\mu_t\) and \(\mu_c\) are true mean response for
the treatment and control, respectively.
Standard Deviation (\(s\)):
standard deviation calculated by
\(s=\sqrt{\frac{1}{n-1}\sum_{i=1}^{n}(x_{i}-\bar{x})^2}\), where \(\bar{x}=\frac{1}{n}\sum_{i=1}^{n}x_{i}\) and
\(x_{i}\) is observed response for the \(j\)th patient obtained from previous research or literature, \(i=1,\cdots,n\).
Subject Allocation Ratio (\(k=n_t/n_c\)):
the ratio of number of subjects assigned to
treatment to the number of subjects in control
where \(n_t, n_c\) are sample size for treatment and control, respectively.
Type I Error Rate (\(\alpha\)):
false positive rate.
Power \((1-\beta)\):
where \(\beta\) is type II error rate (i.e., false negative rate).
2. Example
(equality test for a two-group design)
Suppose one is interested in detemining whether a new treatment is better than
a standard control in terms of mean response. The mean response for the control is 0.2. The standard deviation of response
is approximately 1. If an increase of 0.3
in the mean response is clinically meaningful, how many subjects are needed to detect the difference with a power of 0.8 given equal allocation?
Input:
\(\mu_t-\mu_c=0.3, s=1, k=1,\alpha=0.05, 1-\beta=0.8\), and assume a one-sided test.
Output:
In a one-sided t-test for two-sample mean, at the significance level of 0.05,
139
subjects for treatment group and
139
subjects for control group are needed to achive 80% power to detect the mean difference of 0.3 between treatment and control.
1. Input
Equivalence Limit (\(\delta>0\)):
\(\delta\) is length of margin, which is called (1) equivalence limit in equivalence test,
(2) noninferiority margin in nonferiority test, (3) supriority margin when supriority test is of interest. The difference of the three types of tests can be shown intuitively
in the following Figure.
Mean for Historical Control (\(\mu_c\)):
true mean response for control treatment.
Mean for Treatment (\(\mu_t\)):
true mean response for experimental treatment.
Standard Deviation (\(s\)):
standard deviation calculated by
\(s=\sqrt{\frac{1}{n-1}\sum_{i=1}^{n}(x_{i}-\bar{x})^2}\), where \(\bar{x}=\frac{1}{n}\sum_{i=1}^{n}x_{i}\) and
\(x_{i}\) is observed response for the \(j\)th patient obtained from previous research or literature, \(i=1,\cdots,n\).
Type I Error Rate (\(\alpha\)):
false positive rate.
Power \((1-\beta)\):
where \(\beta\) is type II error rate (i.e., false negative rate).
2. Example
(equivalence test for a one-group design)
Suppose an investigator is interested in detemining whether a new treatment is better than
a historical control in terms of mean response. The mean response for the historical control is 0.2. The standard deviation of response
is approximately 0.1. The equivalence limit is 0.2. The investigator believes that the new treatment has a mean response of 0.35.
How many subjects are needed to have a power of 0.8 to determine that the new treatment is equivalent to the historical control?
Input:
\(\delta=0.2, \mu_c=0.2, \mu_t=0.35, s=0.1, \alpha=0.05, 1-\beta=0.8\).
Output:
At the significance level of 0.05 , with an equivalence limit of 0.2, a sample size of
27
is required to achieve 80 % power when the absolute mean difference between treatment and the historical control is 0.15.
1. Input
Equivalence Limit (\(\delta>0\)):
\(\delta\) is length of margin, which is called (1) equivalence limit in equivalence test,
(2) noninferiority margin in nonferiority test, (3) supriority margin when supriority test is of interest. The difference of the three types of tests can be shown intuitively
in the following Figure.
Difference in Mean (\(\mu_t-\mu_c\) ):
\(\mu_t\) and \(\mu_c\) are true mean response for
the treatment and control, respectively.
Standard Deviation (\(s\)):
standard deviation calculated by
\(s=\sqrt{\frac{1}{n-1}\sum_{i=1}^{n}(x_{i}-\bar{x})^2}\), where \(\bar{x}=\frac{1}{n}\sum_{i=1}^{n}x_{i}\) and
\(x_{i}\) is observed response for the \(j\)th patient obtained from previous research or literature, \(i=1,\cdots,n\).
Subject Allocation Ratio (\(k=n_t/n_c\)):
the ratio of number of subjects assigned to
treatment to the number of subjects in control
where \(n_t, n_c\) are sample size for treatment and control, respectively.
Type I Error Rate (\(\alpha\)):
false positive rate.
Power \((1-\beta)\):
where \(\beta\) is type II error rate (i.e., false negative rate).
2. Example
(equivalence test for a two-group design)
Suppose an investigator is interested in detemining whether a new treatment is better than
a standard control in terms of mean response. The mean response for the control is 0.2. The standard deviation of response
is approximately 0.1. The equivalence limit is 0.2. The investigator believes that the new treatment has a mean response of 0.35.
How many subjects are needed to have a power of 0.8 to determine that the new treatment is equivalent to the control given equal patient allocation?
Input:
\(\delta\)=0.2,
\(\mu_t-\mu_c=0.35-0.2=0.15, s=0.1, k=1, \alpha=0.05, 1-\beta=0.8\).
Output:
At the significance level of 0.05 , with an equivalence limit of 0.2,
51
subjects for treatment group and
51
subjects for control group are needed to achieve 80% power when the mean response difference between treatment and control is 0.15.
1. Input
Noninferiority margin (\(\delta>0\)):
\(\delta\) is length of margin, which is called (1) equivalence limit in equivalence test,
(2) noninferiority margin in nonferiority test, (3) supriority margin when supriority test is of interest. The difference of the three types of tests can be shown intuitively
in the following Figure.
Mean for Historical Control (\(\mu_c\)):
true mean response for control treatment.
Mean for Treatment (\(\mu_t\)):
true mean response for experimental treatment.
Standard Deviation (\(s\)):
standard deviation calculated by
\(s=\sqrt{\frac{1}{n-1}\sum_{i=1}^{n}(x_{i}-\bar{x})^2}\), where \(\bar{x}=\frac{1}{n}\sum_{i=1}^{n}x_{i}\) and
\(x_{i}\) is observed response for the \(j\)th patient obtained from previous research or literature, \(i=1,\cdots,n\).
Type I Error Rate (\(\alpha\)):
false positive rate.
Power \((1-\beta)\):
where \(\beta\) is type II error rate (i.e., false negative rate).
2. Example
(noninferiority test for a one-group design)
Suppose an investigator is interested in claiming that a new treatment
is not worse than a historical control in terms of mean response. The mean response for the historical control is 0.3. The standard deviation of response
is approximately 1. The noninferiority margin is 0.2. The investigator believes that the new treatment has a mean response of 0.2.
How many subjects are needed to have a power of 0.8to claim that the new treatment is not worse than the historical control?
Input:
\(\delta=0.2, \mu_c=0.3, \mu_t=0.2, s=1, \alpha=0.05, 1-\beta=0.8\).
Output:
At the significance level of 0.05 , with a noninferiority margin of 0.2,
a sample size of
620
is required to achieve 80 % power when the mean difference between treatment and the historical control is 0.1.
1. Input
Noninferiority margin (\(\delta>0\)):
\(\delta\) is length of margin, which is called (1) equivalence limit in equivalence test,
(2) noninferiority margin in nonferiority test, (3) supriority margin when supriority test is of interest. The difference of the three types of tests can be shown intuitively
in the following Figure.
Difference in Mean (\(\mu_t-\mu_c\) ):
\(\mu_t\) and \(\mu_c\) are true mean response for
the treatment and control, respectively.
Standard Deviation (\(s\)):
standard deviation calculated by
\(s=\sqrt{\frac{1}{n-1}\sum_{i=1}^{n}(x_{i}-\bar{x})^2}\), where \(\bar{x}=\frac{1}{n}\sum_{i=1}^{n}x_{i}\) and
\(x_{i}\) is observed response for the \(j\)th patient obtained from previous research or literature, \(i=1,\cdots,n\).
Subject Allocation Ratio (\(k=n_t/n_c\)):
the ratio of number of subjects assigned to
treatment to the number of subjects in control
where \(n_t, n_c\) are sample size for treatment and control, respectively.
Type I Error Rate (\(\alpha\)):
false positive rate.
Power \((1-\beta)\):
where \(\beta\) is type II error rate (i.e., false negative rate).
2. Example
(noninferiority test for a two-group design)
Suppose an investigator is interested in claiming that a new treatment is not worse than
a standard control in terms of mean response. The mean response for the control is 0.4. The standard deviation of response
is approximately 0.1. The noninferiority margin is 0.2. The investigator believes that the new treatment has a mean response of 0.3.
How many subjects are needed to have a power of 0.8 to determine that the new treatment is not worse than the control given equal patient allocation?
Input:
\(\delta\)=0.2,
\(\mu_t-\mu_c=0.3-0.4=-0.1, s=0.1, k=1, \alpha=0.05, 1-\beta=0.8\).
Output:
At the significance level of 0.05, with a noninferiority margin of 0.2,
14
subjects for treatment group and
14
subjects for control group are needed to achieve 80% power
when the mean response difference between treatment and control is -0.1.
1. Input
Superiority margin (\(\delta>0\)):
\(\delta\) is length of margin, which is called (1) equivalence limit in equivalence test,
(2) noninferiority margin in nonferiority test, (3) supriority margin when supriority test is of interest. The difference of the three types of tests can be shown intuitively
in the following Figure.
Mean for Historical Control (\(\mu_c\)):
true mean response for control treatment.
Mean for Treatment (\(\mu_t\)):
true mean response for experimental treatment.
Standard Deviation (\(s\)):
standard deviation calculated by
\(s=\sqrt{\frac{1}{n-1}\sum_{i=1}^{n}(x_{i}-\bar{x})^2}\), where \(\bar{x}=\frac{1}{n}\sum_{i=1}^{n}x_{i}\) and
\(x_{i}\) is observed response for the \(j\)th patient obtained from previous research or literature, \(i=1,\cdots,n\).
Type I Error Rate (\(\alpha\)):
false positive rate.
Power \((1-\beta)\):
where \(\beta\) is type II error rate (i.e., false negative rate).
2. Example
(superiority test for a one-group design)
Suppose an investigator is interested in determine whether a new treatment
superior to a historical control in terms of mean response. The mean response for the historical control is 0.3. The standard deviation of response
is approximately 0.5. The superiority margin is 0.15. The investigator believes that the new treatment has a mean response of 0.5.
How many subjects are needed to have a power of 0.8 to claim that the new treatment is superior to the historical control?
Input:
\(\delta=0.15, \mu_c=0.3, \mu_t=0.5,s=0.5, \alpha=0.05, 1-\beta=0.8\).
Output:
At the significance level of 0.05 with a superiority margin of 0.15, a sample size of
620
is required to achieve 80 % power when the mean difference between treatment and the historical control is 0.2.
1. Input
Superiority margin (\(\delta>0\)):
\(\delta\) is length of margin, which is called (1) equivalence limit in equivalence test,
(2) noninferiority margin in nonferiority test, (3) supriority margin when supriority test is of interest. The difference of the three types of tests can be shown intuitively
in the following Figure.
Difference in Mean (\(\mu_t-\mu_c\) ):
\(\mu_t\) and \(\mu_c\) are true mean response for
the treatment and control, respectively.
Standard Deviation (\(s\)):
standard deviation calculated by
\(s=\sqrt{\frac{1}{n-1}\sum_{i=1}^{n}(x_{i}-\bar{x})^2}\), where \(\bar{x}=\frac{1}{n}\sum_{i=1}^{n}x_{i}\) and
\(x_{i}\) is observed response for the \(j\)th patient obtained from previous research or literature, \(i=1,\cdots,n\).
Subject Allocation Ratio (\(k=n_t/n_c\)):
the ratio of number of subjects assigned to
treatment to the number of subjects in control
where \(n_t, n_c\) are sample size for treatment and control, respectively.
Type I Error Rate (\(\alpha\)):
false positive rate.
Power \((1-\beta)\):
where \(\beta\) is type II error rate (i.e., false negative rate).
2. Example
(superiority test for a two-group design)
Suppose an investigator is interested in determine whether a new treatment is superior to
a standard control in terms of mean response. The mean response for the control is 0.3. The standard deviation of response
is approximately 0.1. The superiority margin is 0.25. The investigator believes that the new treatment has a mean response of 0.6.
How many subjects are needed to have a power of 0.8 to determine that the new treatment is superior to the control given equal patient allocation?
Input:
\(\delta\)=0.25,
\(\mu_t-\mu_c=0.6-0.3=0.3, s=0.1, k=1, \alpha=0.05, 1-\beta=0.8\).
Output:
At the significance level of 0.05, with a superiority margin of 0.25,
51
subjects for treatment group and
51
subjects for control group are needed to achieve
80% power when the mean response difference between treatment and control is 0.3.
1. Input
Correlation Coefficient \((r)\) under Alternative Hypothesis:
the correlation coefficient expected.
Type I Error Rate \((\alpha)\):
false positive rate.
Power \((1-\beta)\):
where \(\beta\) is type II error rate (i.e., false negative rate).
2. Example
Suppose that an investigator is interested in testing if two groups are correlated in terms their outcome values. The null hypothesis is that there is no correlation between the two group while the alternative
is that the correlation is 0.4. At the significance level of 0.05, how many subjects are required to have a power of 0.8 for the test?
Input:
\(r=0.4,\alpha=0.05, 1-\beta=0.8\), assume a two-sided test.
Output:
Result based on t-test:
In a two-sided t-test, at the significance level of 0.05, a sample size of
46
is needed to achieve 80% power
when the correlation coefficient under the alternative is 0.4.
Result based on z-test:
In a two-sided z-test, at the significance level of 0.05, a sample size of
47
is needed to achieve 80% power
when the correlation coefficient under the alternative is 0.4.
1. Input
Number of Groups \((m)\):
the number of experimental groups considered.
Effect size \((f)\):
defined as
\(f=\sqrt{\frac{\sigma^2_m}{\sigma^2}}=\frac{\sigma_m}{\sigma}\). Enter this value directly or calculate it using the App. Details for \(f\) are available in
Document
.
\(\sigma^2\): the variance of the outcome values within the populations (i.e., common variance for the groups).
\(\sigma^2_m\): the variance of the \(m\) true means, calculated by \(\sum_{i=1}^{m}(\mu_i-\bar{\mu})^2/m\),
where \(\bar{\mu}=\sum_{i=1}^{m}\mu_i/m\) with \(\mu_i, i=1,\cdots, m\) is the true mean response for the \(i\)th group.
Type I Error Rate \((\alpha)\):
false positive rate.
Power \((1-\beta)\):
where \(\beta\) is type II error rate (i.e., false negative rate).
2. Example
Suppose that an investigator is interested in conducting a four-arm (\(m=4\)) parallel, double-blinded, equal-randomized clinical trial to
compare four treatments. The effect size is assumed to be 0.25. At the significance level of 0.05, how many subjects are required to have a power of 0.8 for the investigation?
Input:
\(m=4, f=0.25, \alpha=0.05, 1-\beta=0.8\).
Output:
In a one-way ANOVA test for a 4 -arm design, at the significance level of 0.05,
45
subjects per group are needed to achieve 80%
power to detect the effect size of 0.25.
1. Input
Effect size \((\Delta_d)\):
effect size defined as \(\frac{|\mu_1-\mu_2|}{\sigma_d}\), where
\(\mu_1\)
is pre-study mean;
\(\mu_2\)
is post-study mean;
\(\sigma_d\)
is the standard deviation of pre-post difference within each subject.
Type I Error Rate (\(\alpha\)):
false positive rate.
Power \((1-\beta)\):
where \(\beta\) is type II error rate (i.e., false negative rate).
2. Example
(paired t-test)
Suppose one is interested in determining the effect of an experimental treatment. Given that the pre-study mean is known as 0.3,
the post-study mean is assumed to be 0.5, and the standard deviation of the mean difference is 0.5. In a two-sided test, how many subjects are required to have a 90% power test at the
significance level of 0.01?
Input:
We know that \(\mu_1=0.3, \mu_2=0.5, \sigma_d=0.5,\alpha=0.05, 1-\beta=0.8\), and the test is two-sided.
So we can select "Calculate effect size \(\Delta_d=|\mu_1-\mu_2|/\sigma_d\) to type in the known values. Alternatively, we can also caluated the
effect size |0.3-0.5|/0.5=0.4 by hand and enter it directly.
Output:
If effect size is entered directly:
In a two-sided paired t-test, at the significance level of 0.05,
52
subjects are needed to achieve 80% power to detect the effect size of 0.4.
If effect size is calculated by the App:
In a two-sided paired t-test, at the significance level of 0.05,
52
subjects are needed to achieve 80% power to detect
the effect size of 0.4
which is calculated given the pre-study mean 0.3, post-study mean 0.5 and a standard deviation of the mean difference: 0.5.