[R] Sample size for proportion, not binomial

Tue Mar 23 22:49:19 CET 2010

On Mar 23, 2010, at 11:05 AM, Prew, Paul wrote:

> Hello,  I am looking for a sample size function for samples sizes, to test proportions that are not binomial proportions.  The proportions represent a ratio of (final measure) / (baseline measure) on the same experimental unit.  Searches using RSeek and such bring multiple hits for binomial proportions, but that doesn't seem to fit my situation.  Perhaps there's some standard terminology from a different field that would provide better hits than deeming this a 'rate' or a 'proportion'.
> 
> Of course, most sample size functions assume a normal distribution, while this data will be bounded between 0 and 1.  The scientist I'm working with feels it's important to make fair comparisons, any weight loss must account for the baseline weight.  A logistic transformation seems appropriate, but that term also didn't yield hits I recognized as useful.
> 
> Loss of weight --- compare treatments:
> Treatment A:  1 - Final weight / Initial weight
> Treatment B:  1 - Final weight / Initial weight
> 
> This appears to be a situation that would be common, but I'm not framing it in a way that matches an R package.  Any guidance is appreciated.
> 
> Regards, Paul

If you and the scientist are in a position of being open to better options of analyzing "change from baseline" data, I would recommend that you both read the following two papers:

Statistics notes: analysing controlled trials with baseline and follow up measurements. 
Vickers AJ, Altman DG.
BMJ 2001;323:1123–4.
http://www.bmj.com/cgi/content/full/323/7321/1123
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1121605/pdf/1123.pdf

The use of percentage change from baseline as an outcome in a controlled trial is statistically inefficient: a simulation study. 
Vickers AJ.
BMC Med Res Methodol 2001;1:6.
http://www.biomedcentral.com/1471-2288/1/6
http://www.biomedcentral.com/content/pdf/1471-2288-1-6.pdf

and review an additional web site:

  http://biostat.mc.vanderbilt.edu/wiki/Main/MeasureChange

Once you are hopefully in a position of adopting a regression based approach (eg. FinalWeight ~ BaseWeight + Treatment), there are various options for calculating sample sizes.  The key advantage of this approach is that you get the baseline adjusted between-group comparison (the regression beta coefficient and confidence intervals for Treatment) which is the key outcome of interest in comparing treatments in a parallel design.

The easiest, albeit conservative approach for sample size, is to use power.t.test() on your assumptions of the inter-group delta for actual weight change (not percent change), the std dev for actual change, desired power and target alpha. 

I am not aware off-hand of any power/sample size functions in R for regular linear regression, though they may exist. There are third party programs that do provide that functionality. 

If you are willing to code and experiment a bit, you could construct a monte carlo simulation with a linear model, using data generated with rnorm() based upon reasonable assumptions about the distribution of your data in each group for the baseline and final values.

Once you get your actual data collected and ready for analysis, you will also need to test for a baseline*treatment interaction (FinalWeight ~ BaseWeight * Treatment), which can make the interpretation of treatment effects more complicated, since the treatment effect will be conditional upon the baseline weight, rather than being able to report a mean treatment effect.

HTH,

Marc Schwartz