[BioC] Bootstrapping paired samples and tiny groups

Naomi Altman naomi at stat.psu.edu
Tue Sep 14 17:45:03 CEST 2010

I really think this problem is beyond what the 
mailing list can do.  You need to chat with a statistician.

Naomi Altman

At 05:46 AM 9/9/2010, Benjamin Otto wrote:
>Hi guys,
>in principle the problem is how to compute a 
>statistic for ultra-tiny group sizes with paired samples.
>Here is the Model:
>Assumption 1:
>A data set of microarrays consists of four 
>classes describing the disease phenotype: type 
>1, type 2, type 3 and control group. Now as the 
>type 1 and type 2 phenotype of the disease is 
>extremely rare there are only two sample in 
>these two groups. The data set now consists of
>control: 8 samples
>type 1: 7 samples
>type 2: 2 samples
>type 3: 2 samples
>Assumption 2:
>We assume, that gender and age might have an 
>influence on the phenotype. Therefore samples in 
>the control groups were selected so that age and 
>gender match the samples in the other three 
>groups. Unfortunately, as the disease is so 
>rare, the age and gender of the patients in the 
>groups are not all the same. So we end up with 
>some kind of semi-paired comparisons, "paired" 
>because for each type1/2/3 sample we pick a 
>control sample defined by age and gender and 
>"semi" because it is not really the same patient the control sample come from.
>We suppose (but that IS an assumption) that 
>differences between type1/2/3 samples and 
>controls with non-matching age and gender might 
>naturally exhibit bigger (disease-unrelated) 
>variance, so the selection of the control-disease pairs is targeted.
>At the end type 1/2/3 groups shall be compared 
>with control group. As group 1 has 7 samples a 
>paired analysis is possible. The problem lies within groups 2 and 3.
>Here is a suggested analysis approach:
>As there is no real statistical test that can be 
>applied for samples with groups of size 2 it 
>would be a thought introducing a bootstrapping 
>approach where for each gene no statistic but 
>only the fold change is computed. From the set 
>of computed fold changes the location of the 
>native fold change(s) (e.g. the mean fold change 
>for the correct pairs) within the distribution 
>is used as significance statistic.
>Now here are the questions:
>1) As the samples are "paired", is it at all 
>convincing to resolve the pairings to be able to 
>perform a bootstrapping? Is such a bootstrapping 
>the correct approach for "paired" samples anyway in such a case?
>2) Should the samples of group 2/3 "only" 
>randomly be remapped to other control samples 
>than the initial ones. Or does it make more 
>sense to randomly assign the control and type 2/3 samples to the groups?
>3) If the samples were randomly assigned to the 
>groups, does always at least one 
>"disease"-sample have to remain in the type 2/3 
>group? Or would it be legit in this case to use 
>a permutation where two control samples are compared to two other permutations?
>4) Any preferable idea how to calculate a statistic here?
>Thanks and best regards,
>Benjamin Otto, PhD
>University Medical Center Hamburg-Eppendorf
>Institute For Clinical Chemistry / Central Laboratories
>Campus Forschung N27
>Martinistr. 52,
>D-20246 Hamburg
>Tel.: +49 40 7410 51908
>Fax.: +49 40 7410 54971
>Pflichtangaben gemäß Gesetz über elektronische 
>Handelsregister und Genossenschaftsregister 
>sowie das Unternehmensregister (EHUG):
>Universitätsklinikum Hamburg-Eppendorf
>Körperschaft des öffentlichen Rechts
>Gerichtsstand: Hamburg
>Prof. Dr. Jörg F. Debatin (Vorsitzender)
>Dr. Alexander Kirstein
>Joachim Prölß
>Prof. Dr. Dr. Uwe Koch-Gromus
>Bioconductor mailing list
>Bioconductor at stat.math.ethz.ch
>Search the archives: 

More information about the Bioconductor mailing list