# [BioC] Bootstrapping paired samples and tiny groups

Naomi Altman naomi at stat.psu.edu
Tue Sep 14 17:45:03 CEST 2010

I really think this problem is beyond what the
mailing list can do.  You need to chat with a statistician.

Naomi Altman

At 05:46 AM 9/9/2010, Benjamin Otto wrote:
>Hi guys,
>
>in principle the problem is how to compute a
>statistic for ultra-tiny group sizes with paired samples.
>
>
>Here is the Model:
>-------------------------
>
>Assumption 1:
>
>A data set of microarrays consists of four
>classes describing the disease phenotype: type
>1, type 2, type 3 and control group. Now as the
>type 1 and type 2 phenotype of the disease is
>extremely rare there are only two sample in
>these two groups. The data set now consists of
>
>control: 8 samples
>type 1: 7 samples
>type 2: 2 samples
>type 3: 2 samples
>
>Assumption 2:
>We assume, that gender and age might have an
>influence on the phenotype. Therefore samples in
>the control groups were selected so that age and
>gender match the samples in the other three
>groups. Unfortunately, as the disease is so
>rare, the age and gender of the patients in the
>groups are not all the same. So we end up with
>some kind of semi-paired comparisons, "paired"
>because for each type1/2/3 sample we pick a
>control sample defined by age and gender and
>"semi" because it is not really the same patient the control sample come from.
>
>We suppose (but that IS an assumption) that
>differences between type1/2/3 samples and
>controls with non-matching age and gender might
>naturally exhibit bigger (disease-unrelated)
>variance, so the selection of the control-disease pairs is targeted.
>
>At the end type 1/2/3 groups shall be compared
>with control group. As group 1 has 7 samples a
>paired analysis is possible. The problem lies within groups 2 and 3.
>
>
>
>Here is a suggested analysis approach:
>------------------------------------------------------
>
>As there is no real statistical test that can be
>applied for samples with groups of size 2 it
>would be a thought introducing a bootstrapping
>approach where for each gene no statistic but
>only the fold change is computed. From the set
>of computed fold changes the location of the
>native fold change(s) (e.g. the mean fold change
>for the correct pairs) within the distribution
>is used as significance statistic.
>
>
>
>Now here are the questions:
>--------------------------------------
>
>1) As the samples are "paired", is it at all
>convincing to resolve the pairings to be able to
>perform a bootstrapping? Is such a bootstrapping
>the correct approach for "paired" samples anyway in such a case?
>
>2) Should the samples of group 2/3 "only"
>randomly be remapped to other control samples
>than the initial ones. Or does it make more
>sense to randomly assign the control and type 2/3 samples to the groups?
>3) If the samples were randomly assigned to the
>groups, does always at least one
>"disease"-sample have to remain in the type 2/3
>group? Or would it be legit in this case to use
>a permutation where two control samples are compared to two other permutations?
>
>4) Any preferable idea how to calculate a statistic here?
>
>
>
>Thanks and best regards,
>
>Benjamin
>
>
>
>
>
>
>
