# [BioC] Bootstrapping paired samples and tiny groups

Benjamin Otto b.otto at uke.uni-hamburg.de
Thu Sep 9 11:46:15 CEST 2010

```Hi guys,

in principle the problem is how to compute a statistic for ultra-tiny group sizes with paired samples.

Here is the Model:
-------------------------

Assumption 1:

A data set of microarrays consists of four classes describing the disease phenotype: type 1, type 2, type 3 and control group. Now as the type 1 and type 2 phenotype of the disease is extremely rare there are only two sample in these two groups. The data set now consists of

control: 8 samples
type 1: 7 samples
type 2: 2 samples
type 3: 2 samples

Assumption 2:
We assume, that gender and age might have an influence on the phenotype. Therefore samples in the control groups were selected so that age and gender match the samples in the other three groups. Unfortunately, as the disease is so rare, the age and gender of the patients in the groups are not all the same. So we end up with some kind of semi-paired comparisons, "paired" because for each type1/2/3 sample we pick a control sample defined by age and gender and "semi" because it is not really the same patient the control sample come from.

We suppose (but that IS an assumption) that differences between type1/2/3 samples and controls with non-matching age and gender might naturally exhibit bigger (disease-unrelated) variance, so the selection of the control-disease pairs is targeted.

At the end type 1/2/3 groups shall be compared with control group. As group 1 has 7 samples a paired analysis is possible. The problem lies within groups 2 and 3.

Here is a suggested analysis approach:
------------------------------------------------------

As there is no real statistical test that can be applied for samples with groups of size 2 it would be a thought introducing a bootstrapping approach where for each gene no statistic but only the fold change is computed. From the set of computed fold changes the location of the native fold change(s) (e.g. the mean fold change for the correct pairs) within the distribution is used as significance statistic.

Now here are the questions:
--------------------------------------

1) As the samples are "paired", is it at all convincing to resolve the pairings to be able to perform a bootstrapping? Is such a bootstrapping the correct approach for "paired" samples anyway in such a case?

2) Should the samples of group 2/3 "only" randomly be remapped to other control samples than the initial ones. Or does it make more sense to randomly assign the control and type 2/3 samples to the groups?
3) If the samples were randomly assigned to the groups, does always at least one "disease"-sample have to remain in the type 2/3 group? Or would it be legit in this case to use a permutation where two control samples are compared to two other permutations?

4) Any preferable idea how to calculate a statistic here?

Thanks and best regards,

Benjamin

___________________________________________
Benjamin Otto, PhD
University Medical Center Hamburg-Eppendorf
Institute For Clinical Chemistry / Central Laboratories
Campus Forschung N27
Martinistr. 52,
D-20246 Hamburg

Tel.: +49 40 7410 51908
Fax.: +49 40 7410 54971
___________________________________________

--
Pflichtangaben gemäß Gesetz über elektronische Handelsregister und Genossenschaftsregister sowie das Unternehmensregister (EHUG):

Universitätsklinikum Hamburg-Eppendorf
Körperschaft des öffentlichen Rechts
Gerichtsstand: Hamburg

Vorstandsmitglieder:
Prof. Dr. Jörg F. Debatin (Vorsitzender)
Dr. Alexander Kirstein
Joachim Prölß
Prof. Dr. Dr. Uwe Koch-Gromus

```