[BioC] ComBat: Could it utilize technical replicates?

Essi Laajala essi.laajala at gmail.com
Mon Aug 12 12:29:11 CEST 2013


Dear Evan,

Thank you for your message! That's a good plan but don't you think it
should lead to the singularity error? At least that's what it does to me.
I've been using the old ComBat script and now I tried the Bioconductor
version as well. I've attached the real sample_info file. (Sorry it's a bit
complicated: there are actually 4 batches and batch 4 is the one with the
re-hybridizations. I have 113 samples but 15 are re-hybridized so
altogether 128 arrays. The "high risk" group has the label E, "medium risk"
is T and "low risk" is P.) Here's what I did with the Bioconductor ComBat:

> library(sva)
> b <- sample_info[,"Batch"]
> mm <- model.matrix(~as.factor(Covariate1), data=sample_info)
> data_combat <- ComBat(exprs_data, b, mm)
Found 4 batches
Found 112  categorical covariate(s)
Standardizing Data across genes
Error in solve.default(t(design) %*% design) :
  Lapack routine dgesv: system is exactly singular: U[51,51] = 0

Best regards,

Essi




On Sun, Aug 11, 2013 at 2:57 AM, Johnson, William Evan <wej at bu.edu> wrote:

> Hi Essi,
>
> Yes, ComBat can definitely utilize this information. Just replace your
> current 'Covariate 1' with a covariate that just has the sample letter
> (e.g. A, B, C, C, D, D, E, ... ). Note that this will be sufficient because
> your 'Covariate 1' is nested within sample letter. Under this setup, ComBat
> will preserve all variation due to sample type (and as a result risk level)
> and effectively just use the repeated samples to adjust for batch.
>
> Hope this helps. Thanks!
>
> Evan
>
>
> On Aug 9, 2013, at 8:23 AM, Essi Laajala wrote:
>
> > Hi,
> >
> > I'm dealing with quite an unusual study design. Originally (due to
> unfortunate and inevitable circumstances) we had all "high_risk" and
> "medium_risk" samples on batch 1 and "low_risk" samples on batch 2. Then we
> discussed the batch effect and decided to re-hybridize some randomly
> selected samples from each risk group on batch 3. The resulting study
> design looks a bit like like this (in reality we have 30 - 45 samples in
> each group and 16 samples re-hybridized but you'll get an idea):
> >
> > Array  name        Batch        Covariate  1
> > sample_A        1        High_risk
> > sample_B        1        High_risk
> > sample_C        1        High_risk
> > sample_C_2     3        High_risk
> > sample_D        1        High_risk
> > sample_D_2    3        High_risk
> > sample_E        1        Medium_risk
> > sample_F        1        Medium_risk
> > sample_G        1        Medium_risk
> > sample_G_2     3        Medium_risk
> > sample_H        2        Low_risk
> > sample_I        2        Low_risk
> > sample_J        2        Low_risk
> > sample_J_2     3        Low_risk
> > sample_K        2        Low_risk
> > sample_K_2     3        Low_risk
> >
> > For example Sample_C and Sample_C_2 are the same RNA sample and the only
> difference between them is the batch (the same applies to D and D_2 etc.).
> Such array pairs should be valuable for estimating batch effects. The
> question is: Can ComBat utilize this information? Or can you recommend some
> other batch correction method that could? For now, I've applied ComBat
> after removing the replicated samples on batches 1 and 2 (C, D, G, J and K
> in the above example) but this is certainly not an optimal solution.
> >
> > Best regards,
> >
> > Essi Laajala
> > PhD student in bioinformatics
> > Turku, Finland
> >
>
>
-------------- next part --------------
"Batch"	"Covariate1"
"E0107_53_E05.CEL"	1	"E0107"
"E0112_54_E06.CEL"	1	"E0112"
"E0116_55_E07.CEL"	1	"E0116"
"E022_13_B01.CEL"	1	"E022"
"E024_14_B02.CEL"	1	"E024"
"E025_15_B03.CEL"	1	"E025"
"E026_16_B04.CEL"	1	"E026"
"E027_17_B05.CEL"	1	"E027"
"E029_130023_E07.CEL"	4	"E029"
"E029_18_B06.CEL"	1	"E029"
"E031_19_B07.CEL"	1	"E031"
"E033_20_B08.CEL"	1	"E033"
"E036_21_B09.CEL"	1	"E036"
"E044_22_B10.CEL"	1	"E044"
"E048_23_B11.CEL"	1	"E048"
"E049_24_B12.CEL"	1	"E049"
"E050_25_C01.CEL"	1	"E050"
"E051_26_C02.CEL"	1	"E051"
"E052_27_C03.CEL"	1	"E052"
"E054_28_C04.CEL"	1	"E054"
"E057_30_C06.CEL"	1	"E057"
"E060_31_C07.CEL"	1	"E060"
"E061_32_C08.CEL"	1	"E061"
"E063_33_C09.CEL"	1	"E063"
"E066_34_C10.CEL"	1	"E066"
"E067_130023_F07.CEL"	4	"E067"
"E067_35_C11.CEL"	1	"E067"
"E068_36_C12.CEL"	1	"E068"
"E069_37_D01.CEL"	1	"E069"
"E070_38_D02.CEL"	1	"E070"
"E071_39_D03.CEL"	1	"E071"
"E074_40_D04.CEL"	1	"E074"
"E082_41_D05.CEL"	1	"E082"
"E083_42_D06.CEL"	1	"E083"
"E086_130023_G07.CEL"	4	"E086"
"E086_43_D07.CEL"	1	"E086"
"E088_44_D08.CEL"	1	"E088"
"E091_45_D09.CEL"	1	"E091"
"E093_46_D10.CEL"	1	"E093"
"E096_47_D11.CEL"	1	"E096"
"E098_48_D12.CEL"	1	"E098"
"E102_49_E01.CEL"	1	"E102"
"E104_130023_H07.CEL"	4	"E104"
"E104_50_E02.CEL"	1	"E104"
"E105_51_E03.CEL"	1	"E105"
"E106_52_E04.CEL"	1	"E106"
"E118_56_E08.CEL"	1	"E118"
"E120_57_E09.CEL"	1	"E120"
"E121_58_110049_E09.CEL"	2	"E121"
"E125_59_110049_F09.CEL"	2	"E125"
"E128_60_110049_G09.CEL"	2	"E128"
"E133_61_110049_H09.CEL"	2	"E133"
"P001_130003_A05.CEL"	3	"P001"
"P005_130003_B05.CEL"	3	"P005"
"P009_130003_C05.CEL"	3	"P009"
"P014_130003_D05.CEL"	3	"P014"
"P017_130003_E05.CEL"	3	"P017"
"P017_130023_A05.CEL"	4	"P017"
"P020_130003_G05.CEL"	3	"P020"
"P021_130003_H05.CEL"	3	"P021"
"P024_130003_A07.CEL"	3	"P024"
"P025_130003_B07.CEL"	3	"P025"
"P025_130023_C05.CEL"	4	"P025"
"P026_130003_C07.CEL"	3	"P026"
"P027_130003_D07.CEL"	3	"P027"
"P028_130003_E07.CEL"	3	"P028"
"P030_130003_F07.CEL"	3	"P030"
"P031_130003_G07.CEL"	3	"P031"
"P033_130003_H07.CEL"	3	"P033"
"P033_130023_D05.CEL"	4	"P033"
"P035_130003_A09.CEL"	3	"P035"
"P036_130003_B09.CEL"	3	"P036"
"P039_130003_C09.CEL"	3	"P039"
"P041_130003_D09.CEL"	3	"P041"
"P041_130023_E05.CEL"	4	"P041"
"P042_130003_E09.CEL"	3	"P042"
"P042_130023_F05.CEL"	4	"P042"
"P044_130003_F09.CEL"	3	"P044"
"P045_130003_G09.CEL"	3	"P045"
"P046_130003_H09.CEL"	3	"P046"
"P047_130003_A05.CEL"	3	"P047"
"P048_130003_B05.CEL"	3	"P048"
"P052_130003_C05.CEL"	3	"P052"
"P054_130003_D05.CEL"	3	"P054"
"P055_130003_E05.CEL"	3	"P055"
"P056_130003_F05.CEL"	3	"P056"
"P061_130003_G05.CEL"	3	"P061"
"P063_130003_H05.CEL"	3	"P063"
"P066_130003_A07.CEL"	3	"P066"
"P066_130023_G05.CEL"	4	"P066"
"P067_130003_B07.CEL"	3	"P067"
"P070_130003_C07.CEL"	3	"P070"
"P072_130003_D07.CEL"	3	"P072"
"P073_130003_E07.CEL"	3	"P073"
"P074_130003_F07.CEL"	3	"P074"
"P075_130003_G07.CEL"	3	"P075"
"P077_130003_H07.CEL"	3	"P077"
"P082_130003_A09.CEL"	3	"P082"
"P082_130023_H05.CEL"	4	"P082"
"T021_130023_A07.CEL"	4	"T021"
"T021_68_F04.CEL"	1	"T021"
"T032_71_F07.CEL"	1	"T032"
"T038_72_F08.CEL"	1	"T038"
"T056_73_F09.CEL"	1	"T056"
"T059_74_F10.CEL"	1	"T059"
"T062_75_F11.CEL"	1	"T062"
"T063_76_F12.CEL"	1	"T063"
"T064_77_G01.CEL"	1	"T064"
"T065_78_G02.CEL"	1	"T065"
"T066_79_G03.CEL"	1	"T066"
"T069_80_G04.CEL"	1	"T069"
"T070_81_G05.CEL"	1	"T070"
"T071_82_G06.CEL"	1	"T071"
"T073_83_G07.CEL"	1	"T073"
"T076_84_G08.CEL"	1	"T076"
"T077_85_G09.CEL"	1	"T077"
"T078_130023_B07.CEL"	4	"T078"
"T078_86_G10.CEL"	1	"T078"
"T093_110049_H09.CEL"	1	"T093"
"T094_90_H02.CEL"	1	"T094"
"T095_130023_C07.CEL"	4	"T095"
"T095_91_H03.CEL"	1	"T095"
"T099_93_H05.CEL"	1	"T099"
"T103_96_H08.CEL"	1	"T103"
"T106_98_H10.CEL"	1	"T106"
"T109_130023_D07.CEL"	4	"T109"
"T109_99_H11.CEL"	1	"T109"
"T111_100_H12.CEL"	1	"T111"


More information about the Bioconductor mailing list