[BioC] HTqPCR normalization issue ?

Sun Aug 25 00:26:15 CEST 2013

CC'ing package maintainers.
Dan

On Sat, Aug 24, 2013 at 1:37 PM, Alessandro Guffanti [guest]
<guest at bioconductor.org> wrote:
>
> Dear all (and we thing especially Heidi):
>
> we are using HTqPCR to analyze a set of cards which we casted on this format from the original one (being the only one being accepted in input):
>
> 2    Run05    41    Passed    sample 41    ABCC5    Target    30
> 3    Run05    41    Passed    sample 41    ADM    Target    31.3
> 4    Run05    41    Passed    sample 41    CEBPB    Target    29.8
> 5    Run05    41    Passed    sample 41    CSF1R    Target    31.2
> 6    Run05    41    Passed    sample 41    CXCL16    Target    26.9
> 7    Run05    41    Passed    sample 41    CYC1    Target    25.7
> 8    Run05    41    Passed    sample 41    DYNLT1    Target    25.8
> 9    Run05    41    Passed    sample 41    EREG    Target    35.6
> 10    Run05    41    Passed    sample 41    ERH    Target    25.9
> 11    Run05    41    Passed    sample 41    FGD4    Target    40
> 12    Run05    41    Passed    sample 41    GPX1    Target    20.4
> [...]
>
> The total number of files and groups is as follows (this is the file "Elenco_1.txt" which is used below):
>
> File    Group
> 41.txt    Sano
> 39.txt    Sano
> 37.txt    Sano
> 35.txt    Sano
> 43.txt    Sano
> 34.txt    Sano
> 44.txt    Sano
> 38.txt    Sano
> 48.txt    Sano
> 40.txt    Sano
> 47.txt    Sano
> 6.txt    Non Responder DISEASE
> 26.txt    Non Responder DISEASE
> 2.txt    Non Responder DISEASE
> 69.txt    Non Responder DISEASE
> 68.txt    Non Responder DISEASE
> 5.txt    Non Responder DISEASE
> 71.txt    Responder DISEASE
> 3.txt    Responder DISEASE
> 17.txt    Responder DISEASE
> 1.txt    Responder DISEASE
> 19.txt    Responder DISEASE
>
> The comparison is DISEASE vs non DISEASE, but what leaves us dubious is the normalization part.
>
> Here is the code up to the dump of the normalized values matrices:
>
> library("HTqPCR")
> path <- ("C:/Users/BRINIEL/Desktop/new_analisi_card1/analisiAeB/")
> files <- read.delim (file.path(path, "Elenco_1.txt"))
> files
> filelist <- as.character(files$File)
> filelist
> raw <- readCtData(files = filelist, path = path, n.features=46, type=7, flag=NULL, feature=6, Ct=8, header=FALSE, n.data=1)
> featureNames (raw)
> raw.cat <- setCategory(raw, Ct.max=36, Ct.min=9, replicates=FALSE, quantile=0.9, groups =files$Group, verbose=TRUE)
>
> s.norm <- normalizeCtData(raw.cat, norm="scale.rank")
> exprs(s.norm)
> write.table(exprs(s.norm),file="Ct norm scaling.txt")
>
> g.norm <- normalizeCtData(raw.cat, norm="geometric.mean")
> exprs(g.norm)
> write.table(exprs(g.norm),file="Ct norm media geometrica.txt")
>
> Now if we look at the content of the two expression value files, it looks like that the first column (corresponding to the first sample) is always unchanged, while all the others have been normalized.
>
> In this case the first dataset is sample 41 so you can check comparing between the lines above and below what is happening.
>
> We do not include here all the columns but all the samples except the first have all their values 'normalized'
>
> Ct norm scaling:
>
>     41    39    37    35    43    34    44    38
> ABCC5    30    27.37706161    26.47393365    29.7721327    31.20189573    26.39260664    26.32436019    27.54274882
> ADM    31.3    30.36540284    28.51753555    32.31241706    34.40473934    26.29800948    29.82796209    28.60208531
> CEBPB    29.8    28.53383886    26.65971564    27.84151659    30.06540284    27.3385782    27.36597156    26.29080569
> CSF1R    31.2    27.66625592    28.05308057    37.18976303    36.98767773    31.0278673    34.56255924    29.75772512
> CXCL16    26.9    27.56985782    24.15165877    30.28018957    28.82559242    25.91962085    26.89251185    26.96492891
> CYC1    25.7    23.52113744    22.01516588    26.92701422    27.27582938    22.89251185    22.53668246    23.88322275
> DYNLT1    25.8    23.71393365    21.17914692    25.8092891    26.03601896    22.89251185    22.63137441    23.01649289
> EREG    35.6    31.32938389    30.18957346    35.66559242    37.29763033    29.79810427    32.76341232    30.33554502
>
>
> Ct norm geometric
>
>     41    39    37    35    43    34    44    38
> ABCC5    30    27.73443878    26.93934246    29.88113261    30.76352197    26.51166676    26.8989347    27.49219508
> ADM    31.3    30.76178949    29.01887064    32.4307173    33.92136694    26.41664286    30.47900874    28.5495872
> CEBPB    29.8    28.90631647    27.12839047    27.94344824    29.64299633    27.46190571    27.96328103    26.24254985
> CSF1R    31.2    28.0274082    28.5462506    37.32591991    36.46801611    31.16783762    35.31694663    29.70310587
> CXCL16    26.9    27.92975172    24.57624224    30.39104955    28.42060473    26.03654728    27.47948724    26.91543574
> CYC1    25.7    23.82817979    22.40219004    27.02559775    26.89261523    22.99578263    23.02858438    23.83938594
> DYNLT1    25.8    24.02349274    21.55147396    25.90378049    25.67022363    22.99578263    23.12534314    22.97424694
> EREG    35.6    31.73835423    30.7203028    35.7961691    36.77361401    29.93252698    33.47853023    30.27986521
>
> This looks a bit odd - why the first sample seems to be taken as a 'reference' for both normalization methods and hence is left unchanged ?
>
> Another (related ?) oddity is that in the final differential analysis result the same sample ID is always reported in the feature.pos field, as you can see below:
>
>     genes    feature.pos    t.test    p.value    adj.p.value
> 22    NUCB1    41    -1.998838921    0.077900837    0.251381346
> 8    ERH    41    -1.958143348    0.091329532    0.251381346
> 16    MAFB    41    -1.887142703    0.09421993    0.251381346
> 28    RNF130    41    -1.904866754    0.099644523    0.251381346
> 3    CEBPB    41    -1.853176708    0.103563968    0.251381346
> 18    MSR1    41    -1.80887129    0.10432619    0.251381346
>
> Are we doing something wrong in the data input or subsequent elaboration here? can we actually trust these normalizations?
>
> Many thanks in advance - kind regards
>
> Alessandro & Elena
>
>
>  -- output of sessionInfo():
>
> R version 3.0.1 (2013-05-16)
>
> Platform: x86_64-w64-mingw32/x64 (64-bit)
>
>
>
> locale:
>
> [1] LC_COLLATE=Italian_Italy.1252  LC_CTYPE=Italian_Italy.1252
>
> [3] LC_MONETARY=Italian_Italy.1252 LC_NUMERIC=C
>
> [5] LC_TIME=Italian_Italy.1252
>
>
>
> attached base packages:
>
> [1] parallel  stats     graphics  grDevices utils     datasets  methods
>
> [8] base
>
>
>
> other attached packages:
>
> [1] HTqPCR_1.14.0      limma_3.16.5       RColorBrewer_1.0-5 Biobase_2.20.0
>
> [5] BiocGenerics_0.6.0
>
>
>
> loaded via a namespace (and not attached):
>
> [1] affy_1.38.1           affyio_1.28.0         BiocInstaller_1.10.2
>
> [4] gdata_2.12.0.2        gplots_2.11.0.1       gtools_2.7.1
>
> [7] preprocessCore_1.22.0 stats4_3.0.1          zlibbioc_1.6.0
>
>
>
>
>
> --
> Sent via the guest posting facility at bioconductor.org.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor