[BioC] HTqPCR normalization issues - third posting
James W. MacDonald
jmacdon at uw.edu
Thu Oct 10 15:56:47 CEST 2013
Hi Allesandro,
I believe this package is still maintained, and it is unfortunate that
you have not received a reply. The expectation is that package
maintainers will subscribe (and pay attention) to the Bioc listserv,
but the list is fairly high traffic, so it never hurts to add a CC to
the maintainer as well (which I have done for you).
Best,
Jim
On Thursday, October 10, 2013 8:35:06 AM, Alessandro Guffanti [guest]
wrote:
>
> Dear all, this is our third posting without a real reply so we wonder if this package is actually not maintained anymore ? if yes, it would be useful for us to know...
>
>
> We are using HTqPCR to analyze a set of cards which we trasformed in this format, which is accepted by HtQPCR:
>
> 2 Run05 41 Passed sample 41 ABCC5 Target 30
> 3 Run05 41 Passed sample 41 ADM Target 31.3
> 4 Run05 41 Passed sample 41 CEBPB Target 29.8
> 5 Run05 41 Passed sample 41 CSF1R Target 31.2
> 6 Run05 41 Passed sample 41 CXCL16 Target 26.9
> 7 Run05 41 Passed sample 41 CYC1 Target 25.7
>
> [...]
>
> The total number of files and groups is as follows - summarized in the file "Elenco_1.txt" which is used below:
>
> File Group
> 41.txt Sano
> 39.txt Sano
> 37.txt Sano
> 35.txt Sano
> 43.txt Sano
> 34.txt Sano
> 44.txt Sano
> 38.txt Sano
> 48.txt Sano
> 40.txt Sano
> 47.txt Sano
> 6.txt Non Responder DISEASE
> 26.txt Non Responder DISEASE
> 2.txt Non Responder DISEASE
> 69.txt Non Responder DISEASE
> 68.txt Non Responder DISEASE
> 5.txt Non Responder DISEASE
> 71.txt Responder DISEASE
> 3.txt Responder DISEASE
> 17.txt Responder DISEASE
> 1.txt Responder DISEASE
> 19.txt Responder DISEASE
>
> The comparison is DISEASE vs non DISEASE, but what leaves us dubious is the normalization part.
> Note that sample 41 is the *first* of the list.
>
> Here is the code up to the dump of the normalized values matrices:
>
> library("HTqPCR")
> path <- ("whatever/")
> files <- read.delim (file.path(path, "Elenco_1.txt"))
> files
> filelist <- as.character(files$File)
> filelist
> raw <- readCtData(files = filelist, path = path, n.features=46, type=7, flag=NULL, feature=6, Ct=8, header=FALSE, n.data=1)
> featureNames (raw)
> raw.cat <- setCategory(raw, Ct.max=36, Ct.min=9, replicates=FALSE, quantile=0.9, groups =files$Group, verbose=TRUE)
>
> s.norm <- normalizeCtData(raw.cat, norm="scale.rank")
> exprs(s.norm)
> write.table(exprs(s.norm),file="Ct norm scaling.txt")
>
> g.norm <- normalizeCtData(raw.cat, norm="geometric.mean")
> exprs(g.norm)
> write.table(exprs(g.norm),file="Ct norm media geometrica.txt")
>
> Now if we look at the content of the two expression value files, it looks like that the first column
> (corresponding to the first sample) is always unchanged, while all the others have been normalized.
>
> In this case the first dataset is sample 41 so you can check comparing between the corresponding column
> above and below what is happening.
>
> We do not include here all the columns; however, you can see that all the samples *except the first (number 41)* have all their values normalized
>
> Ct norm scaling:
>
> 41 39 37 35 43 34 44 38
> ABCC5 30 27.37706161 26.47393365 29.7721327 31.20189573 26.39260664 26.32436019 27.54274882
> ADM 31.3 30.36540284 28.51753555 32.31241706 34.40473934 26.29800948 29.82796209 28.60208531
> CEBPB 29.8 28.53383886 26.65971564 27.84151659 30.06540284 27.3385782 27.36597156 26.29080569
> CSF1R 31.2 27.66625592 28.05308057 37.18976303 36.98767773 31.0278673 34.56255924 29.75772512
> CXCL16 26.9 27.56985782 24.15165877 30.28018957 28.82559242 25.91962085 26.89251185 26.96492891
> Ct norm geometric
>
> 41 39 37 35 43 34 44 38
> ABCC5 30 27.73443878 26.93934246 29.88113261 30.76352197 26.51166676 26.8989347 27.49219508
> ADM 31.3 30.76178949 29.01887064 32.4307173 33.92136694 26.41664286 30.47900874 28.5495872
> CEBPB 29.8 28.90631647 27.12839047 27.94344824 29.64299633 27.46190571 27.96328103 26.24254985
> CSF1R 31.2 28.0274082 28.5462506 37.32591991 36.46801611 31.16783762 35.31694663 29.70310587
> CXCL16 26.9 27.92975172 24.57624224 30.39104955 28.42060473 26.03654728 27.47948724 26.91543574
>
> This looks odd - why the first sample seems to be taken as a 'reference' for both normalization methods and hence is left unchanged ?
>
> This happens with ANY normalization procedure selected.
>
> Another (related ?) oddity is that in the final differential analysis result the same sample ID is always reported
> in the feature.pos field, as you can see below:
>
> genes feature.pos t.test p.value adj.p.value
> 22 NUCB1 41 -1.998838921 0.077900837 0.251381346
> 8 ERH 41 -1.958143348 0.091329532 0.251381346
> 16 MAFB 41 -1.887142703 0.09421993 0.251381346
> 28 RNF130 41 -1.904866754 0.099644523 0.251381346
> 3 CEBPB 41 -1.853176708 0.103563968 0.251381346
> 18 MSR1 41 -1.80887129 0.10432619 0.251381346
>
> Are we doing something wrong in the data input or subsequent elaboration here? can we actually trust these normalizations?
>
> Many thanks in advance - kind regards
>
> Alessandro & Elena
>
>
>
>
> -- output of sessionInfo():
>
>
> R version 3.0.1 (2013-05-16)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
>
> locale:
> [1] LC_COLLATE=English_United States.1252
> [2] LC_CTYPE=English_United States.1252
> [3] LC_MONETARY=English_United States.1252
> [4] LC_NUMERIC=C
> [5] LC_TIME=English_United States.1252
>
> attached base packages:
> [1] parallel stats graphics grDevices utils datasets methods
> [8] base
>
> other attached packages:
> [1] HTqPCR_1.14.0 limma_3.16.8 RColorBrewer_1.0-5 Biobase_2.20.1
> [5] BiocGenerics_0.6.0
>
> loaded via a namespace (and not attached):
> [1] affy_1.38.1 affyio_1.28.0 BiocInstaller_1.10.3
> [4] gdata_2.13.2 gplots_2.11.3 gtools_3.0.0
> [7] preprocessCore_1.22.0 stats4_3.0.1 zlibbioc_1.6.0
>
> --
> Sent via the guest posting facility at bioconductor.org.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
More information about the Bioconductor
mailing list