[BioC] HTqPCR normalization issues - third posting

Thu Oct 10 15:56:47 CEST 2013

Hi Allesandro,

I believe this package is still maintained, and it is unfortunate that 
you have not received a reply. The expectation is that package 
maintainers will subscribe (and pay attention) to the Bioc listserv, 
but the list is fairly high traffic, so it never hurts to add a CC to 
the maintainer as well (which I have done for you).

Best,

Jim

On Thursday, October 10, 2013 8:35:06 AM, Alessandro Guffanti [guest] 
wrote:
>
> Dear all, this is our third posting without a real reply so we wonder if this package is actually not maintained anymore ? if yes, it would be useful for us to know...
>
>
> We are using HTqPCR to analyze a set of cards which we trasformed in this format, which is accepted by HtQPCR:
>
>   2    Run05    41    Passed    sample 41    ABCC5    Target    30
>   3    Run05    41    Passed    sample 41    ADM    Target    31.3
>   4    Run05    41    Passed    sample 41    CEBPB    Target    29.8
>   5    Run05    41    Passed    sample 41    CSF1R    Target    31.2
>   6    Run05    41    Passed    sample 41    CXCL16    Target    26.9
>   7    Run05    41    Passed    sample 41    CYC1    Target    25.7
>
>   [...]
>
>   The total number of files and groups is as follows - summarized in the file "Elenco_1.txt" which is used below:
>
>   File    Group
>   41.txt    Sano
>   39.txt    Sano
>   37.txt    Sano
>   35.txt    Sano
>   43.txt    Sano
>   34.txt    Sano
>   44.txt    Sano
>   38.txt    Sano
>   48.txt    Sano
>   40.txt    Sano
>   47.txt    Sano
>   6.txt    Non Responder DISEASE
>   26.txt    Non Responder DISEASE
>   2.txt    Non Responder DISEASE
>   69.txt    Non Responder DISEASE
>   68.txt    Non Responder DISEASE
>   5.txt    Non Responder DISEASE
>   71.txt    Responder DISEASE
>   3.txt    Responder DISEASE
>   17.txt    Responder DISEASE
>   1.txt    Responder DISEASE
>   19.txt    Responder DISEASE
>
>   The comparison is DISEASE vs non DISEASE, but what leaves us dubious is the normalization part.
>   Note that sample 41 is the *first* of the list.
>
>   Here is the code up to the dump of the normalized values matrices:
>
>   library("HTqPCR")
>   path <- ("whatever/")
>   files <- read.delim (file.path(path, "Elenco_1.txt"))
>   files
>   filelist <- as.character(files$File)
>   filelist
>   raw <- readCtData(files = filelist, path = path, n.features=46, type=7, flag=NULL, feature=6, Ct=8, header=FALSE, n.data=1)
>   featureNames (raw)
>   raw.cat <- setCategory(raw, Ct.max=36, Ct.min=9, replicates=FALSE, quantile=0.9, groups =files$Group, verbose=TRUE)
>
>   s.norm <- normalizeCtData(raw.cat, norm="scale.rank")
>   exprs(s.norm)
>   write.table(exprs(s.norm),file="Ct norm scaling.txt")
>
>   g.norm <- normalizeCtData(raw.cat, norm="geometric.mean")
>   exprs(g.norm)
>   write.table(exprs(g.norm),file="Ct norm media geometrica.txt")
>
>   Now if we look at the content of the two expression value files, it looks like that the first column
>   (corresponding to the first sample) is always unchanged, while all the others have been normalized.
>
>   In this case the first dataset is sample 41 so you can check comparing between the corresponding column
>   above and below what is happening.
>
>   We do not include here all the columns; however, you can see that all the samples *except the first (number 41)* have all their values normalized
>
>   Ct norm scaling:
>
>       41    39    37    35    43    34    44    38
>   ABCC5    30    27.37706161    26.47393365    29.7721327    31.20189573    26.39260664    26.32436019    27.54274882
>   ADM    31.3    30.36540284    28.51753555    32.31241706    34.40473934    26.29800948    29.82796209    28.60208531
>   CEBPB    29.8    28.53383886    26.65971564    27.84151659    30.06540284    27.3385782    27.36597156    26.29080569
>   CSF1R    31.2    27.66625592    28.05308057    37.18976303    36.98767773    31.0278673    34.56255924    29.75772512
>   CXCL16    26.9    27.56985782    24.15165877    30.28018957    28.82559242    25.91962085    26.89251185    26.96492891
>    Ct norm geometric
>
>       41    39    37    35    43    34    44    38
>   ABCC5    30    27.73443878    26.93934246    29.88113261    30.76352197    26.51166676    26.8989347    27.49219508
>   ADM    31.3    30.76178949    29.01887064    32.4307173    33.92136694    26.41664286    30.47900874    28.5495872
>   CEBPB    29.8    28.90631647    27.12839047    27.94344824    29.64299633    27.46190571    27.96328103    26.24254985
>   CSF1R    31.2    28.0274082    28.5462506    37.32591991    36.46801611    31.16783762    35.31694663    29.70310587
>   CXCL16    26.9    27.92975172    24.57624224    30.39104955    28.42060473    26.03654728    27.47948724    26.91543574
>
>   This looks odd - why the first sample seems to be taken as a 'reference' for both normalization methods and hence is left unchanged ?
>
>   This happens with ANY normalization procedure selected.
>
>   Another (related ?) oddity is that in the final differential analysis result the same sample ID is always reported
>   in the feature.pos field, as you can see below:
>
>       genes    feature.pos    t.test    p.value    adj.p.value
>   22    NUCB1    41    -1.998838921    0.077900837    0.251381346
>   8    ERH    41    -1.958143348    0.091329532    0.251381346
>   16    MAFB    41    -1.887142703    0.09421993    0.251381346
>   28    RNF130    41    -1.904866754    0.099644523    0.251381346
>   3    CEBPB    41    -1.853176708    0.103563968    0.251381346
>   18    MSR1    41    -1.80887129    0.10432619    0.251381346
>
>   Are we doing something wrong in the data input or subsequent elaboration here? can we actually trust these normalizations?
>
>   Many thanks in advance - kind regards
>
>   Alessandro & Elena
>
>
>
>
>   -- output of sessionInfo():
>
>
> R version 3.0.1 (2013-05-16)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
>
> locale:
> [1] LC_COLLATE=English_United States.1252
> [2] LC_CTYPE=English_United States.1252
> [3] LC_MONETARY=English_United States.1252
> [4] LC_NUMERIC=C
> [5] LC_TIME=English_United States.1252
>
> attached base packages:
> [1] parallel  stats     graphics  grDevices utils     datasets  methods
> [8] base
>
> other attached packages:
> [1] HTqPCR_1.14.0      limma_3.16.8       RColorBrewer_1.0-5 Biobase_2.20.1
> [5] BiocGenerics_0.6.0
>
> loaded via a namespace (and not attached):
> [1] affy_1.38.1           affyio_1.28.0         BiocInstaller_1.10.3
> [4] gdata_2.13.2          gplots_2.11.3         gtools_3.0.0
> [7] preprocessCore_1.22.0 stats4_3.0.1          zlibbioc_1.6.0
>
> --
> Sent via the guest posting facility at bioconductor.org.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099