[BioC] questions concerning quantile normalization in HTqPCR

Fri Sep 17 19:29:46 CEST 2010

Hello Andreia,

sorry for the late reply, this email slipped under my radar.
>
> I am analyzing data of qPCR plates in which each well correponds to a
> miRNA.
> The system does not have replicates within the same plate.
> the data consistes in 3 types of cells: I have 3 replicates for cell type
> one; 2 replicates for cell type 2 and only one sampled cell type 3.
> The data of each cell type and each replicate is in a separate file where
> the data has 5 columns
> miRNA ID  Target or Endogenous  Position on plate  Passed or Failed Ct
> value
> miR-18a Target A1 Passed 26.99  this means that I have 6 files:
> File    Treatment
> sample1.txt    cell type1
> sample2.txt    cell type1
> sample3.txt    cell type1
> sample1.txt    cell type2
> sample2.txt    cell type2
> sample1.txt    cell type3
>
>
>
> For the miRNA which did not have expression in a biological replicate, I
> have replaced NA by 0.
>
> question 1 -So after normalizing using quantile, several miRNA which did
> not
> have expression in sample 1 cell type 3 the value were normalized to
> 23.1191666666667. I notice that this happened only for this cell type, for
> which I do not have replicates. Can someone explain me why does this
> happen?
>
This is probably all the values that were replaced with 0, no? This is an
artefact of the quantile normalisation, which will force all the
distributions of Ct values to be identical - whether this is appropriate
or not! Values within each sample are sorted according to Ct value, the
average Ct value for rank 1, 2, ... n are calculated across samples, and
that value is assigned to the genes in that order. So if for example 20
out of 96 genes are 0 in one sample, these will be ranked 77-96 (ties are
treated equally) and are assigned the mean value of genes ranked 77-96 in
all samples. In your case that mean value is apparently 23.12.

Quantile normalisation is in general quite a harsh normalisation method,
and shouldn't be used if you have e.g. many missing values, or if you
expect a genuine, i.e. biologically real, higher or lower gene expression
level in some samples.

> question 2- the interpretation of values of Ct=0 and Ct>35. Shouldn't be
> the
> same, that there is no expression, or levels of expression under capacity
> of
> detection?
>
Ct > 35 is generally taken to mean expression under capacity of detection,
whereas Ct=0 in the output of many qPCR instruments is the same as NA,
i.e. a value that is completely missing for any number of reasons, not
necessarily because of low gene expression level.

> question 3- I notice that miRNAs which are flagged by failed are being
> included on the analysis. Shouldn't these miRNAs be excluded? I thought
> that
> these would be, and I could find a filter for miRNA classified as
> undetermined but not for miRNAs flagged as failed.
>
They're not excluded per default, since the user might not necessarily
agree with the parameters set by the qPCR instrument (or HTqPCR functions)
for failed Ct measurements. Removing them does make sense in some cases
though. Does it work to say something like:

q.filt <- filterCtData(q.norm, remove.category="failed")

You might have to run filterCtData twice. Once to remove
category="Undetermined", and a second time to remove category="failed".
Depends on how many types of classes you have in featureCategory(q.norm).
Per default I think HTqPCR just expect "OK", "Undetermined" and
"Unreliable".

HTH
\Heidi

> thanks in advance for the help. Kind regards,
> Andreia
>
> the code that I have used is bellow:
>
> library(HTqPCR)
> source("http://www.bioconductor.org/biocLite.R")
> biocLite("statmod")
> library(statmod)
> setwd("G:/margarida_miRNA")
> path<-("G:/margarida_miRNA")
> files <- read.delim(file.path(path, "Data.txt"))
> raw <- readCtData(files=files$File, path=path, n.features = 96, flag = 4,
> feature = 1, type = 2, position = 3, Ct = 5, header = FALSE, SDS = FALSE)
> pdf("plotCtOverview_raw_data.pdf")
> plotCtOverview(raw,genes=g, xlim=c(0,50), groups=files$Treatment, conf.int
> =TRUE)
> dev.off()
> raw.cat<-setCategory(raw, groups=files$Treatment, Ct.max=38, Ct.min=8,
> flag=TRUE, flag.out="Failed",verbose=TRUE, quantile=0.8)
> pdf("plot_Category.pdf")
> plotCtCategory(raw.cat)
> plotCtCategory(raw.cat, by.feature=TRUE, cexRow=0.1)
> dev.off()
> q.norm<-normalizeCtData(raw.cat,norm="quantile")
> library(HTqPCR)
> qFilt<-filterCtData(q.norm, remove.type="Endogenous")
> iqr.values<-apply(exprs(q.norm),1,IQR)
> iqr.filt<-filterCtData(q.norm, remove.IQR=1.5,
> remove.category="Undetermined")
> pdf("severalgraphs.pdf")
> plotCtCor(q.norm, main="Ct correlation")
> plotCtDensity(q.norm)
> plotCtDensity(iqr.filt)
> plotCtBoxes(q.norm)
> plotCtBoxes(iqr.filt)
> plotCtScatter(q.norm, cards=c(1,2), col="type", diag=TRUE)
> plotCtScatter(q.norm, cards=c(1,3), col="type", diag=TRUE)
> plotCtScatter(q.norm, cards=c(2,3), col="type", diag=TRUE)
> plotCtScatter(q.norm, cards=c(4,5), col="type", diag=TRUE)
> plotCtScatter(q.norm, cards=c(1,6), col="type", diag=TRUE)
> plotCtScatter(q.norm, cards=c(2,6), col="type", diag=TRUE)
> plotCtScatter(q.norm, cards=c(3,6), col="type", diag=TRUE)
> plotCtScatter(q.norm, cards=c(4,6), col="type", diag=TRUE)
> plotCtScatter(q.norm, cards=c(5,6), col="type", diag=TRUE)
> plotCtPairs(q.norm, col="type", diag=TRUE)
> dev.off()
>
>
>
> design<-model.matrix(~0+files$Treatment)
>
> colnames(design)<-c("CT1","CT2","CT3")
>
>
>
>
> contrasts<-makeContrasts(CT1-CT2, levels=design)
> qDE.limma<-limmaCtData(iqr.filt, design=design, contrasts=contrasts,
> ndups=1, spacing=1)
> qDE.limma
> dim(qDE.limma)
> names(qDE.limma)
> pdf("testingandheatmap.pdf")
> plotCtRQ(qDE.limma, p.val=0.05, transform="log10", col="#9E0142")
> g<-featureNames(iqr.filt)
> plotCtHeatmap(iqr.filt, gene.names=g, dist="euclidean", cexRow=0.4)
> cluster.list<-clusterCt(iqr.filt, type="genes", dist="euclidean",
> n.cluster=8, hang=-1,cex=0.5)
> dev.off()
>
>  sessionInfo()
> R version 2.11.1 (2010-05-31)
> i386-pc-mingw32
>
> locale:
> [1] LC_COLLATE=Portuguese_Portugal.1252  LC_CTYPE=Portuguese_Portugal.1252
>
> [3] LC_MONETARY=Portuguese_Portugal.1252 LC_NUMERIC=C
>
> [5] LC_TIME=Portuguese_Portugal.1252
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] statmod_1.4.6      HTqPCR_1.2.0       limma_3.4.4
> RColorBrewer_1.0-2
> [5] Biobase_2.8.0
>
> loaded via a namespace (and not attached):
> [1] affy_1.26.1           affyio_1.16.0         gdata_2.8.0
> [4] gplots_2.8.0          gtools_2.6.2          preprocessCore_1.10.0
> [7] tools_2.11.1
>
>
>
>
> --
> --------------------------------------------
> Andreia J. Amaral
> Unidade de Imunologia Clínica
> Instituto de Medicina Molecular
> Universidade de Lisboa
> email: andreiaamaral at fm.ul.pt
>           andreia.fonseca at gmail.com
>