[BioC] warning with lmFit command in limma

Tue Aug 5 23:31:13 CEST 2014

Dear List & Gordon,

Firstly thanks to you and your team for the wonderful limma package and its well dcoumented manual. 
I have used it earlier for analysing both one colored and two colored data. But with one of the datasets, I get the following warning with lmFit command in limma package.

> fit <- lmFit(gset38932, design38932)
Warning message:
Partial NA coefficients for 28837 probe(s)

Background of experiment:

    It is a two color experiment with common reference.
    Paired samples (cancerous and adjacent noncancerous tissues were obtained from 12 patients)

I have followed the instructions under section 9.4.1 "Paired Samples" of limma manual (last revised on 17 june 2014) and also goggled for this warning but I am unable to understand it. My another concern is that if I ignore the warning and proceed with limma analysis, I get only 84 differentially expressed genes (genes with adj.P.Value < 0.05), which appears too small a number for a microarray experiment. 

Complete code:
> library(Biobase)
> library(GEOquery)
> library(limma)

> gset38932 <- getGEO("GSE38932", GSEMatrix =TRUE)
> GPLid <- levels((gset38932[[1]])$platform_id)
> if (length(gset38932) > 1)
+ {
+   #idx <- grep("GPL5936", attr(gset38932, "names"))
+   idx <- grep(GPLid, attr(gset38932, "names"))
+ } else
+ {
+   idx <- 1
+ }
> gset38932 <- gset38932[[idx]]

> ConventionalMicroarrayData <- which(pData(gset38932)$characteristics_ch1.10 == "strategy: conventional microarray")
> ## remove data with "Indirect strategy." and keep ones with "Direct strategy."
> gset38932 <- gset38932[,ConventionalMicroarrayData]

> # log2 transform
> ex38932 <- exprs(gset38932)
> qx <- as.numeric(quantile(ex38932, c(0., 0.25, 0.5, 0.75, 0.99, 1.0), na.rm=T))
> LogC <- (qx[5] > 100) ||
+   (qx[6]-qx[1] > 50 && qx[2] > 0) ||
+   (qx[2] > 0 && qx[2] < 1 && qx[4] > 1 && qx[4] < 2)
> if (LogC) { ex38932[which(ex38932 <= 0)] <- NaN
+             exprs(gset38932) <- log2(ex38932) }

> PatientInfo <- pData(gset38932)[,c(2,10:20,27)]
> ## remove anything that is not a digit
> PatientID <- as.numeric(sub(pattern="\D+", replacement="", x=PatientInfo$characteristics_ch1.9,ignore.case=T))
> TissueType <- sub(pattern="tissue type: ", replacement="", x=PatientInfo$characteristics_ch1.8,ignore.case=T)
> TissueType <- make.names(TissueType)
> Block <- factor(PatientID)
> Treatment <- factor(TissueType,levels=c("adjacent.non.tumor","tumor")) ## order of levels is important
> design38932 <- model.matrix(~Block+Treatment)

> fit <- lmFit(gset38932, design38932)
Warning message:
Partial NA coefficients for 28837 probe(s)

> fit2 <- eBayes(fit, proportion=0.01) #proportion: assumed proportion of genes which are differentially expressed
>
> ## to get the no.of genes in each array
> nrow38932 <- nrow(ex38932)
> LastColumn <- dim(design38932)[2]
> tT <- topTable(fit2, coef=colnames(design38932)[LastColumn], adjust="fdr", sort.by="B",number=nrow38932)
> dim(subset(tT,adj.P.Val<0.05,))
[1] 84 15

 -- output of sessionInfo(): 

> sessionInfo()
R version 3.1.0 (2014-04-10)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8       
 [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] limma_3.20.8        GEOquery_2.30.1     Biobase_2.24.0      BiocGenerics_0.10.0

loaded via a namespace (and not attached):
[1] RCurl_1.95-4.3 tools_3.1.0    XML_3.98-1.1  

--
Sent via the guest posting facility at bioconductor.org.