[BioC] Error in lm.fit(design, t(M)) from GEO dataset

Jeremiah H. Savage jeremiahsavage at gmail.com
Fri May 28 03:39:18 CEST 2010


Hello,

I am using limma 3.4.0, and received an error with a GEO series (
http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE12113 ) that
doesn't occur with other series data:

Error in lm.fit(design, t(M)) :
  NA/NaN/Inf in foreign function call (arg 4)

If I do a summary on the eset, I get some -Inf values. Does this mean
there is some invalid data in the series that lmFit doesn't handle?
> summary(exprs(eset12113))
   GSM305488        GSM305489        GSM305490        GSM305491
 Min.   :-2.322   Min.   :-3.322   Min.   :-3.322   Min.   :  -Inf
 1st Qu.: 3.263   1st Qu.: 2.982   1st Qu.: 4.585   1st Qu.: 3.868
 Median : 5.202   Median : 4.916   Median : 6.200   Median : 5.638
 Mean   : 5.140   Mean   : 4.934   Mean   : 6.054   Mean   :  -Inf
 3rd Qu.: 6.785   3rd Qu.: 6.637   3rd Qu.: 7.485   3rd Qu.: 7.323
 Max.   :15.620   Max.   :15.180   Max.   :14.749   Max.   :14.290


The code I'm using is:
> source("sampleEset.R")
>  fit <- getLMFit()


sampleEset.R :
-------------------8<-----------------------------

getLMFit <- function()
  {
    eset12113 <- getEset("12113")

    design12113 <- model.matrix(~ 0+factor(c(1,1,2,3,3,4,5,5,6,7,7,8,9,9,10)))
    colnames(design12113) <-
c("untreated1","treated1","untreated2","treated2","untreated3","treated3","untreated4","treated4","untreated5","treated5")
    contrast.matrix <- makeContrasts(untreated1-treated1,levels=design12113)
    fit <- lmFit(eset12113,design12113)
}

getEset <- function(value)
  {
    DATADIR = "/home/jeremiah/R/i486-pc-linux-gnu-library/2.11/GEOquery/extdata"
    data_file = paste("GSE",value,".soft",sep="")
    data_dir_file= paste(DATADIR,data_file,sep="/")

    library(GEOquery)
    library(Biobase)
    library(limma)

    if (file.exists(data_dir_file))
      {
        GSE_parsed <- getGEO(filename=data_dir_file)
      }
    else
      {
        GSE_value <- paste("GSE",value,sep="")
        GSE_file <- getGEOfile(GSE_value,destdir=DATADIR)
        GSE_parsed <- getGEO(filename=data_dir_file)
      }
    GSM_platform <- lapply(GSMList(GSE_parsed), function(x) {
      Meta(x)$platform
    })
    GSM_platform <- GSM_platform[[1]]

    probe_set <- Table(GPLList(GSE_parsed)[[1]])$ID

    data.matrix <- do.call("cbind", lapply(GSMList(GSE_parsed), function(x) {
      tab <- Table(x)
      mymatch <- match(probe_set, tab$ID_REF)
      return(tab$VALUE[mymatch])
    }))

    data.matrix <- apply(data.matrix, 2, function(x) {
      as.numeric(as.character(x))
    })

    dataLog2.matrix <- log2(data.matrix)

    rownames(dataLog2.matrix) <- probe_set
    colnames(dataLog2.matrix) <- names(GSMList(GSE_parsed))
    p_data <- data.frame(samples = names(GSMList(GSE_parsed)))
    rownames(p_data) <- names(GSMList(GSE_parsed))
    pheno <- as(p_data, "AnnotatedDataFrame")
    eset <- new("ExpressionSet", exprs = dataLog2.matrix, phenoData = pheno)
    return(eset)
  }

-------------------8<-----------------------------

Thanks,
Jeremiah



More information about the Bioconductor mailing list