[BioC] Bioconductor Digest, Vol 121, Issue 29

Thornton, Matthew Matthew.Thornton at med.usc.edu
Fri Mar 29 15:30:00 CET 2013



"bioconductor-request at r-project.org" <bioconductor-request at r-project.org> wrote:


Send Bioconductor mailing list submissions to
        bioconductor at r-project.org

To subscribe or unsubscribe via the World Wide Web, visit
        https://stat.ethz.ch/mailman/listinfo/bioconductor
or, via email, send a message with subject or body 'help' to
        bioconductor-request at r-project.org

You can reach the person managing the list at
        bioconductor-owner at r-project.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Bioconductor digest..."


Today's Topics:

   1. Heatmaps for Agilent Single Color 4x44k using limma and vsn.
      (Cornish, Joseph (NIH/NIAID) [F])
   2. Re: Heatmaps for Agilent Single Color 4x44k using limma and
      vsn. (James W. MacDonald)
   3. Re: Heatmaps for Agilent Single Color 4x44k using limma and
      vsn. (Cornish, Joseph (NIH/NIAID) [F])
   4. Re: Heatmaps for Agilent Single Color 4x44k using limma and
      vsn. (Sean Davis)
   5. Re: Heatmaps for Agilent Single Color 4x44k using limma and
      vsn. (James W. MacDonald)
   6. Re: Heatmaps for Agilent Single Color 4x44k using limma and
      vsn. (Cornish, Joseph (NIH/NIAID) [F])
   7. analysis of HuGene2.0-st array affymetrix -aroma package
      (Andreia Fonseca)
   8. Re: Heatmaps for Agilent Single Color 4x44k using limma and
      vsn. (Sean Davis)
   9. Interpreting DESeq2 results (Michael Muratet)
  10. identifying drosophila miRNA targets (Fiona Ingleby)
  11. Re: identifying drosophila miRNA targets (James W. MacDonald)
  12. Re: Limit on number of sequence files for forging a BSgenome
      (Blanchette, Marco)
  13. Re: [R] Error in setMethod("combine"... was - Error when
      installing globaltest package (Shields, Rusty (IMS))
  14. Re: [R] Error in setMethod("combine"... was - Error when
      installing globaltest package (Martin Morgan)
  15. Re: [R] Error in setMethod("combine"... was - Error when
      installing globaltest package (Shields, Rusty (IMS))
  16. Re: Interpreting DESeq2 results (Michael Love)
  17. Re: Interpreting DESeq2 results (somnath bandyopadhyay)


----------------------------------------------------------------------

Message: 1
Date: Thu, 28 Mar 2013 13:27:23 +0000
From: "Cornish, Joseph (NIH/NIAID) [F]" <joseph.cornish at nih.gov>
To: "bioconductor at r-project.org" <bioconductor at r-project.org>
Subject: [BioC] Heatmaps for Agilent Single Color 4x44k using limma
        and vsn.
Message-ID: <E68EA43FBF80174C85A92BC24EFAA3B84B7BC3 at MLBXV01.nih.gov>
Content-Type: text/plain

I am attempting to write a script that will allow me to do bulk benchmarking of various methods for normalization and background correction. I am concerned that I am not processing the data correctly and this may be leading to the abnormal heatmaps that I am seeing. For a test case, I am using an experiment where I have two different conditions with two different states with two replicates per state.

The heatmaps concern me because of the uniformity between them, the values are all completely green or completely red, there are differences between genes within a state according to the heatmap.

In the examples I have seen people have suggested that I use the coefficients from the eBayes fitting of the data, however in my example the eBayes LMarray object does not have two columns in the results, only on (not sure if this is a sign of errors on my part). I have instead used the lmFit coefficients since there are column for each matrix.

To further confuse someone who is new to R, the simplified example and full code generate different heatmaps for the same data and same methods (normexp for background, cyclic loess for norm). The differences being in the simplified version the gene IDs show up on the right side but in the full version, only row number appears. The color for some genes changes between states as well.

My questions are:
Why does the bayes model only have one column?
Am I approaching the processing correctly?
Why are there differences between the resulting heatmaps of each version?

I know that the idea is to keep the amount of posted code to a minimum but I don't see any other way here.

Here is the simplified example I have made:
library(limma)
library(vsn)
library(gplots)

set.seed(0xabcd) #from vsn manual

#constants for reading agi outs from our imager
COL__   <- list(G="Raw intensity (tr.mean) {A}",Gb="Background (tr.mean) {A}")
ANO__   <- c("Position", "Name", "ID", "Description", "GeneName")

#parse data
targets <- readTargets("test.csv", sep = ",")
gal     <- readGAL("026806_D_20110524.gal")
layout  <- getLayout(gal)
agi     <- read.maimages(targets, columns    = COL__,
                                  annotation = ANO__,
                                  green.only = TRUE)
design <- model.matrix(~0+factor(c(1,1,2,2)))
colnames(design) <- c("s1", "s2")

#process
bg     <- backgroundCorrect(agi, method = "normexp")
norm   <- normalizeBetweenArrays(bg, method = "cyclicloess")
avg    <- avereps(norm, ID=norm$genes$ProbeName)

#fit
contmat <- makeContrasts(s2-s1, levels = design)
fit_lm  <- lmFit(avg$E, design)
fit_be  <- eBayes(contrasts.fit(fit_lm, contmat))

#results
difexp <- topTable(fit_be, coef = 1, adjust = "BH")
res    <- decideTests(fit_be)
comp   <- vennDiagram(res)

#plot
pdf("example-heatmap.pdf")
genes <- as.numeric(rownames(topTable(fit_be, n = 100)))
heatmap(fit_lm$coefficients[genes,], col=redgreen(75), key = TRUE)
dev.off()


The full version:
library(vsn)
library(limma)
library(gplots)
set.seed(0xabcd) #from the vsn manual

#constants
A_SUB__ <- 1
A_NXP__ <- 2
A_MIN__ <- 3
A_VSN__ <- 1
A_QNT__ <- 2
A_LWS__ <- 3
M_SUB__ <- "subtract"
M_NXP__ <- "normexp"
M_MIN__ <- "minimum"
M_VSN__ <- "vsn"
M_LWS__ <- "cyclicloess"
M_QNT__ <- "quantile"
COL__   <- list(G="Raw intensity (tr.mean) {A}",Gb="Background (tr.mean) {A}")
ANO__   <- c("Position", "Name", "ID", "Description", "GeneName")
N_BGM__ <- 3 #number of background correction methods
N_NRM__ <- 3 #number of normalization methods
A_BGM__ <- c(A_SUB__, A_NXP__, A_MIN__)
A_NRM__ <- c(A_VSN__, A_QNT__, A_LWS__)


#utility methods
#applies subtraction background correction with limma
bg_subtract <- function(x){
    return(list(norm = x$norm,
                bg   = M_SUB__,
                agi  = backgroundCorrect(x$agi, method = M_SUB__)))
}

#applies normexp backgroud correction with limma
bg_normexp <- function(x){
    return(list(norm = x$norm,
                bg   = M_NXP__,
                agi  = backgroundCorrect(x$agi, method = M_NXP__)))
}

#applies minimum background correction with limma
bg_minimum <- function(x){
    return(list(norm = x$norm,
                bg   = M_MIN__,
                agi  = backgroundCorrect(x$agi, method = M_MIN__)))
}

#applies VSN normalization with VSN package
nb_vsn <- function(x){
    return(list(norm = M_VSN__,
                bg   = x$bg,
                agi  = normalizeVSN(x$agi)))
}

#applies VSN normalization with VSN package
nb_vsn <- function(x){
    return(list(norm = M_VSN__,
                bg   = x$bg,
                agi  = normalizeVSN(x$agi)))
}

#applies cyclic lowess normalization with limma
nb_lowess <- function(x){
   return(list(norm = M_LWS__,
                 bg = x$bg,
                agi = normalizeBetweenArrays(x$agi, method = M_LWS__)))
}

#applies quantile normalization with limma
nb_quantile <- function(x){
    return(list(norm = M_QNT__,
                  bg = x$bg,
                 agi = normalizeBetweenArrays(x$agi, method = M_QNT__)))
}

#generates a plot of MA values with limma
pl_ma <- function(x){
    name <- paste(c(x[[1]]$bg, x[[1]]$norm, "-ma.pdf"), collapse="")
    pdf(name)
    plotMA(x[[1]]$agi$E)
    dev.off()
}

#generates a plot of mean versus stdev with VS
pl_meansd <- function(x){
    name <- paste(c(x[[1]]$bg, x[[1]]$norm, "-meansd.pdf"), collapse="")
    pdf(name)
    meanSdPlot(x[[1]]$agi$E)
    dev.off()
}

#generate volcanoplot of log-fold versus log-odds
pl_volcano <- function(x){
    name <- paste(c(x[[1]]$bg, x[[1]]$norm, "-volcano.pdf"), collapse="")
    pdf(name)
    volcanoplot(x[[1]]$bayes)
    dev.off()
}

#generate plot of expression data over time
pl_lines <- function(x){
    name <- paste(c(x[[1]]$bg, x[[1]]$norm, "-lines.pdf"), collapse="")
    pdf(name)
    plotlines(x[[1]]$agi$E)
    dev.off()
}

#generate heatmat of final results
pl_heatmap <- function(x, ntop){
    name <- paste(c(x[[1]]$bg, x[[1]]$norm, "-heatmap.pdf"), collapse="")
    pdf(name)
    genes <- as.numeric(rownames(topTable(x[[1]]$bayes, n = ntop)))
    heatmap(x[[1]]$lm$coefficients[genes,],
              col   = redgreen(75),
              key   = TRUE)
    dev.off()
}

#averages replicates of probes
m_repavg <- function(x){
    return(list(agi     = x[[1]]$agi,
                avg     = avereps(x[[1]]$agi, ID=x[[1]]$agi$genes$ProbeName),
                bg      = x[[1]]$bg,
                norm    = x[[1]]$norm))
}

#generate contrasts
m_fit <- function(x, design){
    contmat   <- makeContrasts(s2-s1,
                               levels = design)
    fit1      <- lmFit(x[[1]]$agi$E, design)
    fit2      <- eBayes(contrasts.fit(fit1, contmat))
    diff      <- topTable(fit2, coef = 1, adjust = "BH")
    res       <- decideTests(fit2)
    comp      <- vennDiagram(res)
    return(list(agi   = x[[1]]$agi,
                bg    = x[[1]]$bg,
                norm  = x[[1]]$norm,
                ave   = x[[1]]$ave,
                cont  = contmat,
                lm    = fit1,
                bayes = fit2,
                diff  = diff,
                res   = res,
                comp  = comp))
}

###############################################################################
# Function to parse array data into limma objects
# Args:
#     tfile:   target file (see limma documentation)
#     tsep:    limma defaults TSV, have to set manually for CSV and so on
#     galfile: GAL file for printer/layout
# Returns data object with targets, genes and printer layout
###############################################################################
parse <- function(tfile, tsep, galfile){
    targets <- readTargets(tfile, sep=tsep)
    gal     <- readGAL(galfile)
    layout  <- getLayout(gal)
    agi     <- read.maimages(targets, columns    = COL__,
                                      annotation = ANO__,
                                      green.only = TRUE)
    bg      <- "none"
    norm    <- "none"
    desg    <- model.matrix(~0+factor(c(1,1,2,2)))
    colnames(desg) <- c("s1","s2")
    d <- list(agi     = agi,
              gal     = gal,
              layout  = layout,
              targets = targets,
              bg      = bg,
              desg    = desg,
              norm    = norm)

    processed        <- matrix(NA, nrow = N_BGM__, ncol = N_NRM__)
    processed        <- apply(processed, 1:2, function(x){return(d)})
    return(processed)
}

#parse command args
args <- commandArgs(trailingOnly = TRUE)

#load files into matrix
a <- parse(args[1], args[2], args[3])

#keepy a copy of a raw dataset
raw <- a[1,1]
print(raw[[1]]$desg)

#perform backgroung corrections
a[,A_SUB__] <- lapply(a[,A_SUB__], bg_subtract)
a[,A_NXP__] <- lapply(a[,A_NXP__], bg_normexp)
a[,A_MIN__] <- lapply(a[,A_MIN__], bg_minimum)

#perform normalization
a[A_VSN__,] <- lapply(a[A_VSN__,], nb_vsn)
a[A_LWS__,] <- lapply(a[A_LWS__,], nb_lowess)
a[A_QNT__,] <- lapply(a[A_QNT__,], nb_quantile)

#fit model and differentiate expression
a <- apply(a, 1:2, m_repavg)
a <- apply(a, 1:2, m_fit, design = raw[[1]]$desg)

#generate plots
apply(a, 1:2, pl_ma)
apply(a, 1:2, pl_meansd)
apply(a, 1:2, pl_volcano)
apply(a, 1:2, pl_lines)
apply(a, 1:2, pl_heatmap, args[4])


        [[alternative HTML version deleted]]



------------------------------

Message: 2
Date: Thu, 28 Mar 2013 10:07:32 -0400
From: "James W. MacDonald" <jmacdon at uw.edu>
To: "Cornish, Joseph (NIH/NIAID) [F]" <joseph.cornish at nih.gov>
Cc: "bioconductor at r-project.org" <bioconductor at r-project.org>
Subject: Re: [BioC] Heatmaps for Agilent Single Color 4x44k using
        limma and vsn.
Message-ID: <51544EA4.5080805 at uw.edu>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

Hi Joseph,


On 3/28/2013 9:27 AM, Cornish, Joseph (NIH/NIAID) [F] wrote:
> I am attempting to write a script that will allow me to do bulk benchmarking of various methods for normalization and background correction. I am concerned that I am not processing the data correctly and this may be leading to the abnormal heatmaps that I am seeing. For a test case, I am using an experiment where I have two different conditions with two different states with two replicates per state.
>
> The heatmaps concern me because of the uniformity between them, the values are all completely green or completely red, there are differences between genes within a state according to the heatmap.
>
> In the examples I have seen people have suggested that I use the coefficients from the eBayes fitting of the data, however in my example the eBayes LMarray object does not have two columns in the results, only on (not sure if this is a sign of errors on my part). I have instead used the lmFit coefficients since there are column for each matrix.
>
> To further confuse someone who is new to R, the simplified example and full code generate different heatmaps for the same data and same methods (normexp for background, cyclic loess for norm). The differences being in the simplified version the gene IDs show up on the right side but in the full version, only row number appears. The color for some genes changes between states as well.
>
> My questions are:
> Why does the bayes model only have one column?

Because you are computing one contrast (which is all you can do). One
contrast means one test which means one column of results. In other
words, the coefficient you are estimating in the contrasts.fit() step is
the difference in means between the two groups, which is one value per gene.

> Am I approaching the processing correctly?

I tried to read through that business below, but got some weird
affliction due to an apparent overdose of underscores and had to stop.

> Why are there differences between the resulting heatmaps of each version?

I don't know, as I worried my brain might explode if I read any more.
But you should note that it is fairly unusual to plot coefficients in a
heatmap, as what you are then presenting are log ratios, and most people
can't understand that.

Instead you might consider getting the list of genes, and then creating
a heatmap using the normalized expression values for each sample
instead. You then would want to scale in some way. We often sweep the
mean of the controls out of the matrix (by row), so the colors can then
be interpreted as differences between the treated samples and controls.

Best,

Jim


>
> I know that the idea is to keep the amount of posted code to a minimum but I don't see any other way here.
>
> Here is the simplified example I have made:
> library(limma)
> library(vsn)
> library(gplots)
>
> set.seed(0xabcd) #from vsn manual
>
> #constants for reading agi outs from our imager
> COL__<- list(G="Raw intensity (tr.mean) {A}",Gb="Background (tr.mean) {A}")
> ANO__<- c("Position", "Name", "ID", "Description", "GeneName")
>
> #parse data
> targets<- readTargets("test.csv", sep = ",")
> gal<- readGAL("026806_D_20110524.gal")
> layout<- getLayout(gal)
> agi<- read.maimages(targets, columns    = COL__,
>                                    annotation = ANO__,
>                                    green.only = TRUE)
> design<- model.matrix(~0+factor(c(1,1,2,2)))
> colnames(design)<- c("s1", "s2")
>
> #process
> bg<- backgroundCorrect(agi, method = "normexp")
> norm<- normalizeBetweenArrays(bg, method = "cyclicloess")
> avg<- avereps(norm, ID=norm$genes$ProbeName)
>
> #fit
> contmat<- makeContrasts(s2-s1, levels = design)
> fit_lm<- lmFit(avg$E, design)
> fit_be<- eBayes(contrasts.fit(fit_lm, contmat))
>
> #results
> difexp<- topTable(fit_be, coef = 1, adjust = "BH")
> res<- decideTests(fit_be)
> comp<- vennDiagram(res)
>
> #plot
> pdf("example-heatmap.pdf")
> genes<- as.numeric(rownames(topTable(fit_be, n = 100)))
> heatmap(fit_lm$coefficients[genes,], col=redgreen(75), key = TRUE)
> dev.off()
>
>
> The full version:
> library(vsn)
> library(limma)
> library(gplots)
> set.seed(0xabcd) #from the vsn manual
>
> #constants
> A_SUB__<- 1
> A_NXP__<- 2
> A_MIN__<- 3
> A_VSN__<- 1
> A_QNT__<- 2
> A_LWS__<- 3
> M_SUB__<- "subtract"
> M_NXP__<- "normexp"
> M_MIN__<- "minimum"
> M_VSN__<- "vsn"
> M_LWS__<- "cyclicloess"
> M_QNT__<- "quantile"
> COL__<- list(G="Raw intensity (tr.mean) {A}",Gb="Background (tr.mean) {A}")
> ANO__<- c("Position", "Name", "ID", "Description", "GeneName")
> N_BGM__<- 3 #number of background correction methods
> N_NRM__<- 3 #number of normalization methods
> A_BGM__<- c(A_SUB__, A_NXP__, A_MIN__)
> A_NRM__<- c(A_VSN__, A_QNT__, A_LWS__)
>
>
> #utility methods
> #applies subtraction background correction with limma
> bg_subtract<- function(x){
>      return(list(norm = x$norm,
>                  bg   = M_SUB__,
>                  agi  = backgroundCorrect(x$agi, method = M_SUB__)))
> }
>
> #applies normexp backgroud correction with limma
> bg_normexp<- function(x){
>      return(list(norm = x$norm,
>                  bg   = M_NXP__,
>                  agi  = backgroundCorrect(x$agi, method = M_NXP__)))
> }
>
> #applies minimum background correction with limma
> bg_minimum<- function(x){
>      return(list(norm = x$norm,
>                  bg   = M_MIN__,
>                  agi  = backgroundCorrect(x$agi, method = M_MIN__)))
> }
>
> #applies VSN normalization with VSN package
> nb_vsn<- function(x){
>      return(list(norm = M_VSN__,
>                  bg   = x$bg,
>                  agi  = normalizeVSN(x$agi)))
> }
>
> #applies VSN normalization with VSN package
> nb_vsn<- function(x){
>      return(list(norm = M_VSN__,
>                  bg   = x$bg,
>                  agi  = normalizeVSN(x$agi)))
> }
>
> #applies cyclic lowess normalization with limma
> nb_lowess<- function(x){
>     return(list(norm = M_LWS__,
>                   bg = x$bg,
>                  agi = normalizeBetweenArrays(x$agi, method = M_LWS__)))
> }
>
> #applies quantile normalization with limma
> nb_quantile<- function(x){
>      return(list(norm = M_QNT__,
>                    bg = x$bg,
>                   agi = normalizeBetweenArrays(x$agi, method = M_QNT__)))
> }
>
> #generates a plot of MA values with limma
> pl_ma<- function(x){
>      name<- paste(c(x[[1]]$bg, x[[1]]$norm, "-ma.pdf"), collapse="")
>      pdf(name)
>      plotMA(x[[1]]$agi$E)
>      dev.off()
> }
>
> #generates a plot of mean versus stdev with VS
> pl_meansd<- function(x){
>      name<- paste(c(x[[1]]$bg, x[[1]]$norm, "-meansd.pdf"), collapse="")
>      pdf(name)
>      meanSdPlot(x[[1]]$agi$E)
>      dev.off()
> }
>
> #generate volcanoplot of log-fold versus log-odds
> pl_volcano<- function(x){
>      name<- paste(c(x[[1]]$bg, x[[1]]$norm, "-volcano.pdf"), collapse="")
>      pdf(name)
>      volcanoplot(x[[1]]$bayes)
>      dev.off()
> }
>
> #generate plot of expression data over time
> pl_lines<- function(x){
>      name<- paste(c(x[[1]]$bg, x[[1]]$norm, "-lines.pdf"), collapse="")
>      pdf(name)
>      plotlines(x[[1]]$agi$E)
>      dev.off()
> }
>
> #generate heatmat of final results
> pl_heatmap<- function(x, ntop){
>      name<- paste(c(x[[1]]$bg, x[[1]]$norm, "-heatmap.pdf"), collapse="")
>      pdf(name)
>      genes<- as.numeric(rownames(topTable(x[[1]]$bayes, n = ntop)))
>      heatmap(x[[1]]$lm$coefficients[genes,],
>                col   = redgreen(75),
>                key   = TRUE)
>      dev.off()
> }
>
> #averages replicates of probes
> m_repavg<- function(x){
>      return(list(agi     = x[[1]]$agi,
>                  avg     = avereps(x[[1]]$agi, ID=x[[1]]$agi$genes$ProbeName),
>                  bg      = x[[1]]$bg,
>                  norm    = x[[1]]$norm))
> }
>
> #generate contrasts
> m_fit<- function(x, design){
>      contmat<- makeContrasts(s2-s1,
>                                 levels = design)
>      fit1<- lmFit(x[[1]]$agi$E, design)
>      fit2<- eBayes(contrasts.fit(fit1, contmat))
>      diff<- topTable(fit2, coef = 1, adjust = "BH")
>      res<- decideTests(fit2)
>      comp<- vennDiagram(res)
>      return(list(agi   = x[[1]]$agi,
>                  bg    = x[[1]]$bg,
>                  norm  = x[[1]]$norm,
>                  ave   = x[[1]]$ave,
>                  cont  = contmat,
>                  lm    = fit1,
>                  bayes = fit2,
>                  diff  = diff,
>                  res   = res,
>                  comp  = comp))
> }
>
> ###############################################################################
> # Function to parse array data into limma objects
> # Args:
> #     tfile:   target file (see limma documentation)
> #     tsep:    limma defaults TSV, have to set manually for CSV and so on
> #     galfile: GAL file for printer/layout
> # Returns data object with targets, genes and printer layout
> ###############################################################################
> parse<- function(tfile, tsep, galfile){
>      targets<- readTargets(tfile, sep=tsep)
>      gal<- readGAL(galfile)
>      layout<- getLayout(gal)
>      agi<- read.maimages(targets, columns    = COL__,
>                                        annotation = ANO__,
>                                        green.only = TRUE)
>      bg<- "none"
>      norm<- "none"
>      desg<- model.matrix(~0+factor(c(1,1,2,2)))
>      colnames(desg)<- c("s1","s2")
>      d<- list(agi     = agi,
>                gal     = gal,
>                layout  = layout,
>                targets = targets,
>                bg      = bg,
>                desg    = desg,
>                norm    = norm)
>
>      processed<- matrix(NA, nrow = N_BGM__, ncol = N_NRM__)
>      processed<- apply(processed, 1:2, function(x){return(d)})
>      return(processed)
> }
>
> #parse command args
> args<- commandArgs(trailingOnly = TRUE)
>
> #load files into matrix
> a<- parse(args[1], args[2], args[3])
>
> #keepy a copy of a raw dataset
> raw<- a[1,1]
> print(raw[[1]]$desg)
>
> #perform backgroung corrections
> a[,A_SUB__]<- lapply(a[,A_SUB__], bg_subtract)
> a[,A_NXP__]<- lapply(a[,A_NXP__], bg_normexp)
> a[,A_MIN__]<- lapply(a[,A_MIN__], bg_minimum)
>
> #perform normalization
> a[A_VSN__,]<- lapply(a[A_VSN__,], nb_vsn)
> a[A_LWS__,]<- lapply(a[A_LWS__,], nb_lowess)
> a[A_QNT__,]<- lapply(a[A_QNT__,], nb_quantile)
>
> #fit model and differentiate expression
> a<- apply(a, 1:2, m_repavg)
> a<- apply(a, 1:2, m_fit, design = raw[[1]]$desg)
>
> #generate plots
> apply(a, 1:2, pl_ma)
> apply(a, 1:2, pl_meansd)
> apply(a, 1:2, pl_volcano)
> apply(a, 1:2, pl_lines)
> apply(a, 1:2, pl_heatmap, args[4])
>
>
>       [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099



------------------------------

Message: 3
Date: Thu, 28 Mar 2013 14:22:32 +0000
From: "Cornish, Joseph (NIH/NIAID) [F]" <joseph.cornish at nih.gov>
To: "James W. MacDonald" <jmacdon at uw.edu>
Cc: "bioconductor at r-project.org" <bioconductor at r-project.org>
Subject: Re: [BioC] Heatmaps for Agilent Single Color 4x44k using
        limma and vsn.
Message-ID: <E68EA43FBF80174C85A92BC24EFAA3B84B7BDF at MLBXV01.nih.gov>
Content-Type: text/plain; charset="us-ascii"

Hello James,

Are you referring to function names or constants? The constants are an old habit I picked up long ago in my CS classes, there are a ton but it makes them stand out for me in vim more readily. I've come to avoid camel case since switching to linux, it makes searching for things a nightmare, it can also help if I group functions with a prefix (eg plot functions start with "pl_"). If you want I can find+replace all of the constants with their values...

The differences came from using the unaveraged replicates in the larger script versus the average in the example.

I'm not sure what you mean by "mean of the controls", would this simply be filtering out the control spots?

I'll generate the heatmaps using the normalized expression values. I suppose in that case it would be worthwhile to filter ahead of time by p-value and fold change (the coef in topTable).


_________thanks_____________
_______Joe______

-----Original Message-----
From: James W. MacDonald [mailto:jmacdon at uw.edu]
Sent: Thursday, March 28, 2013 10:08 AM
To: Cornish, Joseph (NIH/NIAID) [F]
Cc: bioconductor at r-project.org
Subject: Re: [BioC] Heatmaps for Agilent Single Color 4x44k using limma and vsn.

Hi Joseph,


On 3/28/2013 9:27 AM, Cornish, Joseph (NIH/NIAID) [F] wrote:
> I am attempting to write a script that will allow me to do bulk benchmarking of various methods for normalization and background correction. I am concerned that I am not processing the data correctly and this may be leading to the abnormal heatmaps that I am seeing. For a test case, I am using an experiment where I have two different conditions with two different states with two replicates per state.
>
> The heatmaps concern me because of the uniformity between them, the values are all completely green or completely red, there are differences between genes within a state according to the heatmap.
>
> In the examples I have seen people have suggested that I use the coefficients from the eBayes fitting of the data, however in my example the eBayes LMarray object does not have two columns in the results, only on (not sure if this is a sign of errors on my part). I have instead used the lmFit coefficients since there are column for each matrix.
>
> To further confuse someone who is new to R, the simplified example and full code generate different heatmaps for the same data and same methods (normexp for background, cyclic loess for norm). The differences being in the simplified version the gene IDs show up on the right side but in the full version, only row number appears. The color for some genes changes between states as well.
>
> My questions are:
> Why does the bayes model only have one column?

Because you are computing one contrast (which is all you can do). One contrast means one test which means one column of results. In other words, the coefficient you are estimating in the contrasts.fit() step is the difference in means between the two groups, which is one value per gene.

> Am I approaching the processing correctly?

I tried to read through that business below, but got some weird affliction due to an apparent overdose of underscores and had to stop.

> Why are there differences between the resulting heatmaps of each version?

I don't know, as I worried my brain might explode if I read any more.
But you should note that it is fairly unusual to plot coefficients in a heatmap, as what you are then presenting are log ratios, and most people can't understand that.

Instead you might consider getting the list of genes, and then creating a heatmap using the normalized expression values for each sample instead. You then would want to scale in some way. We often sweep the mean of the controls out of the matrix (by row), so the colors can then be interpreted as differences between the treated samples and controls.

Best,

Jim


>
> I know that the idea is to keep the amount of posted code to a minimum but I don't see any other way here.
>
> Here is the simplified example I have made:
> library(limma)
> library(vsn)
> library(gplots)
>
> set.seed(0xabcd) #from vsn manual
>
> #constants for reading agi outs from our imager
> COL__<- list(G="Raw intensity (tr.mean) {A}",Gb="Background (tr.mean)
> {A}")
> ANO__<- c("Position", "Name", "ID", "Description", "GeneName")
>
> #parse data
> targets<- readTargets("test.csv", sep = ",")
> gal<- readGAL("026806_D_20110524.gal")
> layout<- getLayout(gal)
> agi<- read.maimages(targets, columns    = COL__,
>                                    annotation = ANO__,
>                                    green.only = TRUE)
> design<- model.matrix(~0+factor(c(1,1,2,2)))
> colnames(design)<- c("s1", "s2")
>
> #process
> bg<- backgroundCorrect(agi, method = "normexp")
> norm<- normalizeBetweenArrays(bg, method = "cyclicloess")
> avg<- avereps(norm, ID=norm$genes$ProbeName)
>
> #fit
> contmat<- makeContrasts(s2-s1, levels = design)
> fit_lm<- lmFit(avg$E, design)
> fit_be<- eBayes(contrasts.fit(fit_lm, contmat))
>
> #results
> difexp<- topTable(fit_be, coef = 1, adjust = "BH")
> res<- decideTests(fit_be)
> comp<- vennDiagram(res)
>
> #plot
> pdf("example-heatmap.pdf")
> genes<- as.numeric(rownames(topTable(fit_be, n = 100)))
> heatmap(fit_lm$coefficients[genes,], col=redgreen(75), key = TRUE)
> dev.off()
>
>
> The full version:
> library(vsn)
> library(limma)
> library(gplots)
> set.seed(0xabcd) #from the vsn manual
>
> #constants
> A_SUB__<- 1
> A_NXP__<- 2
> A_MIN__<- 3
> A_VSN__<- 1
> A_QNT__<- 2
> A_LWS__<- 3
> M_SUB__<- "subtract"
> M_NXP__<- "normexp"
> M_MIN__<- "minimum"
> M_VSN__<- "vsn"
> M_LWS__<- "cyclicloess"
> M_QNT__<- "quantile"
> COL__<- list(G="Raw intensity (tr.mean) {A}",Gb="Background (tr.mean)
> {A}")
> ANO__<- c("Position", "Name", "ID", "Description", "GeneName")
> N_BGM__<- 3 #number of background correction methods
> N_NRM__<- 3 #number of normalization methods
> A_BGM__<- c(A_SUB__, A_NXP__, A_MIN__)
> A_NRM__<- c(A_VSN__, A_QNT__, A_LWS__)
>
>
> #utility methods
> #applies subtraction background correction with limma
> bg_subtract<- function(x){
>      return(list(norm = x$norm,
>                  bg   = M_SUB__,
>                  agi  = backgroundCorrect(x$agi, method = M_SUB__))) }
>
> #applies normexp backgroud correction with limma
> bg_normexp<- function(x){
>      return(list(norm = x$norm,
>                  bg   = M_NXP__,
>                  agi  = backgroundCorrect(x$agi, method = M_NXP__))) }
>
> #applies minimum background correction with limma
> bg_minimum<- function(x){
>      return(list(norm = x$norm,
>                  bg   = M_MIN__,
>                  agi  = backgroundCorrect(x$agi, method = M_MIN__))) }
>
> #applies VSN normalization with VSN package
> nb_vsn<- function(x){
>      return(list(norm = M_VSN__,
>                  bg   = x$bg,
>                  agi  = normalizeVSN(x$agi))) }
>
> #applies VSN normalization with VSN package
> nb_vsn<- function(x){
>      return(list(norm = M_VSN__,
>                  bg   = x$bg,
>                  agi  = normalizeVSN(x$agi))) }
>
> #applies cyclic lowess normalization with limma
> nb_lowess<- function(x){
>     return(list(norm = M_LWS__,
>                   bg = x$bg,
>                  agi = normalizeBetweenArrays(x$agi, method =
> M_LWS__))) }
>
> #applies quantile normalization with limma
> nb_quantile<- function(x){
>      return(list(norm = M_QNT__,
>                    bg = x$bg,
>                   agi = normalizeBetweenArrays(x$agi, method =
> M_QNT__))) }
>
> #generates a plot of MA values with limma
> pl_ma<- function(x){
>      name<- paste(c(x[[1]]$bg, x[[1]]$norm, "-ma.pdf"), collapse="")
>      pdf(name)
>      plotMA(x[[1]]$agi$E)
>      dev.off()
> }
>
> #generates a plot of mean versus stdev with VS
> pl_meansd<- function(x){
>      name<- paste(c(x[[1]]$bg, x[[1]]$norm, "-meansd.pdf"), collapse="")
>      pdf(name)
>      meanSdPlot(x[[1]]$agi$E)
>      dev.off()
> }
>
> #generate volcanoplot of log-fold versus log-odds
> pl_volcano<- function(x){
>      name<- paste(c(x[[1]]$bg, x[[1]]$norm, "-volcano.pdf"), collapse="")
>      pdf(name)
>      volcanoplot(x[[1]]$bayes)
>      dev.off()
> }
>
> #generate plot of expression data over time
> pl_lines<- function(x){
>      name<- paste(c(x[[1]]$bg, x[[1]]$norm, "-lines.pdf"), collapse="")
>      pdf(name)
>      plotlines(x[[1]]$agi$E)
>      dev.off()
> }
>
> #generate heatmat of final results
> pl_heatmap<- function(x, ntop){
>      name<- paste(c(x[[1]]$bg, x[[1]]$norm, "-heatmap.pdf"), collapse="")
>      pdf(name)
>      genes<- as.numeric(rownames(topTable(x[[1]]$bayes, n = ntop)))
>      heatmap(x[[1]]$lm$coefficients[genes,],
>                col   = redgreen(75),
>                key   = TRUE)
>      dev.off()
> }
>
> #averages replicates of probes
> m_repavg<- function(x){
>      return(list(agi     = x[[1]]$agi,
>                  avg     = avereps(x[[1]]$agi, ID=x[[1]]$agi$genes$ProbeName),
>                  bg      = x[[1]]$bg,
>                  norm    = x[[1]]$norm))
> }
>
> #generate contrasts
> m_fit<- function(x, design){
>      contmat<- makeContrasts(s2-s1,
>                                 levels = design)
>      fit1<- lmFit(x[[1]]$agi$E, design)
>      fit2<- eBayes(contrasts.fit(fit1, contmat))
>      diff<- topTable(fit2, coef = 1, adjust = "BH")
>      res<- decideTests(fit2)
>      comp<- vennDiagram(res)
>      return(list(agi   = x[[1]]$agi,
>                  bg    = x[[1]]$bg,
>                  norm  = x[[1]]$norm,
>                  ave   = x[[1]]$ave,
>                  cont  = contmat,
>                  lm    = fit1,
>                  bayes = fit2,
>                  diff  = diff,
>                  res   = res,
>                  comp  = comp))
> }
>
> ######################################################################
> ######### # Function to parse array data into limma objects # Args:
> #     tfile:   target file (see limma documentation)
> #     tsep:    limma defaults TSV, have to set manually for CSV and so on
> #     galfile: GAL file for printer/layout
> # Returns data object with targets, genes and printer layout
> ######################################################################
> #########
> parse<- function(tfile, tsep, galfile){
>      targets<- readTargets(tfile, sep=tsep)
>      gal<- readGAL(galfile)
>      layout<- getLayout(gal)
>      agi<- read.maimages(targets, columns    = COL__,
>                                        annotation = ANO__,
>                                        green.only = TRUE)
>      bg<- "none"
>      norm<- "none"
>      desg<- model.matrix(~0+factor(c(1,1,2,2)))
>      colnames(desg)<- c("s1","s2")
>      d<- list(agi     = agi,
>                gal     = gal,
>                layout  = layout,
>                targets = targets,
>                bg      = bg,
>                desg    = desg,
>                norm    = norm)
>
>      processed<- matrix(NA, nrow = N_BGM__, ncol = N_NRM__)
>      processed<- apply(processed, 1:2, function(x){return(d)})
>      return(processed)
> }
>
> #parse command args
> args<- commandArgs(trailingOnly = TRUE)
>
> #load files into matrix
> a<- parse(args[1], args[2], args[3])
>
> #keepy a copy of a raw dataset
> raw<- a[1,1]
> print(raw[[1]]$desg)
>
> #perform backgroung corrections
> a[,A_SUB__]<- lapply(a[,A_SUB__], bg_subtract)
> a[,A_NXP__]<- lapply(a[,A_NXP__], bg_normexp)
> a[,A_MIN__]<- lapply(a[,A_MIN__], bg_minimum)
>
> #perform normalization
> a[A_VSN__,]<- lapply(a[A_VSN__,], nb_vsn)
> a[A_LWS__,]<- lapply(a[A_LWS__,], nb_lowess)
> a[A_QNT__,]<- lapply(a[A_QNT__,], nb_quantile)
>
> #fit model and differentiate expression
> a<- apply(a, 1:2, m_repavg)
> a<- apply(a, 1:2, m_fit, design = raw[[1]]$desg)
>
> #generate plots
> apply(a, 1:2, pl_ma)
> apply(a, 1:2, pl_meansd)
> apply(a, 1:2, pl_volcano)
> apply(a, 1:2, pl_lines)
> apply(a, 1:2, pl_heatmap, args[4])
>
>
>       [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor

--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099



------------------------------

Message: 4
Date: Thu, 28 Mar 2013 10:33:51 -0400
From: Sean Davis <sdavis2 at mail.nih.gov>
To: "Cornish, Joseph (NIH/NIAID) [F]" <joseph.cornish at nih.gov>
Cc: "bioconductor at r-project.org" <bioconductor at r-project.org>,  "James
        W. MacDonald" <jmacdon at uw.edu>
Subject: Re: [BioC] Heatmaps for Agilent Single Color 4x44k using
        limma and       vsn.
Message-ID:
        <CANeAVBkYV+mb61JVne0aKLiCXoVxYveLUsBHGowwR61_q9s-dw at mail.gmail.com>
Content-Type: text/plain

On Thu, Mar 28, 2013 at 10:22 AM, Cornish, Joseph (NIH/NIAID) [F] <
joseph.cornish at nih.gov> wrote:

> Hello James,
>
> Are you referring to function names or constants? The constants are an old
> habit I picked up long ago in my CS classes, there are a ton but it makes
> them stand out for me in vim more readily. I've come to avoid camel case
> since switching to linux, it makes searching for things a nightmare, it can
> also help if I group functions with a prefix (eg plot functions start with
> "pl_"). If you want I can find+replace all of the constants with their
> values...
>
>
Hi, Joe.

I think I could help to translate part of what James said.  If you could
boil your example down to a few lines of reproducible code that show the
problem, that would be simpler to interpret and easier to comment on
directly.  Sometimes in the process of producing such a simplified example,
I have found my own misunderstandings.  It also sometimes helps to get
someone local to give a second pair of eyes.


> The differences came from using the unaveraged replicates in the larger
> script versus the average in the example.
>
> I'm not sure what you mean by "mean of the controls", would this simply be
> filtering out the control spots?
>
>
No.  I think he means for you to subtract the mean of the control samples
(or one group of samples) for each gene from all the normalized expression
values.


> I'll generate the heatmaps using the normalized expression values. I
> suppose in that case it would be worthwhile to filter ahead of time by
> p-value and fold change (the coef in topTable).
>
>
Yes.

Sean


>
> _________thanks_____________
> _______Joe______
>
> -----Original Message-----
> From: James W. MacDonald [mailto:jmacdon at uw.edu]
> Sent: Thursday, March 28, 2013 10:08 AM
> To: Cornish, Joseph (NIH/NIAID) [F]
> Cc: bioconductor at r-project.org
> Subject: Re: [BioC] Heatmaps for Agilent Single Color 4x44k using limma
> and vsn.
>
> Hi Joseph,
>
>
> On 3/28/2013 9:27 AM, Cornish, Joseph (NIH/NIAID) [F] wrote:
> > I am attempting to write a script that will allow me to do bulk
> benchmarking of various methods for normalization and background
> correction. I am concerned that I am not processing the data correctly and
> this may be leading to the abnormal heatmaps that I am seeing. For a test
> case, I am using an experiment where I have two different conditions with
> two different states with two replicates per state.
> >
> > The heatmaps concern me because of the uniformity between them, the
> values are all completely green or completely red, there are differences
> between genes within a state according to the heatmap.
> >
> > In the examples I have seen people have suggested that I use the
> coefficients from the eBayes fitting of the data, however in my example the
> eBayes LMarray object does not have two columns in the results, only on
> (not sure if this is a sign of errors on my part). I have instead used the
> lmFit coefficients since there are column for each matrix.
> >
> > To further confuse someone who is new to R, the simplified example and
> full code generate different heatmaps for the same data and same methods
> (normexp for background, cyclic loess for norm). The differences being in
> the simplified version the gene IDs show up on the right side but in the
> full version, only row number appears. The color for some genes changes
> between states as well.
> >
> > My questions are:
> > Why does the bayes model only have one column?
>
> Because you are computing one contrast (which is all you can do). One
> contrast means one test which means one column of results. In other words,
> the coefficient you are estimating in the contrasts.fit() step is the
> difference in means between the two groups, which is one value per gene.
>
> > Am I approaching the processing correctly?
>
> I tried to read through that business below, but got some weird affliction
> due to an apparent overdose of underscores and had to stop.
>
> > Why are there differences between the resulting heatmaps of each version?
>
> I don't know, as I worried my brain might explode if I read any more.
> But you should note that it is fairly unusual to plot coefficients in a
> heatmap, as what you are then presenting are log ratios, and most people
> can't understand that.
>
> Instead you might consider getting the list of genes, and then creating a
> heatmap using the normalized expression values for each sample instead. You
> then would want to scale in some way. We often sweep the mean of the
> controls out of the matrix (by row), so the colors can then be interpreted
> as differences between the treated samples and controls.
>
> Best,
>
> Jim
>
>
> >
> > I know that the idea is to keep the amount of posted code to a minimum
> but I don't see any other way here.
> >
> > Here is the simplified example I have made:
> > library(limma)
> > library(vsn)
> > library(gplots)
> >
> > set.seed(0xabcd) #from vsn manual
> >
> > #constants for reading agi outs from our imager
> > COL__<- list(G="Raw intensity (tr.mean) {A}",Gb="Background (tr.mean)
> > {A}")
> > ANO__<- c("Position", "Name", "ID", "Description", "GeneName")
> >
> > #parse data
> > targets<- readTargets("test.csv", sep = ",")
> > gal<- readGAL("026806_D_20110524.gal")
> > layout<- getLayout(gal)
> > agi<- read.maimages(targets, columns    = COL__,
> >                                    annotation = ANO__,
> >                                    green.only = TRUE)
> > design<- model.matrix(~0+factor(c(1,1,2,2)))
> > colnames(design)<- c("s1", "s2")
> >
> > #process
> > bg<- backgroundCorrect(agi, method = "normexp")
> > norm<- normalizeBetweenArrays(bg, method = "cyclicloess")
> > avg<- avereps(norm, ID=norm$genes$ProbeName)
> >
> > #fit
> > contmat<- makeContrasts(s2-s1, levels = design)
> > fit_lm<- lmFit(avg$E, design)
> > fit_be<- eBayes(contrasts.fit(fit_lm, contmat))
> >
> > #results
> > difexp<- topTable(fit_be, coef = 1, adjust = "BH")
> > res<- decideTests(fit_be)
> > comp<- vennDiagram(res)
> >
> > #plot
> > pdf("example-heatmap.pdf")
> > genes<- as.numeric(rownames(topTable(fit_be, n = 100)))
> > heatmap(fit_lm$coefficients[genes,], col=redgreen(75), key = TRUE)
> > dev.off()
> >
> >
> > The full version:
> > library(vsn)
> > library(limma)
> > library(gplots)
> > set.seed(0xabcd) #from the vsn manual
> >
> > #constants
> > A_SUB__<- 1
> > A_NXP__<- 2
> > A_MIN__<- 3
> > A_VSN__<- 1
> > A_QNT__<- 2
> > A_LWS__<- 3
> > M_SUB__<- "subtract"
> > M_NXP__<- "normexp"
> > M_MIN__<- "minimum"
> > M_VSN__<- "vsn"
> > M_LWS__<- "cyclicloess"
> > M_QNT__<- "quantile"
> > COL__<- list(G="Raw intensity (tr.mean) {A}",Gb="Background (tr.mean)
> > {A}")
> > ANO__<- c("Position", "Name", "ID", "Description", "GeneName")
> > N_BGM__<- 3 #number of background correction methods
> > N_NRM__<- 3 #number of normalization methods
> > A_BGM__<- c(A_SUB__, A_NXP__, A_MIN__)
> > A_NRM__<- c(A_VSN__, A_QNT__, A_LWS__)
> >
> >
> > #utility methods
> > #applies subtraction background correction with limma
> > bg_subtract<- function(x){
> >      return(list(norm = x$norm,
> >                  bg   = M_SUB__,
> >                  agi  = backgroundCorrect(x$agi, method = M_SUB__))) }
> >
> > #applies normexp backgroud correction with limma
> > bg_normexp<- function(x){
> >      return(list(norm = x$norm,
> >                  bg   = M_NXP__,
> >                  agi  = backgroundCorrect(x$agi, method = M_NXP__))) }
> >
> > #applies minimum background correction with limma
> > bg_minimum<- function(x){
> >      return(list(norm = x$norm,
> >                  bg   = M_MIN__,
> >                  agi  = backgroundCorrect(x$agi, method = M_MIN__))) }
> >
> > #applies VSN normalization with VSN package
> > nb_vsn<- function(x){
> >      return(list(norm = M_VSN__,
> >                  bg   = x$bg,
> >                  agi  = normalizeVSN(x$agi))) }
> >
> > #applies VSN normalization with VSN package
> > nb_vsn<- function(x){
> >      return(list(norm = M_VSN__,
> >                  bg   = x$bg,
> >                  agi  = normalizeVSN(x$agi))) }
> >
> > #applies cyclic lowess normalization with limma
> > nb_lowess<- function(x){
> >     return(list(norm = M_LWS__,
> >                   bg = x$bg,
> >                  agi = normalizeBetweenArrays(x$agi, method =
> > M_LWS__))) }
> >
> > #applies quantile normalization with limma
> > nb_quantile<- function(x){
> >      return(list(norm = M_QNT__,
> >                    bg = x$bg,
> >                   agi = normalizeBetweenArrays(x$agi, method =
> > M_QNT__))) }
> >
> > #generates a plot of MA values with limma
> > pl_ma<- function(x){
> >      name<- paste(c(x[[1]]$bg, x[[1]]$norm, "-ma.pdf"), collapse="")
> >      pdf(name)
> >      plotMA(x[[1]]$agi$E)
> >      dev.off()
> > }
> >
> > #generates a plot of mean versus stdev with VS
> > pl_meansd<- function(x){
> >      name<- paste(c(x[[1]]$bg, x[[1]]$norm, "-meansd.pdf"), collapse="")
> >      pdf(name)
> >      meanSdPlot(x[[1]]$agi$E)
> >      dev.off()
> > }
> >
> > #generate volcanoplot of log-fold versus log-odds
> > pl_volcano<- function(x){
> >      name<- paste(c(x[[1]]$bg, x[[1]]$norm, "-volcano.pdf"), collapse="")
> >      pdf(name)
> >      volcanoplot(x[[1]]$bayes)
> >      dev.off()
> > }
> >
> > #generate plot of expression data over time
> > pl_lines<- function(x){
> >      name<- paste(c(x[[1]]$bg, x[[1]]$norm, "-lines.pdf"), collapse="")
> >      pdf(name)
> >      plotlines(x[[1]]$agi$E)
> >      dev.off()
> > }
> >
> > #generate heatmat of final results
> > pl_heatmap<- function(x, ntop){
> >      name<- paste(c(x[[1]]$bg, x[[1]]$norm, "-heatmap.pdf"), collapse="")
> >      pdf(name)
> >      genes<- as.numeric(rownames(topTable(x[[1]]$bayes, n = ntop)))
> >      heatmap(x[[1]]$lm$coefficients[genes,],
> >                col   = redgreen(75),
> >                key   = TRUE)
> >      dev.off()
> > }
> >
> > #averages replicates of probes
> > m_repavg<- function(x){
> >      return(list(agi     = x[[1]]$agi,
> >                  avg     = avereps(x[[1]]$agi,
> ID=x[[1]]$agi$genes$ProbeName),
> >                  bg      = x[[1]]$bg,
> >                  norm    = x[[1]]$norm))
> > }
> >
> > #generate contrasts
> > m_fit<- function(x, design){
> >      contmat<- makeContrasts(s2-s1,
> >                                 levels = design)
> >      fit1<- lmFit(x[[1]]$agi$E, design)
> >      fit2<- eBayes(contrasts.fit(fit1, contmat))
> >      diff<- topTable(fit2, coef = 1, adjust = "BH")
> >      res<- decideTests(fit2)
> >      comp<- vennDiagram(res)
> >      return(list(agi   = x[[1]]$agi,
> >                  bg    = x[[1]]$bg,
> >                  norm  = x[[1]]$norm,
> >                  ave   = x[[1]]$ave,
> >                  cont  = contmat,
> >                  lm    = fit1,
> >                  bayes = fit2,
> >                  diff  = diff,
> >                  res   = res,
> >                  comp  = comp))
> > }
> >
> > ######################################################################
> > ######### # Function to parse array data into limma objects # Args:
> > #     tfile:   target file (see limma documentation)
> > #     tsep:    limma defaults TSV, have to set manually for CSV and so on
> > #     galfile: GAL file for printer/layout
> > # Returns data object with targets, genes and printer layout
> > ######################################################################
> > #########
> > parse<- function(tfile, tsep, galfile){
> >      targets<- readTargets(tfile, sep=tsep)
> >      gal<- readGAL(galfile)
> >      layout<- getLayout(gal)
> >      agi<- read.maimages(targets, columns    = COL__,
> >                                        annotation = ANO__,
> >                                        green.only = TRUE)
> >      bg<- "none"
> >      norm<- "none"
> >      desg<- model.matrix(~0+factor(c(1,1,2,2)))
> >      colnames(desg)<- c("s1","s2")
> >      d<- list(agi     = agi,
> >                gal     = gal,
> >                layout  = layout,
> >                targets = targets,
> >                bg      = bg,
> >                desg    = desg,
> >                norm    = norm)
> >
> >      processed<- matrix(NA, nrow = N_BGM__, ncol = N_NRM__)
> >      processed<- apply(processed, 1:2, function(x){return(d)})
> >      return(processed)
> > }
> >
> > #parse command args
> > args<- commandArgs(trailingOnly = TRUE)
> >
> > #load files into matrix
> > a<- parse(args[1], args[2], args[3])
> >
> > #keepy a copy of a raw dataset
> > raw<- a[1,1]
> > print(raw[[1]]$desg)
> >
> > #perform backgroung corrections
> > a[,A_SUB__]<- lapply(a[,A_SUB__], bg_subtract)
> > a[,A_NXP__]<- lapply(a[,A_NXP__], bg_normexp)
> > a[,A_MIN__]<- lapply(a[,A_MIN__], bg_minimum)
> >
> > #perform normalization
> > a[A_VSN__,]<- lapply(a[A_VSN__,], nb_vsn)
> > a[A_LWS__,]<- lapply(a[A_LWS__,], nb_lowess)
> > a[A_QNT__,]<- lapply(a[A_QNT__,], nb_quantile)
> >
> > #fit model and differentiate expression
> > a<- apply(a, 1:2, m_repavg)
> > a<- apply(a, 1:2, m_fit, design = raw[[1]]$desg)
> >
> > #generate plots
> > apply(a, 1:2, pl_ma)
> > apply(a, 1:2, pl_meansd)
> > apply(a, 1:2, pl_volcano)
> > apply(a, 1:2, pl_lines)
> > apply(a, 1:2, pl_heatmap, args[4])
> >
> >
> >       [[alternative HTML version deleted]]
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at r-project.org
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > Search the archives:
> > http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> --
> James W. MacDonald, M.S.
> Biostatistician
> University of Washington
> Environmental and Occupational Health Sciences
> 4225 Roosevelt Way NE, # 100
> Seattle WA 98105-6099
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>

        [[alternative HTML version deleted]]



------------------------------

Message: 5
Date: Thu, 28 Mar 2013 10:45:21 -0400
From: "James W. MacDonald" <jmacdon at uw.edu>
To: "Cornish, Joseph (NIH/NIAID) [F]" <joseph.cornish at nih.gov>
Cc: "bioconductor at r-project.org" <bioconductor at r-project.org>
Subject: Re: [BioC] Heatmaps for Agilent Single Color 4x44k using
        limma and vsn.
Message-ID: <51545781.7000100 at uw.edu>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

Hi Joe,

I don't mean to be critical. If you like to use gobtons of underscores,
then rock on. I would just find it very distracting.

Anyway, by 'mean of the controls', I mean the average value of the
control samples. But this could be generalized in any way. For example,
your contrast was s2 - s1, so you could do

mat <- avg$E[<index of interesting genes>,]
mn <- fit_lm[<index of interesting genes>,"s1"]
swept <- sweep(mat, 1, mn, "-")

and then do the heatmap. In this case the colors would be interpreted as
the log fold difference between any sample and the mean of the s1 samples.

But the general take home message here is that you can't just use the
raw expression values, as you will then have a solid color to your
heatmap. Other possibilities are to use the scale argument to
heatmap.2() to scale by row (convert to z-scores), in which case the
interpretation of the colors is less obvious.

Best,

Jim



On 3/28/2013 10:22 AM, Cornish, Joseph (NIH/NIAID) [F] wrote:
> Hello James,
>
> Are you referring to function names or constants? The constants are an old habit I picked up long ago in my CS classes, there are a ton but it makes them stand out for me in vim more readily. I've come to avoid camel case since switching to linux, it makes searching for things a nightmare, it can also help if I group functions with a prefix (eg plot functions start with "pl_"). If you want I can find+replace all of the constants with their values...
>
> The differences came from using the unaveraged replicates in the larger script versus the average in the example.
>
> I'm not sure what you mean by "mean of the controls", would this simply be filtering out the control spots?
>
> I'll generate the heatmaps using the normalized expression values. I suppose in that case it would be worthwhile to filter ahead of time by p-value and fold change (the coef in topTable).
>
>
> _________thanks_____________
> _______Joe______
>
> -----Original Message-----
> From: James W. MacDonald [mailto:jmacdon at uw.edu]
> Sent: Thursday, March 28, 2013 10:08 AM
> To: Cornish, Joseph (NIH/NIAID) [F]
> Cc: bioconductor at r-project.org
> Subject: Re: [BioC] Heatmaps for Agilent Single Color 4x44k using limma and vsn.
>
> Hi Joseph,
>
>
> On 3/28/2013 9:27 AM, Cornish, Joseph (NIH/NIAID) [F] wrote:
>> I am attempting to write a script that will allow me to do bulk benchmarking of various methods for normalization and background correction. I am concerned that I am not processing the data correctly and this may be leading to the abnormal heatmaps that I am seeing. For a test case, I am using an experiment where I have two different conditions with two different states with two replicates per state.
>>
>> The heatmaps concern me because of the uniformity between them, the values are all completely green or completely red, there are differences between genes within a state according to the heatmap.
>>
>> In the examples I have seen people have suggested that I use the coefficients from the eBayes fitting of the data, however in my example the eBayes LMarray object does not have two columns in the results, only on (not sure if this is a sign of errors on my part). I have instead used the lmFit coefficients since there are column for each matrix.
>>
>> To further confuse someone who is new to R, the simplified example and full code generate different heatmaps for the same data and same methods (normexp for background, cyclic loess for norm). The differences being in the simplified version the gene IDs show up on the right side but in the full version, only row number appears. The color for some genes changes between states as well.
>>
>> My questions are:
>> Why does the bayes model only have one column?
> Because you are computing one contrast (which is all you can do). One contrast means one test which means one column of results. In other words, the coefficient you are estimating in the contrasts.fit() step is the difference in means between the two groups, which is one value per gene.
>
>> Am I approaching the processing correctly?
> I tried to read through that business below, but got some weird affliction due to an apparent overdose of underscores and had to stop.
>
>> Why are there differences between the resulting heatmaps of each version?
> I don't know, as I worried my brain might explode if I read any more.
> But you should note that it is fairly unusual to plot coefficients in a heatmap, as what you are then presenting are log ratios, and most people can't understand that.
>
> Instead you might consider getting the list of genes, and then creating a heatmap using the normalized expression values for each sample instead. You then would want to scale in some way. We often sweep the mean of the controls out of the matrix (by row), so the colors can then be interpreted as differences between the treated samples and controls.
>
> Best,
>
> Jim
>
>
>> I know that the idea is to keep the amount of posted code to a minimum but I don't see any other way here.
>>
>> Here is the simplified example I have made:
>> library(limma)
>> library(vsn)
>> library(gplots)
>>
>> set.seed(0xabcd) #from vsn manual
>>
>> #constants for reading agi outs from our imager
>> COL__<- list(G="Raw intensity (tr.mean) {A}",Gb="Background (tr.mean)
>> {A}")
>> ANO__<- c("Position", "Name", "ID", "Description", "GeneName")
>>
>> #parse data
>> targets<- readTargets("test.csv", sep = ",")
>> gal<- readGAL("026806_D_20110524.gal")
>> layout<- getLayout(gal)
>> agi<- read.maimages(targets, columns    = COL__,
>>                                     annotation = ANO__,
>>                                     green.only = TRUE)
>> design<- model.matrix(~0+factor(c(1,1,2,2)))
>> colnames(design)<- c("s1", "s2")
>>
>> #process
>> bg<- backgroundCorrect(agi, method = "normexp")
>> norm<- normalizeBetweenArrays(bg, method = "cyclicloess")
>> avg<- avereps(norm, ID=norm$genes$ProbeName)
>>
>> #fit
>> contmat<- makeContrasts(s2-s1, levels = design)
>> fit_lm<- lmFit(avg$E, design)
>> fit_be<- eBayes(contrasts.fit(fit_lm, contmat))
>>
>> #results
>> difexp<- topTable(fit_be, coef = 1, adjust = "BH")
>> res<- decideTests(fit_be)
>> comp<- vennDiagram(res)
>>
>> #plot
>> pdf("example-heatmap.pdf")
>> genes<- as.numeric(rownames(topTable(fit_be, n = 100)))
>> heatmap(fit_lm$coefficients[genes,], col=redgreen(75), key = TRUE)
>> dev.off()
>>
>>
>> The full version:
>> library(vsn)
>> library(limma)
>> library(gplots)
>> set.seed(0xabcd) #from the vsn manual
>>
>> #constants
>> A_SUB__<- 1
>> A_NXP__<- 2
>> A_MIN__<- 3
>> A_VSN__<- 1
>> A_QNT__<- 2
>> A_LWS__<- 3
>> M_SUB__<- "subtract"
>> M_NXP__<- "normexp"
>> M_MIN__<- "minimum"
>> M_VSN__<- "vsn"
>> M_LWS__<- "cyclicloess"
>> M_QNT__<- "quantile"
>> COL__<- list(G="Raw intensity (tr.mean) {A}",Gb="Background (tr.mean)
>> {A}")
>> ANO__<- c("Position", "Name", "ID", "Description", "GeneName")
>> N_BGM__<- 3 #number of background correction methods
>> N_NRM__<- 3 #number of normalization methods
>> A_BGM__<- c(A_SUB__, A_NXP__, A_MIN__)
>> A_NRM__<- c(A_VSN__, A_QNT__, A_LWS__)
>>
>>
>> #utility methods
>> #applies subtraction background correction with limma
>> bg_subtract<- function(x){
>>       return(list(norm = x$norm,
>>                   bg   = M_SUB__,
>>                   agi  = backgroundCorrect(x$agi, method = M_SUB__))) }
>>
>> #applies normexp backgroud correction with limma
>> bg_normexp<- function(x){
>>       return(list(norm = x$norm,
>>                   bg   = M_NXP__,
>>                   agi  = backgroundCorrect(x$agi, method = M_NXP__))) }
>>
>> #applies minimum background correction with limma
>> bg_minimum<- function(x){
>>       return(list(norm = x$norm,
>>                   bg   = M_MIN__,
>>                   agi  = backgroundCorrect(x$agi, method = M_MIN__))) }
>>
>> #applies VSN normalization with VSN package
>> nb_vsn<- function(x){
>>       return(list(norm = M_VSN__,
>>                   bg   = x$bg,
>>                   agi  = normalizeVSN(x$agi))) }
>>
>> #applies VSN normalization with VSN package
>> nb_vsn<- function(x){
>>       return(list(norm = M_VSN__,
>>                   bg   = x$bg,
>>                   agi  = normalizeVSN(x$agi))) }
>>
>> #applies cyclic lowess normalization with limma
>> nb_lowess<- function(x){
>>      return(list(norm = M_LWS__,
>>                    bg = x$bg,
>>                   agi = normalizeBetweenArrays(x$agi, method =
>> M_LWS__))) }
>>
>> #applies quantile normalization with limma
>> nb_quantile<- function(x){
>>       return(list(norm = M_QNT__,
>>                     bg = x$bg,
>>                    agi = normalizeBetweenArrays(x$agi, method =
>> M_QNT__))) }
>>
>> #generates a plot of MA values with limma
>> pl_ma<- function(x){
>>       name<- paste(c(x[[1]]$bg, x[[1]]$norm, "-ma.pdf"), collapse="")
>>       pdf(name)
>>       plotMA(x[[1]]$agi$E)
>>       dev.off()
>> }
>>
>> #generates a plot of mean versus stdev with VS
>> pl_meansd<- function(x){
>>       name<- paste(c(x[[1]]$bg, x[[1]]$norm, "-meansd.pdf"), collapse="")
>>       pdf(name)
>>       meanSdPlot(x[[1]]$agi$E)
>>       dev.off()
>> }
>>
>> #generate volcanoplot of log-fold versus log-odds
>> pl_volcano<- function(x){
>>       name<- paste(c(x[[1]]$bg, x[[1]]$norm, "-volcano.pdf"), collapse="")
>>       pdf(name)
>>       volcanoplot(x[[1]]$bayes)
>>       dev.off()
>> }
>>
>> #generate plot of expression data over time
>> pl_lines<- function(x){
>>       name<- paste(c(x[[1]]$bg, x[[1]]$norm, "-lines.pdf"), collapse="")
>>       pdf(name)
>>       plotlines(x[[1]]$agi$E)
>>       dev.off()
>> }
>>
>> #generate heatmat of final results
>> pl_heatmap<- function(x, ntop){
>>       name<- paste(c(x[[1]]$bg, x[[1]]$norm, "-heatmap.pdf"), collapse="")
>>       pdf(name)
>>       genes<- as.numeric(rownames(topTable(x[[1]]$bayes, n = ntop)))
>>       heatmap(x[[1]]$lm$coefficients[genes,],
>>                 col   = redgreen(75),
>>                 key   = TRUE)
>>       dev.off()
>> }
>>
>> #averages replicates of probes
>> m_repavg<- function(x){
>>       return(list(agi     = x[[1]]$agi,
>>                   avg     = avereps(x[[1]]$agi, ID=x[[1]]$agi$genes$ProbeName),
>>                   bg      = x[[1]]$bg,
>>                   norm    = x[[1]]$norm))
>> }
>>
>> #generate contrasts
>> m_fit<- function(x, design){
>>       contmat<- makeContrasts(s2-s1,
>>                                  levels = design)
>>       fit1<- lmFit(x[[1]]$agi$E, design)
>>       fit2<- eBayes(contrasts.fit(fit1, contmat))
>>       diff<- topTable(fit2, coef = 1, adjust = "BH")
>>       res<- decideTests(fit2)
>>       comp<- vennDiagram(res)
>>       return(list(agi   = x[[1]]$agi,
>>                   bg    = x[[1]]$bg,
>>                   norm  = x[[1]]$norm,
>>                   ave   = x[[1]]$ave,
>>                   cont  = contmat,
>>                   lm    = fit1,
>>                   bayes = fit2,
>>                   diff  = diff,
>>                   res   = res,
>>                   comp  = comp))
>> }
>>
>> ######################################################################
>> ######### # Function to parse array data into limma objects # Args:
>> #     tfile:   target file (see limma documentation)
>> #     tsep:    limma defaults TSV, have to set manually for CSV and so on
>> #     galfile: GAL file for printer/layout
>> # Returns data object with targets, genes and printer layout
>> ######################################################################
>> #########
>> parse<- function(tfile, tsep, galfile){
>>       targets<- readTargets(tfile, sep=tsep)
>>       gal<- readGAL(galfile)
>>       layout<- getLayout(gal)
>>       agi<- read.maimages(targets, columns    = COL__,
>>                                         annotation = ANO__,
>>                                         green.only = TRUE)
>>       bg<- "none"
>>       norm<- "none"
>>       desg<- model.matrix(~0+factor(c(1,1,2,2)))
>>       colnames(desg)<- c("s1","s2")
>>       d<- list(agi     = agi,
>>                 gal     = gal,
>>                 layout  = layout,
>>                 targets = targets,
>>                 bg      = bg,
>>                 desg    = desg,
>>                 norm    = norm)
>>
>>       processed<- matrix(NA, nrow = N_BGM__, ncol = N_NRM__)
>>       processed<- apply(processed, 1:2, function(x){return(d)})
>>       return(processed)
>> }
>>
>> #parse command args
>> args<- commandArgs(trailingOnly = TRUE)
>>
>> #load files into matrix
>> a<- parse(args[1], args[2], args[3])
>>
>> #keepy a copy of a raw dataset
>> raw<- a[1,1]
>> print(raw[[1]]$desg)
>>
>> #perform backgroung corrections
>> a[,A_SUB__]<- lapply(a[,A_SUB__], bg_subtract)
>> a[,A_NXP__]<- lapply(a[,A_NXP__], bg_normexp)
>> a[,A_MIN__]<- lapply(a[,A_MIN__], bg_minimum)
>>
>> #perform normalization
>> a[A_VSN__,]<- lapply(a[A_VSN__,], nb_vsn)
>> a[A_LWS__,]<- lapply(a[A_LWS__,], nb_lowess)
>> a[A_QNT__,]<- lapply(a[A_QNT__,], nb_quantile)
>>
>> #fit model and differentiate expression
>> a<- apply(a, 1:2, m_repavg)
>> a<- apply(a, 1:2, m_fit, design = raw[[1]]$desg)
>>
>> #generate plots
>> apply(a, 1:2, pl_ma)
>> apply(a, 1:2, pl_meansd)
>> apply(a, 1:2, pl_volcano)
>> apply(a, 1:2, pl_lines)
>> apply(a, 1:2, pl_heatmap, args[4])
>>
>>
>>      [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
> --
> James W. MacDonald, M.S.
> Biostatistician
> University of Washington
> Environmental and Occupational Health Sciences
> 4225 Roosevelt Way NE, # 100
> Seattle WA 98105-6099
>

--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099



------------------------------

Message: 6
Date: Thu, 28 Mar 2013 15:00:49 +0000
From: "Cornish, Joseph (NIH/NIAID) [F]" <joseph.cornish at nih.gov>
To: "Davis, Sean (NIH/NCI) [E]" <sdavis2 at mail.nih.gov>
Cc: "bioconductor at r-project.org" <bioconductor at r-project.org>,  "James
        W. MacDonald" <jmacdon at uw.edu>
Subject: Re: [BioC] Heatmaps for Agilent Single Color 4x44k using
        limma and vsn.
Message-ID: <E68EA43FBF80174C85A92BC24EFAA3B84B7C02 at MLBXV01.nih.gov>
Content-Type: text/plain

Hello Davis,

    I posted a simple example, which after fixing a problem in the longer example no produces the same results. I only posted the longer one because I wasn???t able to reproduce the results of the longer version initially. I???ve been sort of banging my head on this for a while so I missed the difference between ???agi??? and ???avg???. I???m thankful anyone bothered to look after posting the code, so thanks. If I sounded offended by the underscores, I was only admitting that I know they???re terrible. c:

Thanks for clarifying the subtraction of the mean aspect. That has been one aspect that has been troubling when it comes to learning this analysis, there seem to be plenty of ways to do things but, without experience, no way to tell what is better.

I will give the mean subtraction and the z-score scaling a try. Am I reading too much into the usefulness of these heatmaps? In most papers I???ve seen they seem to be there because the look nice or the attached clustering might be of some use.

Thanks again!

From: Davis, Sean (NCI) On Behalf Of Davis, Sean (NIH/NCI) [E]
Sent: Thursday, March 28, 2013 10:34 AM
To: Cornish, Joseph (NIH/NIAID) [F]
Cc: James W. MacDonald; bioconductor at r-project.org
Subject: Re: [BioC] Heatmaps for Agilent Single Color 4x44k using limma and vsn.



On Thu, Mar 28, 2013 at 10:22 AM, Cornish, Joseph (NIH/NIAID) [F] <joseph.cornish at nih.gov<mailto:joseph.cornish at nih.gov>> wrote:
Hello James,

Are you referring to function names or constants? The constants are an old habit I picked up long ago in my CS classes, there are a ton but it makes them stand out for me in vim more readily. I've come to avoid camel case since switching to linux, it makes searching for things a nightmare, it can also help if I group functions with a prefix (eg plot functions start with "pl_"). If you want I can find+replace all of the constants with their values...

Hi, Joe.

I think I could help to translate part of what James said.  If you could boil your example down to a few lines of reproducible code that show the problem, that would be simpler to interpret and easier to comment on directly.  Sometimes in the process of producing such a simplified example, I have found my own misunderstandings.  It also sometimes helps to get someone local to give a second pair of eyes.

The differences came from using the unaveraged replicates in the larger script versus the average in the example.

I'm not sure what you mean by "mean of the controls", would this simply be filtering out the control spots?

No.  I think he means for you to subtract the mean of the control samples (or one group of samples) for each gene from all the normalized expression values.

I'll generate the heatmaps using the normalized expression values. I suppose in that case it would be worthwhile to filter ahead of time by p-value and fold change (the coef in topTable).

Yes.

Sean


_________thanks_____________
_______Joe______

-----Original Message-----
From: James W. MacDonald [mailto:jmacdon at uw.edu<mailto:jmacdon at uw.edu>]
Sent: Thursday, March 28, 2013 10:08 AM
To: Cornish, Joseph (NIH/NIAID) [F]
Cc: bioconductor at r-project.org<mailto:bioconductor at r-project.org>
Subject: Re: [BioC] Heatmaps for Agilent Single Color 4x44k using limma and vsn.

Hi Joseph,


On 3/28/2013 9:27 AM, Cornish, Joseph (NIH/NIAID) [F] wrote:
> I am attempting to write a script that will allow me to do bulk benchmarking of various methods for normalization and background correction. I am concerned that I am not processing the data correctly and this may be leading to the abnormal heatmaps that I am seeing. For a test case, I am using an experiment where I have two different conditions with two different states with two replicates per state.
>
> The heatmaps concern me because of the uniformity between them, the values are all completely green or completely red, there are differences between genes within a state according to the heatmap.
>
> In the examples I have seen people have suggested that I use the coefficients from the eBayes fitting of the data, however in my example the eBayes LMarray object does not have two columns in the results, only on (not sure if this is a sign of errors on my part). I have instead used the lmFit coefficients since there are column for each matrix.
>
> To further confuse someone who is new to R, the simplified example and full code generate different heatmaps for the same data and same methods (normexp for background, cyclic loess for norm). The differences being in the simplified version the gene IDs show up on the right side but in the full version, only row number appears. The color for some genes changes between states as well.
>
> My questions are:
> Why does the bayes model only have one column?

Because you are computing one contrast (which is all you can do). One contrast means one test which means one column of results. In other words, the coefficient you are estimating in the contrasts.fit() step is the difference in means between the two groups, which is one value per gene.

> Am I approaching the processing correctly?

I tried to read through that business below, but got some weird affliction due to an apparent overdose of underscores and had to stop.

> Why are there differences between the resulting heatmaps of each version?

I don't know, as I worried my brain might explode if I read any more.
But you should note that it is fairly unusual to plot coefficients in a heatmap, as what you are then presenting are log ratios, and most people can't understand that.

Instead you might consider getting the list of genes, and then creating a heatmap using the normalized expression values for each sample instead. You then would want to scale in some way. We often sweep the mean of the controls out of the matrix (by row), so the colors can then be interpreted as differences between the treated samples and controls.

Best,

Jim


>
> I know that the idea is to keep the amount of posted code to a minimum but I don't see any other way here.
>
> Here is the simplified example I have made:
> library(limma)
> library(vsn)
> library(gplots)
>
> set.seed(0xabcd) #from vsn manual
>
> #constants for reading agi outs from our imager
> COL__<- list(G="Raw intensity (tr.mean) {A}",Gb="Background (tr.mean)
> {A}")
> ANO__<- c("Position", "Name", "ID", "Description", "GeneName")
>
> #parse data
> targets<- readTargets("test.csv", sep = ",")
> gal<- readGAL("026806_D_20110524.gal")
> layout<- getLayout(gal)
> agi<- read.maimages(targets, columns    = COL__,
>                                    annotation = ANO__,
>                                    green.only = TRUE)
> design<- model.matrix(~0+factor(c(1,1,2,2)))
> colnames(design)<- c("s1", "s2")
>
> #process
> bg<- backgroundCorrect(agi, method = "normexp")
> norm<- normalizeBetweenArrays(bg, method = "cyclicloess")
> avg<- avereps(norm, ID=norm$genes$ProbeName)
>
> #fit
> contmat<- makeContrasts(s2-s1, levels = design)
> fit_lm<- lmFit(avg$E, design)
> fit_be<- eBayes(contrasts.fit(fit_lm, contmat))
>
> #results
> difexp<- topTable(fit_be, coef = 1, adjust = "BH")
> res<- decideTests(fit_be)
> comp<- vennDiagram(res)
>
> #plot
> pdf("example-heatmap.pdf")
> genes<- as.numeric(rownames(topTable(fit_be, n = 100)))
> heatmap(fit_lm$coefficients[genes,], col=redgreen(75), key = TRUE)
> dev.off()
>
>
> The full version:
> library(vsn)
> library(limma)
> library(gplots)
> set.seed(0xabcd) #from the vsn manual
>
> #constants
> A_SUB__<- 1
> A_NXP__<- 2
> A_MIN__<- 3
> A_VSN__<- 1
> A_QNT__<- 2
> A_LWS__<- 3
> M_SUB__<- "subtract"
> M_NXP__<- "normexp"
> M_MIN__<- "minimum"
> M_VSN__<- "vsn"
> M_LWS__<- "cyclicloess"
> M_QNT__<- "quantile"
> COL__<- list(G="Raw intensity (tr.mean) {A}",Gb="Background (tr.mean)
> {A}")
> ANO__<- c("Position", "Name", "ID", "Description", "GeneName")
> N_BGM__<- 3 #number of background correction methods
> N_NRM__<- 3 #number of normalization methods
> A_BGM__<- c(A_SUB__, A_NXP__, A_MIN__)
> A_NRM__<- c(A_VSN__, A_QNT__, A_LWS__)
>
>
> #utility methods
> #applies subtraction background correction with limma
> bg_subtract<- function(x){
>      return(list(norm = x$norm,
>                  bg   = M_SUB__,
>                  agi  = backgroundCorrect(x$agi, method = M_SUB__))) }
>
> #applies normexp backgroud correction with limma
> bg_normexp<- function(x){
>      return(list(norm = x$norm,
>                  bg   = M_NXP__,
>                  agi  = backgroundCorrect(x$agi, method = M_NXP__))) }
>
> #applies minimum background correction with limma
> bg_minimum<- function(x){
>      return(list(norm = x$norm,
>                  bg   = M_MIN__,
>                  agi  = backgroundCorrect(x$agi, method = M_MIN__))) }
>
> #applies VSN normalization with VSN package
> nb_vsn<- function(x){
>      return(list(norm = M_VSN__,
>                  bg   = x$bg,
>                  agi  = normalizeVSN(x$agi))) }
>
> #applies VSN normalization with VSN package
> nb_vsn<- function(x){
>      return(list(norm = M_VSN__,
>                  bg   = x$bg,
>                  agi  = normalizeVSN(x$agi))) }
>
> #applies cyclic lowess normalization with limma
> nb_lowess<- function(x){
>     return(list(norm = M_LWS__,
>                   bg = x$bg,
>                  agi = normalizeBetweenArrays(x$agi, method =
> M_LWS__))) }
>
> #applies quantile normalization with limma
> nb_quantile<- function(x){
>      return(list(norm = M_QNT__,
>                    bg = x$bg,
>                   agi = normalizeBetweenArrays(x$agi, method =
> M_QNT__))) }
>
> #generates a plot of MA values with limma
> pl_ma<- function(x){
>      name<- paste(c(x[[1]]$bg, x[[1]]$norm, "-ma.pdf"), collapse="")
>      pdf(name)
>      plotMA(x[[1]]$agi$E)
>      dev.off()
> }
>
> #generates a plot of mean versus stdev with VS
> pl_meansd<- function(x){
>      name<- paste(c(x[[1]]$bg, x[[1]]$norm, "-meansd.pdf"), collapse="")
>      pdf(name)
>      meanSdPlot(x[[1]]$agi$E)
>      dev.off()
> }
>
> #generate volcanoplot of log-fold versus log-odds
> pl_volcano<- function(x){
>      name<- paste(c(x[[1]]$bg, x[[1]]$norm, "-volcano.pdf"), collapse="")
>      pdf(name)
>      volcanoplot(x[[1]]$bayes)
>      dev.off()
> }
>
> #generate plot of expression data over time
> pl_lines<- function(x){
>      name<- paste(c(x[[1]]$bg, x[[1]]$norm, "-lines.pdf"), collapse="")
>      pdf(name)
>      plotlines(x[[1]]$agi$E)
>      dev.off()
> }
>
> #generate heatmat of final results
> pl_heatmap<- function(x, ntop){
>      name<- paste(c(x[[1]]$bg, x[[1]]$norm, "-heatmap.pdf"), collapse="")
>      pdf(name)
>      genes<- as.numeric(rownames(topTable(x[[1]]$bayes, n = ntop)))
>      heatmap(x[[1]]$lm$coefficients[genes,],
>                col   = redgreen(75),
>                key   = TRUE)
>      dev.off()
> }
>
> #averages replicates of probes
> m_repavg<- function(x){
>      return(list(agi     = x[[1]]$agi,
>                  avg     = avereps(x[[1]]$agi, ID=x[[1]]$agi$genes$ProbeName),
>                  bg      = x[[1]]$bg,
>                  norm    = x[[1]]$norm))
> }
>
> #generate contrasts
> m_fit<- function(x, design){
>      contmat<- makeContrasts(s2-s1,
>                                 levels = design)
>      fit1<- lmFit(x[[1]]$agi$E, design)
>      fit2<- eBayes(contrasts.fit(fit1, contmat))
>      diff<- topTable(fit2, coef = 1, adjust = "BH")
>      res<- decideTests(fit2)
>      comp<- vennDiagram(res)
>      return(list(agi   = x[[1]]$agi,
>                  bg    = x[[1]]$bg,
>                  norm  = x[[1]]$norm,
>                  ave   = x[[1]]$ave,
>                  cont  = contmat,
>                  lm    = fit1,
>                  bayes = fit2,
>                  diff  = diff,
>                  res   = res,
>                  comp  = comp))
> }
>
> ######################################################################
> ######### # Function to parse array data into limma objects # Args:
> #     tfile:   target file (see limma documentation)
> #     tsep:    limma defaults TSV, have to set manually for CSV and so on
> #     galfile: GAL file for printer/layout
> # Returns data object with targets, genes and printer layout
> ######################################################################
> #########
> parse<- function(tfile, tsep, galfile){
>      targets<- readTargets(tfile, sep=tsep)
>      gal<- readGAL(galfile)
>      layout<- getLayout(gal)
>      agi<- read.maimages(targets, columns    = COL__,
>                                        annotation = ANO__,
>                                        green.only = TRUE)
>      bg<- "none"
>      norm<- "none"
>      desg<- model.matrix(~0+factor(c(1,1,2,2)))
>      colnames(desg)<- c("s1","s2")
>      d<- list(agi     = agi,
>                gal     = gal,
>                layout  = layout,
>                targets = targets,
>                bg      = bg,
>                desg    = desg,
>                norm    = norm)
>
>      processed<- matrix(NA, nrow = N_BGM__, ncol = N_NRM__)
>      processed<- apply(processed, 1:2, function(x){return(d)})
>      return(processed)
> }
>
> #parse command args
> args<- commandArgs(trailingOnly = TRUE)
>
> #load files into matrix
> a<- parse(args[1], args[2], args[3])
>
> #keepy a copy of a raw dataset
> raw<- a[1,1]
> print(raw[[1]]$desg)
>
> #perform backgroung corrections
> a[,A_SUB__]<- lapply(a[,A_SUB__], bg_subtract)
> a[,A_NXP__]<- lapply(a[,A_NXP__], bg_normexp)
> a[,A_MIN__]<- lapply(a[,A_MIN__], bg_minimum)
>
> #perform normalization
> a[A_VSN__,]<- lapply(a[A_VSN__,], nb_vsn)
> a[A_LWS__,]<- lapply(a[A_LWS__,], nb_lowess)
> a[A_QNT__,]<- lapply(a[A_QNT__,], nb_quantile)
>
> #fit model and differentiate expression
> a<- apply(a, 1:2, m_repavg)
> a<- apply(a, 1:2, m_fit, design = raw[[1]]$desg)
>
> #generate plots
> apply(a, 1:2, pl_ma)
> apply(a, 1:2, pl_meansd)
> apply(a, 1:2, pl_volcano)
> apply(a, 1:2, pl_lines)
> apply(a, 1:2, pl_heatmap, args[4])
>
>
>       [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org<mailto:Bioconductor at r-project.org>
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor

--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099

_______________________________________________
Bioconductor mailing list
Bioconductor at r-project.org<mailto:Bioconductor at r-project.org>
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor


        [[alternative HTML version deleted]]


------------------------------

Message: 7
Date: Thu, 28 Mar 2013 15:18:59 +0000
From: Andreia Fonseca <andreia.fonseca at gmail.com>
To: bioconductor <bioconductor at stat.math.ethz.ch>
Subject: [BioC] analysis of HuGene2.0-st array affymetrix -aroma
        package
Message-ID:
        <CAG43sEL74j9EYpo8izYGSy7=C0Jye5bJw5eGcoHkug9mmYp_FQ at mail.gmail.com>
Content-Type: text/plain

Dear all,

I am analysing HuGene2.0-st array affymetrix using aroma and limma package.
I have 2 treatments and 3 arrays per condition. I have getting 11 genes
that are differentially expressed, but these loose significance after
applying fdr correction. I want to apply a filter out genes with similar
distributions between treatments before applying fdr correction to see   I
am loosing significance due to overcorrection. Can someone advise me on a
function or package to do this? I have explored genefilter and I could not
find a function for this.
Thanks for the help.

Kind regards,

Andreia


-----------------------------------------------------------------------------------------------
Andreia J. Amaral, PhD
BioFIG - Center for Biodiversity, Functional and Integrative Genomics
Instituto de Medicina Molecular
University of Lisbon
Tel: +352 217500000 (ext. office: 28253)
email:andreiaamaral at fm.ul.pt ; andreiaamaral at fc.ul.pt

        [[alternative HTML version deleted]]



------------------------------

Message: 8
Date: Thu, 28 Mar 2013 11:47:12 -0400
From: Sean Davis <sdavis2 at mail.nih.gov>
To: "Cornish, Joseph (NIH/NIAID) [F]" <joseph.cornish at nih.gov>
Cc: "bioconductor at r-project.org" <bioconductor at r-project.org>,  "James
        W. MacDonald" <jmacdon at uw.edu>
Subject: Re: [BioC] Heatmaps for Agilent Single Color 4x44k using
        limma and       vsn.
Message-ID:
        <CANeAVB=Xn9C2bJqK6GF5DpTGHbLtgrcOV+oh0T05qsmD1s92dg at mail.gmail.com>
Content-Type: text/plain; charset=UTF-8

On Thu, Mar 28, 2013 at 11:00 AM, Cornish, Joseph (NIH/NIAID) [F]
<joseph.cornish at nih.gov> wrote:
> Hello Davis,
>
>     I posted a simple example, which after fixing a problem in the longer example no produces the same results. I only posted the longer one because I wasn?t able to reproduce the results of the longer version initially. I?ve been sort of banging my head on this for a while so I missed the difference between ?agi? and ?avg?. I?m thankful anyone bothered to look after posting the code, so thanks. If I sounded offended by the underscores, I was only admitting that I know they?re terrible. c:
>
> Thanks for clarifying the subtraction of the mean aspect. That has been one aspect that has been troubling when it comes to learning this analysis, there seem to be plenty of ways to do things but, without experience, no way to tell what is better.
>

There is not a "right" way to do this since the only metric I have
seen for the evaluation of the color scale in a heatmap is subjective
and qualitative.

> I will give the mean subtraction and the z-score scaling a try. Am I reading too much into the usefulness of these heatmaps? In most papers I?ve seen they seem to be there because the look nice or the attached clustering might be of some use.
>

Heatmaps are very good visualization tools.  I would not try to attach
more utility to them than that.

Sean


> Thanks again!
>
> From: Davis, Sean (NCI) On Behalf Of Davis, Sean (NIH/NCI) [E]
> Sent: Thursday, March 28, 2013 10:34 AM
> To: Cornish, Joseph (NIH/NIAID) [F]
> Cc: James W. MacDonald; bioconductor at r-project.org
> Subject: Re: [BioC] Heatmaps for Agilent Single Color 4x44k using limma and vsn.
>
>
>
> On Thu, Mar 28, 2013 at 10:22 AM, Cornish, Joseph (NIH/NIAID) [F] <joseph.cornish at nih.gov<mailto:joseph.cornish at nih.gov>> wrote:
> Hello James,
>
> Are you referring to function names or constants? The constants are an old habit I picked up long ago in my CS classes, there are a ton but it makes them stand out for me in vim more readily. I've come to avoid camel case since switching to linux, it makes searching for things a nightmare, it can also help if I group functions with a prefix (eg plot functions start with "pl_"). If you want I can find+replace all of the constants with their values...
>
> Hi, Joe.
>
> I think I could help to translate part of what James said.  If you could boil your example down to a few lines of reproducible code that show the problem, that would be simpler to interpret and easier to comment on directly.  Sometimes in the process of producing such a simplified example, I have found my own misunderstandings.  It also sometimes helps to get someone local to give a second pair of eyes.
>
> The differences came from using the unaveraged replicates in the larger script versus the average in the example.
>
> I'm not sure what you mean by "mean of the controls", would this simply be filtering out the control spots?
>
> No.  I think he means for you to subtract the mean of the control samples (or one group of samples) for each gene from all the normalized expression values.
>
> I'll generate the heatmaps using the normalized expression values. I suppose in that case it would be worthwhile to filter ahead of time by p-value and fold change (the coef in topTable).
>
> Yes.
>
> Sean
>
>
> _________thanks_____________
> _______Joe______
>
> -----Original Message-----
> From: James W. MacDonald [mailto:jmacdon at uw.edu<mailto:jmacdon at uw.edu>]
> Sent: Thursday, March 28, 2013 10:08 AM
> To: Cornish, Joseph (NIH/NIAID) [F]
> Cc: bioconductor at r-project.org<mailto:bioconductor at r-project.org>
> Subject: Re: [BioC] Heatmaps for Agilent Single Color 4x44k using limma and vsn.
>
> Hi Joseph,
>
>
> On 3/28/2013 9:27 AM, Cornish, Joseph (NIH/NIAID) [F] wrote:
>> I am attempting to write a script that will allow me to do bulk benchmarking of various methods for normalization and background correction. I am concerned that I am not processing the data correctly and this may be leading to the abnormal heatmaps that I am seeing. For a test case, I am using an experiment where I have two different conditions with two different states with two replicates per state.
>>
>> The heatmaps concern me because of the uniformity between them, the values are all completely green or completely red, there are differences between genes within a state according to the heatmap.
>>
>> In the examples I have seen people have suggested that I use the coefficients from the eBayes fitting of the data, however in my example the eBayes LMarray object does not have two columns in the results, only on (not sure if this is a sign of errors on my part). I have instead used the lmFit coefficients since there are column for each matrix.
>>
>> To further confuse someone who is new to R, the simplified example and full code generate different heatmaps for the same data and same methods (normexp for background, cyclic loess for norm). The differences being in the simplified version the gene IDs show up on the right side but in the full version, only row number appears. The color for some genes changes between states as well.
>>
>> My questions are:
>> Why does the bayes model only have one column?
>
> Because you are computing one contrast (which is all you can do). One contrast means one test which means one column of results. In other words, the coefficient you are estimating in the contrasts.fit() step is the difference in means between the two groups, which is one value per gene.
>
>> Am I approaching the processing correctly?
>
> I tried to read through that business below, but got some weird affliction due to an apparent overdose of underscores and had to stop.
>
>> Why are there differences between the resulting heatmaps of each version?
>
> I don't know, as I worried my brain might explode if I read any more.
> But you should note that it is fairly unusual to plot coefficients in a heatmap, as what you are then presenting are log ratios, and most people can't understand that.
>
> Instead you might consider getting the list of genes, and then creating a heatmap using the normalized expression values for each sample instead. You then would want to scale in some way. We often sweep the mean of the controls out of the matrix (by row), so the colors can then be interpreted as differences between the treated samples and controls.
>
> Best,
>
> Jim
>
>
>>
>> I know that the idea is to keep the amount of posted code to a minimum but I don't see any other way here.
>>
>> Here is the simplified example I have made:
>> library(limma)
>> library(vsn)
>> library(gplots)
>>
>> set.seed(0xabcd) #from vsn manual
>>
>> #constants for reading agi outs from our imager
>> COL__<- list(G="Raw intensity (tr.mean) {A}",Gb="Background (tr.mean)
>> {A}")
>> ANO__<- c("Position", "Name", "ID", "Description", "GeneName")
>>
>> #parse data
>> targets<- readTargets("test.csv", sep = ",")
>> gal<- readGAL("026806_D_20110524.gal")
>> layout<- getLayout(gal)
>> agi<- read.maimages(targets, columns    = COL__,
>>                                    annotation = ANO__,
>>                                    green.only = TRUE)
>> design<- model.matrix(~0+factor(c(1,1,2,2)))
>> colnames(design)<- c("s1", "s2")
>>
>> #process
>> bg<- backgroundCorrect(agi, method = "normexp")
>> norm<- normalizeBetweenArrays(bg, method = "cyclicloess")
>> avg<- avereps(norm, ID=norm$genes$ProbeName)
>>
>> #fit
>> contmat<- makeContrasts(s2-s1, levels = design)
>> fit_lm<- lmFit(avg$E, design)
>> fit_be<- eBayes(contrasts.fit(fit_lm, contmat))
>>
>> #results
>> difexp<- topTable(fit_be, coef = 1, adjust = "BH")
>> res<- decideTests(fit_be)
>> comp<- vennDiagram(res)
>>
>> #plot
>> pdf("example-heatmap.pdf")
>> genes<- as.numeric(rownames(topTable(fit_be, n = 100)))
>> heatmap(fit_lm$coefficients[genes,], col=redgreen(75), key = TRUE)
>> dev.off()
>>
>>
>> The full version:
>> library(vsn)
>> library(limma)
>> library(gplots)
>> set.seed(0xabcd) #from the vsn manual
>>
>> #constants
>> A_SUB__<- 1
>> A_NXP__<- 2
>> A_MIN__<- 3
>> A_VSN__<- 1
>> A_QNT__<- 2
>> A_LWS__<- 3
>> M_SUB__<- "subtract"
>> M_NXP__<- "normexp"
>> M_MIN__<- "minimum"
>> M_VSN__<- "vsn"
>> M_LWS__<- "cyclicloess"
>> M_QNT__<- "quantile"
>> COL__<- list(G="Raw intensity (tr.mean) {A}",Gb="Background (tr.mean)
>> {A}")
>> ANO__<- c("Position", "Name", "ID", "Description", "GeneName")
>> N_BGM__<- 3 #number of background correction methods
>> N_NRM__<- 3 #number of normalization methods
>> A_BGM__<- c(A_SUB__, A_NXP__, A_MIN__)
>> A_NRM__<- c(A_VSN__, A_QNT__, A_LWS__)
>>
>>
>> #utility methods
>> #applies subtraction background correction with limma
>> bg_subtract<- function(x){
>>      return(list(norm = x$norm,
>>                  bg   = M_SUB__,
>>                  agi  = backgroundCorrect(x$agi, method = M_SUB__))) }
>>
>> #applies normexp backgroud correction with limma
>> bg_normexp<- function(x){
>>      return(list(norm = x$norm,
>>                  bg   = M_NXP__,
>>                  agi  = backgroundCorrect(x$agi, method = M_NXP__))) }
>>
>> #applies minimum background correction with limma
>> bg_minimum<- function(x){
>>      return(list(norm = x$norm,
>>                  bg   = M_MIN__,
>>                  agi  = backgroundCorrect(x$agi, method = M_MIN__))) }
>>
>> #applies VSN normalization with VSN package
>> nb_vsn<- function(x){
>>      return(list(norm = M_VSN__,
>>                  bg   = x$bg,
>>                  agi  = normalizeVSN(x$agi))) }
>>
>> #applies VSN normalization with VSN package
>> nb_vsn<- function(x){
>>      return(list(norm = M_VSN__,
>>                  bg   = x$bg,
>>                  agi  = normalizeVSN(x$agi))) }
>>
>> #applies cyclic lowess normalization with limma
>> nb_lowess<- function(x){
>>     return(list(norm = M_LWS__,
>>                   bg = x$bg,
>>                  agi = normalizeBetweenArrays(x$agi, method =
>> M_LWS__))) }
>>
>> #applies quantile normalization with limma
>> nb_quantile<- function(x){
>>      return(list(norm = M_QNT__,
>>                    bg = x$bg,
>>                   agi = normalizeBetweenArrays(x$agi, method =
>> M_QNT__))) }
>>
>> #generates a plot of MA values with limma
>> pl_ma<- function(x){
>>      name<- paste(c(x[[1]]$bg, x[[1]]$norm, "-ma.pdf"), collapse="")
>>      pdf(name)
>>      plotMA(x[[1]]$agi$E)
>>      dev.off()
>> }
>>
>> #generates a plot of mean versus stdev with VS
>> pl_meansd<- function(x){
>>      name<- paste(c(x[[1]]$bg, x[[1]]$norm, "-meansd.pdf"), collapse="")
>>      pdf(name)
>>      meanSdPlot(x[[1]]$agi$E)
>>      dev.off()
>> }
>>
>> #generate volcanoplot of log-fold versus log-odds
>> pl_volcano<- function(x){
>>      name<- paste(c(x[[1]]$bg, x[[1]]$norm, "-volcano.pdf"), collapse="")
>>      pdf(name)
>>      volcanoplot(x[[1]]$bayes)
>>      dev.off()
>> }
>>
>> #generate plot of expression data over time
>> pl_lines<- function(x){
>>      name<- paste(c(x[[1]]$bg, x[[1]]$norm, "-lines.pdf"), collapse="")
>>      pdf(name)
>>      plotlines(x[[1]]$agi$E)
>>      dev.off()
>> }
>>
>> #generate heatmat of final results
>> pl_heatmap<- function(x, ntop){
>>      name<- paste(c(x[[1]]$bg, x[[1]]$norm, "-heatmap.pdf"), collapse="")
>>      pdf(name)
>>      genes<- as.numeric(rownames(topTable(x[[1]]$bayes, n = ntop)))
>>      heatmap(x[[1]]$lm$coefficients[genes,],
>>                col   = redgreen(75),
>>                key   = TRUE)
>>      dev.off()
>> }
>>
>> #averages replicates of probes
>> m_repavg<- function(x){
>>      return(list(agi     = x[[1]]$agi,
>>                  avg     = avereps(x[[1]]$agi, ID=x[[1]]$agi$genes$ProbeName),
>>                  bg      = x[[1]]$bg,
>>                  norm    = x[[1]]$norm))
>> }
>>
>> #generate contrasts
>> m_fit<- function(x, design){
>>      contmat<- makeContrasts(s2-s1,
>>                                 levels = design)
>>      fit1<- lmFit(x[[1]]$agi$E, design)
>>      fit2<- eBayes(contrasts.fit(fit1, contmat))
>>      diff<- topTable(fit2, coef = 1, adjust = "BH")
>>      res<- decideTests(fit2)
>>      comp<- vennDiagram(res)
>>      return(list(agi   = x[[1]]$agi,
>>                  bg    = x[[1]]$bg,
>>                  norm  = x[[1]]$norm,
>>                  ave   = x[[1]]$ave,
>>                  cont  = contmat,
>>                  lm    = fit1,
>>                  bayes = fit2,
>>                  diff  = diff,
>>                  res   = res,
>>                  comp  = comp))
>> }
>>
>> ######################################################################
>> ######### # Function to parse array data into limma objects # Args:
>> #     tfile:   target file (see limma documentation)
>> #     tsep:    limma defaults TSV, have to set manually for CSV and so on
>> #     galfile: GAL file for printer/layout
>> # Returns data object with targets, genes and printer layout
>> ######################################################################
>> #########
>> parse<- function(tfile, tsep, galfile){
>>      targets<- readTargets(tfile, sep=tsep)
>>      gal<- readGAL(galfile)
>>      layout<- getLayout(gal)
>>      agi<- read.maimages(targets, columns    = COL__,
>>                                        annotation = ANO__,
>>                                        green.only = TRUE)
>>      bg<- "none"
>>      norm<- "none"
>>      desg<- model.matrix(~0+factor(c(1,1,2,2)))
>>      colnames(desg)<- c("s1","s2")
>>      d<- list(agi     = agi,
>>                gal     = gal,
>>                layout  = layout,
>>                targets = targets,
>>                bg      = bg,
>>                desg    = desg,
>>                norm    = norm)
>>
>>      processed<- matrix(NA, nrow = N_BGM__, ncol = N_NRM__)
>>      processed<- apply(processed, 1:2, function(x){return(d)})
>>      return(processed)
>> }
>>
>> #parse command args
>> args<- commandArgs(trailingOnly = TRUE)
>>
>> #load files into matrix
>> a<- parse(args[1], args[2], args[3])
>>
>> #keepy a copy of a raw dataset
>> raw<- a[1,1]
>> print(raw[[1]]$desg)
>>
>> #perform backgroung corrections
>> a[,A_SUB__]<- lapply(a[,A_SUB__], bg_subtract)
>> a[,A_NXP__]<- lapply(a[,A_NXP__], bg_normexp)
>> a[,A_MIN__]<- lapply(a[,A_MIN__], bg_minimum)
>>
>> #perform normalization
>> a[A_VSN__,]<- lapply(a[A_VSN__,], nb_vsn)
>> a[A_LWS__,]<- lapply(a[A_LWS__,], nb_lowess)
>> a[A_QNT__,]<- lapply(a[A_QNT__,], nb_quantile)
>>
>> #fit model and differentiate expression
>> a<- apply(a, 1:2, m_repavg)
>> a<- apply(a, 1:2, m_fit, design = raw[[1]]$desg)
>>
>> #generate plots
>> apply(a, 1:2, pl_ma)
>> apply(a, 1:2, pl_meansd)
>> apply(a, 1:2, pl_volcano)
>> apply(a, 1:2, pl_lines)
>> apply(a, 1:2, pl_heatmap, args[4])
>>
>>
>>       [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org<mailto:Bioconductor at r-project.org>
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> --
> James W. MacDonald, M.S.
> Biostatistician
> University of Washington
> Environmental and Occupational Health Sciences
> 4225 Roosevelt Way NE, # 100
> Seattle WA 98105-6099
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org<mailto:Bioconductor at r-project.org>
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
>         [[alternative HTML version deleted]]
>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



------------------------------

Message: 9
Date: Thu, 28 Mar 2013 11:00:17 -0500
From: Michael Muratet <mmuratet at hudsonalpha.org>
To: bioconductor at r-project.org
Subject: [BioC] Interpreting DESeq2 results
Message-ID: <14DFA994-3719-4FDF-BD71-4290D4149457 at hudsonalpha.org>
Content-Type: text/plain; charset=us-ascii

Greetings

I have an experiment:

> design(dse)
~ factor1 + factor2 + factor3

where factor1 has two levels, factor2 has three levels and factor3 has three levels. I extract a gene of interest from the results for each term (I've changed the indices to reflect the condition):

> lapply(resultsNames(dse),function(u) results(dse,u)["gene_A",])
[["Intercept"]]
        baseMean log2FoldChange        pvalue           FDR
gene_A 1596.548       10.77485 3.309439e-216 7.025442e-216
[["factor1_level2"]]
        baseMean log2FoldChange    pvalue       FDR
gene_A 1596.548      0.3386776 0.1307309 0.3587438
[["factor2_level2"]]
        baseMean log2FoldChange    pvalue       FDR
gene_A 1596.548     -0.6882543 0.0613569 0.1007896
[["factor2_level3"]]
        baseMean log2FoldChange   pvalue       FDR
gene_A 1596.548      0.2393368 0.513216 0.6589575
[["factor3_level2"]]
        baseMean log2FoldChange    pvalue       FDR
gene_A 1596.548      0.1584153 0.6423634 0.8503163
[["factor3_level3]]
        baseMean log2FoldChange       pvalue         FDR
gene_A 1596.548      -1.627898 1.823141e-06 0.001409384

I want to be sure I understand the output format. Is it true that the coefficients (the vector beta) from the fit are the baseMean value scaled by the log2FoldChange? Is the true intercept value 1596.548*2^10.77485=2797274.13?

mcols() tells me that the baseMean term is calculated over "all rows". The baseMean is different for different genes although it is the same for each gene across all the conditions, I'm not seeing how the rows are selected.

Thanks

Mike

Michael Muratet, Ph.D.
Senior Scientist
HudsonAlpha Institute for Biotechnology
mmuratet at hudsonalpha.org
(256) 327-0473 (p)
(256) 327-0966 (f)

Room 4005
601 Genome Way
Huntsville, Alabama 35806



------------------------------

Message: 10
Date: Thu, 28 Mar 2013 16:30:12 +0000
From: Fiona Ingleby <fiona.ingleby at gmail.com>
To: "bioconductor at r-project.org" <bioconductor at r-project.org>
Subject: [BioC] identifying drosophila miRNA targets
Message-ID: <04CBBC25-9E0F-4E2B-8008-D3EB093E2A10 at gmail.com>
Content-Type: text/plain

Hi everyone,

I am working with mRNA data from Affy 'drosophila2' arrays and miRNA data from Affy 'mirna3' arrays. I have identified a list of differentially expressed mRNAs and miRNAs. I'm having a bit of trouble with some downstream analyses and I'm hoping someone might be able to offer some help.

I would like to use my list of differentially expressed miRNAs to access online databases (e.g. miRBase, microRNA.org?) and extract the names of all the potential target mRNAs. Then I'd like to use this list of mRNAs to look through my mRNA expression data. I'm aware of packages like 'RmiR' and 'microRNA' which have built-in functions for finding miRNA targets, but as far as I can tell, 'RmiR' uses miRNA databases for humans only and 'microRNA' works with human and mouse data only. So is there a package I am unaware of (or another application of 'RmiR'/'microRNA' that I am unaware of) for looking at drosophila data?

So far I have also considered the 'biomaRt' package to see if the database query function on there can help me, but I haven't had much luck. For instance, if I try an example list of miRNAs:

mirna<-c("dme-miR-1002","dme-miR-312","dme-miR-973")
library(biomaRt)
ensembl<-useMart("ensembl",dataset="dmelanogaster_gene_ensembl")
getBM(attributes="mirbase_accession",filters="mirbase_id",values=mirna,mart=ensembl)

then 'logical(0)' is returned, as if there are no records for those miRNAs - but by searching the database manually I know the records are there.

Alternatively I can try:

miRNA <- getBM(c("mirbase_accession","mirbase_id", "ensembl_gene_id", "start_position", "chromosome_name"), filters = c("with_mirbase"), values = list(T), mart = ensembl)

which returns a table of various bits of information on miRNAs, but I cannot adapt this command to just look at my list of miRNAs of interest (ie. the 'mirna' vector above). I've included the sessionInfo() output for these at the bottom of the email, but I suspect my problem is more to do with the fact I'm not going about this the right way (as opposed to a problem with package versions and coding etc.). I'm not even sure that using 'biomaRt' will give me the information I eventually want (the target mRNAs of these miRNAs), I was just trying it out, to see what it was capable of in terms of querying these databases.  So I apologise for the vagueness. Since I haven't managed to get very far by myself then it's difficult to be more specific, but I'd really appreciate it if anyone could offer some advice, even just to point me in the direction of a useful package which might have gone unnoticed by me.

Many thanks,

Fiona

Dr Fiona C Ingleby
Postdoctoral Research Fellow
University of Sussex
Email: F.Ingleby at sussex.ac.uk
Website: fionaingleby.weebly.com


> sessionInfo()
R version 2.15.2 (2012-10-26)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] biomaRt_2.14.0     affy_1.36.1        Biobase_2.18.0     BiocGenerics_0.4.0

loaded via a namespace (and not attached):
 [1] affyio_1.26.0         BiocInstaller_1.8.3   grid_2.15.2           lattice_0.20-14       Matrix_1.0-11         MCMCglmm_2.17
 [7] preprocessCore_1.20.0 RCurl_1.95-4.1        tools_2.15.2          XML_3.95-0.2          zlibbioc_1.4.0
        [[alternative HTML version deleted]]



------------------------------

Message: 11
Date: Thu, 28 Mar 2013 12:43:14 -0400
From: "James W. MacDonald" <jmacdon at uw.edu>
To: Fiona Ingleby <fiona.ingleby at gmail.com>
Cc: "bioconductor at r-project.org" <bioconductor at r-project.org>
Subject: Re: [BioC] identifying drosophila miRNA targets
Message-ID: <51547322.6020803 at uw.edu>
Content-Type: text/plain; charset=windows-1252; format=flowed

Hi Fiona,

I have a function called mirna2mrna (yeah, I know, lame function
name...) in my affycoretools package that does this, based on the sanger
microcosm targets data that you can download here:

http://www.ebi.ac.uk/enright-srv/microcosm/cgi-bin/targets/v5/download.pl

there is also a function makeHmap() that will create a heatmap with the
miRNA/mRNA pairs, where the color of the cells is based on the
correlation between the two RNA species (with the intent to show
negative correlations, indicating that the miRNA is hypothetically
causing premature degradation of the mRNA).

I think the help pages for these two functions are reasonable, but
please let me know if you have any questions.

Best,

Jim



On 3/28/2013 12:30 PM, Fiona Ingleby wrote:
> Hi everyone,
>
> I am working with mRNA data from Affy 'drosophila2' arrays and miRNA data from Affy 'mirna3' arrays. I have identified a list of differentially expressed mRNAs and miRNAs. I'm having a bit of trouble with some downstream analyses and I'm hoping someone might be able to offer some help.
>
> I would like to use my list of differentially expressed miRNAs to access online databases (e.g. miRBase, microRNA.org?) and extract the names of all the potential target mRNAs. Then I'd like to use this list of mRNAs to look through my mRNA expression data. I'm aware of packages like 'RmiR' and 'microRNA' which have built-in functions for finding miRNA targets, but as far as I can tell, 'RmiR' uses miRNA databases for humans only and 'microRNA' works with human and mouse data only. So is there a package I am unaware of (or another application of 'RmiR'/'microRNA' that I am unaware of) for looking at drosophila data?
>
> So far I have also considered the 'biomaRt' package to see if the database query function on there can help me, but I haven't had much luck. For instance, if I try an example list of miRNAs:
>
> mirna<-c("dme-miR-1002","dme-miR-312","dme-miR-973")
> library(biomaRt)
> ensembl<-useMart("ensembl",dataset="dmelanogaster_gene_ensembl")
> getBM(attributes="mirbase_accession",filters="mirbase_id",values=mirna,mart=ensembl)
>
> then 'logical(0)' is returned, as if there are no records for those miRNAs - but by searching the database manually I know the records are there.
>
> Alternatively I can try:
>
> miRNA<- getBM(c("mirbase_accession","mirbase_id", "ensembl_gene_id", "start_position", "chromosome_name"), filters = c("with_mirbase"), values = list(T), mart = ensembl)
>
> which returns a table of various bits of information on miRNAs, but I cannot adapt this command to just look at my list of miRNAs of interest (ie. the 'mirna' vector above). I've included the sessionInfo() output for these at the bottom of the email, but I suspect my problem is more to do with the fact I'm not going about this the right way (as opposed to a problem with package versions and coding etc.). I'm not even sure that using 'biomaRt' will give me the information I eventually want (the target mRNAs of these miRNAs), I was just trying it out, to see what it was capable of in terms of querying these databases.  So I apologise for the vagueness. Since I haven't managed to get very far by myself then it's difficult to be more specific, but I'd really appreciate it if anyone could offer some advice, even just to point me in the direction of a useful package which might have gone unnoticed by me.
>
> Many thanks,
>
> Fiona
>
> Dr Fiona C Ingleby
> Postdoctoral Research Fellow
> University of Sussex
> Email: F.Ingleby at sussex.ac.uk
> Website: fionaingleby.weebly.com
>
>
>> sessionInfo()
> R version 2.15.2 (2012-10-26)
> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
>
> locale:
> [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] biomaRt_2.14.0     affy_1.36.1        Biobase_2.18.0     BiocGenerics_0.4.0
>
> loaded via a namespace (and not attached):
>   [1] affyio_1.26.0         BiocInstaller_1.8.3   grid_2.15.2           lattice_0.20-14       Matrix_1.0-11         MCMCglmm_2.17
>   [7] preprocessCore_1.20.0 RCurl_1.95-4.1        tools_2.15.2          XML_3.95-0.2          zlibbioc_1.4.0
>       [[alternative HTML version deleted]]
>
>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099



------------------------------

Message: 12
Date: Thu, 28 Mar 2013 16:58:21 +0000
From: "Blanchette, Marco" <MAB at stowers.org>
To: Kasper Daniel Hansen <kasperdanielhansen at gmail.com>
Cc: "bioconductor at r-project.org" <bioconductor at r-project.org>
Subject: Re: [BioC] Limit on number of sequence files for forging a
        BSgenome
Message-ID: <1CE7C98DF956594C98E0956F4747178136D97D at MBSRV02.sgc.loc>
Content-Type: text/plain; charset="us-ascii"

Kasper,

I see your line of thought, is there a particular fasta file causing
forgeBSgenomeDataPkg() to break?

The answer is no. Once I reach a certain number of fasta files, adding one
more contig breaks the function. For instance, taking the first 454
contigs of C. brenneri breaks while removing the last or the first fasta
file from the list (keeping only 453) compile without a problem (neither
the last or the first fasta files are responsible for breaking the
function, the number of file is the trigger)

What's even more puzzling is that the number that breaks is not a fixed
number. Selecting a random selection of contigs or changing genome will
change the number that triggers the function to break... However it's
always around 440 files, which might be due to the size of the fasta files
being all of very similar sizes.

Any clues?


--  Marco Blanchette, Ph.D.
Stowers Institute for Medical Research
1000 East 50th Street
Kansas City MO 64110
www.stowers.org


Tel: 816-926-4071
Cell: 816-726-8419
Fax: 816-926-2018






On 3/27/13 8:22 PM, "Kasper Daniel Hansen" <kasperdanielhansen at gmail.com>
wrote:

>Marco,
>
>You are probably right in diagnosing the problem, but sometimes I
>think I have seen FASTA files with the entire sequence on a single
>line, instead of (say) 80 nucleotides and then a newline.  I could
>believe that a really long contig on a single line without a newline,
>could cause an error like this. You could quickly check if there is a
>suspicious file by
>  wc -l *
>and look for files with #lines like 2-3.  Somehow 460 seems a weird
>number to fail at.
>
>This may not be your problem, and I am sure Herve will respond in due
>time.
>
>Best,
>Kasper
>
>On Wed, Mar 27, 2013 at 4:28 PM, Blanchette, Marco <MAB at stowers.org>
>wrote:
>> Hi,
>>
>> Is there a maximum number of sequence files (chromosomes or contigs in
>>my case) that can be fed to the forgeBSgenomeDataPkg() function? I am
>>trying to build a BSgenome for C. brenneri and C. japonica available
>>from EnsemblGenomes. These genomes are made from thousands of contigs
>>with genes annotated to them. Currently, I get the following error when
>>running "Error: Line longer than buffer size" when running on the full
>>set of contigs. However, it works fine on a seed file containing a
>>subset of the contigs (I can forge a genome with 450 contigs but not
>>with 460!)
>>
>> Any suggestions will be appreciated (I can provide a toy example but I
>>am not sure what would be the merit of it at this point)
>>
>> Thanks
>>
>> --  Marco Blanchette, Ph.D.
>> Stowers Institute for Medical Research
>> 1000 East 50th Street
>> Kansas City MO 64110
>> www.stowers.org
>>
>> Tel: 816-926-4071
>> Cell: 816-726-8419
>> Fax: 816-926-2018
>>
>>         [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>>http://news.gmane.org/gmane.science.biology.informatics.conductor



------------------------------

Message: 13
Date: Thu, 28 Mar 2013 19:07:19 +0000
From: "Shields, Rusty (IMS)" <ShieldsR at imsweb.com>
To: "bioconductor at r-project.org" <bioconductor at r-project.org>
Subject: Re: [BioC] [R] Error in setMethod("combine"... was - Error
        when installing globaltest package
Message-ID:
        <E2E839B5BFA711438B481AD9906B0C340A6A70A8 at VARUNA.omni.imsweb.com>
Content-Type: text/plain; charset="iso-8859-1"

Thanks Martin,

No apologies necessary, just happy to have some help.

Here's the info on Biobase:

> packageDescription("Biobase")
Package: Biobase
Title: Biobase: Base functions for Bioconductor
Version: 2.18.0
Author: R. Gentleman, V. Carey, M. Morgan, S. Falcon
Description: Functions that are needed by many other packages or which
        replace R functions.
Suggests: tools, tkWidgets, ALL
Depends: R (>= 2.10), BiocGenerics (>= 0.3.2), utils
Imports: methods, BiocGenerics
Maintainer: Bioconductor Package Maintainer
        <maintainer at bioconductor.org>
License: Artistic-2.0
Collate: tools.R strings.R environment.R vignettes.R packages.R
        AllGenerics.R VersionsClass.R VersionedClasses.R
        methods-VersionsNull.R methods-VersionedClass.R DataClasses.R
        methods-aggregator.R methods-container.R methods-MIAxE.R
        methods-MIAME.R methods-AssayData.R
        methods-AnnotatedDataFrame.R methods-eSet.R
        methods-ExpressionSet.R methods-MultiSet.R methods-SnpSet.R
        methods-NChannelSet.R anyMissing.R rowOp-methods.R
        updateObjectTo.R methods-ScalarObject.R zzz.R
LazyLoad: yes
biocViews: Infrastructure, Bioinformatics
Packaged: 2012-10-02 02:41:50 UTC; biocbuild
Built: R 2.14.0; x86_64-unknown-linux-gnu; 2013-03-22 15:01:20 UTC;
        unix

-- File: /usr/local/R-2.14.0/lib64/R/library/Biobase/Meta/package.rds

-----Original Message-----
From: Martin Morgan [mailto:mtmorgan at fhcrc.org]
Sent: Thursday, March 28, 2013 2:52 PM
To: Shields, Rusty (IMS)
Cc: r-help at r-project.org
Subject: Re: [R] Error in setMethod("combine"... was - Error when installing globaltest package

On 3/28/2013 11:05 AM, Shields, Rusty (IMS) wrote:
> Hi All,
>
> I posted this on the bioconductor list and didn't get a response there, so I'm hoping someone here can help.
>
> I don't know a heck of a lot about R, so I apologize if this seems like a trivial issue.  This error comes up when trying to install the bioconductor globaltest package.
>

Sorry that you didn't get a response on the Bioc mailing list; I'd actually
suggest returning to the original thread there and I'll see that it gets answered.

My guess is that you have a version of Biobase that is too new compared to the
version (2.14.0) expected in the version of Bioconductor you are using. What
does packageDescription("Biobase") say?

Martin




> Any clues?
>
> Thanks!
> Rusty
>
> -----Original Message-----
> From: bioconductor-bounces at r-project.org [mailto:bioconductor-bounces at r-project.org] On Behalf Of Shields, Rusty (IMS)
> Sent: Tuesday, March 26, 2013 12:34 PM
> To: bioconductor at r-project.org
> Subject: [BioC] Error when installing globaltest package
>
> Hi all,
>
> I've run into a problem when attempting to install the globaltest package from Bioconductor.  I'm using R 2.14.0 on 64bit SLES 11.  Let me know what other information you might need about my system to troubleshoot this.
>
> Using the method described for this installation on the Bioconductor website:
>
>      source("http://bioconductor.org/biocLite.R")
>      biocLite("globaltest")
>
> I get and the following result, which I can't find an reference to on the list archives:
>
>> biocLite("globaltest")
> BioC_mirror: 'http://www.bioconductor.org'
> Using R version 2.14, BiocInstaller version 1.2.1.
> Installing package(s) 'globaltest'
> trying URL 'http://www.bioconductor.org/packages/2.9/bioc/src/contrib/globaltest_5.8.1.tar.gz'
> Content type 'application/x-gzip' length 956376 bytes (933 Kb)
> opened URL
> ==================================================
> downloaded 933 Kb
>
> * installing *source* package ?globaltest? ...
> ** R
> ** data
> ** inst
> ** preparing package for lazy loading
> Warning in .simpleDuplicateClass(def, prev) :
>    A specification for class ?data.frameOrNULL?
>                                                Creating a generic function for ?sort? from package ?base? in package ?globaltest?
> Creating a generic function for ?model.matrix? from package ?stats? in package ?globaltest?
> Creating a generic function for ?coefficients? from package ?stats? in package ?globaltest?
> Creating a generic function for ?fitted.values? from package ?stats? in package ?globaltest?
> Creating a generic function for ?residuals? from package ?stats? in package ?globaltest?
> Error in setMethod("combine", signature(x = "gt.result", y = "gt.result"),  :
>    no existing definition for function ?combine?
> Error : unable to load R code in package ?globaltest?
> ERROR: lazy loading failed for package ?globaltest?
> * removing ?/usr/local/R-2.14.0/lib64/R/library/globaltest?
>
> The downloaded packages are in
>          ?/tmp/RtmpdCwjNB/downloaded_packages?
> Updating HTML index of packages in '.Library'
> Making packages.html  ... done
> Old packages: 'caret'
> Update all/some/none? [a/s/n]:
>
> ________________________________
>
> Information in this e-mail may be confidential. It is intended only for the addressee(s) identified above. If you are not the addressee(s), or an employee or agent of the addressee(s), please note that any dissemination, distribution, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender of the error.
>
>          [[alternative HTML version deleted]]
>
>
> ________________________________
>
> Information in this e-mail may be confidential. It is intended only for the addressee(s) identified above. If you are not the addressee(s), or an employee or agent of the addressee(s), please note that any dissemination, distribution, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender of the error.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


--
Dr. Martin Morgan, PhD
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

________________________________

Information in this e-mail may be confidential. It is intended only for the addressee(s) identified above. If you are not the addressee(s), or an employee or agent of the addressee(s), please note that any dissemination, distribution, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender of the error.



------------------------------

Message: 14
Date: Thu, 28 Mar 2013 12:26:11 -0700
From: Martin Morgan <mtmorgan at fhcrc.org>
To: "Shields, Rusty (IMS)" <ShieldsR at imsweb.com>
Cc: "bioconductor at r-project.org" <bioconductor at r-project.org>
Subject: Re: [BioC] [R] Error in setMethod("combine"... was - Error
        when installing globaltest package
Message-ID: <51549953.6080003 at fhcrc.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

On 03/28/2013 12:07 PM, Shields, Rusty (IMS) wrote:
> Thanks Martin,
>
> No apologies necessary, just happy to have some help.
>
> Here's the info on Biobase:
>
>> packageDescription("Biobase")
> Package: Biobase
> Title: Biobase: Base functions for Bioconductor
> Version: 2.18.0

yes, this is the version of Biobase from the current version of R /
Bioconductor, whereas you're using a relatively old R / Biocondcutor (R 2.14,
Bioc 2.9, from the 'Previous versions' box at http://bioconductor.org/help/)
where Biobase is at version 2.14. This doesn't bode well, as other packages are
also likely at incorrect versions.

The easiest solution is to upgrade to a recent R, and start again with
biocLite('globaltest').

You could also try biocLite("Biobase") to get the correct version, then
biocLite("globaltest"), but if other packages are also installed incorrectly...

Something like

   source("http://bioconductor.org/biocLite.R")
   avail = available.packages(contriburl=contrib.url(biocinstallRepos()))
   inst = installed.packages(priority="NA")
   idx = rownames(avail) %in% rownames(inst)
   vers = avail[idx, "Version"]
   vers[vers < inst[names(vers), "Version"]]

will give you the too-new packages that need to be re-installed.

It's hard to know how your system got to its current state; one candidate is
that .libPaths() includes a path shared by several versions of R. but using
biocLite() to install packages should help avoiding that in the future.

Martin

> Author: R. Gentleman, V. Carey, M. Morgan, S. Falcon
> Description: Functions that are needed by many other packages or which
>          replace R functions.
> Suggests: tools, tkWidgets, ALL
> Depends: R (>= 2.10), BiocGenerics (>= 0.3.2), utils
> Imports: methods, BiocGenerics
> Maintainer: Bioconductor Package Maintainer
>          <maintainer at bioconductor.org>
> License: Artistic-2.0
> Collate: tools.R strings.R environment.R vignettes.R packages.R
>          AllGenerics.R VersionsClass.R VersionedClasses.R
>          methods-VersionsNull.R methods-VersionedClass.R DataClasses.R
>          methods-aggregator.R methods-container.R methods-MIAxE.R
>          methods-MIAME.R methods-AssayData.R
>          methods-AnnotatedDataFrame.R methods-eSet.R
>          methods-ExpressionSet.R methods-MultiSet.R methods-SnpSet.R
>          methods-NChannelSet.R anyMissing.R rowOp-methods.R
>          updateObjectTo.R methods-ScalarObject.R zzz.R
> LazyLoad: yes
> biocViews: Infrastructure, Bioinformatics
> Packaged: 2012-10-02 02:41:50 UTC; biocbuild
> Built: R 2.14.0; x86_64-unknown-linux-gnu; 2013-03-22 15:01:20 UTC;
>          unix
>
> -- File: /usr/local/R-2.14.0/lib64/R/library/Biobase/Meta/package.rds
>
> -----Original Message-----
> From: Martin Morgan [mailto:mtmorgan at fhcrc.org]
> Sent: Thursday, March 28, 2013 2:52 PM
> To: Shields, Rusty (IMS)
> Cc: r-help at r-project.org
> Subject: Re: [R] Error in setMethod("combine"... was - Error when installing globaltest package
>
> On 3/28/2013 11:05 AM, Shields, Rusty (IMS) wrote:
>> Hi All,
>>
>> I posted this on the bioconductor list and didn't get a response there, so I'm hoping someone here can help.
>>
>> I don't know a heck of a lot about R, so I apologize if this seems like a trivial issue.  This error comes up when trying to install the bioconductor globaltest package.
>>
>
> Sorry that you didn't get a response on the Bioc mailing list; I'd actually
> suggest returning to the original thread there and I'll see that it gets answered.
>
> My guess is that you have a version of Biobase that is too new compared to the
> version (2.14.0) expected in the version of Bioconductor you are using. What
> does packageDescription("Biobase") say?
>
> Martin
>
>
>
>
>> Any clues?
>>
>> Thanks!
>> Rusty
>>
>> -----Original Message-----
>> From: bioconductor-bounces at r-project.org [mailto:bioconductor-bounces at r-project.org] On Behalf Of Shields, Rusty (IMS)
>> Sent: Tuesday, March 26, 2013 12:34 PM
>> To: bioconductor at r-project.org
>> Subject: [BioC] Error when installing globaltest package
>>
>> Hi all,
>>
>> I've run into a problem when attempting to install the globaltest package from Bioconductor.  I'm using R 2.14.0 on 64bit SLES 11.  Let me know what other information you might need about my system to troubleshoot this.
>>
>> Using the method described for this installation on the Bioconductor website:
>>
>>       source("http://bioconductor.org/biocLite.R")
>>       biocLite("globaltest")
>>
>> I get and the following result, which I can't find an reference to on the list archives:
>>
>>> biocLite("globaltest")
>> BioC_mirror: 'http://www.bioconductor.org'
>> Using R version 2.14, BiocInstaller version 1.2.1.
>> Installing package(s) 'globaltest'
>> trying URL 'http://www.bioconductor.org/packages/2.9/bioc/src/contrib/globaltest_5.8.1.tar.gz'
>> Content type 'application/x-gzip' length 956376 bytes (933 Kb)
>> opened URL
>> ==================================================
>> downloaded 933 Kb
>>
>> * installing *source* package ?globaltest? ...
>> ** R
>> ** data
>> ** inst
>> ** preparing package for lazy loading
>> Warning in .simpleDuplicateClass(def, prev) :
>>     A specification for class ?data.frameOrNULL?
>>                                                 Creating a generic function for ?sort? from package ?base? in package ?globaltest?
>> Creating a generic function for ?model.matrix? from package ?stats? in package ?globaltest?
>> Creating a generic function for ?coefficients? from package ?stats? in package ?globaltest?
>> Creating a generic function for ?fitted.values? from package ?stats? in package ?globaltest?
>> Creating a generic function for ?residuals? from package ?stats? in package ?globaltest?
>> Error in setMethod("combine", signature(x = "gt.result", y = "gt.result"),  :
>>     no existing definition for function ?combine?
>> Error : unable to load R code in package ?globaltest?
>> ERROR: lazy loading failed for package ?globaltest?
>> * removing ?/usr/local/R-2.14.0/lib64/R/library/globaltest?
>>
>> The downloaded packages are in
>>           ?/tmp/RtmpdCwjNB/downloaded_packages?
>> Updating HTML index of packages in '.Library'
>> Making packages.html  ... done
>> Old packages: 'caret'
>> Update all/some/none? [a/s/n]:
>>
>> ________________________________
>>
>> Information in this e-mail may be confidential. It is intended only for the addressee(s) identified above. If you are not the addressee(s), or an employee or agent of the addressee(s), please note that any dissemination, distribution, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender of the error.
>>
>>           [[alternative HTML version deleted]]
>>
>>
>> ________________________________
>>
>> Information in this e-mail may be confidential. It is intended only for the addressee(s) identified above. If you are not the addressee(s), or an employee or agent of the addressee(s), please note that any dissemination, distribution, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender of the error.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
> --
> Dr. Martin Morgan, PhD
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N.
> PO Box 19024 Seattle, WA 98109
>
> ________________________________
>
> Information in this e-mail may be confidential. It is intended only for the addressee(s) identified above. If you are not the addressee(s), or an employee or agent of the addressee(s), please note that any dissemination, distribution, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender of the error.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>


--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



------------------------------

Message: 15
Date: Thu, 28 Mar 2013 20:00:24 +0000
From: "Shields, Rusty (IMS)" <ShieldsR at imsweb.com>
To: Martin Morgan <mtmorgan at fhcrc.org>
Cc: "bioconductor at r-project.org" <bioconductor at r-project.org>
Subject: Re: [BioC] [R] Error in setMethod("combine"... was - Error
        when installing globaltest package
Message-ID:
        <E2E839B5BFA711438B481AD9906B0C340A6A78AD at VARUNA.omni.imsweb.com>
Content-Type: text/plain; charset="iso-8859-1"

Correcting the version of Biobase seems to have fixed it, and after that it doesn't appear that anything else is of the wrong version - based on the code that you provided below.

globaltest now installs successfully through BiocLite.

As for how the wrong version of Biobase was installed, I have to take the blame for that.  Rather than following the instructions on the bioconductor website to install using BiocLite, I made the mistake of downloading the package source manually and installing via 'R CMD INSTALL ...'.  I believe I used the biocLite method to install all of the other packages.  It has been a very educational few days.

Thanks again.

-----Original Message-----
From: Martin Morgan [mailto:mtmorgan at fhcrc.org]
Sent: Thursday, March 28, 2013 3:26 PM
To: Shields, Rusty (IMS)
Cc: bioconductor at r-project.org
Subject: Re: [BioC] [R] Error in setMethod("combine"... was - Error when installing globaltest package

On 03/28/2013 12:07 PM, Shields, Rusty (IMS) wrote:
> Thanks Martin,
>
> No apologies necessary, just happy to have some help.
>
> Here's the info on Biobase:
>
>> packageDescription("Biobase")
> Package: Biobase
> Title: Biobase: Base functions for Bioconductor
> Version: 2.18.0

yes, this is the version of Biobase from the current version of R /
Bioconductor, whereas you're using a relatively old R / Biocondcutor (R 2.14,
Bioc 2.9, from the 'Previous versions' box at http://bioconductor.org/help/)
where Biobase is at version 2.14. This doesn't bode well, as other packages are
also likely at incorrect versions.

The easiest solution is to upgrade to a recent R, and start again with
biocLite('globaltest').

You could also try biocLite("Biobase") to get the correct version, then
biocLite("globaltest"), but if other packages are also installed incorrectly...

Something like

   source("http://bioconductor.org/biocLite.R")
   avail = available.packages(contriburl=contrib.url(biocinstallRepos()))
   inst = installed.packages(priority="NA")
   idx = rownames(avail) %in% rownames(inst)
   vers = avail[idx, "Version"]
   vers[vers < inst[names(vers), "Version"]]

will give you the too-new packages that need to be re-installed.

It's hard to know how your system got to its current state; one candidate is
that .libPaths() includes a path shared by several versions of R. but using
biocLite() to install packages should help avoiding that in the future.

Martin

> Author: R. Gentleman, V. Carey, M. Morgan, S. Falcon
> Description: Functions that are needed by many other packages or which
>          replace R functions.
> Suggests: tools, tkWidgets, ALL
> Depends: R (>= 2.10), BiocGenerics (>= 0.3.2), utils
> Imports: methods, BiocGenerics
> Maintainer: Bioconductor Package Maintainer
>          <maintainer at bioconductor.org>
> License: Artistic-2.0
> Collate: tools.R strings.R environment.R vignettes.R packages.R
>          AllGenerics.R VersionsClass.R VersionedClasses.R
>          methods-VersionsNull.R methods-VersionedClass.R DataClasses.R
>          methods-aggregator.R methods-container.R methods-MIAxE.R
>          methods-MIAME.R methods-AssayData.R
>          methods-AnnotatedDataFrame.R methods-eSet.R
>          methods-ExpressionSet.R methods-MultiSet.R methods-SnpSet.R
>          methods-NChannelSet.R anyMissing.R rowOp-methods.R
>          updateObjectTo.R methods-ScalarObject.R zzz.R
> LazyLoad: yes
> biocViews: Infrastructure, Bioinformatics
> Packaged: 2012-10-02 02:41:50 UTC; biocbuild
> Built: R 2.14.0; x86_64-unknown-linux-gnu; 2013-03-22 15:01:20 UTC;
>          unix
>
> -- File: /usr/local/R-2.14.0/lib64/R/library/Biobase/Meta/package.rds
>
> -----Original Message-----
> From: Martin Morgan [mailto:mtmorgan at fhcrc.org]
> Sent: Thursday, March 28, 2013 2:52 PM
> To: Shields, Rusty (IMS)
> Cc: r-help at r-project.org
> Subject: Re: [R] Error in setMethod("combine"... was - Error when installing globaltest package
>
> On 3/28/2013 11:05 AM, Shields, Rusty (IMS) wrote:
>> Hi All,
>>
>> I posted this on the bioconductor list and didn't get a response there, so I'm hoping someone here can help.
>>
>> I don't know a heck of a lot about R, so I apologize if this seems like a trivial issue.  This error comes up when trying to install the bioconductor globaltest package.
>>
>
> Sorry that you didn't get a response on the Bioc mailing list; I'd actually
> suggest returning to the original thread there and I'll see that it gets answered.
>
> My guess is that you have a version of Biobase that is too new compared to the
> version (2.14.0) expected in the version of Bioconductor you are using. What
> does packageDescription("Biobase") say?
>
> Martin
>
>
>
>
>> Any clues?
>>
>> Thanks!
>> Rusty
>>
>> -----Original Message-----
>> From: bioconductor-bounces at r-project.org [mailto:bioconductor-bounces at r-project.org] On Behalf Of Shields, Rusty (IMS)
>> Sent: Tuesday, March 26, 2013 12:34 PM
>> To: bioconductor at r-project.org
>> Subject: [BioC] Error when installing globaltest package
>>
>> Hi all,
>>
>> I've run into a problem when attempting to install the globaltest package from Bioconductor.  I'm using R 2.14.0 on 64bit SLES 11.  Let me know what other information you might need about my system to troubleshoot this.
>>
>> Using the method described for this installation on the Bioconductor website:
>>
>>       source("http://bioconductor.org/biocLite.R")
>>       biocLite("globaltest")
>>
>> I get and the following result, which I can't find an reference to on the list archives:
>>
>>> biocLite("globaltest")
>> BioC_mirror: 'http://www.bioconductor.org'
>> Using R version 2.14, BiocInstaller version 1.2.1.
>> Installing package(s) 'globaltest'
>> trying URL 'http://www.bioconductor.org/packages/2.9/bioc/src/contrib/globaltest_5.8.1.tar.gz'
>> Content type 'application/x-gzip' length 956376 bytes (933 Kb)
>> opened URL
>> ==================================================
>> downloaded 933 Kb
>>
>> * installing *source* package ?globaltest? ...
>> ** R
>> ** data
>> ** inst
>> ** preparing package for lazy loading
>> Warning in .simpleDuplicateClass(def, prev) :
>>     A specification for class ?data.frameOrNULL?
>>                                                 Creating a generic function for ?sort? from package ?base? in package ?globaltest?
>> Creating a generic function for ?model.matrix? from package ?stats? in package ?globaltest?
>> Creating a generic function for ?coefficients? from package ?stats? in package ?globaltest?
>> Creating a generic function for ?fitted.values? from package ?stats? in package ?globaltest?
>> Creating a generic function for ?residuals? from package ?stats? in package ?globaltest?
>> Error in setMethod("combine", signature(x = "gt.result", y = "gt.result"),  :
>>     no existing definition for function ?combine?
>> Error : unable to load R code in package ?globaltest?
>> ERROR: lazy loading failed for package ?globaltest?
>> * removing ?/usr/local/R-2.14.0/lib64/R/library/globaltest?
>>
>> The downloaded packages are in
>>           ?/tmp/RtmpdCwjNB/downloaded_packages?
>> Updating HTML index of packages in '.Library'
>> Making packages.html  ... done
>> Old packages: 'caret'
>> Update all/some/none? [a/s/n]:
>>
>> ________________________________
>>
>> Information in this e-mail may be confidential. It is intended only for the addressee(s) identified above. If you are not the addressee(s), or an employee or agent of the addressee(s), please note that any dissemination, distribution, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender of the error.
>>
>>           [[alternative HTML version deleted]]
>>
>>
>> ________________________________
>>
>> Information in this e-mail may be confidential. It is intended only for the addressee(s) identified above. If you are not the addressee(s), or an employee or agent of the addressee(s), please note that any dissemination, distribution, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender of the error.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
> --
> Dr. Martin Morgan, PhD
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N.
> PO Box 19024 Seattle, WA 98109
>
> ________________________________
>
> Information in this e-mail may be confidential. It is intended only for the addressee(s) identified above. If you are not the addressee(s), or an employee or agent of the addressee(s), please note that any dissemination, distribution, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender of the error.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>


--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793

________________________________

Information in this e-mail may be confidential. It is intended only for the addressee(s) identified above. If you are not the addressee(s), or an employee or agent of the addressee(s), please note that any dissemination, distribution, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender of the error.



------------------------------

Message: 16
Date: Thu, 28 Mar 2013 22:19:57 +0100
From: Michael Love <michaelisaiahlove at gmail.com>
To: Michael Muratet <mmuratet at hudsonalpha.org>
Cc: bioconductor at r-project.org
Subject: Re: [BioC] Interpreting DESeq2 results
Message-ID:
        <CADqzidV-QP+=Nm_goxJ2AsO+2MAEvQ_i_y0MFgu1vv3o8Fzy1g at mail.gmail.com>
Content-Type: text/plain

Hi Michael,

The baseMean column is not on the log scale; it is the mean of normalized
counts for a gene. The intercept from the GLM is labelled intercept in
mcols(dse).

Mike
On Mar 28, 2013 5:00 PM, "Michael Muratet" <mmuratet at hudsonalpha.org> wrote:

> Greetings
>
> I have an experiment:
>
> > design(dse)
> ~ factor1 + factor2 + factor3
>
> where factor1 has two levels, factor2 has three levels and factor3 has
> three levels. I extract a gene of interest from the results for each term
> (I've changed the indices to reflect the condition):
>
> > lapply(resultsNames(dse),function(u) results(dse,u)["gene_A",])
> [["Intercept"]]
>         baseMean log2FoldChange        pvalue           FDR
> gene_A 1596.548       10.77485 3.309439e-216 7.025442e-216
> [["factor1_level2"]]
>         baseMean log2FoldChange    pvalue       FDR
> gene_A 1596.548      0.3386776 0.1307309 0.3587438
> [["factor2_level2"]]
>         baseMean log2FoldChange    pvalue       FDR
> gene_A 1596.548     -0.6882543 0.0613569 0.1007896
> [["factor2_level3"]]
>         baseMean log2FoldChange   pvalue       FDR
> gene_A 1596.548      0.2393368 0.513216 0.6589575
> [["factor3_level2"]]
>         baseMean log2FoldChange    pvalue       FDR
> gene_A 1596.548      0.1584153 0.6423634 0.8503163
> [["factor3_level3]]
>         baseMean log2FoldChange       pvalue         FDR
> gene_A 1596.548      -1.627898 1.823141e-06 0.001409384
>
> I want to be sure I understand the output format. Is it true that the
> coefficients (the vector beta) from the fit are the baseMean value scaled
> by the log2FoldChange? Is the true intercept value
> 1596.548*2^10.77485=2797274.13?
>
> mcols() tells me that the baseMean term is calculated over "all rows". The
> baseMean is different for different genes although it is the same for each
> gene across all the conditions, I'm not seeing how the rows are selected.
>
> Thanks
>
> Mike
>
> Michael Muratet, Ph.D.
> Senior Scientist
> HudsonAlpha Institute for Biotechnology
> mmuratet at hudsonalpha.org
> (256) 327-0473 (p)
> (256) 327-0966 (f)
>
> Room 4005
> 601 Genome Way
> Huntsville, Alabama 35806
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>

        [[alternative HTML version deleted]]



------------------------------

Message: 17
Date: Thu, 28 Mar 2013 17:50:04 -0400
From: somnath bandyopadhyay <genome1976 at hotmail.com>
Cc: "bioconductor at r-project.org" <bioconductor at r-project.org>
Subject: Re: [BioC] Interpreting DESeq2 results
Message-ID: <BAY175-W24D794C2988D81C509EF9FCDD20 at phx.gbl>
Content-Type: text/plain

Will the LIMMA package work on sparse matrices?
Thanks,Som.
        [[alternative HTML version deleted]]



------------------------------

_______________________________________________
Bioconductor mailing list
Bioconductor at r-project.org
https://stat.ethz.ch/mailman/listinfo/bioconductor


End of Bioconductor Digest, Vol 121, Issue 29



More information about the Bioconductor mailing list