[BioC] Importing RMA processed data into limma for eBayes

Thornton, Matthew Matthew.Thornton at med.usc.edu
Mon Dec 23 20:30:01 CET 2013


Dear List Patrons,

I am processing some microarray data obtained with Affymetrix Mouse Gene 2.0 ST chips. Using the xps package, I have performed step-wise RMA processing - Background correction, Quantiles Normalization, and Median Polish summarization. The output of xps is a tab-delimited text file with 33,960 non-log base 2 transformed expression measures, the UnitName (an 8 digit number starting at 17550607) and a Unit ID (starts at 0 but isn't continuous).  What I would like to do is import this data into the limma package, from my output text file.  I know that I can make an expressionset and use this expressionset directly after making a design matrix, via the instructions here:

http://permalink.gmane.org/gmane.science.biology.informatics.conductor/50591

I would like to add the gene names into the expressionset so that I can make volcano plots and graphs with limma, in accord with the limma users guide.

I know how to access the gene names and relate them to the UnitName code copied from here: (http://biowhat.ucsd.edu/homer/basicTutorial/affymetrix.html)

library(mogene20sttranscriptcluster.db)
library(pd.mogene.2.0.st)

mat1 <- as.matrix(exprs_mp)
post_rma  <- ExpressionSet(assayData=mat1, annotation="mogene.2.0.st")

my_frame <- data.frame(exprs(post_rma))

Annot <- data.frame(ACCNUM=sapply(contents(mogene20sttranscriptclusterACCNUM), paste, collapse=", "), SYMBOL=sapply(contents(mogene20sttranscriptclusterSYMBOL), paste, collapse=", "), DESC=sapply(contents(mogene20sttranscriptclusterGENENAME), paste, collapse=", "))

all <- merge(Annot, my_frame, by.x=0, by.y=0, all=T)

write.table(all,file="21Dec13_data.ann.txt",sep="\t")

Is there a way to modify the above code so that I can add the gene names and gene information to my expresionset, in a way that makes them AS accessible to the limma package as they would be if I had used the affy package to process the data initially?

Also, I would like to compare the output of VSN with RMA using eBayes in limma. I don't wish to import the raw intensities into limma, not because I couldn't use the same methods (vsn is in limma), only because I would have to start my data processing over. Any help or comments is greatly appreciated.

Thank you.

Sincerely,

Matt

Matthew E. Thornton

Research Lab Specialist
Saban Research Institute

USC/Children’s Hospital Los Angeles
513X,  Mail Stop 35
4661 W. Sunset Blvd.
Los Angeles, CA 90027-6020

matthew.thornton at med.usc.edu

________________________________________
From: bioconductor-bounces at r-project.org [bioconductor-bounces at r-project.org] on behalf of bioconductor-request at r-project.org [bioconductor-request at r-project.org]
Sent: Monday, December 23, 2013 3:00 AM
To: bioconductor at r-project.org
Subject: Bioconductor Digest, Vol 130, Issue 25

Send Bioconductor mailing list submissions to
        bioconductor at r-project.org

To subscribe or unsubscribe via the World Wide Web, visit
        https://stat.ethz.ch/mailman/listinfo/bioconductor
or, via email, send a message with subject or body 'help' to
        bioconductor-request at r-project.org

You can reach the person managing the list at
        bioconductor-owner at r-project.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Bioconductor digest..."


Today's Topics:

   1. Re: Complete variant toolbox:
      gmapR/VariantTools/VariantAnnotation (Thomas Girke)
   2. DESeq / DESeq2 for translation efficiency in polysome
      profiling or ribsome profiling experiments (Gilgi [guest])
   3. forgeSeqFiles (Melo [guest])
   4. Multiple contrasts in DESeq2 (Hugo Varet)


----------------------------------------------------------------------

Message: 1
Date: Sun, 22 Dec 2013 09:36:40 -0800
From: Thomas Girke <thomas.girke at ucr.edu>
To: Valerie Obenchain <vobencha at fhcrc.org>
Cc: Bioconductor mailing list <bioconductor at stat.math.ethz.ch>
Subject: Re: [BioC] Complete variant toolbox:
        gmapR/VariantTools/VariantAnnotation
Message-ID: <20131222173640.GA4345 at Thomas-Girkes-MacBook-Pro.local>
Content-Type: text/plain; charset="us-ascii"

Sorry for not responding sooner. - I agree, providing the variant mapping
position relative to transcripts will be the most useful one, at least
for features that are part of transcripts. For others that are not, e.g.
intron and intergenic features, one could report the variant mappings
relative to the start position of these features.

Thomas

On Thu, Dec 19, 2013 at 03:08:19AM +0000, Valerie Obenchain wrote:
> Hi,
>
> On 12/17/2013 09:40 AM, Robert Castelo wrote:
> > hi Valerie cc Thomas,
> >
> > sorry for hijacking the thread, regarding the request made below..
> >
> > On 12/09/2013 09:07 PM, Valerie Obenchain wrote:
> > [...]
> >> I could add a 'REFLOC' column to the otuput of locateVariants() that
> >> would essentially be the "equivalent" to 'CDSLOC' from predictCoding().
> >
> > for the purpose of ordering cDNA primers flanking variants which one may
> > want to validate through sanger sequencing, it is useful to have at hand
> > the position of the variant with respect to the beginning of the
> > transcript (cDNA) where it has been observed, thus not just from the
> > beginning of the CDS but from the beginning of the transcript.
> >
> > is this newer 'REFLOC' going to contain this position? if not, would it
> > be possible to get also a column for that from the locateVariants()
> > call? (e.g., TXLOC)
>
> Yes, I think it makes sense to have 'REFLOC' be the position in the
> reference starting from the beginning of the transcript. Unless others
> have different thoughts this is what I'll go ahead with.
>
> Valerie
>
>
>
>
>
> >
> >
> > thanks!!
> > robert.
>
>
> --
> Valerie Obenchain
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B155
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: vobencha at fhcrc.org
> Phone:  (206) 667-3158
> Fax:    (206) 667-1319



------------------------------

Message: 2
Date: Sun, 22 Dec 2013 23:37:06 -0800 (PST)
From: "Gilgi [guest]" <guest at bioconductor.org>
To: bioconductor at r-project.org, gilgi.friedlander at weizmann.ac.il
Cc: DESeq Maintainer <sanders at fs.tum.de>
Subject: [BioC] DESeq / DESeq2 for translation efficiency in polysome
        profiling or ribsome profiling experiments
Message-ID: <20131223073706.DA826143685 at mamba.fhcrc.org>


Hello,
I wanted to ask whether it is possible to use DESeq for the analysis of translation efficiency in polysome profiling or ribsome profiling.
In polysome profiling - polysomes are isolated (e.g. by sucrose gradients), and mRNA that were associated with the polysomes are then isolated and sequenced.
In ribosome profiling, mRNA fragments that are protected by the ribosome are sequenced.
When such experiments are conducted in parallel to total RNA-seq, one can ask questions regarding the translation efficiency.
For example: if we have 4 samples: ribosome sequencing (without treatment), total sequencing (without treatment), sequencing (with treatment), total sequencing (with treatment), one can ask how does the treatment affects the protein synthesis rate (which is the ratio ribosome/total).
I wanted to ask whether it is possible to define such a model in DESeq for such analysis, in order to answer how is this ratio affected by the treatment.
Thanks a lot


 -- output of sessionInfo():

none

--
Sent via the guest posting facility at bioconductor.org.



------------------------------

Message: 3
Date: Sun, 22 Dec 2013 23:47:30 -0800 (PST)
From: "Melo [guest]" <guest at bioconductor.org>
To: bioconductor at r-project.org, memoly101 at yahoo.com
Subject: [BioC] forgeSeqFiles
Message-ID: <20131223074730.1C4F514369A at mamba.fhcrc.org>


Hello everyone,

I am so new to R as well as Bioconductor but found them very helpful so I'm trying to use.
Now I need to make a BSgenome pckg for my own organism. Therefore I have made a folder named seqs_srcdir which contains 17,000 files (one gene_sequence per file), but when I downloaded, and even unzipped BSgenome I kept getting the  Error: could not find function "forgeSeqFiles".
I know I need to make the seed files so I gave the command line:

forgeSeqlengthsFile(seqnames, prefix="pi1>", suffix=".fa", seqs_srcdir="/Users/Me/Documents/Microarray", seqs_destdir="/Users/Me/Documents/Microarray/Seeds", verbose=TRUE)

Error: could not find function "forgeSeqFiles".

I would appreciate it if you could please advice,
Melo

 -- output of sessionInfo():

forgeSeqlengthsFile(seqnames, prefix="pi1>", suffix=".fa", seqs_srcdir="/Users/Me/Documents/Microarray", seqs_destdir="/Users/Me/Documents/Microarray/Seeds", verbose=TRUE)

Error: could not find function "forgeSeqFiles".

--
Sent via the guest posting facility at bioconductor.org.



------------------------------

Message: 4
Date: Mon, 23 Dec 2013 11:10:40 +0100
From: Hugo Varet <hugo.varet at pasteur.fr>
To: bioconductor at r-project.org
Subject: [BioC] Multiple contrasts in DESeq2
Message-ID: <52B80C20.5020009 at pasteur.fr>
Content-Type: text/plain; charset=ISO-8859-15; format=flowed

Dear Michael Love and list members,

I'm used to analyze RNA-seq data with your nice DESeq2 package (1.2.0)
and I sometimes have to test complex null hypotheses using contrasts.
For one of my project, I would like using multiple contrasts, i.e.
testing for example (H0: beta1-beta2 = beta3-beta4 = 0) against (H1:
beta1-beta2 != 0 and/or beta3-beta4 != 0), where the beta's are the
regression coefficients of the models. To do so, I would like using a
numeric matrix for the contrast argument in results(), but it only
accepts a numeric vector.

Hence, I was wondering if you planned to extend DESeq2 to the use of
multiple contrasts?

Best regards,

Hugo

--
Hugo Varet
Plate-Forme Transcriptome et Epig?nome
Institut Pasteur



------------------------------

_______________________________________________
Bioconductor mailing list
Bioconductor at r-project.org
https://stat.ethz.ch/mailman/listinfo/bioconductor


End of Bioconductor Digest, Vol 130, Issue 25
*********************************************



More information about the Bioconductor mailing list