[BioC] Motif enrichment analysis: Error in transfac format and background frequencies from BSGenome
Robert Stojnic
rainmansr at gmail.com
Mon Sep 8 19:22:28 CEST 2014
Hi Dips,
If you haven't already done so, please first update to the latest
version of PWMEnrich (in release this is 3.6.1). I would recommend
converting the MotifDb motifs directly into PFMs that PWMEnrich expects.
The only issue here is that MotifDb motifs come from different sources
and are not always in the same format (i.e. sometimes they are
probabilites, sometimes count matrices). Here is some example code to
extract the motifs from MotifDb:
# extract mouse motifs
d = values(MotifDb)
dm.sel = which(d$organism == "Mmusculus")
# output list of motifs
motifs = list()
for(i in dm.sel){
seq.count = d$sequenceCount[i]
if(is.na(seq.count))
seq.count = 100
motifs[[length(motifs)+1]] = apply(round(MotifDb[[i]] * seq.count),
1:2, as.integer)
}
motif.names = d$geneSymbol[dm.sel]
motif.ids = d$providerName[dm.sel]
motif.names[is.na(motif.names)] = motif.ids[is.na(motif.names)]
names(motifs) = motif.ids
# get A,C,G,T counts
prior = getBackgroundFrequencies("mm9")
# convert to PWMenrich PWM format
pwms = PFMtoPWM(motifs, id=motif.ids, name=motif.names, prior.params=prior)
# create background distributions
bg = makeBackground(pwms, "mm9")
The last line is using the mm9 promoters that are built-in into
PWMEnrich as genomic background. If you want to use a different set of
promoter sequences (i.e. mm10), you will have to extract them yourself
into a DNAStringSet object and pass them like this:
bg = makeBackground(pwms, bg.seq=your_DNAStringSet_object)
Cheers, Robert
On 07/09/14 23:17, deepti anand wrote:
> Hi Roberts,
>
> Thank you for suggestion. The backgrounds available in PWMEnrich for
> mouse are in mm9 assembly (current is mm10). Also, I found that it has
> 329 PWMs which is less than current MotifDb (528 motifs). That is why
> I want to create a background with the current mouse genome and use
> 528 motifs for enrichment analysis in my gene list Could you please
> tell me how can I export the motifs in 'transfac ' format and get the
> background frequencies from 'BSgenome.Mmusculus.UCSC.mm10'.
>
> I would appreciate it.
>
> Dips
>
>
> > Date: Sun, 7 Sep 2014 19:38:43 +0100
> > From: rainmansr at gmail.com
> > To: anand.deepti at outlook.com
> > CC: bioconductor at r-project.org
> > Subject: Re: [BioC] Motif enrichment analysis: Error in transfac
> format and background frequencies from BSGenome
> >
> >
> > Dear Deepti,
> >
> > If you want to use the mouse MotifDB motifs you can retrieve them in
> the
> > correct format for PWMEnrich here:
> >
> >
> http://bioconductor.org/packages/2.14/data/experiment/html/PWMEnrich.Mmusculus.background.html
> >
> > Cheers, Robert
> >
> > On 07/09/14 16:47, deepti anand wrote:
> > > Hi all,
> > > I am scanning a geneset for all the Mmusculus motifs and comparing
> their enrichment to genomic background. I am using MotifDb package to
> retrieve motifs and PWMEnrich for doing motif enrichment. I am getting
> error in the below code-
> > >
> > > 1). Get all motifs in Mmusculus from MotifDb in transfac format-
> > > In this step when exporting the motifs as TRANSFAC format I am
> getting error. Here are my codes:
> > >
> > >
> > >> motifs.denovo = query(MotifDb, 'Mmusculus')
> > >> export(motifs.denovo,con='MotifDBFile',format='transfac')
> > > Error in cat(list(...), file, sep, fill, labels, append) :
> > > argument 1 (type 'closure') cannot be handled by 'cat'
> > >
> > >
> > >
> > > 2). Convert count matrices into PWMs: In this step the error is in
> getting the background frequencies from Mmusculus BSgenome. Here are
> my code:
> > >
> > >
> > >> library(BSgenome.Mmusculus.UCSC.mm10)
> > >> genome = BSgenome.Mmusculus.UCSC.mm10
> > >> genomic.acgt =
> getBackgroundFrequencies("BSgenome.Mmusculus.UCSC.mm10")
> > > Error in pickGenome(organism) :
> > > Please pick one of the valid organisms: "dm3" or provide a
> BSgenome object of the target genome.
> > >
> > >
> > > I would appreciate any help
> > >
> > >
> > > Dips
> > > [[alternative HTML version deleted]]
> > >
> > > _______________________________________________
> > > Bioconductor mailing list
> > > Bioconductor at r-project.org
> > > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > > Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
> > >
> >
[[alternative HTML version deleted]]
More information about the Bioconductor
mailing list