[BioC] Motif enrichment analysis: Error in transfac format and background frequencies from BSGenome

Robert Stojnic rainmansr at gmail.com
Mon Sep 8 19:22:28 CEST 2014


Hi Dips,

If you haven't already done so, please first update to the latest 
version of PWMEnrich (in release this is 3.6.1). I would recommend 
converting the MotifDb motifs directly into PFMs that PWMEnrich expects. 
The only issue here is that MotifDb motifs come from different sources 
and are not always in the same format (i.e. sometimes they are 
probabilites, sometimes count matrices). Here is some example code to 
extract the motifs from MotifDb:

# extract mouse motifs
d = values(MotifDb)
dm.sel = which(d$organism == "Mmusculus")

# output list of motifs
motifs = list()
for(i in dm.sel){
     seq.count = d$sequenceCount[i]
     if(is.na(seq.count))
         seq.count = 100
     motifs[[length(motifs)+1]] = apply(round(MotifDb[[i]] * seq.count), 
1:2, as.integer)
}

motif.names = d$geneSymbol[dm.sel]
motif.ids = d$providerName[dm.sel]
motif.names[is.na(motif.names)] = motif.ids[is.na(motif.names)]

names(motifs) = motif.ids

# get A,C,G,T counts
prior = getBackgroundFrequencies("mm9")

# convert to PWMenrich PWM format
pwms = PFMtoPWM(motifs, id=motif.ids, name=motif.names, prior.params=prior)

# create background distributions
bg = makeBackground(pwms, "mm9")

The last line is using the mm9 promoters that are built-in into 
PWMEnrich as genomic background. If you want to use a different set of 
promoter sequences (i.e. mm10), you will have to extract them yourself 
into a DNAStringSet object and pass them like this:

bg = makeBackground(pwms, bg.seq=your_DNAStringSet_object)

Cheers, Robert

On 07/09/14 23:17, deepti anand wrote:
> Hi Roberts,
>
> Thank you for suggestion. The backgrounds available in PWMEnrich for 
> mouse are in mm9 assembly (current is mm10). Also, I found that it has 
> 329 PWMs which is less than current MotifDb (528 motifs). That is why 
> I want to create a background with the current mouse genome and use 
> 528 motifs for enrichment analysis in my gene list Could you please 
> tell me how can I export the motifs in 'transfac ' format and get the 
> background frequencies from 'BSgenome.Mmusculus.UCSC.mm10'.
>
> I would appreciate it.
>
> Dips
>
>
> > Date: Sun, 7 Sep 2014 19:38:43 +0100
> > From: rainmansr at gmail.com
> > To: anand.deepti at outlook.com
> > CC: bioconductor at r-project.org
> > Subject: Re: [BioC] Motif enrichment analysis: Error in transfac 
> format and background frequencies from BSGenome
> >
> >
> > Dear Deepti,
> >
> > If you want to use the mouse MotifDB motifs you can retrieve them in 
> the
> > correct format for PWMEnrich here:
> >
> > 
> http://bioconductor.org/packages/2.14/data/experiment/html/PWMEnrich.Mmusculus.background.html
> >
> > Cheers, Robert
> >
> > On 07/09/14 16:47, deepti anand wrote:
> > > Hi all,
> > > I am scanning a geneset for all the Mmusculus motifs and comparing 
> their enrichment to genomic background. I am using MotifDb package to 
> retrieve motifs and PWMEnrich for doing motif enrichment. I am getting 
> error in the below code-
> > >
> > > 1). Get all motifs in Mmusculus from MotifDb in transfac format-
> > > In this step when exporting the motifs as TRANSFAC format I am 
> getting error. Here are my codes:
> > >
> > >
> > >> motifs.denovo = query(MotifDb, 'Mmusculus')
> > >> export(motifs.denovo,con='MotifDBFile',format='transfac')
> > > Error in cat(list(...), file, sep, fill, labels, append) :
> > > argument 1 (type 'closure') cannot be handled by 'cat'
> > >
> > >
> > >
> > > 2). Convert count matrices into PWMs: In this step the error is in 
> getting the background frequencies from Mmusculus BSgenome. Here are 
> my code:
> > >
> > >
> > >> library(BSgenome.Mmusculus.UCSC.mm10)
> > >> genome = BSgenome.Mmusculus.UCSC.mm10
> > >> genomic.acgt = 
> getBackgroundFrequencies("BSgenome.Mmusculus.UCSC.mm10")
> > > Error in pickGenome(organism) :
> > > Please pick one of the valid organisms: "dm3" or provide a 
> BSgenome object of the target genome.
> > >
> > >
> > > I would appreciate any help
> > >
> > >
> > > Dips
> > > [[alternative HTML version deleted]]
> > >
> > > _______________________________________________
> > > Bioconductor mailing list
> > > Bioconductor at r-project.org
> > > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > > Search the archives: 
> http://news.gmane.org/gmane.science.biology.informatics.conductor
> > >
> >


	[[alternative HTML version deleted]]



More information about the Bioconductor mailing list