[BioC] Motif enrichment analysis: Error in transfac format and background frequencies from BSGenome

deepti anand anand.deepti at outlook.com
Mon Sep 8 20:13:01 CEST 2014


Hi Robert,
Thank you for example codes. I am able to extract all the 528 Mmusculus motifs from MotifDB by running the example codes you send. The code below gives me error when I try to get the A,C,G,T counts using getBackgroundFrequencies(). > prior = getBackgroundFrequencies("mm9")Error in pickGenome(organism) :   Please pick one of the valid organisms: "dm3" or provide a BSgenome object of the target genome.
I have updated version of PWMEnrich (3.6.1) installed. Could you please suggest me how to proceed with this error. I appreciate your help.
-Dips-Date: Mon, 8 Sep 2014 18:22:28 +0100From: rainmansr at gmail.com
To: anand.deepti at outlook.com
CC: bioconductor at r-project.org
Subject: Re: [BioC] Motif enrichment analysis: Error in transfac format and background frequencies from BSGenome


  
    
  
  
    

      Hi Dips,

      

      If you haven't already done so, please first update to the latest
      version of PWMEnrich (in release this is 3.6.1). I would recommend
      converting the MotifDb motifs directly into PFMs that PWMEnrich
      expects. The only issue here is that MotifDb motifs come from
      different sources and are not always in the same format (i.e.
      sometimes they are probabilites, sometimes count matrices). Here
      is some example code to extract the motifs from MotifDb:

      

      # extract mouse
        motifs

        d = values(MotifDb)

        dm.sel = which(d$organism == "Mmusculus")

        

        # output list of motifs

        motifs = list()

        for(i in dm.sel){

            seq.count = d$sequenceCount[i]

            if(is.na(seq.count))

                seq.count = 100

            motifs[[length(motifs)+1]] = apply(round(MotifDb[[i]] *
        seq.count), 1:2, as.integer)

        }

        

        motif.names = d$geneSymbol[dm.sel]

        motif.ids = d$providerName[dm.sel]

        motif.names[is.na(motif.names)] = motif.ids[is.na(motif.names)]

        

        names(motifs) = motif.ids

      

        # get A,C,G,T counts

        prior = getBackgroundFrequencies("mm9")

        

        # convert to PWMenrich PWM format

        pwms = PFMtoPWM(motifs, id=motif.ids, name=motif.names,
        prior.params=prior)

        

        # create background distributions

        bg = makeBackground(pwms, "mm9")

      

      The last line is using the mm9 promoters that are built-in into
      PWMEnrich as genomic background. If you want to use a different
      set of promoter sequences (i.e. mm10), you will have to extract
      them yourself into a DNAStringSet object and pass them like this:

      

      bg =
        makeBackground(pwms, bg.seq=your_DNAStringSet_object)

      

      Cheers, Robert

      

      On 07/09/14 23:17, deepti anand wrote:

    
    
      
      Hi Roberts,
        

          Thank you for suggestion. The backgrounds available in
            PWMEnrich for mouse are in mm9 assembly (current is mm10). Also, I found that it has 329 PWMs which is
              less than current MotifDb (528 motifs). That is why I want
              to create a background with the current mouse genome and
              use 528 motifs for enrichment analysis in my gene list 
              Could you please tell me how can I export the motifs in
              'transfac ' format and get the
              background frequencies from 'BSgenome.Mmusculus.UCSC.mm10'.
          

          
          I would appreciate it.
          

          
          Dips
          

            

              > Date: Sun, 7 Sep 2014 19:38:43 +0100

                > From: rainmansr at gmail.com

                > To: anand.deepti at outlook.com

                > CC: bioconductor at r-project.org

                > Subject: Re: [BioC] Motif enrichment analysis:
                Error in transfac format and background frequencies from
                BSGenome

                > 

                > 

                > Dear Deepti,

                > 

                > If you want to use the mouse MotifDB motifs you can
                retrieve them in the 

                > correct format for PWMEnrich here:

                > 

                >
http://bioconductor.org/packages/2.14/data/experiment/html/PWMEnrich.Mmusculus.background.html

                > 

                > Cheers, Robert

                > 

                > On 07/09/14 16:47, deepti anand wrote:

                > > Hi all,

                > > I am scanning a geneset for all the Mmusculus
                motifs and comparing their enrichment to genomic
                background. I am using MotifDb package to retrieve
                motifs and PWMEnrich for doing motif enrichment. I am
                getting error in the below code-

                > >

                > > 1). Get all motifs in Mmusculus from MotifDb
                in transfac format-

                > > In this step when exporting the motifs as
                TRANSFAC format I am getting error. Here are my codes:

                > >

                > >

                > >> motifs.denovo = query(MotifDb,
                'Mmusculus')

                > >>
                export(motifs.denovo,con='MotifDBFile',format='transfac')

                > > Error in cat(list(...), file, sep, fill,
                labels, append) :

                > > argument 1 (type 'closure') cannot be handled
                by 'cat'

                > >

                > >

                > > 

                > > 2). Convert count matrices into PWMs: In this
                step the error is in getting the background frequencies
                from Mmusculus BSgenome. Here are my code:

                > >

                > >

                > >> library(BSgenome.Mmusculus.UCSC.mm10)

                > >> genome = BSgenome.Mmusculus.UCSC.mm10

                > >> genomic.acgt =
                getBackgroundFrequencies("BSgenome.Mmusculus.UCSC.mm10")

                > > Error in pickGenome(organism) :

                > > Please pick one of the valid organisms: "dm3"
                or provide a BSgenome object of the target genome.

                > >

                > >

                > > I would appreciate any help

                > >

                > >

                > > Dips 

                > > [[alternative HTML version deleted]]

                > >

                > >
                _______________________________________________

                > > Bioconductor mailing list

                > > Bioconductor at r-project.org

                > >
                https://stat.ethz.ch/mailman/listinfo/bioconductor

                > > Search the archives:
                http://news.gmane.org/gmane.science.biology.informatics.conductor

                > >

                > 

              
            
          
        
      
    
    
 		 	   		  
	[[alternative HTML version deleted]]



More information about the Bioconductor mailing list