[BioC] occurrence of rGADEM motifs (mattia pelizzola)

Wed Jul 27 19:27:43 CEST 2011

Hi Mattia,

I just had to figure out how to access the alignment locations
returned by rGADEM also. The object returned by GADEM() contains a
list of "motif" objects which then have a slot "alignList" that has a
list of "align" objects. To pull out the locations for the second
motif, for example, this worked for me:

>chrs <- sapply(gadem[[2]]@alignList, slot, 'chr')
>starts <- sapply(gadem[[2]]@alignList, slot, 'start')
>ends <- sapply(gadem[[2]]@alignList, slot, 'end')
>positions <- sapply(gadem[[2]]@alignList, slot, 'pos')
>locations <- cbind(chrs, starts,ends,positions)
> head(locations)
     chrs   starts      ends        pos
[1,] "chr3" "11250871"  "11251624"  "167"
[2,] "chr7" "2975746"   "2976319"   "412"
[3,] "chr7" "129587981" "129588370" "140"
[4,] "chrX" "18735991"  "18736550"  "457"
[5,] "chr1" "40002871"  "40003399"  "232"
[6,] "chr1" "175910829" "175911459" "502"

I believe that starts and ends are the coordinates of the original
search regions you gave to GADEM and then "pos" is the offset location
within that region of the motif.

Hope that helps - if someone knows better, please correct me.

Chris

> Message: 4
> Date: Tue, 26 Jul 2011 12:51:33 +0200
> From: mattia pelizzola <mattia.pelizzola at gmail.com>
> To: bioconductor <bioconductor at stat.math.ethz.ch>
> Subject: [BioC] occurrence of rGADEM motifs
> Message-ID:
>        <CAG10-br939Ye_+Gss13L6ZqHOzUzCaT7H_8urSk0FD9Wi-7KXQ at mail.gmail.com>
> Content-Type: text/plain
>
> Hi,
> I am using rGADEM and MotIV to find out enriched motifs in my ChIPseq peaks
> and determine the similarity with Jaspar TFBS. These tools look very useful!
>
> rGADEM provides a list of enriched motifs. The total number of motifs is
> provided by the nOccurrences function, but I can't find a way to get to know
> which peak regions do contain these motifs. In particular, what are the
> startPos and endPos functions supposed to do? I would expect a set of
> genomic positions (or positions relative to the peak regions) with the same
> length as nOccurrences, but I only get one number for each motif, with no
> chromosome associated.
> Even in the rGADEM vignette you have nOccurrences equal to 60 but then you
> get only one number out of the startPos and endPos functions. Am I missing
> or misunderstanding anything?
>
> Additionally, I was also wondering if it is possible to control the max
> number of processors used in the analysis. I am working on a cluster shared
> between many people and apparently the software uses as many processors as
> possible, while I do not want to be that greedy with other users ..
>
> Thanks for any hint,
>
> mattia