[BioC] A basic question about retrieve the absolute genome position from the output of of function matchPWM .

Steve Lianoglou mailinglist.honeypot at gmail.com
Sun Oct 28 02:19:16 CEST 2012


Hi Holly,

On Sat, Oct 27, 2012 at 7:23 PM, Holly <xyang2 at uchicago.edu> wrote:
> Dear all,
> I am learning how to "Finding Candidate Binding Sites for Known
> Transcription Factors via Sequence Matching" from the Bioconductor
> online help file at
> http://www.bioconductor.org/help/workflows/gene-regulation-tfbs/
>
> When I ran the following codes, I can not match the results back to the
> UCSC genome browser results.
>
> ||>|orfs <- as.character(mget("RHOA", org.Hs.egSYMBOL2EG))
> ||||>||  pfm.grhl <- query(MotifDb,"GRHL")[[1]]
>> pcm.grhl <- round(100 * pfm.grhl)
>> chromosomal.loc <-  transcriptsBy(TxDb.Hsapiens.UCSC.hg19.knownGene, by="gene") [orfs]
>> promoter.seqs <-  getPromoterSeq(chromosomal.loc, Hsapiens, upstream=1000, downstream=0)
>> promoter.seqs <- unlist(promoter.seqs)
>> matchPWM(pcm.grhl, promoter.seqs[[1]], "80%")
>   Views on a 1000-letter DNAString subject
> subject: AGCCCAGGTCAGGTTAGAGACCACTGGTATTGTT...GGCACTCGGAGGCGCGCACGTCGTTCCCCGCCCT
> views:
>      start end width
> [1]   347 354     8 [AAGCCTCA]
>
>> chromosomal.loc
> GRangesList of length 1:
> $387
> GRanges with 2 ranges and 2 metadata columns:
>        seqnames               ranges strand |     tx_id     tx_name
>           <Rle>            <IRanges>  <Rle> | <integer> <character>
>    [1]     chr3 [49396579, 49449428]      - |     15517  uc010hku.3
>    [2]     chr3 [49396579, 49449526]      - |     15518  uc003cwu.3
>
> ---
> seqlengths:
>                    chr1                  chr2 ...        chrUn_gl000249
>               249250621             243199373 ...                 38502
>
> As|||49396579-1000+347=49395926, I use it as an absolute position.|
> However, when I visit the UCSC genome browser, for the region chr3:||||49395926||-||||49395934.||
>   I got [GTGCTCCTC], instead of [|||AAGCCTCA] or similar.|
>
> So, could anyone tell me how to calculate the absolute genome  position from the output of of function matchPWM ?.

Beware of the evil strand.

This transcript is on the negative strand, so the 1000bp upstream of
the first transcript you've listed starts at its end, namely
`(end+1):(end+999)`

HTH,
-steve

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact



More information about the Bioconductor mailing list