[BioC] output from function "matchSeeds"

Steve Lianoglou mailinglist.honeypot at gmail.com
Mon Aug 10 16:46:49 CEST 2009


Hi,

On Aug 10, 2009, at 9:10 AM, <mauede at alice.it> <mauede at alice.it> wrote:

> I am not sure I can interpret the output of function "matchSeeds. I  
> have run the on-line example myself:
>
>> data(hsSeqs)
>> data(s3utr)
>> hSeedReg = seedRegions(hsSeqs)
>> comphSeed = as.character(reverseComplement(RNAStringSet(hSeedReg)))
>> comph = RNA2DNA(comphSeed)
>> mx = matchSeeds(comph, s3utr)
>
> mx is a vector of lists:
>
>> is.vector(mx)
> [1] TRUE
>> length(mx)
> [1] 676

I think you mean a list of vectors (as a vector of lists isn't really  
possible). Call "is.list(mx)" -- you should get TRUE. Aslo, the  
documentation on the function says its value is:

"""A list containing one entry for each element of seeds that had at  
least one match in one entry
of seqs. Each element of this list is a named vector containing the  
elements of seqs that the
corresponding seed has an exact match in."""


> I would appreciate some help at understanding the results.
> For instance, I understand mx[1] is the list of matches for miRNA  
> "hsa-let-7a".
> Each element of list mx[[1]]  should be the list of target 3'utr  
> sequances (??) to which
> the current miRNA 5' region match ??? For instance:
>
>> mx[1]
> $`hsa-let-7a`
> $`hsa-let-7a`$`1588`
> [1]  125 1107
>
> $`hsa-let-7a`$`599`
> [1] 1240
>
> $`hsa-let-7a`$`9180`
> [1] 757
>
> $`hsa-let-7a`$`9180`
> [1] 757
>
> I would like to understand what is the number apparing on the same  
> line as the miRNA identifier, following the "$" sign.
> ($`1588`, $`599`, $`9180`;....)
> It must be an index into something ... But miRNA sequences are no  
> more than 26 nucleotides long ...

Actually, I think 21 nt is what people are classifying as miRNA's  
these days. Smaller RNA's that are slightly larger/smaller than that  
are classified as other things (piRNA's, etc.)

Sometimes it helps to ask yourself how you would design such a  
function, and what information actually would be useful for the  
function to return to the user. So, I bet that: if you call the  
function like so:

mx = matchSeeds(comph, s3utr)

A return value of:

$`hsa-let-7a`$`1588`
[1]  125 1107

Means that hsa-let-7a has two match in the 1588th sequence in s3utr.  
These matches probably start at positions 125 and 1107 on the 1588th  
sequence in said UTR.

> Does this coordinate mark the beginning of the matching region ?
> What about the following list of numbers ? For instance:
> [1]  125 1107
> [1] 1240
> [1] 757
> Are these indices of 3'utr sequences ?

It looks like you are on the right smoke trail, but why not just  
finish finding the fire would you want to take anybody's word for it?  
Not to sound rude, but these are questions you can easily answer  
yourself by playing around in your workspace to see if your intuition  
is correct: just look at the sequence of your 1588th UTR from 125 to  
125+21 and see if the output makes sense.

-steve

--
Steve Lianoglou
Graduate Student: Computational Systems Biology
   |  Memorial Sloan-Kettering Cancer Center
   |  Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact



More information about the Bioconductor mailing list