[BioC] kmer and zscore calculation

Fabrice Tourre fabrice.ciup at gmail.com
Thu Dec 19 19:45:14 CET 2013


Thank you very much. It is helpful.

On Thu, Dec 19, 2013 at 1:33 PM, Hervé Pagès <hpages at fhcrc.org> wrote:
> [Oops, forgot to Cc the list when I answered this. Sending it again...]
>
>
> Hi Fabrice,
>
> The oligonucleotideFrequency() function in the Biostrings
> package counts the nb of occurrences of all possible 5-mers
> (use 'width=5') or 6-mers (use 'width=6'). You need to store
> your sequence(s) in a DNAString or DNAStringSet object first.
> On a DNAStringSet, the counts are returned in a matrix with 1
> row per sequence and 1 column per k-mer:
>
>   library(Biostrings)
>   library(hgu95av2probe)
>   probes <- DNAStringSet(hgu95av2probe)
>   count5 <- oligonucleotideFrequency(probes, width=5)
>
> Then:
>
>   > dim(count5)
>   [1] 201800 1024
>   > count5[1:6, 1:10]
>        AAAAA AAAAC AAAAG AAAAT AAACA AAACC AAACG AAACT AAAGA AAAGC
>   [1,]     0     0     0     0     0     0     0     0     0     0
>   [2,]     0     0     0     0     0     0     0     0     0     0
>   [3,]     0     0     0     0     0     0     0     0     0     0
>   [4,]     0     0     0     0     0     0     0     0     0     0
>   [5,]     0     0     0     0     0     0     0     0     0     0
>   [6,]     0     0     0     0     0     0     0     0     0     0
>
> Maybe this function should have been called kmerFrequency()...
>
> Cheers,
> H.
>
>
> On 12/17/2013 10:28 AM, Fabrice Tourre wrote:
>>
>> Dear list,
>>
>> I have a list of bed regions. each region is 10bp length. I want to
>> calculate the hexamers and pentamers in theses regions and get the
>> zscore. Is there any existed packages to do this?
>>
>> Thank you very much in advance.
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpages at fhcrc.org
> Phone:  (206) 667-5791
> Fax:    (206) 667-1319



More information about the Bioconductor mailing list