[BioC] kmer and zscore calculation

Hervé Pagès hpages at fhcrc.org
Thu Dec 19 19:33:04 CET 2013


[Oops, forgot to Cc the list when I answered this. Sending it again...]

Hi Fabrice,

The oligonucleotideFrequency() function in the Biostrings
package counts the nb of occurrences of all possible 5-mers
(use 'width=5') or 6-mers (use 'width=6'). You need to store
your sequence(s) in a DNAString or DNAStringSet object first.
On a DNAStringSet, the counts are returned in a matrix with 1
row per sequence and 1 column per k-mer:

   library(Biostrings)
   library(hgu95av2probe)
   probes <- DNAStringSet(hgu95av2probe)
   count5 <- oligonucleotideFrequency(probes, width=5)

Then:

   > dim(count5)
   [1] 201800   1024
   > count5[1:6, 1:10]
        AAAAA AAAAC AAAAG AAAAT AAACA AAACC AAACG AAACT AAAGA AAAGC
   [1,]     0     0     0     0     0     0     0     0     0     0
   [2,]     0     0     0     0     0     0     0     0     0     0
   [3,]     0     0     0     0     0     0     0     0     0     0
   [4,]     0     0     0     0     0     0     0     0     0     0
   [5,]     0     0     0     0     0     0     0     0     0     0
   [6,]     0     0     0     0     0     0     0     0     0     0

Maybe this function should have been called kmerFrequency()...

Cheers,
H.


On 12/17/2013 10:28 AM, Fabrice Tourre wrote:
> Dear list,
>
> I have a list of bed regions. each region is 10bp length. I want to
> calculate the hexamers and pentamers in theses regions and get the
> zscore. Is there any existed packages to do this?
>
> Thank you very much in advance.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioconductor mailing list