[BioC] kmer and zscore calculation

Julian Gehring julian.gehring at embl.de
Wed Dec 18 19:40:24 CET 2013


Hi Fabrice,

I'm not sure whether there is a package which offers this out of the 
box.  However, it should be easy to achieve with some standard 
functionality of bioconductor.

1. Import the sequences with the 'Rsamtools' or 'ShortRead' package.

2. Define your k-mers and count the occurrance with functions like 
'countPattern' or 'vcountPattern' from the 'Biostrings' package.

3. Calculate your z-scores.

Hope this helps.

Best wishes
Julian


On 12/18/2013 06:38 PM, Fabrice Tourre wrote:
> Julian,
>
> Thank you for your reply.
>
> For example, I have a sample fastq file.
>
>> chr1:14404-14435(-)
> GGCACA
>> chr1:14409-14440(-)
> AAAACG
>> chr1:14423-14454(-)
> AGAGGC
>> chr1:14424-14455(-)
> AAGAGG
>
> I want to calculate 6kmers in this sample file. Also I have extract a
> control fastq file.
>
>> chr1:14404-14435(-)
> CCTACA
>> chr1:14409-14440(-)
> TCGACG
>> chr1:14423-14454(-)
> TCAGAT
>> chr1:14424-14455(-)
> CAAGGC
>
> The z-score was calculated for each pentamer as:
>
> (occurrence in sequences – average occurrence in control sequences) /
> standard deviation of occurrence in control sequences
>
> On Wed, Dec 18, 2013 at 4:07 AM, Julian Gehring <julian.gehring at embl.de> wrote:
>>
>>
>> Hi Fabrice,
>>
>> Could you give an example for this, especially on what the z-score
>> statistics is about to tell you?
>>
>> Best wishes
>> Julian
>>
>>
>>
>> On 12/17/2013 07:28 PM, Fabrice Tourre wrote:
>>>
>>> Dear list,
>>>
>>> I have a list of bed regions. each region is 10bp length. I want to
>>> calculate the hexamers and pentamers in theses regions and get the
>>> zscore. Is there any existed packages to do this?
>>>
>>> Thank you very much in advance.
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor-0bNBQ1PAWB4BXFe83j6qeQ at public.gmane.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>
>>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>



More information about the Bioconductor mailing list