[BioC] kmer and zscore calculation

Fabrice Tourre fabrice.ciup at gmail.com
Wed Dec 18 18:38:21 CET 2013


Julian,

Thank you for your reply.

For example, I have a sample fastq file.

>chr1:14404-14435(-)
GGCACA
>chr1:14409-14440(-)
AAAACG
>chr1:14423-14454(-)
AGAGGC
>chr1:14424-14455(-)
AAGAGG

I want to calculate 6kmers in this sample file. Also I have extract a
control fastq file.

>chr1:14404-14435(-)
CCTACA
>chr1:14409-14440(-)
TCGACG
>chr1:14423-14454(-)
TCAGAT
>chr1:14424-14455(-)
CAAGGC

The z-score was calculated for each pentamer as:

(occurrence in sequences – average occurrence in control sequences) /
standard deviation of occurrence in control sequences

On Wed, Dec 18, 2013 at 4:07 AM, Julian Gehring <julian.gehring at embl.de> wrote:
>
>
> Hi Fabrice,
>
> Could you give an example for this, especially on what the z-score
> statistics is about to tell you?
>
> Best wishes
> Julian
>
>
>
> On 12/17/2013 07:28 PM, Fabrice Tourre wrote:
>>
>> Dear list,
>>
>> I have a list of bed regions. each region is 10bp length. I want to
>> calculate the hexamers and pentamers in theses regions and get the
>> zscore. Is there any existed packages to do this?
>>
>> Thank you very much in advance.
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor-0bNBQ1PAWB4BXFe83j6qeQ at public.gmane.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
>



More information about the Bioconductor mailing list