[BioC] Delta CT data distribution and cluster analyses; machine learning or other

Kevin R. Coombes kevin.r.coombes at gmail.com
Fri May 13 22:06:49 CEST 2011


What is the range of the data that you received?

In most TaqMan real-time PCR experiments, the Ct values range between 
about 10 (for really really abuindant things like 18S) to 40.  These 
measurements are in cycles.  In principle, if you had  perfectly 
efficient probe-primer combination, the number of mRNA molecules present 
would double every cycle.  As a result, cycle values are already 
essentially on the "negative log base two" scale.

As Richard already pointed out, the Delta-Ct or Delta-Delta-Ct values on 
this scale are usually normal.

If your data are not in a range that makes sense as cycles, then it is 
likely that someone exponentiated the data to get it back to the "raw" 
scale, and thus converted from normally distributed to log-normal.

     Kevin


> Hi Richard,
> Thank you. It is from taqman real time PCR. I have sent a mail asking how
> exactly they normalised the data.
> We only have biological replicates and no common reference, so I was told we
> can only use Delta CT values.
>
> I make, maybe wrongly, that is Delta Delta CT values are normally
> distributed that Delta CT values will also be normally distributed?
>
> I will make plots of the raw data and Delta CT as I know it.
>
>
>
>
>
> On Fri, May 13, 2011 at 3:53 PM, Richard Friedman<
> friedman at cancercenter.columbia.edu>  wrote:
>
>> Dear John,
>>
>>         Is the Delta CT data from PCR or from some other method?
>> If it is from PCR in my experience Delta Delta CT is usually normally
>> distributed.
>> were the first delta references to the difference between the experiment
>> and internal reference
>> (e.g. GAPDH) and the second delta refers to 2 experimental conditions.
>>
>> With hopes that the above helps,
>> Rich
>> ------------------------------------------------------------
>> Richard A. Friedman, PhD
>> Associate Research Scientist,
>> Biomedical Informatics Shared Resource
>> Herbert Irving Comprehensive Cancer Center (HICCC)
>> Lecturer,
>> Department of Biomedical Informatics (DBMI)
>> Educational Coordinator,
>> Center for Computational Biology and Bioinformatics (C2B2)/
>> National Center for Multiscale Analysis of Genomic Networks (MAGNet)
>> Room 824
>> Irving Cancer Research Center
>> Columbia University
>> 1130 St. Nicholas Ave
>> New York, NY 10032
>> (212)851-4765 (voice)
>> friedman at cancercenter.columbia.edu
>> http://cancercenter.columbia.edu/~friedman/
>>
>> I am a Bayesian. When I see a multiple-choice question on a test and I
>> don't
>> know the answer I say "eeney-meaney-miney-moe".
>>
>> Rose Friedman, Age 14
>>
>>
>>
>>
>>
>>
>>
>>
>> On May 13, 2011, at 10:46 AM, john herbert wrote:
>>
>>    Dear Bioconductors,
>>> I have a bunch of DeltaCT values for several tissues. If I boxplot the
>>> data,
>>> it looks very similar to microarray data, a lot of congestion around zero.
>>>
>>> Likewise, if I log2 the data, as in microarray, the distributions looks
>>> close to normal and like microarray data.
>>>
>>> Please see the image here for different plots;
>>>
>>> https://docs.google.com/leaf?id=0B9IUGsKecS4GNDc0OWVlNzEtZjE5Yi00Y2Q4LWI0M2MtMGFiNzZhMDU0YTFm&hl=en
>>>
>>> My question is data manipulation in this manner OK for this type of data
>>> and
>>> will it effect/invalidate any unsupervised machine learning/clustering?
>>>
>>> Can I quantile normalise the data and still do valid clustering?
>>>
>>>         [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list