[BioC] Delta CT data distribution and cluster analyses; machine learning or other

Richard Friedman friedman at cancercenter.columbia.edu
Sun May 15 20:01:50 CEST 2011


John,

 	Do not raise deltaCt to a power and do a t-test.
To test the hypothesis do deltaCt(condition 1)=deltaCt(condiiton 2) with a 
t-test.

deltaCt=-log2M and will be closer to nornally distbutes that 2^-delatCt.

I hope this helps.


Best wishes,
Rich

On Sat, 14 May 2011, john herbert wrote:

> The range of Raw CT values is around 15 to 35The 2^-deltaCT are very small, less than zero. An example is 0.079703285
> I have 5 case samples and 5 control samples. For all samples, there are CT measures for target genes and house-keeper genes. Our approach is to
> use houskeeper on each sample as that used in Delta CT calculation. 
> 
> E.g.
> Sample Case 1 target CT = 15
> Sample Case 1 house keeper CT = 10
> Delta CT = 15-10 = 5
> A = 2 to the power of minus delta CT, as in Excel =power(2,-(-5)) = 0.03125
> 
> Then normal sample is the same....
> Sample normal 1 target CT = 10
> Sample normal 1 house keeper CT = 4
> Delta CT = 10-4 = 6
> 2 to the power of minus delta CT, as in Excel =power(2,-(-6)) = 0.015625
> 
> I have lots of these small values. These values don't look normally distributed. 
> 
> My view is maybe I should make an M value (log2 ratios) do ttests etc. 
> 
> Is this the best way to go for gene expression and subsequent clustering?. 
> 
> Thank you.  
> 
> 
> On Fri, May 13, 2011 at 9:06 PM, Kevin R. Coombes <kevin.r.coombes at gmail.com> wrote:
>       What is the range of the data that you received?
>
>       In most TaqMan real-time PCR experiments, the Ct values range between about 10 (for really really abuindant things like 18S) to 40.
>        These measurements are in cycles.  In principle, if you had  perfectly efficient probe-primer combination, the number of mRNA
>       molecules present would double every cycle.  As a result, cycle values are already essentially on the "negative log base two" scale.
>
>       As Richard already pointed out, the Delta-Ct or Delta-Delta-Ct values on this scale are usually normal.
>
>       If your data are not in a range that makes sense as cycles, then it is likely that someone exponentiated the data to get it back to
>       the "raw" scale, and thus converted from normally distributed to log-normal.
>
>          Kevin
> 
> 
>
>       Hi Richard,
>       Thank you. It is from taqman real time PCR. I have sent a mail asking how
>       exactly they normalised the data.
>       We only have biological replicates and no common reference, so I was told we
>       can only use Delta CT values.
>
>       I make, maybe wrongly, that is Delta Delta CT values are normally
>       distributed that Delta CT values will also be normally distributed?
>
>       I will make plots of the raw data and Delta CT as I know it.
> 
> 
> 
> 
>
>       On Fri, May 13, 2011 at 3:53 PM, Richard Friedman<
>       friedman at cancercenter.columbia.edu>  wrote:
>
>             Dear John,
>
>                    Is the Delta CT data from PCR or from some other method?
>             If it is from PCR in my experience Delta Delta CT is usually normally
>             distributed.
>             were the first delta references to the difference between the experiment
>             and internal reference
>             (e.g. GAPDH) and the second delta refers to 2 experimental conditions.
>
>             With hopes that the above helps,
>             Rich
>             ------------------------------------------------------------
>             Richard A. Friedman, PhD
>             Associate Research Scientist,
>             Biomedical Informatics Shared Resource
>             Herbert Irving Comprehensive Cancer Center (HICCC)
>             Lecturer,
>             Department of Biomedical Informatics (DBMI)
>             Educational Coordinator,
>             Center for Computational Biology and Bioinformatics (C2B2)/
>             National Center for Multiscale Analysis of Genomic Networks (MAGNet)
>             Room 824
>             Irving Cancer Research Center
>             Columbia University
>             1130 St. Nicholas Ave
>             New York, NY 10032
>             (212)851-4765 (voice)
>             friedman at cancercenter.columbia.edu
>             http://cancercenter.columbia.edu/~friedman/
>
>             I am a Bayesian. When I see a multiple-choice question on a test and I
>             don't
>             know the answer I say "eeney-meaney-miney-moe".
>
>             Rose Friedman, Age 14
> 
> 
> 
> 
> 
> 
> 
>
>             On May 13, 2011, at 10:46 AM, john herbert wrote:
>
>               Dear Bioconductors,
>                   I have a bunch of DeltaCT values for several tissues. If I boxplot the
>                   data,
>                   it looks very similar to microarray data, a lot of congestion around zero.
>
>                   Likewise, if I log2 the data, as in microarray, the distributions looks
>                   close to normal and like microarray data.
>
>                   Please see the image here for different plots;
>
>                   https://docs.google.com/leaf?id=0B9IUGsKecS4GNDc0OWVlNzEtZjE5Yi00Y2Q4LWI0M2MtMGFiNzZhMDU0YTFm&hl=en
>
>                   My question is data manipulation in this manner OK for this type of data
>                   and
>                   will it effect/invalidate any unsupervised machine learning/clustering?
>
>                   Can I quantile normalise the data and still do valid clustering?
>
>                          [[alternative HTML version deleted]]
>
>                   _______________________________________________
>                   Bioconductor mailing list
>                   Bioconductor at r-project.org
>                   https://stat.ethz.ch/mailman/listinfo/bioconductor
>                   Search the archives:
>                   http://news.gmane.org/gmane.science.biology.informatics.conductor
> 
>
>              [[alternative HTML version deleted]]
>
>       _______________________________________________
>       Bioconductor mailing list
>       Bioconductor at r-project.org
>       https://stat.ethz.ch/mailman/listinfo/bioconductor
>       Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
> 
> 
> 
>

-- 
------------------------------------------------------------
Richard A. Friedman, PhD
Associate Research Scientist
Herbert Irving Comprehensive Cancer Center
Biomedical Informatics Shared Resource
Lecturer
Department of Biomedical Informatics
Box 95, Room 130BB or P&S 1-420C
Columbia University Medical Center
630 W. 168th St.
New York, NY 10032
(212)305-6901 (5-6901) (voice)
friedman at cancercenter.columbia.edu
http://cancercenter.columbia.edu/~friedman/

"The last 250 pages of the last Harry Potter
book took place in one day because alot
happened in that day. All of Ulysses takes
place in one day and nothing happened in that day."
-Rose Friedman, age 11


More information about the Bioconductor mailing list