[BioC] No replicates and differential analysis !!
aedin at jimmy.harvard.edu
Thu Jan 26 18:11:57 CET 2006
I completely agree with you. I think I was glad to find that anything
agreed with an n=1. So we first "validated" (I agree its a bad word)
that the results from the arrays made sense using RT-PCR. Then we have
followed the findings, and validated using a different in vivo
Naomi Altman wrote:
> Again, we need to be careful about what is "validated" by PCR.
> If the RNA used for PCR were the same samples hybridized to the
> arrays, you have validated that the arrays "worked" technically. (And
> this is certainly worth knowing.)
> But what we usually want to validate is that the genes are
> differentially expressed in the population, which can only be
> validated by use of an independent sample.
> At 11:06 AM 1/26/2006, Aedin Culhane wrote:
>> Hi Nicolas,
>> I recently had to analyse the same type of data. We had only 2 arrays
>> from rare mRNA (each array contained a pool mRNA from 5 animals). Both
>> we had only 2 arrays which we wanted to compare. All we could do was
>> rank the difference of the genes, and take the maximum fold change. We
>> found the expression value/processing of the probeset values made a big
>> different to the number of genes that had a >2 fold difference. When we
>> apply a mas5 to call the expression value, we had over 2,700 genes with
>> greater than a 2 fold change. When gcRMA was used, 260 genes had a 2
>> fold difference, and with vsn only 11 genes had a 2 fold difference. I
>> have lots of details on this analysis if it will help you. We found most
>> of the genes that mas5 called different were in the low expression
>> range, and could not be trusted.
>> We validated 8 genes which we >2 fold different on both vsn and gcRMA
>> using RT-PCR. We had excellent correlation in all cases. vsn does very
>> slightly "under-estimate" the fold difference. I would definitely trust
>> any genes that have a >2 fold difference when using vsn. I would not
>> trust these if they are called using mas5. The glog transformation is
>> worth applying particularly in these kinds of analyses. We found the
>> glog-ratio to be reliable. Of course we have no real idea of the number
>> of true positives we missed (false -ve).
>> By using vsn, and removing the intensity-dependence of the variance. You
>> can argue that you have removed the denominator of the T-statistic and
>> thus comparing the "mean" difference is valid. Of course the mean, has
>> an n of 1. Thus its just the glog-ratio. Albeit a woolly assumption, at
>> least its gives better basis to your analysis.
>> The second thing I might consider, is checking for replicate probesets
>> on the array, if the replicate probesets agree, then you can be more
>> confident in the result.
>> Although fold change isn't a good statistical measure, a good variance
>> estimate can be difficult. We just completed a comparison of feature
>> selection method (jeffery et al.,) in which we should that at low number
>> of replicates (n<5), rankproducts or even fold change can perform as
>> well as or outperform t-statistic and moderated t-statistic methods,
>> dependent on the variance structure of the data.
>> Hope this helps,
>> PDate: Wed, 25 Jan 2006 16:43:51 +0000
>> From: Wolfgang Huber <huber at ebi.ac.uk>
>> Subject: Re: [BioC] No replicates and differential analysis !!
>> To: Nicolas Servant <Nicolas.Servant at curie.fr>
>> Cc: Bioconductor <bioconductor at stat.math.ethz.ch>
>> Message-ID: <43D7AAC7.9080401 at ebi.ac.uk>
>> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>> Hi Nicolas,
>> > And it is
>> > supported that the FC tends to be greater at low expression levels.
>> What is supported is that the variance of the _estimate_ of the FC (the
>> true underlying quantity) by the log-ratio of measured probe intensities
>> tends to be greater at low expression levels. Indeed this depends on the
>> preprocessing and background correction. Consider this paper:
>> and the accompanying "vsn" package in bioC. It removes the
>> intensity-dependence of the variance, and you can use the "glog-ratio",
>> which is an alternative estimator of FC, to select genes. This amounts
>> to assuming that all genes have the same variance.
>> Of course the assumption is not really true, there can be gene-specific
>> causes for different variances (besides overall intensity). But with
>> only two arrays you have no way of seeing them. Hence, using glog-ratio
>> to select genes when there are no replicates is an extreme version of
>> the moderated t-statistic (which is often used when there are few
>> Best wishes
>> Nicolas Servant wrote:
>> >> Thanks for your answer,
>> >> But in this case, i have to choose a fold change threshold ! And
>> it is
>> >> supported that the FC tends to be greater at low expression levels.
>> >> For instance a FC greater than 2 for expression values near 50 is
>> >> readily seen, but it is low probability to observe FC greater than
>> 2 for
>> >> expression values near 1000
>> >> So i would like to use a more robust approach.
>> >> Regards,
>> >> Nicolas S.
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
> Naomi S. Altman 814-865-3791 (voice)
> Associate Professor
> Dept. of Statistics 814-863-7114 (fax)
> Penn State University 814-865-1348 (Statistics)
> University Park, PA 16802-2111
More information about the Bioconductor