[BioC] Query related to topGene functionality in RankProd library.
Fangxin Hong
fxhong at jimmy.harvard.edu
Tue Dec 8 16:46:11 CET 2009
Hi Rohan,
What I suspect is some "outliers" or extreme values that screw up the
average fold-change, especially when you have uneven design (different #
of samples in two group),
for example for given gene , its readout under class A (4 samples) are:
100,300,60,80, and under class B ( 2 samples) are: 120, 140 => average
fold-change (sample1/sample2)=1.038
However, it is likely that 300 value is an outlier since it is not
consistent with other three meaurement, therefore this gene is likely
downregulated in class A compared with class B though the average FC >1.
In fact this is the power of rankproduct method as it is much more
robust against outlier than t-like statistics which will call this gene
up-regulated in class A.
If you prefer, you would send me the expression of such genes over these
20 samples so that I would take a look ?
Best,
Fangxin
Rohan M wrote:
> Dear Fangxin,
>
> Thank you very much for looking into mail and replying promptly, I
> really appreciate it.
> For answers to question 2 and 3, I understood the points. I cross
> checked that the data was normalized using MAS algorithm.
>
> Regarding question 1 here are few more details -
> I'm using two class data with Single origin to calculate Rank product.
> For example, Class A contains 20 samples and Class B contains 10
> samples. I'm using following cutoff for the topGene functionality.
>
> /classes - The vector to assign two classes labels./
> /RP.out <- RP(arab, classes, num.perm = 100, logged = TRUE,na.rm =
> FALSE, plot = FALSE, rand = 123)
> outPut <- topGene(RP.out, cutoff = 0.05, method = "pfp", logged =
> TRUE, logbase = 2, gene.names = rownames(arab))/
>
> Is the cutoff 0.05 enough or need to be more stringent? Also, with
> this set up is it possible to have FC value more than 1 in Table1 or
> FC value less than 1 in Table2 for many probes?
>
> Thank you once again.
>
> Regards,
> Rohan
>
> On Tue, Dec 8, 2009 at 12:16 AM, Fangxin Hong
> <fxhong at jimmy.harvard.edu <mailto:fxhong at jimmy.harvard.edu>> wrote:
>
> Hi Rohan,
> Please see my comments below,
>
>
>
>
> -----Original Message-----
> From: bioconductor-bounces at stat.math.ethz.ch
> <mailto:bioconductor-bounces at stat.math.ethz.ch>
> [mailto:bioconductor- <mailto:bioconductor->
> bounces at stat.math.ethz.ch
> <mailto:bounces at stat.math.ethz.ch>] On Behalf Of Rohan M
> Sent: Monday, December 07, 2009 6:55 AM
> To: bioconductor at stat.math.ethz.ch
> <mailto:bioconductor at stat.math.ethz.ch>
> Subject: [BioC] Query related to topGene functionality in
> RankProd
> library.
>
> Dear Sir,
>
> I'm using RankProd library to find the significant genes
> in microarray
> studies. I'm facing some problems in understanding the
> output of
> topGene
> functionality.
> Could you help me on following queries?
>
> 1) The topGene functionality outputs
> Table1: Genes called significant under class1 < class2 (Up
> regulated
> Genes)
> and Table2: Genes called significant under class1 > class2
> (Down
> regulated
> genes). When I see the fold change value in both tables ,
> there are
> some
> genes having fold change value less than 1 in Table 1 and
> some genes
> have
> fold change value greater than 1 in Table 2.
> "If the Gene has fold change value less than 1 then its
> down regulated"
> how
> can I interpret fold change value (up regulated or down
> regulated ) in
> such
> case?
>
>
> Theoretically this shouldn't happen as expression level is
> suppressed when downregulated. I don't know what cutoff point you
> selected for topGene.
> If you use a loose criteria, which would lead to gene with not
> strong signal being identified out, this would happen.
> For example, if gene A has 4 fold-change readout (in one-channel
> case) as 1.6,0.9,0.7,0.9 then the average fold-change is1.025.
> However, this gene might be identified in downregulation list as 3
> out of 4 fold changes are less than 1.
>
>
> 2) In some cases both Table 1 and Table 2 contains same probe. Is it
>
> possible to have one probe present in both tables? If Yes,
> then which
> one
> should be considered?
>
>
> This type result would very much indicate this gene doesn't have
> decent signal in either direction. This would happen when random
> variation gives fake signal in both direction, when
> For example, a gene with 4 fold-changes of 1.3, 0.6, 0.9, 1.2
> would results in both list if a loose cut-off point is selected.
>
> 3) Sometime I see "Inf" as fold change value - must be
> infinity. Is it
> possible to have such value?
>
>
> If there is 0 or negative value in the expression data (like the
> one normalized with MAS5), then it is possible.
>
>
> As stated in package manual, it is always a good to look at the
> data when such results coming out, which would help a lot in term
> of selecting cut-off point and interpret results
>
> Hope this would help, let me know if this is not clear or you
> prefer to send over your data for mt to take a look.
>
> Best,
> Fangxin
>
> Could you please help me understanding the above points?
>
> Thanks and regards,
> Rohan
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> <mailto:Bioconductor at stat.math.ethz.ch>
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
>
> --
>
> Fangxin Hong Ph.D.
> Research Scientist
> Department of Biostatistics and Computational Biology
> Dana-Farber Cancer Institute, Harvard School of Public Health
> Phone: 617-632-3602
> Email: fxhong at jimmy.harvard.edu <mailto:fxhong at jimmy.harvard.edu>
>
>
--
Fangxin Hong Ph.D.
Research Scientist
Department of Biostatistics and Computational Biology
Dana-Farber Cancer Institute, Harvard School of Public Health
Phone: 617-632-3602
Email: fxhong at jimmy.harvard.edu
More information about the Bioconductor
mailing list