[BioC] Query related to topGene functionality in RankProd library.

Fangxin Hong fxhong at jimmy.harvard.edu
Tue Dec 8 16:46:11 CET 2009


Hi Rohan,
What I suspect is some "outliers" or extreme values that screw up the 
average fold-change, especially when you have uneven design (different # 
of samples in two group),

for example for given gene , its readout under class A  (4 samples) are: 
100,300,60,80, and under class B ( 2 samples) are: 120, 140  => average 
fold-change (sample1/sample2)=1.038
However, it is likely that 300 value is an outlier since it is not 
consistent with other three meaurement, therefore this gene is likely 
downregulated in class A compared with class B though the average FC >1.
In fact this is the power of rankproduct method as it is much more 
robust against outlier than t-like statistics which will call this gene 
up-regulated in class A.

If you prefer, you would send me the expression of such genes over these 
20 samples so that I would take a look ?

Best,
Fangxin

Rohan M wrote:
> Dear Fangxin,
>
>  Thank you very much for looking into mail and replying promptly, I 
> really appreciate it.
> For answers to question 2 and 3, I understood the points. I cross 
> checked that the data was normalized using MAS algorithm.
>
> Regarding question 1 here are few more details -
> I'm using two class data with Single origin to calculate Rank product. 
> For example, Class A contains 20 samples and Class B contains 10 
> samples. I'm using following cutoff for the topGene functionality.
>
> /classes - The vector to assign two classes labels./
> /RP.out <- RP(arab, classes, num.perm = 100, logged = TRUE,na.rm = 
> FALSE, plot = FALSE, rand = 123)
> outPut <- topGene(RP.out, cutoff = 0.05, method = "pfp", logged = 
> TRUE, logbase = 2, gene.names = rownames(arab))/
>
> Is the cutoff 0.05 enough or need to be more stringent?  Also, with 
> this set up is it possible to have FC value more than 1 in Table1 or 
> FC value less than 1 in Table2 for many probes? 
>
> Thank you once again.
>
> Regards,
> Rohan
>
> On Tue, Dec 8, 2009 at 12:16 AM, Fangxin Hong 
> <fxhong at jimmy.harvard.edu <mailto:fxhong at jimmy.harvard.edu>> wrote:
>
>     Hi Rohan,
>     Please see my comments below,
>
>
>          
>
>             -----Original Message-----
>             From: bioconductor-bounces at stat.math.ethz.ch
>             <mailto:bioconductor-bounces at stat.math.ethz.ch>
>             [mailto:bioconductor- <mailto:bioconductor->
>             bounces at stat.math.ethz.ch
>             <mailto:bounces at stat.math.ethz.ch>] On Behalf Of Rohan M
>             Sent: Monday, December 07, 2009 6:55 AM
>             To: bioconductor at stat.math.ethz.ch
>             <mailto:bioconductor at stat.math.ethz.ch>
>             Subject: [BioC] Query related to topGene functionality in
>             RankProd
>             library.
>
>             Dear Sir,
>
>             I'm using RankProd library to find the significant genes
>             in microarray
>             studies.  I'm facing some problems in understanding the
>             output of
>             topGene
>             functionality.
>             Could you help me on following queries?
>
>             1) The topGene functionality outputs
>             Table1: Genes called significant under class1 < class2 (Up
>             regulated
>             Genes)
>             and Table2: Genes called significant under class1 > class2
>             (Down
>             regulated
>             genes). When I see the fold change value in both tables ,
>             there are
>             some
>             genes having fold change value less than 1 in Table 1 and
>             some genes
>             have
>             fold change value greater than 1 in Table 2.
>             "If the Gene has fold change value less than 1 then its
>             down regulated"
>             how
>             can I interpret fold change value (up regulated or down
>             regulated ) in
>             such
>             case?
>                
>
>     Theoretically this shouldn't happen as expression level is
>     suppressed when downregulated. I don't know what cutoff point you
>     selected for topGene.
>     If you use a loose criteria, which would lead to gene with not
>     strong signal being identified out, this would happen.
>     For example, if  gene A has 4 fold-change readout (in one-channel
>     case) as 1.6,0.9,0.7,0.9 then the average fold-change is1.025.
>     However, this gene might be identified in downregulation list as 3
>     out of 4 fold changes are less than 1.
>
>
>     2) In some cases both Table 1 and Table 2 contains same probe. Is it
>
>             possible to have one probe present in both tables? If Yes,
>             then which
>             one
>             should be considered?
>                
>
>     This type result would very much indicate this gene doesn't  have
>     decent signal in either direction. This would happen when random
>     variation gives  fake signal in both direction, when
>     For example,  a gene with 4 fold-changes of 1.3, 0.6, 0.9, 1.2
>     would results in both list if a loose cut-off point is selected.
>
>             3) Sometime I see "Inf" as fold change value - must be
>             infinity. Is it
>             possible to have such value?
>                
>
>     If there is 0 or negative value in the expression data (like the
>     one normalized with MAS5), then it is possible.
>
>
>     As stated in package manual, it is always a good to look at the
>     data when such results coming out, which would help a lot in term
>     of selecting cut-off point and interpret results
>
>     Hope this would help, let me know if this is not clear or you
>     prefer to send over your data for mt to take a look.
>
>     Best,
>     Fangxin
>
>             Could you please help me understanding the above points?
>
>             Thanks and regards,
>             Rohan
>
>                    [[alternative HTML version deleted]]
>
>             _______________________________________________
>             Bioconductor mailing list
>             Bioconductor at stat.math.ethz.ch
>             <mailto:Bioconductor at stat.math.ethz.ch>
>             https://stat.ethz.ch/mailman/listinfo/bioconductor
>             Search the archives:
>             http://news.gmane.org/gmane.science.biology.informatics.conductor
>                
>
>
>     -- 
>
>     Fangxin Hong Ph.D.
>     Research Scientist
>     Department of Biostatistics and Computational Biology
>     Dana-Farber Cancer Institute, Harvard School of Public Health
>     Phone: 617-632-3602
>     Email: fxhong at jimmy.harvard.edu <mailto:fxhong at jimmy.harvard.edu>
>
>

-- 

Fangxin Hong Ph.D.
Research Scientist
Department of Biostatistics and Computational Biology
Dana-Farber Cancer Institute, Harvard School of Public Health
Phone: 617-632-3602
Email: fxhong at jimmy.harvard.edu



More information about the Bioconductor mailing list