[BioC] which statistical test to perform?

Richard Friedman friedman at cancercenter.columbia.edu
Wed Nov 23 15:57:13 CET 2011


Dear Anand,

	You can calculate the log2 ratio of each pair of groups
you wish to compare. A few more pointers:

1. It is desirable to have at least 3 samples in each group in
order to calculate statistical signficance.
2. The MAS5 algorithm has been shown to lead to many false
positives. The RMA and GCRMA algorithms are more reliable,
but then you have to work from the cel files.
3. There are special statistical problems in analyzing microarray data  
because
of the large number of genes. The best way to address this problem is
the limma program.
4. Consdirations 2 and 3 are adsressable through user friendly  
Bioconductor
programs called AffylmGUI and OneChannelGUI.
5. I can send you my course notes on the  theory and workflow of the
above approach offline upon request.

Best wishes,
Rich
------------------------------------------------------------
Richard A. Friedman, PhD
Associate Research Scientist,
Biomedical Informatics Shared Resource
Herbert Irving Comprehensive Cancer Center (HICCC)
Lecturer,
Department of Biomedical Informatics (DBMI)
Educational Coordinator,
Center for Computational Biology and Bioinformatics (C2B2)/
National Center for Multiscale Analysis of Genomic Networks (MAGNet)
Room 824
Irving Cancer Research Center
Columbia University
1130 St. Nicholas Ave
New York, NY 10032
(212)851-4765 (voice)
friedman at cancercenter.columbia.edu
http://cancercenter.columbia.edu/~friedman/

I am a Bayesian. When I see a multiple-choice question on a test and I  
don't
know the answer I say "eeney-meaney-miney-moe".

Rose Friedman, Age 14







On Nov 23, 2011, at 7:34 AM, anand m t wrote:

> Dear Sir,
> Thank you for your valuable suggestion. Will definitely look into it.
>
> I've one more question though.
> Sir, if we have only two datasets (say lung and liver), we can  
> calculate
> log ratio (lung/liver) and finally fold change (2^log_ratio ,  
> considering
> log2 ratio).
> But in this case, if i want to determine the differentially  
> expressed genes
> based on Fold Change, how do i that??
> Do i have to take the ratio of expression value of each tissue with  
> all
> other remaining tissues??
>
> Sorry again, if my question doesn't make any sense.
>
>
> On Wed, Nov 23, 2011 at 5:43 PM, Sean Davis <sdavis2 at mail.nih.gov>  
> wrote:
>
>> Hi, Anand.
>>
>> Please try to keep the conversations on the list so that you can get
>> the best answers to your questions.
>>
>> First, you will need to split the data back out to include the values
>> for all your replicates.  In other words, do not use the means.
>> Working with means only essentially precludes any statistical testing
>> at all.
>>
>> Second, I would suggest that you take a look at the limma package and
>> the wonderful limma user guide.  The statistical framework used in
>> limma is the linear model and it works well for two-class or
>> multi-class problems.
>>
>> Sean
>>
>>
>> On Wed, Nov 23, 2011 at 6:40 AM, anand m t <anandrox05 at gmail.com>  
>> wrote:
>>> Sir,
>>> I'm tying to compare the data from all the brain tissues. The data  
>>> which
>>> i've shown here is the mean value of all 3 biological replicates  
>>> of each
>>> tissue.
>>>
>>> On Wed, Nov 23, 2011 at 4:57 PM, Sean Davis <sdavis2 at mail.nih.gov>
>> wrote:
>>>>
>>>> On Wed, Nov 23, 2011 at 5:14 AM, anand mt [guest]
>>>> <guest at bioconductor.org> wrote:
>>>>>
>>>>> hi all,
>>>>>
>>>>> i'm new to this microarray data  analysis.
>>>>> recently i've been given data consisting of 11 tissues.
>>>>> now i've normalized the data, filtered data using mas5 AP calls.  
>>>>> My
>>>>> question is, which statistical test i must perform
>>>>> to calculate the significance values ??
>>>>>
>>>>> sample data is as below:
>>>>>
>>>>>
>>>>>       accumbens       amygdala        cerebellum       
>>>>> corpus.collosum
>>>>> hippocampus     midbrain        p.lobe  putamen s.nigra t.lobe
>> thalamus
>>>>> 1007_s_at       11.93852233     12.21404093     11.46118612
>>>>> 13.41594885     12.42216256     12.89589133     11.58715914
>> 11.85803472
>>>>>    12.79920479     12.07087932     12.55338306
>>>>> 1053_at 7.490706858     7.526181155     7.551069308      
>>>>> 7.891002293
>>>>> 7.49104271      7.971097552     8.088918072        7.660258014
>>>>> 7.92423132      7.54689645      7.128753703
>>>>> 117_at  8.486898268        8.773089087        7.642339349
>>>>> 8.560352732     7.676296801     7.865961146        7.250275943
>>>>> 7.929165261     7.874073766        7.940298941     8.10731601
>>>>>
>>>>>
>>>>> I got some web results, from which i came to know that, chi-square
>> test
>>>>> is of more relevant in this case (to compare 3 or more unmatched
>> groups,
>>>>> binomial). Is it correct to choose chi-square test ??
>>>>>
>>>>> Sorry if my question is too lame.
>>>>
>>>> Hi, Anand.
>>>>
>>>> I'm assuming that for you the biological question that you are  
>>>> asking
>>>> is obvious, but to me it seems unclear.  In particular, what groups
>>>> above are you trying to compare?  It seems you have no replicates?
>>>>
>>>> Sean
>>>>
>>>>
>>>>> thanks in advance.
>>>>>
>>>>> -- output of sessionInfo():
>>>>>
>>>>> R version 2.13.1 (2011-07-08)
>>>>> Platform: i386-pc-mingw32/i386 (32-bit)
>>>>>
>>>>> locale:
>>>>> [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United
>>>>> States.1252    LC_MONETARY=English_United States.1252 LC_NUMERIC=C
>>>>> [5] LC_TIME=English_United States.1252
>>>>>
>>>>> attached base packages:
>>>>> [1] stats     graphics  grDevices utils     datasets  methods    
>>>>> base
>>>>>
>>>>> other attached packages:
>>>>> [1] MASS_7.3-13
>>>>>
>>>>> loaded via a namespace (and not attached):
>>>>> [1] tools_2.13.1
>>>>>
>>>>> --
>>>>> Sent via the guest posting facility at bioconductor.org.
>>>>>
>>>>> _______________________________________________
>>>>> Bioconductor mailing list
>>>>> Bioconductor at r-project.org
>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>> Search the archives:
>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>>
>>>
>>>
>>>
>>> --
>>> ******************************************************************
>>> Anand M.T
>>> School of Biotechnology (Bio-Informatics),
>>> International Instituteof Information Technology (I2IT),
>>> P-14, Rajiv Gandhi Infotech park,
>>> Hinjewadi,
>>> Pune-411 057.
>>> INDIA.
>>>
>>
>
>
>
> -- 
> ******************************************************************
> Anand M.T
> School of Biotechnology (Bio-Informatics),
> International Instituteof Information Technology (I2IT),
> P-14, Rajiv Gandhi Infotech park,
> Hinjewadi,
> Pune-411 057.
> INDIA.
>
> "The secret of success comprised in three words.. Work, Finish &  
> Publish" -
> Michael Faraday
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list