[BioC] Gene Selection

Arne.Muller at sanofi-aventis.com Arne.Muller at sanofi-aventis.com
Mon Feb 7 10:56:50 CET 2005


> 
> 
> Dear Arne,
> 
> thanks for your reply.
> 
> >please correct me if I got it wrong: The experiment is a 
> factorial design with factor 2 beeing nested within factor 1, i.e.
> >
> >1. the "category" with three levels (category 1 to 3)
> >2. nested within within each level of the above factor there 
> is another factor (sub-categories) with 3 to 4 levels.
> >
> 
> That is exactly the design I have.
> 
> >What do you mean by "select differential expressed genes for 
> one category"?
> >
> 
> First, I want to select those genes discriminating between the 3 to 4 
> sub-categories within one category. (e.g. "Which genes are 
> significantly 
> differentially expressed [up or down] for sub-category 'g' 
> and not for 
> all others?")

You should use a linear model for this, maybe limma for which you'd need to setup a proper model matrix and contrasts. I can only give you some hints for the standrd "poor man's" linear models in R. Look at 

http://cran.r-project.org/doc/manuals/R-intro.html#Formulae-for-statistical-models
 
For coding complex models in R.

For you purpose you may want to have a look at a nested model (sub-categories are nested within categories)

fit <- lm(Intensity ~ category + category %in% subcategory, data=x)
summary(fit) gives you the estimate and p-values and  anova(fit) tells you whether there are overall differences in category:subcategory. This also compares the catagories with each other. You can use the estimates to calculate fold changes using the predict function for a fit (it  gives you predicted values, intensities, for the model and you can use that to calculate rations).

If you're interested in specific comparisons you need to construct contrasts, e.g.

contrasts(subcategories) <- contr.treatment(levels(subcategories), base=1)

see ?contrasts

	kind regards,

	Arne

> >I see two choices:
> >
> >1. is there an overall difference between the three main cateogies
> >2. Within each category, are all sub-categories the same in 
> terms of gene expression or is there a (any) difference?
> >
> 
> The second question is that I am looking for.
> 
> Thanks for your help and best wishes,
> Heike
> 
> 
> 
> >>-----Original Message-----
> >>From: bioconductor-bounces at stat.math.ethz.ch
> >>[mailto:bioconductor-bounces at stat.math.ethz.ch]On Behalf Of Heike
> >>Pospisil
> >>Sent: 04 February 2005 16:28
> >>To: Naomi Altman
> >>Cc: bioconductor at stat.math.ethz.ch
> >>Subject: Re: [BioC] Gene Selection
> >>
> >>
> >>Dear Naomi (and Stephen)
> >>
> >>thanks for your replies. Sorry for the little information I 
> >>gave in my 
> >>last email.
> >>
> >>I have 79 cel-files. Each chip is classified concerning three 
> >>different 
> >>criteria (categories). For each category, there exist at 
> >>least 3 subclasses:
> >>
> >>               Cat.A                       Cat.B              
> >>           
> >>  Cat.C
> >>1.CEL     g                                  l                
> >>           
> >>         n
> >>2.CEL     n                                  
> >>0                                   r
> >>3.CEL     r                                   
> >>n                                   l
> >>...
> >>79.CEL   n                                   r 
> >>                                   0
> >>          ---------                    ----------             
> >>         
> >>----------
> >>          3 subclasses             4 subclasses                  4 
> >>subclasses
> >>          n,g,r                         
> >>l,0,n,r                            l,0,n,r
> >>
> >>For the first analysis, I only need to select differential 
> expressed 
> >>genes for one category.
> >>
> >>I read some tutorials and could reproduce these analyses, but 
> >>I am not 
> >>sure what the right strategy for me (limma or multtest or 
> >>simple ttest 
> >>or whatever).
> >>
> >>Thanks for your help and best wishes
> >>Heike
> >>
> >>    
> >>
> >>>Dear Dr. Pospisil,
> >>>I am sure someone would be happy to assist you, but we need more 
> >>>information.
> >>>
> >>>How many treatments (conditions, types of tissue, genotype, 
> >>>      
> >>>
> >>or whatever)?
> >>    
> >>
> >>>What is the objective of the study: differential expression? gene 
> >>>expression clustering?  predicting tissue type?
> >>>
> >>>--Naomi Altman
> >>>
> >>>At 10:06 AM 2/3/2005, Heike Pospisil wrote:
> >>>
> >>>      
> >>>
> >>>>Dear users,
> >>>>
> >>>>I am (nearly) a BioC beginner and hope someone could help 
> >>>>        
> >>>>
> >>me with my 
> >>    
> >>
> >>>>first analysis.
> >>>>I am looking for methods to select discriminating genes 
> >>>>        
> >>>>
> >>from a couple 
> >>    
> >>
> >>>>of cel-files using the following metrics: T-statistics, 
> >>>>        
> >>>>
> >>chi-square, 
> >>    
> >>
> >>>>Wilkins' and correlation-based feature selection. I would 
> >>>>        
> >>>>
> >>be glad to 
> >>    
> >>
> >>>>get some hints or links to some tutorials.
> >>>>
> >>>>Thanks in advance,
> >>>>Heike
> >>>>
> >>>>-- 
> >>>>Dr. Heike Pospisil
> >>>>Center for Bioinformatics, University of Hamburg
> >>>>Bundesstrasse 43, 20146 Hamburg, Germany
> >>>>phone: +49-40-42838-7303 fax: +49-40-42838-7312
> >>>>
> >>>>_______________________________________________
> >>>>Bioconductor mailing list
> >>>>Bioconductor at stat.math.ethz.ch
> >>>>https://stat.ethz.ch/mailman/listinfo/bioconductor
> >>>>        
> >>>>
> >>>Naomi S. Altman                                814-865-3791 (voice)
> >>>Associate Professor
> >>>Bioinformatics Consulting Center
> >>>Dept. of Statistics                              814-863-7114 (fax)
> >>>Penn State University                         814-865-1348 
> >>>      
> >>>
> >>(Statistics)
> >>    
> >>
> >>>University Park, PA 16802-2111
> >>>
> >>>
> >>>
> >>>      
> >>>
> >>-- 
> >>Dr. Heike Pospisil
> >>Center for Bioinformatics, University of Hamburg
> >>Bundesstrasse 43, 20146 Hamburg, Germany
> >>phone: +49-40-42838-7303 fax: +49-40-42838-7312
> >>
> >>_______________________________________________
> >>Bioconductor mailing list
> >>Bioconductor at stat.math.ethz.ch
> >>https://stat.ethz.ch/mailman/listinfo/bioconductor
> >>    
> >>
> >
> >
> >  
> >
> 
> -- 
> Dr. Heike Pospisil
> Center for Bioinformatics, University of Hamburg
> Bundesstrasse 43, 20146 Hamburg, Germany
> phone: +49-40-42838-7303 fax: +49-40-42838-7312
> 
> 
>



More information about the Bioconductor mailing list