[BioC] subsetting the genes for cluster

Mark Cowley m.cowley at garvan.org.au
Fri Sep 5 06:07:39 CEST 2008


On 05/09/2008, at 5:06 AM, Sean Davis wrote:
> On Thu, Sep 4, 2008 at 10:59 AM, Abhilash Venu <abhivenu at gmail.com>  
> wrote:
>> On Thu, Sep 4, 2008 at 5:21 AM, Mark Cowley  
>> <m.cowley at garvan.org.au> wrote:
>>
>>> Hi Abhilash,
>>>
>>> On 02/09/2008, at 11:09 PM, Abhilash Venu wrote:
>>>
>>> Hi all,
>>>>
>>>> I am working on a single color expression data using limma. I  
>>>> would like
>>>> to
>>>> perform a cluster analysis after selecting the differentially  
>>>> genes based
>>>> on
>>>> the P value (say 0.001). As far as my knowledge is concerned I  
>>>> have to do
>>>> the sub setting of these selected genes on the normalized data  
>>>> (MA), to
>>>> retrieve the distribution across the samples.
>>>>
>>> That's correct
>>
>>
>>
>>> Thank you Mark, But I am quite cinfused here. Because our  
>>> colaborator has
>>> already performed single color in agilent platform, when I had  
>>> performed
>>> cluster using the same method as I mentioned the color key has given
>>> positive values (as all the values are positive, if I chose values  
>>> from MA).
>>> Our collaborator feels that this scenario is quite unusual because  
>>> the green
>>> color usually represents down regulation. Could you suggest, how I  
>>> should go
>>> about it?
Part of this confusion stems from your non-standard use of 'MA' (I've  
checked your past posts to work this out), since 'MA' implies two- 
colour data, where the M-values, which are the ratios are the quantity  
of interest. You are dealing with single colour data, so I assume that  
in your use of MA you need to be referring to the A-values, but i'm  
not sure how limma deals with this in the way that you have used it.  
My clear preference is when you are dealing with single colour data is  
not to use 2-colour data objects. However, I assume that you have been  
able to identify and subset this data in order to have sent your  
previous reply to the list, so lets move on.

back to your confusion: your collaborator is right. the vast majority  
of clustering is used to show RELATIVE expression, not absolute  
expression.
If you 'mean correct' your absolute expression data, you will convert  
it to ratios, and then the heatmap.2 might give you a sensible picture.

I agree with Sean (which I seem to be doing a lot recently) in that  
you need to improve your basic R usage, and the links that Sean  
provided are a great place to start, as is R for beginners by Paradis.

cheers, Mark

>>>
>
> Did you use heatmap.2 to do the heatmap?  If so, there is an argument
> "scale" that might be useful.  For ALL functions that are new, I would
> advise reading the whole help page, as there is often very useful
> information there.
>
>>>>
>>>> But I am wondering whether I can perform using the R script?
>>>>
>>> Can you elaborate on "using the R script"I was not sure about the  
>>> R script
>>> for subsetting, so I performed using python.
>
> You can try help.search('subset'), as a start.  RSiteSearch is also
> useful for searching for answers.
>
> You will likely benefit from reading:
>
> http://cran.r-project.org/doc/manuals/R-intro.html
>
> And potentially from:
>
> http://biostat-09.berkeley.edu/~bullard/courses/T-berkeley-08/resources/R_intro_easy.pdf
>
>>>>
>>>> I would appreciate any help.
>>>>
>>> You need 2 things: the names of the DE genes, and the normalised  
>>> data.
>>> Get the DE genes from your toptable, and the normalised data from  
>>> within
>>> your MA object (hint: names(MA) ).
>>> Then sub-set the normalised data to just those rows from the DE  
>>> genes, then
>>> perform cluster analysis. There are large number of ways of doing  
>>> this. To
>>> get you started, have a look at heatmap.2 from the package gplots.
>>> others include the built in
>>> hclust( dist( yourDEdata ) )
>>>
>>> cheers,
>>> Mark
>>>
>>> -----------------------------------------------------
>>> Mark Cowley, BSc (Bioinformatics)(Hons)
>>>
>>> Peter Wills Bioinformatics Centre
>>> Garvan Institute of Medical Research, Sydney, Australia
>>> -----------------------------------------------------
>>>
>>>
>>
>>
>> --
>>
>> Regards,
>> Abhilash
>>
>>       [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>>



More information about the Bioconductor mailing list