[BioC] genenames in KEGG2heatmap, GO2heatmap

James W. MacDonald jmacdon at med.umich.edu
Mon Aug 9 21:30:55 CEST 2010


Hi Dave,

Please don't take things off-list; the archives are intended to be a 
resource for others to query.


On 8/9/2010 12:09 PM, Dave Bridges wrote:
> Jim,
>
> When I try to load the new featurenames I get:
>
>> featureNames(mat)<- ifelse(!is.na(symbs), symbs, featureNames(mat))
>
> Error in `row.names<-.data.frame`(`*tmp*`, value = c("Copg", "Atp6v0d1",  :
>    duplicate 'row.names' are not allowed
> In addition: Warning message:
> non-unique values when setting 'row.names': '0610007L01Rik', '0610007P08Rik', '0610007P22Rik', '0610009D07Rik', '0610009O20Rik', '0610010F05Rik', '0610010K06Rik', '0610010K14Rik', '0610010O12Rik', '0610011F06Rik', '0610011L14Rik', '0610012G03Rik', '0610030E20Rik', '0610037L13Rik', '0610040B10Rik', '0610042E11Rik', '1110003E01Rik', '1110004E09Rik', '1110004M10Rik', '1110005A03Rik', '1110007A13Rik', '1110008F13Rik', '1110008P14Rik', '1110012D08Rik', '1110012J17Rik', '1110014N23Rik', '1110018G07Rik', '1110018J18Rik', '1110020G09Rik', '1110021J02Rik', '1110028C15Rik', '1110030E23Rik', '1110032A04Rik', '1110032F04Rik', '1110034A24Rik', '1110034G24Rik', '1110037F02Rik', '1110054O05Rik', '1110057K04Rik', '1110059G02Rik', '1110067D22Rik', '1110069O07Rik', '1190002H23Rik', '1190003J15Rik', '1190005F20Rik', '1200009O22Rik', '1200011I18Rik', '1200014J11Rik', '1200014M14Rik', '1200016B10Rik', '1300010F03Rik', '1300018I17Rik', '1500002K03Rik', '1500003O03Rik', '1500004A13Rik', '1500005
C15Rik', '150 [... truncated]
>
> is there a way to override this

Good point; I wasn't thinking about the fact that you would be using all 
the probesets on a given chip. There isn't a way to override the 
restriction on row names, as they have to be unique.

The best way to handle this would be to use genefilter to restrict to 
unique Entrez Gene IDs (which I believe would be unique gene symbols as 
well). You can then convert probe IDs to gene symbols.

Best,

Jim
>
> -----Original Message-----
> From: James W. MacDonald [mailto:jmacdon at med.umich.edu]
> Sent: Monday, August 09, 2010 11:33 AM
> To: Dave Bridges
> Cc: bioconductor at stat.math.ethz.ch
> Subject: Re: [BioC] genenames in KEGG2heatmap, GO2heatmap
>
> Hi Dave,
>
> On 8/9/2010 11:03 AM, Dave Bridges wrote:
>> Is there a way to display gene names as opposed to affymetrix identifiers on the y axis of a heatmap?
>
> All you need to do is convert the Affy IDs to e.g., symbols first. Here
> I assume you are using an HG-U133plus2 chip, and have an expressionSet
> called 'eset'.
>
> library(hgu133plus2.db)
> symbs<- mget(featureNames(eset), hgu133plus2SYMBOL, ifnotfound = NA)
>
> At this point you might want to check that you only get one symbol per
> probeset ID (IIRC, you should).
>
> table(sapply(symbs, length))
>
> if all are length one, let's proceed. If not, you might want to do
>
> symbs<- lapply(symbs, function(x) x[1]))
>
> where we simply use the first symbol. At this point we can unlist.
>
> symbs<- unlist(symbs)
>
> Now, some of those Affy IDs might not have symbols, so you will have NA
> values there. In that case, we will just use the Affy IDs. You probably
> don't want to pollute your original expressionSet with these symbols, so
> let's make a copy and use that.
>
> mat<- eset
> featureNames(mat)<- ifelse(!is.na(symbs), symbs, featureNames(mat))
>
> et voila! Now just go ahead with your call to GO2heatmap() or whatever.
>
> Best,
>
> Jim
>
>
>>
>> 	[[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>

-- 
James W. MacDonald, M.S.
Biostatistician
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826
**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues 



More information about the Bioconductor mailing list