[BioC] geneName in GEOquery package

Thu Jan 12 20:08:45 CET 2006

On 1/12/06 1:58 PM, "Ting-Yuan Liu" <tliu at fhcrc.org> wrote:

> 
> Hi Sean,
> 
> I notice that you do some modification in GEOquery to handle the geneNames
> in the transformed exprSets.  I am really glad to see this improvement,
> but I think there is still a bug in the geneNames.  For example,
> 
>> library(GEOquery)
>> 
>> gds82 <- getGEO("GDS82")
> trying URL 'ftp://ftp.ncbi.nih.gov/pub/geo/data/gds/soft_gz/GDS82.soft.gz'
> ftp data connection made, file length 98375 bytes
> opened URL
> ==================================================
> downloaded 96Kb
> 
> File stored at: 
> /tmp/RtmpY010FQ/GDS82.soft.gz
> parsing geodata
> parsing subsets
> ready to return
>> gds82eSet <- GDS2eSet(gds82, do.log2=FALSE)
>> head(geneNames(gds82eSet), 20)
>  [1] "1"  "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10" "11" "12" "13" "14"
> "15"
> [16] "16" "17" "18" "19" "20"
> 
> This is not quite right.  I think you used the first column in the data
> table to be the geneNames, but I think it is supposed to be the second
> column:

Ting-Yuan

There is a problem with using the IDENTIFIER column--it doesn't need to be
unique and the geneNames for an exprSet do need to be unique.  ID_REF, on
the other hand, is unique and for the typical affy GDS, includes affymetrix
probeset ids; that is the reason for using it over the IDENTIFIER column.
If you know that the identifier column IS unique and would rather use that,
it is pretty simple to do so:

 geneNames(gds82eSet) <- Table(gds82)$IDENTIFIER

I hope that solves your problem problem.

Sean