[BioC] regarding package ArrayExpress

Thu Sep 10 15:33:52 CEST 2009

Hi Misha,

Thanks for the trick, it seems to work.
But this brings another problem: the identifier in the array design file
and the row names in the expression file are now different. So maybe the
right question is why do we have duplicate identifiers in the expression
file and do we really want to read a file with those duplicates?

Sorry, maybe we should continue this conversation off list.
To the people using the package ArrayExpress on the processed data, I am
sorry for the problems that are still to be fixed. Thank you for your
feedback, it helps to identify the problems and I can try to fix them.

Audrey

> Hi,
>
> Without tweaking read.table, you'd have to read row names as one of the
> data columns, then make.names on that set of names and set the row names
> to the modified ones. So, something like
>
> d <- read.table("foo.tab") ## if read.table("foo.tab", row.names=1) fails
>
> rownames(d) <- make.names(d[,1], unique=TRUE)
>
> d <- d[,-1]                ## to remove the column used
>
> Whether these newly made "unique" row names are what you need is a good
> question... :)
>
> --Misha
>
> On Thu, 10 Sep 2009, audrey at ebi.ac.uk wrote:
>
>> Dear Amit,
>>
>> You are not making any mistakes. This is the proper way of calling the
>> functions to create an object from a processed dataset. However the
>> problem comes from the dataset itself. It contains duplicate probe
>> identifiers as row names, which is not allowed by the function
>> read.table
>> that is used in the procset function.
>> Unfortunately I do not have an idea on how to prevent this. Does someone
>> know how I could allow duplicate row names in my function?
>>
>> Best regards,
>> Audrey
>>
>> --
>> Audrey Kauffmann
>> EMBL - EBI
>> Cambridge UK
>> +44 (0) 1223 492 631
>> http://www.ebi.ac.uk/~audrey
>>
>>> Hello! List,
>>>
>>> I am trying to build an object from Array Express processed data using
>>> bioconductor package ArrayExpress. I did following:-
>>>
>>> CAGE99d = getAE("E-GAGE-99",type="processed")
>>> colname = getcolproc(CAGE99d)
>>> CAGE99p = procset(CAGE99d, colname[3])
>>>
>>> and I got following error:-
>>> Error in `row.names<-.data.frame`(`*tmp*`, value = c(6995L, 7017L,
>>> 7006L,
>>> :
>>>
>>>   duplicate 'row.names' are not allowed
>>> In addition: Warning message:
>>> non-unique values when setting 'row.names': ?R:A-MEXP-58:210099?,
>>> ?R:A-MEXP-58:210100?, ?R:A-MEXP-58:210111?,
>>> ?R:A-MEXP-58:210123?,?R:A-MEXP-
>>> [... truncated]
>>>
>>> I am not able to figure out mistake I am making. Please Help!
>>> Amit
>>>
>>> 	[[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>