[BioC] regarding package ArrayExpress

Thu Sep 10 18:04:41 CEST 2009

Tim Rayner wrote:
> Hi,
> 
> This would work assuming the featureData is kept synchronised with the
> assayData. I guess the alternative would be to take mean or median for
> the duplicated reporters, which might be more useful in some cases.
> Perhaps that could be added as an option? I know quite a few
> custom-printed arrays had duplicated reporter identifiers such as
> these; it should be less of a problem for the commercial arrays.

This comes up fairly regularly when using ExpressionSets with custom
arrays. The rationale for having unique row names (and consequently
featureNames) is that non-unique names imply some kind of software
'decision', e.g., that reporters with the same id should be averaged, or
that their names should be mangled. There doesn't seem to be a
universally right answer, so in my own work I usually put duplicate
reporter names into a column of featureData, and leave the rows
un-named. I then have to think explicitly about what to do with the
duplicates, at each stage of the analysis where this is important.

The problem with this for ArrayExpress is that the appropriate column of
featureData is an ad-hoc convention ('column X of featureData') rather
than enforced by the software.

Martin

> 
> Cheers,
> 
> Tim
> 
> 
> 2009/9/10 Misha Kapushesky <ostolop at ebi.ac.uk>:
>> Hi,
>>
>> Without tweaking read.table, you'd have to read row names as one of the data
>> columns, then make.names on that set of names and set the row names to the
>> modified ones. So, something like
>>
>> d <- read.table("foo.tab") ## if read.table("foo.tab", row.names=1) fails
>>
>> rownames(d) <- make.names(d[,1], unique=TRUE)
>>
>> d <- d[,-1]                ## to remove the column used
>>
>> Whether these newly made "unique" row names are what you need is a good
>> question... :)
>>
>> --Misha
>>
>> On Thu, 10 Sep 2009, audrey at ebi.ac.uk wrote:
>>
>>> Dear Amit,
>>>
>>> You are not making any mistakes. This is the proper way of calling the
>>> functions to create an object from a processed dataset. However the
>>> problem comes from the dataset itself. It contains duplicate probe
>>> identifiers as row names, which is not allowed by the function read.table
>>> that is used in the procset function.
>>> Unfortunately I do not have an idea on how to prevent this. Does someone
>>> know how I could allow duplicate row names in my function?
>>>
>>> Best regards,
>>> Audrey
>>>
>>> --
>>> Audrey Kauffmann
>>> EMBL - EBI
>>> Cambridge UK
>>> +44 (0) 1223 492 631
>>> http://www.ebi.ac.uk/~audrey
>>>
>>>> Hello! List,
>>>>
>>>> I am trying to build an object from Array Express processed data using
>>>> bioconductor package ArrayExpress. I did following:-
>>>>
>>>> CAGE99d = getAE("E-GAGE-99",type="processed")
>>>> colname = getcolproc(CAGE99d)
>>>> CAGE99p = procset(CAGE99d, colname[3])
>>>>
>>>> and I got following error:-
>>>> Error in `row.names<-.data.frame`(`*tmp*`, value = c(6995L, 7017L, 7006L,
>>>> :
>>>>
>>>>  duplicate 'row.names' are not allowed
>>>> In addition: Warning message:
>>>> non-unique values when setting 'row.names': ?R:A-MEXP-58:210099?,
>>>> ?R:A-MEXP-58:210100?, ?R:A-MEXP-58:210111?,
>>>> ?R:A-MEXP-58:210123?,?R:A-MEXP-
>>>> [... truncated]
>>>>
>>>> I am not able to figure out mistake I am making. Please Help!
>>>> Amit
>>>>
>>>>        [[alternative HTML version deleted]]
>>>>
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor at stat.math.ethz.ch
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> Search the archives:
>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor