[R] how to deduplicate records, e.g. using melt() and cast()

Karl Brand k.brand at erasmusmc.nl
Tue May 8 14:10:38 CEST 2012


Fantastic Jan,

Thanks a lot for the example on how i achieve this with melt()/cast(). 
Very good for my understanding of these functions.

Karl


On 07/05/12 13:49, Jan van der Laan wrote:
> using reshape:
>
> library(reshape)
> m <- melt(my.df, id.var="pathway", na.rm=T)
> cast(m, pathway~variable, sum, fill=NA)
>
> Jan
>
>
> On 05/07/2012 12:30 PM, Karl Brand wrote:
>> Dimitris, Petra,
>>
>> Thank you! aggregate() is my lesson for today, not melt() | cast()
>>
>> Really appreciate the super fast help,
>>
>> Karl
>>
>> On 07/05/12 12:09, Dimitris Rizopoulos wrote:
>>> you could try aggregate(), e.g.,
>>>
>>> my.df <- data.frame(pathway = c(rep("pw.A", 2), rep("pw.B", 3),
>>> rep("pw.C", 1)),
>>> cond.one = c(0.5, NA, 0.4, NA, NA, NA),
>>> cond.two = c(NA, 0.6, NA, 0.9, NA, 0.2),
>>> cond.three = c(NA, NA, NA, NA, 0.1, NA))
>>>
>>>
>>> aggregate(my.df[-1], my.df['pathway'], sum, na.rm = TRUE)
>>>
>>> or
>>>
>>> sum. <- function(x) if (all(is.na(x))) NA else sum(x, na.rm = TRUE)
>>> aggregate(my.df[-1], my.df['pathway'], sum.)
>>>
>>>
>>> I hope it helps.
>>>
>>> Best,
>>> Dimitris
>>>
>>>
>>> On 5/7/2012 11:50 AM, Karl Brand wrote:
>>>> Esteemed UseRs,
>>>>
>>>> This must be embarrassingly trivial to achieve with e.g., melt() and
>>>> cast(): deduplicating records ("pw.X" in example) for a given set of
>>>> responses ("cond.Y" in example).
>>>>
>>>> Hopefully the runnable example shows clearly what i have and what i'm
>>>> trying to convert it to. But i'm just not getting it, ?cast that is! So
>>>> i'd really appreciate some ones patience to clarify this, using the
>>>> reshape package, or any other approach.
>>>>
>>>> With sincere thanks in advance,
>>>>
>>>> Karl
>>>>
>>>>
>>>> ## Runnable example
>>>> ## The data.frame i have:
>>>> library("reshape")
>>>> my.df <- data.frame(pathway = c(rep("pw.A", 2), rep("pw.B", 3),
>>>> rep("pw.C", 1)),
>>>> cond.one = c(0.5, NA, 0.4, NA, NA, NA),
>>>> cond.two = c(NA, 0.6, NA, 0.9, NA, 0.2),
>>>> cond.three = c(NA, NA, NA, NA, 0.1, NA))
>>>> my.df
>>>> ## The data fram i want:
>>>> wanted.df <- data.frame(pathway = c("pw.A", "pw.B", "pw.C"),
>>>> cond.one = c(0.5, 0.4, NA),
>>>> cond.two = c(0.6, 0.9, 0.2),
>>>> cond.three = c(NA, 0.1, NA))
>>>> wanted.df
>>>>
>>>>
>>>
>>
>

-- 
Karl Brand
Dept of Cardiology and Dept of Bioinformatics
Erasmus MC
Dr Molewaterplein 50
3015 GE Rotterdam
T +31 (0)10 703 2460 |M +31 (0)642 777 268 |F +31 (0)10 704 4161



More information about the R-help mailing list