[R] how to deduplicate records, e.g. using melt() and cast()

Jan van der Laan rhelp at eoos.dds.nl
Mon May 7 13:49:35 CEST 2012


using reshape:

library(reshape)
m <- melt(my.df, id.var="pathway", na.rm=T)
cast(m, pathway~variable, sum, fill=NA)

Jan


On 05/07/2012 12:30 PM, Karl Brand wrote:
> Dimitris, Petra,
>
> Thank you! aggregate() is my lesson for today, not melt() | cast()
>
> Really appreciate the super fast help,
>
> Karl
>
> On 07/05/12 12:09, Dimitris Rizopoulos wrote:
>> you could try aggregate(), e.g.,
>>
>> my.df <- data.frame(pathway = c(rep("pw.A", 2), rep("pw.B", 3),
>> rep("pw.C", 1)),
>> cond.one = c(0.5, NA, 0.4, NA, NA, NA),
>> cond.two = c(NA, 0.6, NA, 0.9, NA, 0.2),
>> cond.three = c(NA, NA, NA, NA, 0.1, NA))
>>
>>
>> aggregate(my.df[-1], my.df['pathway'], sum, na.rm = TRUE)
>>
>> or
>>
>> sum. <- function(x) if (all(is.na(x))) NA else sum(x, na.rm = TRUE)
>> aggregate(my.df[-1], my.df['pathway'], sum.)
>>
>>
>> I hope it helps.
>>
>> Best,
>> Dimitris
>>
>>
>> On 5/7/2012 11:50 AM, Karl Brand wrote:
>>> Esteemed UseRs,
>>>
>>> This must be embarrassingly trivial to achieve with e.g., melt() and
>>> cast(): deduplicating records ("pw.X" in example) for a given set of
>>> responses ("cond.Y" in example).
>>>
>>> Hopefully the runnable example shows clearly what i have and what i'm
>>> trying to convert it to. But i'm just not getting it, ?cast that is! So
>>> i'd really appreciate some ones patience to clarify this, using the
>>> reshape package, or any other approach.
>>>
>>> With sincere thanks in advance,
>>>
>>> Karl
>>>
>>>
>>> ## Runnable example
>>> ## The data.frame i have:
>>> library("reshape")
>>> my.df <- data.frame(pathway = c(rep("pw.A", 2), rep("pw.B", 3),
>>> rep("pw.C", 1)),
>>> cond.one = c(0.5, NA, 0.4, NA, NA, NA),
>>> cond.two = c(NA, 0.6, NA, 0.9, NA, 0.2),
>>> cond.three = c(NA, NA, NA, NA, 0.1, NA))
>>> my.df
>>> ## The data fram i want:
>>> wanted.df <- data.frame(pathway = c("pw.A", "pw.B", "pw.C"),
>>> cond.one = c(0.5, 0.4, NA),
>>> cond.two = c(0.6, 0.9, 0.2),
>>> cond.three = c(NA, 0.1, NA))
>>> wanted.df
>>>
>>>
>>
>



More information about the R-help mailing list