[R] how to deduplicate records, e.g. using melt() and cast()

Dimitris Rizopoulos d.rizopoulos at erasmusmc.nl
Mon May 7 12:09:33 CEST 2012


you could try aggregate(), e.g.,

my.df <- data.frame(pathway = c(rep("pw.A", 2), rep("pw.B", 3), 
rep("pw.C", 1)),
                    cond.one = c(0.5, NA, 0.4, NA, NA, NA),
                    cond.two = c(NA, 0.6, NA, 0.9, NA, 0.2),
                    cond.three = c(NA, NA, NA, NA, 0.1, NA))


aggregate(my.df[-1], my.df['pathway'], sum, na.rm = TRUE)

or

sum. <- function(x) if (all(is.na(x))) NA else sum(x, na.rm = TRUE)
aggregate(my.df[-1], my.df['pathway'], sum.)


I hope it helps.

Best,
Dimitris


On 5/7/2012 11:50 AM, Karl Brand wrote:
> Esteemed UseRs,
>
> This must be embarrassingly trivial to achieve with e.g., melt() and
> cast(): deduplicating records ("pw.X" in example) for a given set of
> responses ("cond.Y" in example).
>
> Hopefully the runnable example shows clearly what i have and what i'm
> trying to convert it to. But i'm just not getting it, ?cast that is! So
> i'd really appreciate some ones patience to clarify this, using the
> reshape package, or any other approach.
>
> With sincere thanks in advance,
>
> Karl
>
>
> ## Runnable example
> ## The data.frame i have:
> library("reshape")
> my.df <- data.frame(pathway = c(rep("pw.A", 2), rep("pw.B", 3),
> rep("pw.C", 1)),
> cond.one = c(0.5, NA, 0.4, NA, NA, NA),
> cond.two = c(NA, 0.6, NA, 0.9, NA, 0.2),
> cond.three = c(NA, NA, NA, NA, 0.1, NA))
> my.df
> ## The data fram i want:
> wanted.df <- data.frame(pathway = c("pw.A", "pw.B", "pw.C"),
> cond.one = c(0.5, 0.4, NA),
> cond.two = c(0.6, 0.9, 0.2),
> cond.three = c(NA, 0.1, NA))
> wanted.df
>
>

-- 
Dimitris Rizopoulos
Assistant Professor
Department of Biostatistics
Erasmus University Medical Center

Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
Tel: +31/(0)10/7043478
Fax: +31/(0)10/7043014
Web: http://www.erasmusmc.nl/biostatistiek/



More information about the R-help mailing list