[R] summing up colum values for unique IDs when multiple ID's exist in data frame

Seth Falcon sfalcon at fhcrc.org
Tue May 29 23:47:38 CEST 2007


"Young Cho" <young.stat at gmail.com> writes:

> I have data.frame's with IDs and multiple columns. B/c some of IDs
> showed up more than once, I need sum up colum values to creat a new
> dataframe with unique ids.
>
> I hope there are some cheaper ways of doing it...  Because the
> dataframe is huge, it takes almost an hour to do the task.  Thanks
> so much in advance!

Does this do what you want in a faster way?

sum_dup <- function(df) {
    idIdx <- split(1:nrow(df), as.character(df$ID))
    whID <- match("ID", names(df))
    colNms <- names(df)[-whID]
    ans <- lapply(colNms, function(cn) {
        unlist(lapply(idIdx,
                      function(x) sum(df[[cn]][x])),
               use.names=FALSE)
    })
    attributes(ans) <- list(names=colNms,
                            row.names=names(idIdx),
                            class="data.frame")
    ans
}


-- 
Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center
http://bioconductor.org



More information about the R-help mailing list