[R] filter data set unique, duplicate..

Sundar Dorai-Raj sundar.dorai-raj at pdf.com
Wed Aug 3 20:58:24 CEST 2005


Hi, Anders/Dimitris,

Dimitris Rizopoulos wrote:
> maybe you could consider something like this:
> 
> dat <- data.frame(x = c(1, 2, 2, 3, 3, 4),
>                   y1 = c(1, 1, 2, 1, 7, 8),
>                   y2 = c(NA, NA, NA, 5, 5, 4),
>                   y3 = c(3, 11, NA, 16, 2, 1))
> #############
> out <- as.data.frame(lapply(dat[-1], function(y, x) tapply(y, x, max, 
> na.rm = TRUE), x = dat["x"]))
> out[out == -Inf] <- NA
> out$x <- unique(dat["x"])

Beware this line. If "x" is not sorted as it is in "dat" then your rows 
will be misaligned.

Here's another solution using "by" though it's no more efficient than 
what Dimitris has given.

out <- by(dat[-1], dat[1], function(y) {
   max.na <- function(x)
     if(all(is.na(x))) NA else max(x, na.rm = TRUE)
   apply(y, 2, max.na)
})
out <- as.data.frame(do.call("rbind", out))
out <- cbind(x = as.numeric(row.names(out)), out)
out

HTH,

--sundar

> out
> 
> 
> I hope it helps.
> 
> Best,
> Dimitris
> 
> ----
> Dimitris Rizopoulos
> Ph.D. Student
> Biostatistical Centre
> School of Public Health
> Catholic University of Leuven
> 
> Address: Kapucijnenvoer 35, Leuven, Belgium
> Tel: +32/16/336899
> Fax: +32/16/337015
> Web: http://www.med.kuleuven.be/biostat/
>      http://www.student.kuleuven.be/~m0390867/dimitris.htm
> 
> 
> ----- Original Message ----- 
> From: "Anders Bjørgesæter" <anders.bjorgesater at bio.uio.no>
> To: <r-help at stat.math.ethz.ch>
> Sent: Wednesday, August 03, 2005 10:40 AM
> Subject: [R] filter data set unique, duplicate..
> 
> 
> 
>>Hello
>>
>>First, thanks for the help for an earlier question about error 
>>handling!
>>
>>I have problem filtering a dataset.
>>I'm trying to filter the data in the y columns based on the values 
>>in the x
>>column, e.g.:
>>
>>x          y1        y2                    yn
>>1.0       1          NA                  3
>>2.0       1          NA                  11
>>2.0       2          NA                  NA
>>3.0       1          5                      16
>>3.0       7          5                      2
>>4.0       8          4                      1
>>
>>and want to keep the highest y if x is identical, like this:
>>
>>x          y1        y2                    yn
>>1.0       1          NA                  3
>>2.0       2          NA                  11
>>3.0       7          5                      16
>>4.0       8          4                      1
>>
>>or just as good:
>>
>>x          y1        y2                    yn
>>1.0    1          NA                  3
>>2.0       NA*    NA                  NA
>>2.0       2          NA                  11
>>3.0       NA*    5                      16
>>3.0       7          NA*                NA*
>>4.0       8          4                      1
>>
>>If any has any suggestions or pointers how to do this I would really
>>appreciate it.
>>
>>/Anders
>>
>>______________________________________________
>>R-help at stat.math.ethz.ch mailing list
>>https://stat.ethz.ch/mailman/listinfo/r-help
>>PLEASE do read the posting guide! 
>>http://www.R-project.org/posting-guide.html
>>
> 
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html




More information about the R-help mailing list