[R] finding both rows that are duplicated in a data frame

arun smartpink111 at yahoo.com
Sat Sep 7 16:52:00 CEST 2013


Hi,
example<- data.frame(id1,id2,GENDER,ETH,stringsAsFactors=FALSE)

res<-unique(example[!(grepl("UNK",example$GENDER)|grepl("UNK",example$ETH)),]) 
 res
#   id1 id2 GENDER  ETH
#1    1  22    G-M E-VT
#3    2  34    G-M E-AF
#5    3  15    G-M E-AF
#7    4  76    G-F E-VT
#8    5  45    G-F E-VT
#12   7  37    G-F E-AF
#13   8  52    G-F E-AF
#14   9  66    G-F E-AF
#16  10  91    G-F E-VT


It is a bit unclear about the condition for id1 #6.  If I include both of them, the nrows will be 11, now it is 9.

10   6  84  G-UNK  E-AF
11   6  84    G-F E-UNK


A.K.



----- Original Message -----
From: Robert Lynch <robert.b.lynch at gmail.com>
To: R help <r-help at r-project.org>
Cc: 
Sent: Saturday, September 7, 2013 3:02 AM
Subject: [R] finding both rows that are duplicated in a data frame

I have a data frame that looks like

id1<-c(1,1,2,2,3,3,4,5,5,6,6,7,8,9,9,10)
id2<-c(22,22,34,34,15,15,76,45,45,84,84,37,52,66,66,91)
GENDER<-sample(c("G-UNK","G-M","G-F"),16, replace = TRUE)
ETH <-sample(c("E-AF","E-UNK","E-VT"),16, replace = TRUE)
example<-cbind(id1,id2,GENDER,ETH)

where there are two id's and some duplicate entries for ID's that have
different GENDER or ETH(nicity)
I would like to get a data frame that doesn't have the duplicates, but the
ones that are kept are which ever GENDER is not G-UNK (unknown) and the
kept ETH is what ever is not E-UNK

the resultant data frame should have 10 rows with no *-UNK in either of the
last two columns ( unless both entries were UNK)

yes the example data may have some impossible results but it does capture
important aspects.
1) G-UNK is alphabetically last of G-F, G-M & G-UNK
2) E-UNK is in the middle alphabetically
3) some times the first entry is the unknown gender, some times it is the
second *likely to happen with random sample
4) some times both entries for one variable, GENDER or ETH are unknown.
5) only appears to be two of each row, * not 100% sure

Thanks!
Robert

    [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list