[R] identify duplicate from more than one column

Joshua Wiley jwiley.psych at gmail.com
Sun Nov 13 21:32:51 CET 2011


Hi Carlos,

Am I Jim? (I ask because there are at least two quite active Jim's on
this list and one could have conceivably replied to you offlist).

Regarding your error, it is rather difficult to tell without knowing
exactly what your data is like and what you did.  For _just_ the unit,
home, and sex variables that we are working with, could you post the
output of str() and summary() ?  Something like:

str(dat[c("unit", "home", "sex")])
summary(dat[c("unit", "home", "sex")])

where you replace 'dat' with your data frame name and the variable
with the variable names.  Also, please post the exact code you used
leading up to the error.  I am not certain whether you used mine,
David's, or some mix...as near as I can tell, neither David or I used
the 'coupleid' variable name, so you at least changed names.

Best Regards,

Josh

On Sun, Nov 13, 2011 at 10:37 AM, jour4life <jour4life at gmail.com> wrote:
> Thanks Jim and David!
>
> It seems like both were great options. Both of your suggestions of pasting
> both IDs together worked well, keeping the pasting as a character is better.
> Though, Jim's example was interesting, it gave me the following error:
>
> Error in `$<-.data.frame`(`*tmp*`, "coupleid", value = c(1L, 1L, 2L, 2L,  :
>  replacement has 123586 rows, data has 123631
>
> Since this was a large dataframe, I don't know exactly where the error
> occurred. But, it seems like it was detecting missing values in some of the
> rows and after checking using the is.na() function, it didn't say that there
> were any missing values used (i.e. the new mID or sex).
>
> What do you guys think may be happening?
>
> Thanks,
>
> Carlos
>
> --
> View this message in context: http://r.789695.n4.nabble.com/identify-duplicate-from-more-than-one-column-tp4035888p4037177.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Analyst II, ATS Statistical Consulting Group
University of California, Los Angeles
https://joshuawiley.com/



More information about the R-help mailing list