[R] problem with duplicated function

Jeff Newmiller jdnewmil at dcn.davis.CA.us
Mon May 25 00:12:14 CEST 2015


You are going wrong in a few places: posting using HTML format, not using dput to share your data sample, and comparing floating point numbers for equality.

HTML email is stripped to plain text on this list so we don't see what you see. In addition, HTML formatting corrupts code, so we cannot even run it.

The dput function is highly recommended for making reproducible examples. [1]

FAQ 7.31 warns against expecting floating point numbers that appear the same when printed to actually be equal. This advice actually applies to all programming languages.

[1] http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example
---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
--------------------------------------------------------------------------- 
Sent from my phone. Please excuse my brevity.

On May 24, 2015 2:34:13 PM PDT, Curtis Burkhalter <curtisburkhalter at gmail.com> wrote:
>Hello everyone,
>
>I have two very large dataframes (~1 million rows x 5 columns), of
>which
>two of the columns are lat/long coordinates. The names of the
>dataframes
>are 'data07' and 'data 08'. Data08 has a few more sampling points than
>data
>07 so I want to subset data08 so that it has the same number of data
>points
>as data07 using the unique lat/long coordinates.
>
>Here are the associated data structures:
>
>*str(data07)*
>'data.frame':   969109 obs. of  5 variables:
>$ cell    : int  710228 715545 720690 720824 695611 700490 700626
>705371
>705507 710363 ...
> $ prN     : int  288 276 286 304 258 257 264 272 286 316 ...
>$ Location: Factor w/ 32 levels " ","Blacks_Fork",..: 24 24 24 24 24 24
>24
>24 24 24 ...
> $ Xcor    : num  -111 -111 -111 -111 -111 ...
> $ Ycor    : num  41.7 41.7 41.7 41.7 41.8 ...
>
>*str(data08)*
>'data.frame':   969810 obs. of  5 variables:
>$ cell    : int  705528 710321 710456 715677 720762 720896 699953
>700635
>700771 705664 ...
> $ prN     : int  293 281 299 278 276 266 282 255 287 280 ...
>$ Location: Factor w/ 31 levels "Blacks_Fork",..: 23 23 23 23 23 23 23
>23
>23 23 ...
> $ Xcor    : num  -111 -111 -111 -111 -111 ...
> $ Ycor    : num  41.8 41.7 41.7 41.7 41.7 ...
>
>I've tried using the following code to accomplish my problem:
>
>tt <- rbind(data07, data08)
>
>tt.dup <- duplicated(tt[,4:5]) # marks all duplicate rows in data08
>from
>last 2 cols                                            #that correspond
>to
>the lat/long
>
>tt.dup <- tt.dup[-seq_len(nrow(data07))] # remove all data07 entries
>(first
>n)
>
>test=ddata08[tt.dup, ] # index only TRUE/duplicated elements from
>data08
>
>When I run the code 'tt.dup' is FALSE for all entries, which I know
>isn't
>true.
>
>Here's a small subset of the data so that you can see exactly where
>there
>are duplicates
>
>data07[1:10,]
>                 cell prN Location     Xcor    Ycor
>710229 *710228 288     Sage -111.044 41.7403*
>715546 *715545 276     Sage -111.044 41.7245*
>720691 *720690 286     Sage -111.044 41.7131*
>720825 *720824 304     Sage -111.044 41.7109*
>695612 695611 258     Sage -111.043 41.7766
>700491 700490 257     Sage -111.043 41.7653
>700627 700626 264     Sage -111.043 41.7630
>705372 705371 272     Sage -111.043 41.7517
>705508 705507 286     Sage -111.043 41.7495
>710364 710363 316     Sage -111.043 41.7381
>
> data08[1:10,]
>                 cell prN Location     Xcor    Ycor
>705529 705528 293     Sage -111.044 41.7517
>710322 *710321 281     Sage -111.044 41.7403*
>710457 710456 299     Sage -111.044 41.7381
>715678 *715677 278     Sage -111.044 41.7245*
>720763 *720762 276     Sage -111.044 41.7131*
>720897 *720896 266     Sage -111.044 41.7109*
>699954 699953 282     Sage -111.043 41.7767
>700636 700635 255     Sage -111.043 41.7653
>700772 700771 287     Sage -111.043 41.7631
>705665 705664 280     Sage -111.043 41.7495
>
>
>If anyone has any suggestions as to where I might be going wrong I'd
>greatly appreciate it.
>
>Thank you



More information about the R-help mailing list