[R] Check results between two data.frame

Sarah Goslee sarah.goslee at gmail.com
Wed Mar 21 17:57:50 CET 2012


On Wed, Mar 21, 2012 at 12:48 PM, HJ YAN <yhj204 at googlemail.com> wrote:
> Thanks a lot Sarah, for your nice example and code.
>
> I know '==' can do the work.  Just as a R beginer, sometimes really want to
> be more like a 'real programmer', if you know what I mean...

Being a "real programmer" means using one line of code instead of a
dozen, when that one line is more elegant and understandable.

Being a real programmer also means learning how to debug your code: a
good first step is to set x and y to actual matrices and try each step
in the console. (And don't give them default values that are strings
if your function expects them to be matrices.)

If you'd actually tried to run each line, you would quickly discover
that as.matrix() and matrix() do not do the same thing, and you need
the latter.

> NROW
[1] 5
> NCOL
[1] 4
> CHECK_XY <- as.matrix(NA, NROW, NCOL)
> CHECK_XY
     [,1]
[1,]   NA
>

> CHECK_XY <- matrix(NA, NROW, NCOL)
> CHECK_XY
     [,1] [,2] [,3] [,4]
[1,]   NA   NA   NA   NA
[2,]   NA   NA   NA   NA
[3,]   NA   NA   NA   NA
[4,]   NA   NA   NA   NA
[5,]   NA   NA   NA   NA
>


> If any R expert could help to check where my code went wrong here that would
> be very grateful!

Getting help from "any R expert" is far more likely if you copy your
messages to the R-help list as well as just to me. I've taken the
liberty of doing so with my reply.

Sarah


> Many thanks!
> HJ
> On Wed, Mar 21, 2012 at 4:13 PM, Sarah Goslee <sarah.goslee at gmail.com>
> wrote:
>>
>> As long as == is an appropriate test for your data, why not just use
>> R's innate ability to handle matrices/data frames?
>>
>> > x1 <- matrix(1:20, ncol=4)
>> > x2 <- ifelse(x1 > 18, 22, x1)
>> > x1
>>     [,1] [,2] [,3] [,4]
>> [1,]    1    6   11   16
>> [2,]    2    7   12   17
>> [3,]    3    8   13   18
>> [4,]    4    9   14   19
>> [5,]    5   10   15   20
>> > x2
>>     [,1] [,2] [,3] [,4]
>> [1,]    1    6   11   16
>> [2,]    2    7   12   17
>> [3,]    3    8   13   18
>> [4,]    4    9   14   22
>> [5,]    5   10   15   22
>> > x1 == x2
>>     [,1] [,2] [,3]  [,4]
>> [1,] TRUE TRUE TRUE  TRUE
>> [2,] TRUE TRUE TRUE  TRUE
>> [3,] TRUE TRUE TRUE  TRUE
>> [4,] TRUE TRUE TRUE FALSE
>> [5,] TRUE TRUE TRUE FALSE
>>
>>
>> On Wed, Mar 21, 2012 at 8:48 AM, HJ YAN <yhj204 at googlemail.com> wrote:
>> > Dear R-user,
>> >
>> > I'm trying to compare two sets of results and wanted to find out which
>> > element in the two data frame/matrix are different.
>> >
>> > I wrote the following function and it works ok, and gives me a long list
>> > of
>> > "good" as outcomes.
>> >
>> >
>> > CHECK<-
>> > function (x = "file1", y = "file2")
>> > {
>> >    for (i in 1:nrow(x)) {
>> >        for (j in 1:ncol(x)) {
>> >            if (x[i, j] == y[i, j]) {
>> >                print("good")
>> >            }
>> >            else {
>> >                print("check")
>> >            }
>> >        }
>> >    }
>> > }
>> >
>> >
>> > However, as the two datasets I was comparing are large (400*100
>> > roughly),
>> > so I would like to create a matrix to identify which ones are not same
>> > in
>> > the two dataframes.
>> >
>> > So I added 'CHECK_XY' in my code but  when I run it, I got 'Error in
>> > CHECK_XY[i, j] = c("good") : subscript out of bounds'.
>> >
>> > Could anyone help please??
>> >
>> > CHECK_1<-
>> > function (x = "file1", y = "file2")
>> > {
>> >    NROW <- nrow(x)
>> >    NCOL <- ncol(x)
>> >    CHECK_XY <- as.matrix(NA, NROW, NCOL)
>> >    for (i in 1:nrow(x)) {
>> >        for (j in 1:ncol(x)) {
>> >            if (x[i, j] == y[i, j]) {
>> >                CHECK_XY[i, j] = c("good")
>> >            }
>> >            else {
>> >                CHECK_XY[i, j] = c("check")
>> >            }
>> >        }
>> >    }
>> >    print(CHECK_XY)
>> > }
>> >
>> > Thanks!
>> > HJ
>>

-- 
Sarah Goslee
http://www.functionaldiversity.org



More information about the R-help mailing list