[R] Difficulties with dataframe filter using elements from an array created using a for loop or seq()

Todd A. Johnson tjohnson at src.riken.jp
Tue Feb 20 10:48:17 CET 2007


Hi All-
 
This seems like such a pathetic problem to be posting about, but I have no
idea why this testcase does not work.  I have tried this using R 2.4.1,
2.4.0, 2.3.0, and 2.0.0 on several different computers (Mac OS 10.4.8,
Windows XP, Linux).  Below the signature, you will find my test case R code.
 
My point in this folly is to take a dataframe of 300,000 rows, create a
filter based on two of the rows, and count the number of rows in the
filtered and unfiltered dataframe.  One column in the filter only has the
numbers 0.05, 0.15, 0.25, 0.35, 0.45, 0.55, 0.65, 0.75, 0.85, 0.95, so I
thought that I could just iterate in a for loop and get the job done. Just
the simple single column filter case is presented here. Obviously, there are
only ten numbers, so the "manual" method is easy, but I would like to have a
more flexible program. (Plus it worries me if the simple things don't do
what I expect... :-) )

>From the output, you can see that the loop using the "handmadevector" that
creates a filter and counts the elements, correctly finds one match for each
element in the vector, but the seq() and for loop produced vectors each give
a mixture of true and false matches.

Can anyone tell me why the "loopvector" and "seqvector" do not provide the
same output as the "handmadevector".
 

Thank you for your assistance!

Todd

-- 
Todd A. Johnson
Research Associate, Laboratory for Medical Informatics
SNP Research Center,RIKEN
1-7-22Suehiro,Tsurumi-ku,Yokohama
Kanagawa 230-0045,Japan

Cellphone: 090-5309-5867

E-mail: tjohnson at src.riken.jp



Here's the testcase, with the sample code between the lines and the output
following:
 
_____________________________________________________________________
## Set up three different vectors, each with the numbers 0.05, 0.15, 0.25,
0.35, 0.45, 0.55, 0.65, 0.75, 0.85, 0.95
## each of which is used to select records from a dataframe based on
equality to a particular column
## The first vector is created by using a for loop
loopvector <- c()
for (i in 0:9){
loopvector <- c(loopvector, (i*0.10)+0.05);
}
## The second vector is made "by hand"
handmadevector <- c(0.05, 0.15, 0.25, 0.35, 0.45, 0.55, 0.65, 0.75, 0.85,
0.95)
## The third vector is made using seq()
seqvector <- seq(0.05, 0.95, 0.10)
## Are the vectors the same?
all.equal(loopvector, handmadevector)
all.equal(loopvector, seqvector)
print(handmadevector)
print(loopvector)
print(seqvector)
## As a simple testcase, I create a dataframe with two variables, a varA of
dummy data, and bBins
## which is the column on which I was trying to filter.
a <- c(0,1,2,0,1,3,4,5,3,5)
b <- c(0.05,0.15,0.25,0.35,0.45,0.55,0.65,0.75,0.85,0.95)
testdf <- data.frame(varA = a, bBins = b)
attach(testdf)
## Loop through each of the vectors, create a filter on the dataframe based
on equality with the current iteration,
## and print that number and the count of records in the dataframe that
match that number.
for (i in loopvector){
aqs_filt <- bBins==i;
print(i);
print(length(testdf$varA[aqs_filt]));
}
for (i in handmadevector){
aqs_filt <- bBins==i;
print(i);
print(length(testdf$varA[aqs_filt]));
}
for (i in seqvector){
aqs_filt <- bBins==i;
print(i);
print(length(testdf$varA[aqs_filt]));
}
 
_____________________________________________________________________
 
Here's the output from R 2.4.1 running on an Apple 12" Powerbook.
 
> ## Set up three different vectors, each with the numbers 0.05, 0.15, 0.25,
0.35, 0.45, 0.55, 0.65, 0.75, 0.85, 0.95
> ## each of which is used to select records from a dataframe based on equality
to a particular column
> ## The first vector is created by using a for loop
> loopvector <- c()
> for (i in 0:9){
+ loopvector <- c(loopvector, (i*0.10)+0.05);
+ }
> ## The second vector is made "by hand"
> handmadevector <- c(0.05, 0.15, 0.25, 0.35, 0.45, 0.55, 0.65, 0.75, 0.85,
0.95)
> ## The thirs vector is made using seq()
> seqvector <- seq(0.05, 0.95, 0.10)
> ## Are the vectors the same?
> all.equal(loopvector, handmadevector)
[1] TRUE
> all.equal(loopvector, seqvector)
[1] TRUE
> 
> print(handmadevector)
 [1] 0.05 0.15 0.25 0.35 0.45 0.55 0.65 0.75 0.85 0.95
> print(loopvector)
 [1] 0.05 0.15 0.25 0.35 0.45 0.55 0.65 0.75 0.85 0.95
> print(seqvector)
 [1] 0.05 0.15 0.25 0.35 0.45 0.55 0.65 0.75 0.85 0.95
> ## As a simple testcase, I create a dataframe with two variables, a varA of
dummy data, and bBins
> ## which is the column on which I was trying to filter.
> a <- c(0,1,2,0,1,3,4,5,3,5)
> b <- c(0.05,0.15,0.25,0.35,0.45,0.55,0.65,0.75,0.85,0.95)
> testdf <- data.frame(varA = a, bBins = b)
> attach(testdf)
> ## Loop through each of the vectors, create a filter on the dataframe based on
equality with the current iteration,
> ## and print that number and the count of records in the dataframe that match
that number.
> for (i in loopvector){
+ aqs_filt <- bBins==i;
+ print(i);
+ print(length(testdf$varA[aqs_filt]));
+ }
[1] 0.05
[1] 1
[1] 0.15
[1] 0
[1] 0.25
[1] 1
[1] 0.35
[1] 0
[1] 0.45
[1] 1
[1] 0.55
[1] 1
[1] 0.65
[1] 0
[1] 0.75
[1] 0
[1] 0.85
[1] 0
[1] 0.95
[1] 0
> for (i in handmadevector){
+ aqs_filt <- bBins==i;
+ print(i);
+ print(length(testdf$varA[aqs_filt]));
+ }
[1] 0.05
[1] 1
[1] 0.15
[1] 1
[1] 0.25
[1] 1
[1] 0.35
[1] 1
[1] 0.45
[1] 1
[1] 0.55
[1] 1
[1] 0.65
[1] 1
[1] 0.75
[1] 1
[1] 0.85
[1] 1
[1] 0.95
[1] 1
> for (i in seqvector){
+ aqs_filt <- bBins==i;
+ print(i);
+ print(length(testdf$varA[aqs_filt]));
+ }
[1] 0.05
[1] 1
[1] 0.15
[1] 0
[1] 0.25
[1] 1
[1] 0.35
[1] 0
[1] 0.45
[1] 1
[1] 0.55
[1] 1
[1] 0.65
[1] 0
[1] 0.75
[1] 0
[1] 0.85
[1] 0
[1] 0.95
[1] 0
>



More information about the R-help mailing list