[R] subsetting large data frames.

hesicaia dboyce at dal.ca
Sun Dec 7 18:16:38 CET 2008


Hi all,
  I have a question regarding subsetting of large data frames. I have two
data frames “catches” and “tows” and they both have the same 30 variables
(columns). I would like to select rows in the data frame “tows” where all 5
specific variables are NOT matched in “catches. That is to say, the
combination of these 5 variables is unique. One or more of the variables
could be the same but the combination would be unique. This is confusing to
explain so here is a short example to explain what I am trying to explain:

Example data catches:

Row	Cruise	Order	Townumber	Towtype	Ship	Netlocation	Var1	Var2
1	 22    	1	               4	              A	   B	        S      	X1	X2
2	 22	        1	               4	              A	   B 	        S      	X1	X2
3	 22	        1	               4	              BL	   AM	S      	X1	X2
4	 22	        1	               4 	              BL	   AM	S      	X1	X2
5	 260	        1	               4	              BL	    B  	S      	X1	X2
6	 260	        1	               4	              BL     	    B  	S      	X1
X2
 
Example data tows:

Row	Cruise	Order	Townumber	Towtype	Ship	Netlocation	Var1	Var2
1	22     	1               	4       	A      	B      	S      	X1	X2
2	400    	1               	4       	BL	        AM    	S      	X1	X2
3	260    	1               	4       	BL     	B      	S      	X1	X2
4	260   	10             	10     	BL     	B      	S      	X1	X2
5	22     	99             	4       	BL     	B      	S      	X1	X2

I would want to select rows 2, 4, and 5 from “tows” due to the fact that the
same collection of “cruise”, ”order”, ”townumber”, ”towtype”, ”ship”, and
”netlocation” are not found in “catches”. All rows in data set “tows” are
unique. Clear as mud? Sorry I couldn’t provide real data, but these datasets
are quite large. 

So far I have tried:

New<-tows[(tows$cruise != catches$cruise) & (tows$order != catches$order) &
(tows$townumber !=  catches$townumber) & (tows$towtype != catches$towtype) &
(tows$ship != catches$ship) & (tows$netlocation != catches$netlocation),]
 
But this didn’t work. 
Thanks for your time and help (in advance).
Dan.


-- 
View this message in context: http://www.nabble.com/subsetting-large-data-frames.-tp20883217p20883217.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list