[R] Looping through values in a data frame that are >zero

Dimitri Liakhovitski dimitri.liakhovitski at gmail.com
Sat May 21 15:12:48 CEST 2011


Hello!

I've tried for a while - but can't figure it out. I have data frame x:

y=c("a","b","c","d","e")
z=c("m","n","o","p","r")
a=c(0,0,1,0,0)
b=c(2,0,0,0,0)
c=c(0,0,0,4,0)
x<-data.frame(y,z,a,b,c,stringsAsFactors=F)
str(x)
Some of the values in columns a,b, and c are >0:

I need to write a loop through all the cells in columns a,b,c that are
>0 (only through them).
For each of those cells, I need to know:
1. Name of the column it is in
2 The entry of column y that is in the same row
3 The entry of column z that is in the same row
It'd be good to save this info in a data frame somehow - so that I
could loop through rows of this data frame.


To explain what I need it for eventually: I have a different data
frame "large.df" that has the same columns (variables) - but with many
more entries than "x". Something like:
large.df<-expand.grid(y,z)
names(large.df)<-c("y","z")
set.seed(123)
large.df$a<-sample(0:5,75,replace=T)
set.seed(234)
large.df$b<-sample(0:5,75,replace=T)
set.seed(345)
large.df$c<-sample(0:5,75,replace=T)
large.df$y<-as.character(large.df$y)
large.df$z<-as.character(large.df$z)
large.df<-large.df[order(large.df$y,large.df$z),]
row.names(large.df)<-1:nrow(large.df)
(large.df);str(large.df)

1. Find the first cell in x that is > 0 (in this case - it's x[3,"a"].
2. Find all the corresponding cells in the large.df - in this case, it's:
large.df[large.df$y %in% "c" & large.df$z %in% "o","a"]
and those 3 values can be found in rows 37:39 of large.df, in column "a".
3. Take those 3 values and add to them the corresponding value in x
(in this case = 1) divided by their length (in this case = 3).
4. Do the same for the other cells in x that are >0.

The final result will be (sorry for lengthy code):

large.df[large.df$y %in% "c" & large.df$z %in%
"o","a"]<-large.df[large.df$y %in% "c" & large.df$z %in%
"o","a"]+x[3,"a"]/3
large.df[large.df$y %in% "a" & large.df$z %in%
"m","b"]<-large.df[large.df$y %in% "a" & large.df$z %in%
"m","b"]+x[1,"b"]/3
large.df[large.df$y %in% "d" & large.df$z %in%
"p","c"]<-large.df[large.df$y %in% "d" & large.df$z %in%
"p","c"]+x[4,"c"]/3
(large.df)

(It just happens that at the end I divide by 3 - it could be anything
that is length(large.df[large.df$y %in% "c" & large.df$z %in%
"o","a"]), etc.


Thanks a lot for your suggestions!


-- 
Dimitri Liakhovitski
Ninah Consulting
www.ninah.com



More information about the R-help mailing list