[R] different way for a for loop for several columns?

ilai keren at math.montana.edu
Tue Feb 14 18:50:23 CET 2012


Inline

On Tue, Feb 14, 2012 at 3:16 AM, Nerak T <nerak.t at hotmail.com> wrote:
> Dear Ilai,
>
>
>
> Thanks for your answer. I'm indeed kind of a beginner in R, starting
> to discover the endless possibilities in R. My goal for the moment is indeed
> to get rid of the use of loops and to see through the apply family (which
> seems to be an endless maze for the moment). (For some reason, the apply
> function doesn’t seems to be logical for my brains which prefer to think in
> a loop way)
>

?apply ?lapply ?tapply, etc. are just wrappers for building more
efficient loops. If you "think in loops" (which you shouldn't) you are
also thinking in "apply". The reason it may seem like an endless maze
is because you use different wrappers for looping over different
object classes and indices, but at the end, the call to apply() is
similar to calling for().
e.g. consider a matrix with dimensions n x p. To sum rows you could
for(i in 1:n) sum(matrix[i,])
But better apply(matrix, 1 , sum)  # where the 1 denotes the 1st
dimension (rows)
Same thing for  sum columns
for(j in 1:p) sum(matrix[,j])
But better apply(matrix, 2 , sum)  # where the 2 denotes the 2nd
dimension (columns)

The same "stuff" that goes in the loop can go to apply with small
syntax changes. e.g.

ll<- list(1:5,1:10,letters)
out<- list()
for(L in 1:3){
# ...
# ... a bunch of complicated functions/calculations
# ...
out[[ L ]] <- length( ll[[ L ]] )
}
out

Can be replaced with

lapply( ll, function(L) {
# ...
# ... a bunch of complicated function/calculations
# ...
length(L)
} )

This time use lapply since you are looping over L elements of the list ll.
>
>
> Your answer is really helpful.
> Something I found really interesting is your general comment. You say that I
> don’t need to declare variables? The reason I started to do this is because
> if I don’t, I get a message that the object is not found.

???

xx<- c(0,10,100)   # declare xx
print(xx)
rnorm(3,xx)          # use it
rm(xx)                 # remove xx
rnorm(3, xx<- c(0,10,100))   define and use it "at the same time"
print(xx)

> If I create a data
> frame before the calculation with the right amount of rows, it seems to
> work. But if there is a way not to have to make them before, would be great?

Only in loops. What happens is you need to create a "storage" for the
result of your loop, since the objects created in the loop are
overwritten at each step:
for(i in 1:10) cat( i, '\n' )
i     # i =  10 everything before was overwritten

> Because most of the time, I don’t need that column that I created (but
> couldn't create an empty data frame with right dimensions to solve the
> problem)…

See my lapply example for creating "empty" storage.

>
>> Last, in R you want to avoid loops as much as possible especially for
>> large data sets. Operations are performed on objects so 1:4 + 1 is
>> equivalent to for(i in 1:4) i+1

> This part I don’t understand… where do you put that “ 1:4 + 1 ” ?
>

You don't put it anywhere, it was in answer to your comment: " but
it’s not that I have created a function that has to be applied on a
whole column, calculations are done for the different rows…"

So, no! calculations are done on the object (which has some dimension
or is a list), only in rare cases do you need to loop over each
element (or dimension) of the object itself.

>
> Many thanks, I’m trying to learn as much as possible to be able to use R
> more efficient so I really appreciate your help.

Pleasure. Good luck !

>
>
>
> Kind regards,
>
> Nerak
>
>
>
>
>
>
>
>
>
>> Date: Mon, 13 Feb 2012 23:43:29 -0700
>> Subject: Re: [R] different way for a for loop for several columns?
>> From: keren at math.montana.edu
>> To: nerak.t at hotmail.com
>
>>
>> Nerak,
>> Your example could have been done without a loop at all (at least this
>> calculation), or as you already know by calling one of the apply
>> family functions which are more efficient (but are still "loops"):
>>
>> test<- data.frame(
>>
>> Date=c(1980,1980,1980,1980,1981,1981,1981,1981,1982,1982,1982,1982,1983,1983,1983,1983),
>> C = c(0,0,0,0,5,2,0,0,0,15,12,10,6,0,0,0),
>> B = c(0,0,0,0,9,6,2,0,0,24,20,16,2,0,0,0),
>> F = c(0,0,0,0,6,5,1,0,0,18,16,12,10,5,1,0)
>> )
>> test.2 <- test[,-1] > 1
>> aggregate(test.2, list(test$Date), sum)
>>
>> # See ?aggregate for more details. it also has a time series method
>> which may be useful for you.
>>
>> A general comment. if you are or will be using R a bit more, it may
>> benefit you to study the manuals or find a good basic tutorial. You
>> seem to be applying the conventions of some other programming language
>> and that's slowing you down. e.g. you don't need to declare variables,
>> so all this stuff before your loop is unnecessary:
>> > Year<-data.frame(Date)
>> > test.1<-data.frame(c(1980:1983))
>> > test.4<-data.frame(c(1:4))
>> Also 1:4 is equivalent to data.frame(c(1:4)) without the extra attributes.
>>
>> Last, in R you want to avoid loops as much as possible especially for
>> large data sets. Operations are performed on objects so 1:4 + 1 is
>> equivalent to for(i in 1:4) i+1
>> Bottom line, the inner loops, calls to which, all that stuff...
>>
>> Hope that helps.
>>
>> >        test.3<-test.2[which(Year$Date== y)]
>> >        test.4$length[y-Year[1,]+1]<-length(which(test.3>0))
>> >        }
>> > test.1<-cbind(test.1, test.4$length)
>> > }
>> > names(test.1)<-c("year","C","B","F")
>> >
>> > test.1
>> >
>> >
>> > You can see that it will take a lot of time for more objects and years.
>> > A
>> > problem is that the for (y in 1980:1983) { } takes a lot of time because
>> > “
>> > [which(Year$Date== y)] ” is used several times and it takes a lot of
>> > time to
>> > search through all the rows. And then, all of this has to be repeated
>> > several times for the different objects.
>> > But actually, it are totally the same calculations that have to be made
>> > for
>> > all the objects. Only the input data are different. (calculations are
>> > made
>> > with the values of a corresponding columns of data frame test). I
>> > thought it
>> > could be faster to calculate each step of the inner loop (for (y in
>> > 1980:1983) at the same time for each object . So for example: now,
>> > test.2<-ifelse(test[,l]>1,1,0) is first calculated for year 1980 for
>> > object
>> > 1, than for year 1981 for object 1 and so on for all the years, this is
>> > all
>> > repeated for the different object. I’m looking for a way to calculate
>> > test.2<-ifelse(test[,l]>1,1,0) first for all the objects for year 1980,
>> > ten
>> > for all the objects for year 1981 and so on.
>> >
>> > Does somebody knows a way to do this? I was thinking about some kind of
>> > form
>> > of apply, but it’s not that I have created a function that has to be
>> > applied
>> > on a whole column, calculations are done for the different rows…
>> >
>> > Many thanks for your help!
>> > Kind regards,
>> > Nerak
>> >
>> >
>> > --
>> > View this message in context:
>> > http://r.789695.n4.nabble.com/different-way-for-a-for-loop-for-several-columns-tp4385705p4385705.html
>> > Sent from the R help mailing list archive at Nabble.com.
>> >
>> > ______________________________________________
>> > R-help at r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list