[R] avoiding timconsuming for loop renaming identifiers

toby909 at gmail.com toby909 at gmail.com
Sat Jul 21 03:26:07 CEST 2007


Hi All

I was wondering if I can avoid a time-consuming for loop on my 600000 obs dataset.

school_id   y
8           9.87
8           8.89
8           7.89
8           8.88
20          6.78
20          9.99
20          8.79
31          10.1
31          11

There are, say, 143 different schools in this 600000 obs dataset.

I need to thave sequential identifiers, 1,2,3,4,5,...,143.

I was using an awkward for look that took 30 minutes to run.
sid = 1
dta$sid[1] = 1
for (i in 2:nrow(dta)) {
if (dta$school_id[i] != dta$school_[i-1]) sid = sid+1
dta$sid[i] = sid
}

Any hints appreciated.

Thanks Toby



More information about the R-help mailing list