[R] Large dataset operations

Phil Spector spector at stat.berkeley.edu
Fri Mar 11 21:24:03 CET 2011


To get the equivalent of what your loop does, you could use

lapply(data[,3:5],function(x)x/ave(x,data$plateNo,FUN=mean))

but you might find the output of

sapply(data[,3:5],function(x)x/ave(x,data$plateNo,FUN=mean))

to be more useful.

 					- Phil Spector
 					 Statistical Computing Facility
 					 Department of Statistics
 					 UC Berkeley
 					 spector at stat.berkeley.edu




On Fri, 11 Mar 2011, hi Berven wrote:

>
> Hello all,
>
> I'm new to R and trying to figure out how to perform calculations on a large dataset (300 000 datapoints). I have already made some code to do this but it is awfully slow. What I want to do is add a new column for each "rep_ " column where I have taken each value and divide it by the mean of all values where "PlateNo" is the same. My data is in the following format:
>
>> data
>
>
>
>
> PlateNo
>
> Well
>
> rep_1
>
> rep_2
>
> rep_3
>
>
> 1
>
> A01
>
> 1312
>
> 963
>
> 1172
>
>
> 1
>
> A02
>
> 10464
>
> 6715
>
> 5628
>
>
> 1
>
> A03
>
> 3301
>
> 3257
>
> 3281
>
>
> 1
>
> A04
>
> 3895
>
> 3350
>
> 3496
>
>
> 1
>
> A05
>
> 8731
>
> 7389
>
> 5701
>
>
> 2
>
> A01
>
> 7893
>
> 6748
>
> 5920
>
>
> 2
>
> A02
>
> 2912
>
> 2385
>
> 2586
>
>
> 2
>
> A03
>
> 985
>
> 785
>
> 809
>
>
> 2
>
> A04
>
> 1346
>
> 1018
>
> 1001
>
>
> 2
>
> A05
>
> 794
>
> 314
>
> 486
>
> To generate it copy:
> a <- c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2)
> b <- c("A01", "A02", "A03", "A04", "A05", "A01", "A02", "A03", "A04", "A05")
> c <- c(1312, 10464,  3301,  3895,  8731,  7893,  2912,   985,  1346,   794)
> d <- c(963, 6715, 3257, 3350, 7389, 6748, 2385, 785, 1018,  314)
> e <- c(1172, 5628, 3281, 3496, 5701, 5920, 2586,  809, 1001,  486)
> data <- data.frame(plateNo = a, Well = b, rep_1 = c, rep_2 = d, rep_3 = e)
>
> Here is the code I have come up with:
>
>                rows <- length(data$plateNo)
>                reps <- 3
>                norm <- list()
>                for (rep in 1:reps) {
>                                x <- paste("rep_",rep,sep="")
>                                normx <- paste("normalised_",rep,sep="")
>                                for (row in 1:rows) {
>                                                plateMean <- mean(data[[x]][data$plateNo == data$plateNo[row]])
>                                                wellData <- data[[x]][row]
>                                                norm[[normx]][row] <- wellData / plateMean
>                                }
>                }
>
>
> Any help or tips would be greatly appreciated!
> Thanks,
> Haakon
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list