[R] Group averages

Gabor Grothendieck ggrothendieck at gmail.com
Tue Jun 13 00:25:12 CEST 2006


Assuming that yr and conf are the two factors referred to in the
description, create a function f which calculates the ith row
of the output and use sapply like this:

attach(data)
f <- function(i) {
	hsgpa <- na.omit(hsgpa[-i][conf[-i] == conf[i] & yr[-i] == yr[i]])
	if (length(hsgpa)) c(mean = mean(hsgpa), var = var(hsgpa))
	else c(mean = NA, var = NA)
}
out <- t(sapply(1:nrow(data), f))

On 6/12/06, David Kling <klingd at reed.edu> wrote:
> Hello:
>
> I hope none of you will mind helping a newbie.  I'm a student research
> assistant working with a large data set in which observations are
> categorized according to two factors. I'm trying to calculate the group
> mean and variance of a variable (called 'hsgpa' in the example data
> presented below) to each observation  , excluding that observation.  For
> example, if there are 20 observations with the same value of the two
> factors, for each of the 20 I'd like to generate the mean and variance
> of the 'hsgpa' values of the other 19 group members.  This must be done
> for every observation in the data set.
>
> I've searched the R mail archives, read the manuals, and read
> documentation for tapply() andby() as well as summaryBy() in the 'doBy'
> package and with() from 'Hmisc.'  It may be that since I'm new to
> writing functions and R is the first language I've ever worked with I'm
> less able to come up with a solution than some other new R users.  None
> of the functions I have tried have been succesful, and it doesn't seem
> worth it to reproduce and explain my best effort.  I hope someone has
> some ideas!  Looking at what an experienced user would try should help
> me with my present task as well as future problems.
>
> Below I've included some lines that will generate a sample data set
> similar to the one I'm working with:
>
> #
> #Example data:
> #
> case <- sample(seq(1,10000,1),5000,replace=FALSE)
> hsgpa <- rbeta(5000,7,1.5)*4.25
> yr <- sample(seq(1993,2005,1),5000,replace=TRUE)
> conf <- sample(letters[1:5],5000,replace=TRUE)
> data <- data.frame(case=case,hsgpa=hsgpa,yr=yr,conf=conf)
> data$conf <- as.character(data$conf)
> s1 <- sample(seq(1,5000,1),500,replace=FALSE)
> k <- data$hsgpa
> k[row.names(data) %in% s1] <- NA
> data$hsgpa <- k
> s2 <- sample(seq(1,5000,1),100,replace=FALSE)
> k <- data$yr
> k[row.names(data) %in% s2] <- NA
> data$yr <- k
> k <- data$conf
> k[row.names(data) %in% s2] <- NA
> data$conf <- k
> remove(case,hsgpa,yr,conf,s1,s2,k)
> #
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>



More information about the R-help mailing list