[R] within-groups variance and between-groups variance

Daniel Malter daniel at umd.edu
Thu Aug 25 22:02:20 CEST 2011


#Here is I think an easier way of coding all the components you need.

#The within-group variances, you get with this function:

apply(iris[,1:4],2,function(x) tapply(x,iris$Species,var))

#You can get what you computed by taking the column means. 

apply(apply(iris[,1:4],2,function(x) tapply(x,iris$Species,var)),2,mean)

#HOWEVER, note that this takes unweighted column means. If your groups
contain an UNequal number of observations, you do not want to take the
UNweighted means.

#Here is how you get the group and grand means:

apply(iris[,1:4],2,function(x) tapply(x,iris$Species,mean)) #group means

apply(iris[,1:4],2,mean) #grand means


#And you also need the number of observations in each bin, I guess:

apply(iris[,1:4],2,function(x) tapply(x,iris$Species,length))

#I don't want to ruin all the fun for you.

HTH,
Daniel






Coghlan, Avril wrote:
> 
> Hello,
> 
> I have been looking for functions for calculating the within-groups
> variance and between-groups variance, for the case where you have
> several numerical variables describing samples from a number of groups.
> 
> I didn't find such functions in R, so wrote my own versions myself (see
> below). I can calculate the within- and between-groups variance for the
> Sepal.length variable (iris[1]) in the "iris" data set, by typing:
>> calcWithinGroupsVariance(iris[1],iris[5])
> [1] 0.2650082
>> calcBetweenGroupsVariance(iris[1],iris[5])
> [1] 0.4300145
> 
> I am wondering however if there are functions for doing this already in
> R?
> I would prefer to use a standard R function if one exists. 
> 
> Kind Regards,
> Avril
> 
> 
> Within-Groups Variance:
> =======================
> 
> calcWithinGroupsVariance <- function(variable,groupvariable) 
>       {
>          # find out how many values the group variable can take
>          groupvariable2 <- as.factor(groupvariable[[1]])
>          levels <- levels(groupvariable2)
>          numlevels <- length(levels)
>          # get the mean and standard deviation for each group:
>          numtotal <- 0
>          denomtotal <- 0
>          for (i in 1:numlevels)
>          {
>             leveli <- levels[i]
>             levelidata <- variable[groupvariable==leveli,]
>             levelilength <- length(levelidata)
>             # get the mean and standard deviation for group i:
>             meani <- mean(levelidata)
>             sdi <- sd(levelidata)
>             numi <- (levelilength - 1)*(sdi * sdi)
>             denomi <- levelilength
>             numtotal <- numtotal + numi
>             denomtotal <- denomtotal + denomi 
>          } 
>          # calculate the within-groups variance
>          Vw <- numtotal / (denomtotal - numlevels) 
>          return(Vw)
>       } 
> 
> Between-Groups-Variance:
> ========================
> 
> calcBetweenGroupsVariance <- function(variable,groupvariable) 
>       {
>          # find out how many values the group variable can take
>          groupvariable2 <- as.factor(groupvariable[[1]])
>          levels <- levels(groupvariable2)
>          numlevels <- length(levels)
>          # calculate the overall grand mean: 
>          grandmean <- mean(variable) 
>          # get the mean and standard deviation for each group:
>          numtotal <- 0
>          denomtotal <- 0
>          for (i in 1:numlevels)
>          {
>             leveli <- levels[i]
>             levelidata <- variable[groupvariable==leveli,]
>             levelilength <- length(levelidata)
>             # get the mean and standard deviation for group i:
>             meani <- mean(levelidata)
>             sdi <- sd(levelidata)
>             numi <- levelilength * ((meani - grandmean)^2)
>             denomi <- levelilength
>             numtotal <- numtotal + numi
>             denomtotal <- denomtotal + denomi 
>          } 
>          # calculate the between-groups variance
>          Vb <- numtotal / (denomtotal - numlevels) 
>          Vb <- Vb[[1]]
>          return(Vb)
>       }
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

--
View this message in context: http://r.789695.n4.nabble.com/within-groups-variance-and-between-groups-variance-tp3769027p3769248.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list