[R] "tapply versus by" in function with more than 1 arguments

Gabor Grothendieck ggrothendieck at gmail.com
Wed Oct 1 18:06:32 CEST 2008


The first tapply in your question subsets V1 but not V2 so they are
of different length.  To subset both tapply over the row names and
perform the subsetting in the function:

tapply(rownames(dataf), dataf$class, function(r) cor(dataf[r, "V1"],
dataf[r, "V2"]))

or

tapply(rownames(dataf), dataf$class, function(r) with(dataf[r, ], cor(V1, V2)))


On Wed, Oct 1, 2008 at 8:21 AM, Cézar Freitas <cafanselmo12 at yahoo.com.br> wrote:
> Hi. I searched the list and didn't found nothing similar to this. I simplified my example like below:
>
> #I need calculate correlation (for example) between 2 columns classified by a third one at a data.frame, like below:
>
> #number of rows
> nr = 10
>
> #the third column is to enforce that I need correlation on two variables only
> dataf = as.data.frame(matrix(c(rnorm(nr),rnorm(nr)*2,runif(nr),sort(c(1,1,2,2,3,3,sample(1:3,nr-6,replace=TRUE)))),ncol=4))
> names(dataf)[4] = "class"
>
> #> dataf
> #            V1             V2                V3                 class
> #1   0.56933020      1.2529931     0.30774422     1
> #2   0.41702299     -1.6441547     0.76140046     1
> #3  -1.07671647     -4.8747575     0.43706944     1
> #4  -1.97701167      1.3015196     0.04390175     2
> #5   0.56501325      1.8597720     0.08174124     2
> #6   0.70068638      1.7922641     0.74730126     2
> #7  -1.39956177     -1.9918904     0.64521918     3
> #8   0.27086664      0.3745362     0.61026133     3
> #9   0.04282347      3.7360407     0.48696109     3
> #10 -0.34262654      0.7933674    0.09824913     3
>
> #I tried:
>
> tapply(dataf$V1, dataf$class, cor, dataf$V2)
> #Error FUN(X[[1L]], ...) : incompatible dimensions
>
> tapply(dataf$V1, dataf$class, cor, tapply(dataf$V2, dataf$class))
> #Error FUN(X[[1L]], ...) : incompatible dimensions
>
> #But using "by" I obtain:
>
> by(dataf[,c("V1","V2")], dataf$class, cor)
>
> #dataf$class: 1
> #        V1      V2
> #V1 1.00000 0.91777
> #V2 0.91777 1.00000
> #--------------------------------------------------------------------------------------------------
> #dataf$class: 2
> #         V1       V2
> #V1 1.000000 0.987857
> #V2 0.987857 1.000000
> #--------------------------------------------------------------------------------------------------
> #dataf$class: 3
> #          V1        V2
> #V1 1.0000000 0.7318938
> #V2 0.7318938 1.0000000
>
> #My interest is on cor(V1,V2)[1,2], so I can take 0.91777, 0.987857 and 0.7318938, but I think that tapply can works better, if I can solve the problem.
>
> Thanks,
> Cezar
>
>
>      Novos endereços, o Yahoo! que você conhece. Crie um email novo com a sua cara @ymail.com ou @rocketmail.com.
> http://br.new.mail.yahoo.com/addresses
>        [[alternative HTML version deleted]]
>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>



More information about the R-help mailing list