[R] Calculated mean value based on another column bin from dataframe.

Fabrice Tourre fabrice.ciup at gmail.com
Wed Apr 6 23:59:35 CEST 2011


Thanks. Before I never used findInterval function. It seems very nice.

On Wed, Apr 6, 2011 at 11:20 PM, David Winsemius <dwinsemius at comcast.net> wrote:
>
> On Apr 6, 2011, at 9:46 AM, Fabrice Tourre wrote:
>
>> Dear Henrique Dallazuanna,
>>
>> Thank you very much for your suggestion.
>>
>> It is obvious that your method is better than me.
>>
>> Is it possible to use cut, table,by etc? Whether there is some
>> aggregate function in R can do this?
>>
>> Thanks.
>>
>> On Wed, Apr 6, 2011 at 2:16 PM, Henrique Dallazuanna <wwwhsd at gmail.com>
>> wrote:
>>>
>>> Try this:
>>>
>>> fil <- sapply(ran, '<', e1 = dat[,1]) & sapply(ran[2:(length(ran) +
>>> 1)], '>=', e1 = dat[,1])
>>> mm <- apply(fil, 2, function(idx)mean(dat[idx, 2]))
>>>
>>> On Wed, Apr 6, 2011 at 5:48 AM, Fabrice Tourre <fabrice.ciup at gmail.com>
>>> wrote:
>>>>
>>>> Dear list,
>>>>
>>>> I have a dataframe with two column as fellow.
>>>>
>>>>> head(dat)
>>>>
>>>>      V1      V2
>>>>  0.15624 0.94567
>>>>  0.26039 0.66442
>>>>  0.16629 0.97822
>>>>  0.23474 0.72079
>>>>  0.11037 0.83760
>>>>  0.14969 0.91312
>>>>
>>>> I want to get the column V2 mean value based on the bin of column of
>>>> V1. I write the code as fellow. It works, but I think this is not the
>>>> elegant way. Any suggestions?
>>>>
>>>> dat<-read.table("dat.txt",head=F)
>>>> ran<-seq(0,0.5,0.05)
>>>> mm<-NULL
>>>> for (i in c(1:(length(ran)-1)))
>>>> {
>>>>   fil<- dat[,1] > ran[i] & dat[,1]<=ran[i+1]
>>>>   m<-mean(dat[fil,2])
>>>>   mm<-c(mm,m)
>>>> }
>>>> mm
>>>>
>>>> Here is the first 20 lines of my data.
>>>>
>>>>> dput(head(dat,20))
>>>>
>>>> structure(list(V1 = c(0.15624, 0.26039, 0.16629, 0.23474, 0.11037,
>>>> 0.14969, 0.16166, 0.09785, 0.36417, 0.08005, 0.29597, 0.14856,
>>>> 0.17307, 0.36718, 0.11621, 0.23281, 0.10415, 0.1025, 0.04238,
>>>> 0.13525), V2 = c(0.94567, 0.66442, 0.97822, 0.72079, 0.8376,
>>>> 0.91312, 0.88463, 0.82432, 0.55582, 0.9429, 0.78956, 0.93424,
>>>> 0.87692, 0.83996, 0.74552, 0.9779, 0.9958, 0.9783, 0.92523, 0.99022
>>>> )), .Names = c("V1", "V2"), row.names = c(NA, 20L), class =
>>>> "data.frame")
>>>>
>>>> ______________________________________________
>
> Here is how I would have done it with findInterval and tapply which is very
> similar to using a `cut` and `table` approach:
>
>> dat$grp <- findInterval(dat$V1, seq(0,0.5,0.05) )
>> tapply(dat$V2, dat$grp, mean)
>        1         2         3         4         5         6         8
> 0.9252300 0.8836100 0.9135429 0.9213600 0.8493450 0.7269900 0.6978900
> #####---------------
>
> You do not get exactly the same form of the result as with Henrique's
> method. His yields:
>> mm
>  [1] 0.9252300 0.8836100 0.9135429 0.9213600 0.8493450 0.7269900       NaN
>  [8] 0.6978900       NaN       NaN       NaN
>
> ####----------------
>
> The cut approach would yield this, which is more informatively labeled. (I'm
> wasn't completely sure the second to last word in the prior sentence was a
> real word, but several dictionaries seem to think so.):
>
>> dat$grp2 <- cut(dat$V1 , breaks=ran)
>> tapply(dat$V2, dat$grp2, mean)
>  (0,0.05] (0.05,0.1] (0.1,0.15] (0.15,0.2] (0.2,0.25] (0.25,0.3]
>  0.9252300  0.8836100  0.9135429  0.9213600  0.8493450  0.7269900
> (0.3,0.35] (0.35,0.4] (0.4,0.45] (0.45,0.5]
>        NA  0.6978900         NA         NA
>
>
>>
>
> David Winsemius, MD
> West Hartford, CT
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list