[R] help with grouping data and calculating the means

Bert Gunter bgunter@4567 @ending from gm@il@com
Thu Nov 15 21:19:43 CET 2018


On Thu, Nov 15, 2018 at 10:40 AM Boris Steipe <boris.steipe using utoronto.ca> wrote:
>
> Use round() with the appropriate  "digits" argument. Then use unique() to define your groups.

No.
> round(c(.124,.126),2)
[1] 0.12 0.13

As I understand it, the OP said he wanted the last decimal to be ignored.

The OP also did not specify what he wanted to calculate means of. I
assume TK-QUADRANT. It is also not clear whether the calculations are
to be done separately by latitude and longitude, or both together.
I'll assume separately. In which case, the calculation of TK-QUADRANT
means by e.g. grouped according to 4 decimal digit values of latitude
could be done using(using the provided example data):
(Note: ignore all that follows if my interpretation is incorrect)

> with(df, tapply(TK.QUADRANT, floor(1e4*LAT),mean))
 549249  549749  550249  550749
10158.5 10156.5  9163.5  9161.5

## Note that this assumes positive values of latitude, because:
> floor(c(-1.2,1.2))
[1] -2  1

This could be easily modifed if both positive and negative values were
used: e.g.
> x <-c(-1.2,1.2)
> sign(x)*floor(abs(x))
[1] -1  1

Confession: I suspect that this exponentiate and floor() procedure
might fail with lots of decimal places due to the usual issues of
binary representations of decimals. But maybe it fails even here. If
so, I would appreciate someone pointing this out and, if possible,
providing a better strategy.

Cheers,
Bert



>
> HTH,
> B.
>
>
> > On 2018-11-15, at 11:48, sasa kosanic <sasa.kosanic using gmail.com> wrote:
> >
> > Dear All,
> >
> > I would very much appreciate the help with following:
> > I need to calculate the mean of  different lat/long points that should be
> > grouped.
> > However I would like that r excludes taking  values that are different in
> > only last decimal.
> > So instead 4 values in the group it would calculate the mean for only 3(
> > excluding the ones that differs in only one decimal).
> > # construct the dataframe
> > `TK-QUADRANT` <- c(9161,9162,9163,9164,10152,10154,10161,10163)
> > LAT <- c(55.07496,55.07496,55.02495,55.02496
> > ,54.97496,54.92495,54.97496,54.92496)
> > LON <-
> > c(8.37477,8.458109,8.37477,8.45811,8.291435,8.291437,8.374774,8.374774)
> > df <- data.frame(`TK-QUADRANT`=`TK-QUADRANT`,LAT=LAT,LON=LON)
> >
> >
> > I would like to group the data and calculate means by group but in a way to
> > exclude every number that differs in only last decimal.
> >
> >
> > Also please see pdf. example-attached .
> >
> > Many thanks!
> > Best wishes,
> > Sasha
> >
> > --
> >
> > Dr Sasha Kosanic
> > Ecology Lab (Biology Department)
> > Room M644
> > University of Konstanz
> > Universitätsstraße 10
> > D-78464 Konstanz
> > Phone: +49 7531 883321 & +49 (0)175 9172503
> >
> > http://cms.uni-konstanz.de/vkleunen/
> > https://tinyurl.com/y8u5wyoj
> > https://tinyurl.com/cgec6tu
> > <dataset example.pdf>______________________________________________
> > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list