[R] Computing means of multiple variables based on a condition

Thierry Onkelinx thierry.onkelinx at inbo.be
Thu May 26 15:55:30 CEST 2016


Another option would be to convert the data into a long format and add
columns for each condition.

library(dplyr)
library(tidyr)
DF %>%
  gather(key = "key", value = "value", -a, -d) %>%
  mutate(
    "d>=2" = ifelse(d >= 2, value, NA),
    "d>=4" = ifelse(d >= 4, value, NA),
    "d>=6" = ifelse(d >= 6, value, NA)
  ) %>%
  select(-d, -value) %>%
  gather(key = "condition", value = "value", -a, -key, na.rm = TRUE) %>%
  group_by(a, key, condition) %>%
  summarise(mean = mean(value)) %>%
  spread(key = key, value = mean) %>%
  arrange(condition, a)


ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature and
Forest
team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium

To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to say
what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey

2016-05-26 8:34 GMT+02:00 Jeff Newmiller <jdnewmil op dcn.davis.ca.us>:

> Thank you for including some sample data, but I have to ask that you
> please invest some time in learning how to edit your code in a text editor
> and to post in plain text. The quote marks in your example were "curly",
> which R does not understand. There are other ways in which HTML email leads
> to corruption on this mailing list as well, so you will save everyone
> numerous headaches by investing this time sooner rather than later.
>
> The type of operation you are looking for is referred to as an "outer
> join" in SQL nomenclature, and it is intrinsically slow because the only
> way to accomplish it is computationally equivalent to a for loop that
> successively applies each minimum "d" value to your whole data set.
>
> Having said that, you can accomplish this in the "dplyr" syntax instead of
> using a for loop, if that makes you happy, but it is not really any
> "better" than a for loop (and some people might consider it misleading to
> drape a for loop in such fancy syntax):
>
> DF <- data.frame( a = c( "A", "B", "A", "B", "A", "B", "A", "B", "A", "B" )
>                 , b = c( 15, 35, 20,  99, 75, 64, 33, 78, 45, 20 )
>                 , c = c( 111, 234, 456, 876, 246, 662, 345, 480, 512, 179 )
>                 , d = c( 1.1, 3.2, 14.2, 8.7, 12.5, 5.9, 8.3, 6.0, 2.9,
> 9.3 )
>                 , stringsAsFactors = FALSE
>                 )
> passes <- data.frame( dmin = c( 2, 4, 6 ) )
>
> library(dplyr)
>
> DF2 <- (   passes
>        %>% rowwise
>        %>% do({ # run once for each row in "passes"
>             dmin <- .$dmin # dot here refers to row of
>                            # "passes" data frame
>             (   DF
>             %>% filter( d >= dmin )
>             %>% group_by( a )
>             %>% summarise( meanb = mean( b )
>                          , meanc = mean( c )
>                          )
>             %>% mutate( condition = paste0( "d>=", dmin ) )
>             )
>            })
>        %>% select( a, condition, meanb, meanc )
>        %>% as.data.frame
>        )
>
>
> On Wed, 25 May 2016, KMNanus wrote:
>
> These will be overlapping subgroups from the same data frame.  For
>> example, d<=2 will have length=9, d<=4 will have length=7, etc.
>>
>>
>> Ken
>> kmnanus op gmail.com
>> 914-450-0816 (tel)
>> 347-730-4813 (fax)
>>
>>
>>
>> On May 25, 2016, at 9:06 PM, William Dunlap <wdunlap op tibco.com> wrote:
>>>
>>> Just to be clear, do you really want your 'condition' groups to be be
>>> subsets
>>> of one another?  Most (all?) of the *ply functions assume you want
>>> non-overlapping groups so they do a split-summarize-combine sequence.
>>> You would have to replace the split part of that.
>>>
>>> Bill Dunlap
>>> TIBCO Software
>>> wdunlap tibco.com <http://tibco.com/>
>>> On Wed, May 25, 2016 at 3:37 PM, KMNanus <kmnanus op gmail.com <mailto:
>>> kmnanus op gmail.com>> wrote:
>>> I have a large dataset, a sample of which is:
>>>
>>> a<- c(?A?, ?B?,?A?, ?B?,?A?, ?B?,?A?, ?B?,?A?, ?B?)
>>> b <-c(15, 35, 20,  99, 75, 64, 33, 78, 45, 20)
>>> c<- c( 111, 234, 456, 876, 246, 662, 345, 480, 512, 179)
>>> d<- c(1.1, 3.2, 14.2, 8.7, 12.5, 5.9, 8.3, 6.0, 2.9, 9.3)
>>>
>>> df <- data.frame(a,b,c,d)
>>>
>>> I?m trying to construct a data frame that shows the means of c & b based
>>> on the condition of d and grouped by a.
>>>
>>> I want to create the data frame below, then use ggplot2 to create a line
>>> plot of b at various conditions of d.
>>>
>>> I can compute the grouped means (d>=2, d>=4, etc.) one at a time using
>>> dplyr but haven?t figured out how to put them all together or put them in
>>> one data frame.
>>>
>>> I?d rather not use a loop and am relatively new to R.  Is there a way i
>>> can use tapply and set it to the conditions above so that I can create the
>>> df below?
>>>
>>>
>>>         condition    mean(b)     mean(c)
>>> A        d>=2          ____         _____
>>> B        d>=2          ____         _____
>>> A        d>=4          ____         _____
>>> B        d>=4         ____         _____
>>> A        d>=6         ____         _____
>>> B       d>=6         ____         _____
>>>
>>>
>>>
>>> Ken
>>> kmnanus op gmail.com <mailto:kmnanus op gmail.com>
>>> 914-450-0816 <tel:914-450-0816> (tel)
>>> 347-730-4813 <tel:347-730-4813> (fax)
>>>
>>>
>>>
>>> ______________________________________________
>>> R-help op r-project.org <mailto:R-help op r-project.org> mailing list -- To
>>> UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help <
>>> https://stat.ethz.ch/mailman/listinfo/r-help>
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html <
>>> http://www.r-project.org/posting-guide.html>
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>> ______________________________________________
>> R-help op r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> ---------------------------------------------------------------------------
> Jeff Newmiller                        The     .....       .....  Go Live...
> DCN:<jdnewmil op dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live
> Go...
>                                       Live:   OO#.. Dead: OO#..  Playing
> Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
> /Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
>
>
> ______________________________________________
> R-help op r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list