[R] how can I convert a long to wide matrix?

Jeff Newmiller jdnewm|| @end|ng |rom dcn@d@v|@@c@@u@
Wed May 2 06:33:08 CEST 2018


Here is a stab in the dark. I agree with Jim that the description of the 
problem is hard to follow. The original posting being in HTML format did 
not help.

#########
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#>     filter, lag
#> The following objects are masked from 'package:base':
#>
#>     intersect, setdiff, setequal, union
library(tidyr)

# indenting was just a side-effect of me cleaning up the HTML mess
dat <- structure( list( ID = structure( c( 1L, 1L, 1L, 2L, 2L)
                                       , .Label = c("id_X","id_Y")
                                       , class = "factor"
                                       )
                       , EventDate = structure( c( 4L, 5L, 2L
                                                 , 3L, 1L )
                                              , .Label = c( "9/15/16"
                                                          , "9/15/17"
                                                          , "9/7/16"
                                                          , "9/8/16"
                                                          , "9/9/16"
                                                          )
                                              , class = "factor"
                                              )
                       , timeGroup = structure( c( 1L, 1L, 2L, 1L, 2L)
                                              , .Label = c("B1", "B2")
                                              , class = "factor"
                                              )
                       , SITE = structure( c( 1L, 1L, 2L, 1L, 2L)
                                         , .Label = c("A", "B" )
                                         , class = "factor"
                                         )
                       )
                 , .Names = c( "ID", "EventDate"
                             , "timeGroup", "SITE")
                 , class = "data.frame"
                 , row.names = c(NA, -5L)
                 )
dat2 <- (   dat
         %>% mutate( EventDate = as.Date( as.character( EventDate )
                                        , format = "%m/%d/%y"
                                        )
                   )
         %>% arrange( ID, timeGroup, EventDate )
         %>% group_by( ID, timeGroup )
         %>% top_n( 1, EventDate )
         %>% ungroup
         )
dat2
#> # A tibble: 4 x 4
#>   ID    EventDate  timeGroup SITE
#>   <fct> <date>     <fct>     <fct>
#> 1 id_X  2016-09-09 B1        A
#> 2 id_X  2017-09-15 B2        B
#> 3 id_Y  2016-09-07 B1        A
#> 4 id_Y  2016-09-15 B2        B
dat3a <- (   dat2
          %>% mutate( timeGroup = paste( "EventDate"
                                       , timeGroup
                                       , sep="_"
                                       )
                    )
          %>% select( ID, timeGroup, EventDate )
          %>% spread( timeGroup, EventDate )
          )
dat3a
#> # A tibble: 2 x 3
#>   ID    EventDate_B1 EventDate_B2
#>   <fct> <date>       <date>
#> 1 id_X  2016-09-09   2017-09-15
#> 2 id_Y  2016-09-07   2016-09-15
dat3b <- (   dat2
          %>% mutate( timeGroup = paste( "SITE"
                                       , timeGroup
                                       , sep = "_"
                                       )
                    )
          %>% select( ID, timeGroup, SITE )
          %>% spread( timeGroup, SITE )
          )
dat3b
#> # A tibble: 2 x 3
#>   ID    SITE_B1 SITE_B2
#>   <fct> <fct>   <fct>
#> 1 id_X  A       B
#> 2 id_Y  A       B
dat4 <- (   dat3a
         %>% left_join( dat3b, by = "ID" ) )
dat4
#> # A tibble: 2 x 5
#>   ID    EventDate_B1 EventDate_B2 SITE_B1 SITE_B2
#>   <fct> <date>       <date>       <fct>   <fct>
#> 1 id_X  2016-09-09   2017-09-15   A       B
#> 2 id_Y  2016-09-07   2016-09-15   A       B
#########

On Wed, 2 May 2018, Jim Lemon wrote:

> Hi Marna,
> This is a condition that the function cannot handle. It would be
> possible to reformat the result based on the time intervals, but the
> stretch_df function doesn't try to interpret the values, just
> stretches them out to a wide format.
>
> Jim
>
>
> On Wed, May 2, 2018 at 9:16 AM, Marna Wagley <marna.wagley using gmail.com> wrote:
>> Hi Jim,
>> The data set is correct. I took two readings from the "SITE A" within a
>> short time interval, therefore I want to take the first value if there are
>> repeated within a same group of "timeGroup".
>> Therefore I wanted following
>>
>> FinalData1
>>
>>          B1    B2
>> id_X   "A"   "B"
>> id_Y   "A"   "B"
>>
>> thanks,
>>
>>
>>
>> On Tue, May 1, 2018 at 4:05 PM, Jim Lemon <drjimlemon using gmail.com> wrote:
>>>
>>> Hi Marna,
>>> I think this is due to having three rows for id_X and only two for
>>> id_Y. The function creates a data frame with enough columns to hold
>>> the greatest number of values for each ID variable. Notice that the
>>> SITE_n columns contain three values for id_X (A, A, B) and two for
>>> id_Y (A, B, NA) as there was no third occasion of measurement for the
>>> latter. Even though there are only two _values_ for SITE, there must
>>> be enough space for three. In your desired output, SITE for the second
>>> occasion of measurement is wrong (it should be "A"), and for the third
>>> occasion it is unknown. Even if there was only one value for SITE in
>>> the original data frame, it should be repeated for the correct number
>>> of observations. I think you may be mixing up case ID with location of
>>> observation.
>>>
>>> Jim
>>>
>>>
>>> On Wed, May 2, 2018 at 8:48 AM, Marna Wagley <marna.wagley using gmail.com>
>>> wrote:
>>>> Hi Jim,
>>>> Thank you very much for your suggestions. I used it but it gave me three
>>>> sites. But actually I do have only two sites "Id_X" and "Id_y" . In fact
>>>> "A" is repeated two times for "Id_X". If it is repeated, I would like to
>>>> take the first one among many repeated values.
>>>>
>>>> dat<-structure(list(ID = structure(c(1L, 1L, 1L, 2L, 2L), .Label =
>>>> c("id_X",
>>>>
>>>> "id_Y"), class = "factor"), EventDate = structure(c(4L, 5L, 2L,
>>>>
>>>> 3L, 1L), .Label = c("9/15/16", "9/15/17", "9/7/16", "9/8/16",
>>>>
>>>> "9/9/16"), class = "factor"), timeGroup = structure(c(1L, 1L,
>>>>
>>>> 2L, 1L, 2L), .Label = c("B1", "B2"), class = "factor"), SITE =
>>>> structure(c(1L,
>>>>
>>>> 1L, 2L, 1L, 2L), .Label = c("A", "B"), class = "factor")), .Names =
>>>> c("ID",
>>>>
>>>> "EventDate", "timeGroup", "SITE"), class = "data.frame", row.names =
>>>> c(NA,
>>>>
>>>> -5L))
>>>>
>>>> library(prettyR)
>>>>
>>>> stretch_df(dat,idvar="ID",to.stretch=c("EventDate","SITE"))
>>>>
>>>>
>>>> ID timeGroup EventDate_1 EventDate_2 EventDate_3 SITE_1 SITE_2 SITE_3
>>>> 1 id_X        B1      9/8/16      9/9/16     9/15/17      A      A
>>>> B
>>>> 2 id_Y        B1      9/7/16     9/15/16        <NA>      A      B
>>>> <NA>
>>>>>
>>>>
>>>> Basically I am looking for like following table
>>>>
>>>> ID timeGroup EventDate_1 EventDate_2 EventDate_3 SITE_1 SITE_2
>>>> 1 id_X        B1      9/8/16      9/9/16     9/15/17      A      B
>>>> 2 id_Y        B1      9/7/16     9/15/16        <NA>      A      B
>>>>
>>>> Thanks
>>>>
>>>>
>>>> On Tue, May 1, 2018 at 3:32 PM, Jim Lemon <drjimlemon using gmail.com> wrote:
>>>>>
>>>>> Hi Marna,
>>>>> Try this:
>>>>>
>>>>> library(prettyR)
>>>>> stretch_df(dat,idvar="ID",to.stretch=c("EventDate","SITE"))
>>>>>
>>>>> Jim
>>>>>
>>>>>
>>>>> On Wed, May 2, 2018 at 8:24 AM, Marna Wagley <marna.wagley using gmail.com>
>>>>> wrote:
>>>>>> Hi R user,
>>>>>> I was trying to convert a long matrix to wide? I have an example and
>>>>>> would
>>>>>> like to get a table (FinalData1):
>>>>>>
>>>>>>
>>>>>> FinalData1
>>>>>>          B1    B2
>>>>>> id_X   "A"   "B"
>>>>>> id_Y   "A"   "B"
>>>>>>
>>>>>> but I got the following table using the following code.
>>>>>>
>>>>>> FinalData1
>>>>>>
>>>>>>      B1  B2
>>>>>>
>>>>>> id_X "A" "A"
>>>>>>
>>>>>> id_Y "A" "B"
>>>>>>
>>>>>>
>>>>>> the code and the example data I used are given below. Is there any
>>>>>> suggestions to fix the problem?
>>>>>>
>>>>>>
>>>>>> dat<-structure(list(ID = structure(c(1L, 1L, 1L, 2L, 2L), .Label =
>>>>>> c("id_X",
>>>>>>
>>>>>>
>>>>>> "id_Y"), class = "factor"), EventDate = structure(c(4L, 5L, 2L,
>>>>>>
>>>>>> 3L, 1L), .Label = c("9/15/16", "9/15/17", "9/7/16", "9/8/16",
>>>>>>
>>>>>> "9/9/16"), class = "factor"), timeGroup = structure(c(1L, 1L,
>>>>>>
>>>>>> 2L, 1L, 2L), .Label = c("B1", "B2"), class = "factor"), SITE =
>>>>>> structure(c(
>>>>>> 1L,
>>>>>>
>>>>>> 1L, 2L, 1L, 2L), .Label = c("A", "B"), class = "factor")), .Names =
>>>>>> c("ID",
>>>>>>
>>>>>> "EventDate", "timeGroup", "SITE"), class = "data.frame", row.names =
>>>>>> c(NA,
>>>>>>
>>>>>> -5L))
>>>>>>
>>>>>>
>>>>>> tmp <- split(dat, dat$ID)
>>>>>>
>>>>>> tmp1 <- do.call(rbind, lapply(tmp, function(dat){
>>>>>>
>>>>>> tb <- table(dat$timeGroup)
>>>>>>
>>>>>> idx <- which(tb>0)
>>>>>>
>>>>>> tb1 <- replace(tb, idx, as.character(dat$SITE))
>>>>>>
>>>>>> }))
>>>>>>
>>>>>>
>>>>>> tmp1
>>>>>>
>>>>>> FinalData<-print(tmp1, quote=FALSE)
>>>>>>
>>>>>>         [[alternative HTML version deleted]]
>>>>>>
>>>>>> ______________________________________________
>>>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>> PLEASE do read the posting guide
>>>>>> http://www.R-project.org/posting-guide.html
>>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>>
>>
>>
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil using dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                       Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k




More information about the R-help mailing list