[R] Restructuring Star Wars data from rwars package

Ulrik Stervbo ulrik.stervbo at gmail.com
Fri Aug 4 09:07:28 CEST 2017


Hi Matt,

the usual way would be to use do.call():

.lst <- list(x = list(a = 1, b = 2), y = list(a = 5, b = 8))
do.call(rbind, lapply(.lst, data.frame, stringsAsFactors = FALSE))

however, your list has vectors of unequal lengths making the above fail.
You somehow need to get everything to have the same length, The dplyr data
set has nested columns, but I believe a more transparent way is simply to
concatenate the elements of each vector longer than 1.

library("rwars")
library("tidyverse")

people <- get_all_people(parse_result = T)
people <- get_all_people(getElement(people, "next"), parse_result = T)

list_to_df_collapse <- function(.list){
  .list %>%
    lapply(paste, collapse = "|") %>%
    bind_rows()
}

people$results %>%
  lapply(list_to_df_collapse) %>%
  bind_rows()

This does not re-create the dplyr data set though. To do this you need to
nest the longer than 1 variables. It turns out that some variables are not
found in all members of result, and some variables might have the length of
1 in one case but more than one in another. This means we are probably
better of knowing which columns must be nested.

# Find the variables that must be nested
vars_to_nest <- people$results %>%
  # Get the length of each variable at each entry
  map_df(function(.list){
    .names <- names(.list)
    .lengths <- sapply(.list, length)
    data.frame(col = .names, len = .lengths, stringsAsFactors = FALSE)
  }) %>%
  # Get those that has a length of 2 or more in any entry
  filter(len > 1) %>%
  distinct(col) %>% flatten_chr()

list_to_df_nest <- function(.list, .vars_to_nest){
  # Create a list of data.frames
  tmp_lst <-  .list %>%
      map2(names(.), function(.value, .id){
        data_frame(.value) %>%
          set_names(.id)})

  # Nest those that must be nesed
  nested_vars <- tmp_lst[.vars_to_nest] %>%
    # We might have selected something that does not exist we better clear
away
    compact() %>%
      # Do the nesting
      map2(names(.), function(.df, .id){
          nest(.df, one_of(.id)) %>%
            set_names(.id)
      })

  # Overwrite the list elements with the nested data.frames
  tmp_lst[names(nested_vars)] <- nested_vars
  tmp_lst %>% bind_cols()
}

people$results %>%
  lapply(list_to_df_nest, .vars_to_nest = vars_to_nest) %>%
  bind_rows()

The first solution is considerably faster than my second, though everything
might be done in a more clever way...

HTH
Ulrik

On Fri, 4 Aug 2017 at 05:57 Matt Van Scoyoc <scoyoc at gmail.com> wrote:

> I'm having trouble restructuring data from the rwars package into a
> dataframe. Can someone help me?
>
> Here's what I have...
>
> library("rwars")
> library("tidyverse")
>
> # These data are json, so they load into R as a list
> people <- get_all_people(parse_result = T)
> people <- get_all_people(getElement(people, "next"), parse_result = T)
>
> # Look at Anakin Skywalker's data
> people$results[[1]]
> people$results[[1]][1] # print his name
>
> # To use them in R, I need to restructure them to a dataframe like they are
> in dplyr
> data("starwars")
> glimpse(starwars)
>
> Thanks for the help.
>
> Cheers,
> MVS
> =====
> Matthew Van Scoyoc
> =====
> Think SNOW!
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list