[R] How to extract same columns from identical dataframes in a list?

Bert Gunter bgunter.4567 at gmail.com
Wed Feb 10 16:27:29 CET 2016


Google! (e.g. on "R Language tutorials")

Some specific recommendations can be found here:

https://www.rstudio.com/resources/training/online-learning/#R


Cheers,
Bert


Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Wed, Feb 10, 2016 at 1:04 AM, Wolfgang Waser
<waser at frankenfoerder-fg.de> wrote:
> Hi,
>
> sapply(l,"[",T,2)
>
> and
>
> sapply(l, function(e) e[, 2])
>
>
> work fine!
>
>
> Thanks a lot!
>
> Why is the second version "brute force and ignorance"? Is one of the
> versions to be preferred? If so, which and why (very briefly, please)?
>
>
> Results of the other options mentioned:
>
>> sapply(l,"[[",2)
>
> results in a single vector of length 7
>
>
>> sapply(l,"[",,2)
> Error in lapply(X = X, FUN = FUN, ...) :
> argument is missing, with no default
>
> These versions probably don't work due the "data frames" in the list
> actually being matrices.
>
>
> I'm not enough of a programer to always make complete sense of the R
> help pages. Should I have found this information in the sapply - R help
> page?
> Where else could I check before pestering the R mailing list, which, of
> course, provides quick and valuable answers.
>
>
> Cheers,
>
> Wolfgang
>
>
>
>
> On 09/02/16 16:19, peter dalgaard wrote:
>> Like this?
>>
>>> l <- replicate(3,data.frame(w1=sample(1:4),w2=sample(1:4)), simplify=FALSE)
>>> l
>> [[1]]
>>   w1 w2
>> 1  2  2
>> 2  3  3
>> 3  1  1
>> 4  4  4
>>
>> [[2]]
>>   w1 w2
>> 1  3  4
>> 2  2  2
>> 3  1  3
>> 4  4  1
>>
>> [[3]]
>>   w1 w2
>> 1  1  4
>> 2  4  3
>> 3  2  1
>> 4  3  2
>>
>>> sapply(l,"[[",2)
>>      [,1] [,2] [,3]
>> [1,]    2    4    4
>> [2,]    3    2    3
>> [3,]    1    3    1
>> [4,]    4    1    2
>>
>> Or even
>>
>>> sapply(l,"[",,2)
>>      [,1] [,2] [,3]
>> [1,]    2    4    4
>> [2,]    3    2    3
>> [3,]    1    3    1
>> [4,]    4    1    2
>>
>>
>> Notice that if dd[1:24] gives you the 1st column, then dd is not a data frame but rather a matrix, and indexing semantics are different. In that case, for some unspeakable reason, the empty index does not work and you'll need something like
>>
>>> l <- replicate(3,cbind(w1=sample(1:4),w2=sample(1:4)), simplify=FALSE)
>>> sapply(l,"[",T,2)
>>      [,1] [,2] [,3]
>> [1,]    4    3    2
>> [2,]    1    1    4
>> [3,]    3    2    3
>> [4,]    2    4    1
>>
>> Or, brute-force-and-ignorance:
>>
>>> sapply(l, function(e) e[, 2])
>>      [,1] [,2] [,3]
>> [1,]    4    3    2
>> [2,]    1    1    4
>> [3,]    3    2    3
>> [4,]    2    4    1
>>
>>
>>
>>
>>
>> On 09 Feb 2016, at 10:03 , Wolfgang Waser <waser at frankenfoerder-fg.de> wrote:
>>
>>> Hi,
>>>
>>> sorry if my description was too short / unclear.
>>>
>>>> I have a list of 7 data frames, each data frame having 24 rows (hour of
>>>> the day) and 5 columns (weeks) with a total of 5 x 24 values
>>>
>>> [1]
>>>      week1   week2   week3   ...
>>> 1    x       a       m       ...
>>> 2    y       b       n
>>> 3    z       c       o
>>> .    .       .       .
>>> .    .       .       .
>>> .    .       .       .
>>> 24   .       .       .
>>>
>>>
>>> [2]
>>>      week1 week2 week3 ...
>>> 1    x2      a2      m2      ...
>>> 2    y2      b2      n2
>>> 3    z2      c2      o2
>>> .    .       .       .
>>> .    .       .       .
>>> .    .       .       .
>>> 24   .       .       .
>>>
>>>
>>> [3]
>>> ...
>>>
>>> .
>>> .
>>> .
>>>
>>>
>>> [7]
>>> ...
>>>
>>>
>>>
>>> I now would like to extract e.g. all week2 columns of all data frames in
>>> the list and combine them in a new data frame using cbind.
>>>
>>> new data frame
>>>
>>> week2 ([1])  week2 ([2])     week2 ([3])     ...
>>> a            a2              .
>>> b            b2              .
>>> c            c2              .
>>> .
>>> .
>>> .
>>>
>>> I will then do further row-wise calculations using e.g. apply(x,1,mean),
>>> the result being a vector of 24 values.
>>>
>>>
>>> I have not found a way to extract specific columns of the data frames in
>>> a list.
>>>
>>>
>>> As mentioned I can use
>>>
>>> sapply(list_of_dataframes,"[",1:24)
>>>
>>> which will pick the first 24 values (first column) of each data frame in
>>> the list and arrange them as an array of 24 rows and 7 columns (7 data
>>> frames are in the list).
>>> To pick the second column (week2) using sapply I have to use the next 24
>>> values from 25 to 48:
>>>
>>> sapply(list_of_dataframes,"[",25:48)
>>>
>>>
>>> It seems that sapply treats the data frames in the list as vectors. I
>>> can of course extract all consecutive weeks using consecutive blocks of
>>> 24 values, but this seems cumbersome.
>>>
>>>
>>> The question remains, how to select specific columns from data frames in
>>> a list, e.g. all columns 3 of all data frames in the list.
>>>
>>>
>>> Reformatting (unlist(), dim()) in one data frame with one column for
>>> each week does not help, since I'm not calculating colMeans etc, but
>>> row-wise calculations using apply(x,1,FUN) ("applying a function to
>>> margins of an array or matrix").
>>>
>>>
>>>
>>> Thanks for you help and suggestions!
>>>
>>>
>>> Wolfgang
>>>
>>>
>>>
>>> On 08/02/16 18:00, Dénes Tóth wrote:
>>>> Hi,
>>>>
>>>> Although you did not provide any reproducible example, it seems you
>>>> store the same type of values in your data.frames. If this is true, it
>>>> is much more efficient to store your data in an array:
>>>>
>>>> mylist <- list(a = data.frame(week1 = rnorm(24), week2 = rnorm(24)),
>>>>               b = data.frame(week1 = rnorm(24), week2 = rnorm(24)))
>>>>
>>>> myarray <- unlist(mylist, use.names = FALSE)
>>>> dim(myarray) <- c(nrow(mylist$a), ncol(mylist$a), length(mylist))
>>>> dimnames(myarray) <- list(hour = rownames(mylist$a),
>>>>                          week = colnames(mylist$a),
>>>>                          other = names(mylist))
>>>> # now you can do:
>>>> mean(myarray[, "week1", "a"])
>>>>
>>>> # or:
>>>> colMeans(myarray)
>>>>
>>>>
>>>> Cheers,
>>>>  Denes
>>>>
>>>>
>>>> On 02/08/2016 02:33 PM, Wolfgang Waser wrote:
>>>>> Hello,
>>>>>
>>>>> I have a list of 7 data frames, each data frame having 24 rows (hour of
>>>>> the day) and 5 columns (weeks) with a total of 5 x 24 values
>>>>>
>>>>> I would like to combine all 7 columns of week 1 (and 2 ...) in a
>>>>> separate data frame for hourly calculations, e.g.
>>>>>> apply(new.data.frame,1,mean)
>>>>>
>>>>> In some way sapply (lapply) works, but I cannot directly select columns
>>>>> of the original data frames in the list. As a workaround I have to
>>>>> select a range of values:
>>>>>
>>>>>> sapply(list_of_dataframes,"[",1:24)
>>>>>
>>>>> Values 1:24 give the first column, 25:48 the second and so on.
>>>>>
>>>>> Is there an easier / more direct way to select for specific columns
>>>>> instead of selecting a range of values, avoiding loops?
>>>>>
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Wolfgang
>>>>>
>>>>> ______________________________________________
>>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list