[R] Data.frames can not hold objects...What can be done in the following scenario?

Rui Barradas ruipbarradas at sapo.pt
Tue Jun 12 17:15:51 CEST 2012


Hello,

You're right, to put lists or vectors as elements of data frames is not 
the best practice.

Note, however, that the opposite is not true, it's common and good 
practice to have data frames and other objects as list elements, 
especially if they are in some way related. If, for instance, we have 
several files, each of them with the same structure but with 
measurements taken at different sites or dates, it's frequent to read 
them into a list of data frames.

In your case, maybe it would be better to create a list, not another 
column of the data.frame.

testlist <- lapply(...etc...)

Like this the extra information would be kept in a more flexible 
structure, but sharing the index number. Anyway, all general rules have 
exceptions, and I don't dislike the original one. It does make sense.

Rui Barradas

Em 11-06-2012 23:55, Onur Uncu escreveu:
> Thank you Rui! You have been very helpful to me.
>
> I was told in another R forum today that it is bad programming
> practice to put lists/vectors as elements into data.frames. I am very
> new to R programming and I am trying to figure out elegant ways of
> writing code. This is why I asked the below questions... Thank you for
> your help.
>
>
>
> On Mon, Jun 11, 2012 at 11:35 PM, Rui Barradas <ruipbarradas at sapo.pt> wrote:
>> Hello,
>>
>> There are also other possibilities. What I believe is the easiest is to go
>> back to the beginning, i.e., have the function return a vector as before,
>> and then use lapply on the data.frame's rows.
>>
>> testfun <- function (x, y) seq(x, y, 1)
>>
>>
>> testframe$newcolumn <- lapply(1:nrow(testframe), function(i)
>>     testfun(testframe[i, 1], testframe[i, 2]))
>> class(testframe$newcolumn)  # [1] "list"
>>
>> testframe$newcolumn[[1]]    # a vector, no longer a list
>> testframe$newcolumn[[1]][2]  # 2nd element of that vector
>>
>>
>> The main point is that data.frames are lists of a special kind, they
>> implement the statistical concept of variables and their observations, the
>> columns and the rows. And like all list, its elements can be any R object
>> including lists.
>>
>> Rui Barradas
>>
>> Em 11-06-2012 23:02, R. Michael Weylandt escreveu:
>>>
>>> It is possible to chain together uses of `[[` -- e.g.,
>>>
>>> x <- list(1:5, list(letters[1:5], list(LETTERS[1:5])))
>>>
>>> x[[c(1,2)]] # 2L
>>>
>>> x[[c(2,1,3)]] # "c"
>>>
>>> x[[c(2,2,1,3)]] # "C"
>>>
>>> which is sometimes useful.
>>>
>>> Best,
>>> Michael
>>>
>>> On Mon, Jun 11, 2012 at 4:35 PM, Onur Uncu <onuruncu at gmail.com> wrote:
>>>>
>>>> Rui and the R-help team,
>>>>
>>>> In Rui's helpful answer below, the function returns a list as output.
>>>> When we apply() the function to the data.frame, dataframe$newcolumn
>>>> has 2 layers of list before we can access each vector elements. For
>>>> instance, dataframe$newcolumn[[1]][[1]] is a vector whereas
>>>> dataframe$newcolumn and dataframe$newcolumn[[1]] are lists. Is there a
>>>> solution that involves less layers of lists? I am just trying to
>>>> understand the R language better.
>>>>
>>>> Thank you.
>>>>
>>>>
>>>> On Sun, Jun 10, 2012 at 3:18 PM, Rui Barradas <ruipbarradas at sapo.pt>
>>>> wrote:
>>>>>
>>>>> Hello,
>>>>>
>>>>> What you need is to have your function return a list, not a vector. Like
>>>>> this
>>>>>
>>>>> testfun <- function (x, y) list(seq(x, y, 1))
>>>>>
>>>>> testframe<-data.frame(xvalues=c(2,3),yvalues=c(4,5))
>>>>>
>>>>> testframe$newcolumn <- apply(testframe, 1, function(x) testfun(x[1],
>>>>> x[2]))
>>>>> class(testframe$newcolumn)  # [1] "list"
>>>>>
>>>>> Then you access lists and list elements.
>>>>>
>>>>> testframe$newcolumn[[1]]  # a list with just one element
>>>>> testframe$newcolumn[[1]][[1]]  # that element, a vector
>>>>> testframe$newcolumn[[1]][[1]][2]  # the vector's 2nd element
>>>>>
>>>>>
>>>>> Since you want the function to return vectors in order to do further
>>>>> computations, you'll access those vectors by varying the list index,
>>>>>
>>>>>
>>>>> testframe$newcolumn[[1]][[1]]  # first list, it's only vector
>>>>> testframe$newcolumn[[2]][[1]]  # second list, it's only vector
>>>>>
>>>>>
>>>>> Etc.
>>>>>
>>>>> Hope this helps,
>>>>>
>>>>> Rui Barradas
>>>>>
>>>>> Em 10-06-2012 12:29, Onur Uncu escreveu:
>>>>>>
>>>>>>
>>>>>> Thank you Duncan. A follow-up question is, how can I achieve the
>>>>>> desired result in the earlier email? (i.e. Add the resulting vectors
>>>>>> as a new column to the existing data.frame?)   I tried the following:
>>>>>>
>>>>>> testframe$newcolumn<-apply(testframe,1,function(x)testfun(x[1],x[2]))
>>>>>>
>>>>>> but I am getting the following error:
>>>>>>
>>>>>> Error in `$<-.data.frame`(`*tmp*`, "vecss", value = c(2, 3, 4, 3, 4, 5
>>>>>> : replacement has 3 rows, data has 2
>>>>>>
>>>>>> Thanks for the help.
>>>>>>
>>>>>>
>>>>>> On Sun, Jun 10, 2012 at 12:02 PM, Duncan Murdoch
>>>>>> <murdoch.duncan at gmail.com> wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 12-06-10 6:41 AM, Onur Uncu wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> R-Help community,
>>>>>>>>
>>>>>>>> I understand that data.frames can hold elements of type double,
>>>>>>>> string
>>>>>>>> etc but NOT objects (such as a matrix etc).
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> That is incorrect.  Dataframes can hold list vectors.  For example:
>>>>>>>
>>>>>>> A <- data.frame(x = 1:3)
>>>>>>> A$y <- list(matrix(1, 2,2), matrix(2, 3,3), matrix(3,4,4))
>>>>>>>
>>>>>>> A[1,2] will now extract the 2x2 matrix, A[2,2] will extract the 3x3,
>>>>>>> etc.
>>>>>>>
>>>>>>> Duncan Murdoch
>>>>>>>
>>>>>>> This is not convenient for
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> me in the following situation. I have a function that takes 2 inputs
>>>>>>>> and returns a vector:
>>>>>>>>
>>>>>>>> testfun<- function (x,y) seq(x,y,1)
>>>>>>>>
>>>>>>>> I have a data.frame defined as follows:
>>>>>>>>
>>>>>>>> testframe<-data.frame(xvalues=c(2,3),yvalues=c(4,5))
>>>>>>>>
>>>>>>>> I would like to apply testfun to every row of testframe and then
>>>>>>>> create a new column in the data.frame which holds the returned
>>>>>>>> vectors
>>>>>>>> as objects. Why do I want this? Because the returned vectors are an
>>>>>>>> intermediate step towards further calculations. It would be great to
>>>>>>>> keep adding new columns to the data.frame with the intermediate
>>>>>>>> objects. But this is not possible since data.frames can not hold
>>>>>>>> objects as elements. What do you suggest as an elegant solution in
>>>>>>>> this scenario? Thank you for any help!
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> I would love to hear if forum
>>>>>>>>
>>>>>>>> ______________________________________________
>>>>>>>> R-help at r-project.org mailing list
>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>>> PLEASE do read the posting guide
>>>>>>>> http://www.R-project.org/posting-guide.html
>>>>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> ______________________________________________
>>>>>> R-help at r-project.org mailing list
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>> PLEASE do read the posting guide
>>>>>> http://www.R-project.org/posting-guide.html
>>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>>
>>>>>
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>
>>



More information about the R-help mailing list