[R] problem for strsplit function

Duncan Murdoch murdoch@dunc@n @end|ng |rom gm@||@com
Sat Jul 10 00:05:41 CEST 2021


On 09/07/2021 5:51 p.m., Jeff Newmiller wrote:
> "Strictly speaking", Greg is correct, Bert.
> 
> https://cran.r-project.org/doc/manuals/r-release/R-lang.html#List-objects
> 
> Lists in R are vectors. What we colloquially refer to as "vectors" are more precisely referred to as "atomic vectors". And without a doubt, this "vector" nature of lists is a key underlying concept that explains why adding a dim attribute creates a matrix that can hold data frames. It is also a stumbling block for programmers from other languages that have things like linked lists.

I would also object to v3 (below) as "extracting" a column from d. 
"d[2]" doesn't extract anything, it "subsets" the data frame, so the 
result is a data frame, not what you get when you extract something from 
a data frame.

People don't realize that "x <- 1:10; y <- x[[3]]" is perfectly legal. 
That extracts the 3rd element (the number 3).  The problem is that R has 
no way to represent a scalar number, only a vector of numbers, so x[[3]] 
gets promoted to a vector containing that number when it is returned and 
assigned to y.

Lists are vectors of R objects, so if x is a list, x[[3]] is something 
that can be returned, and it is different from x[3].

Duncan Murdoch

> 
> On July 9, 2021 2:36:19 PM PDT, Bert Gunter <bgunter.4567 using gmail.com> wrote:
>> "1.  a column, when extracted from a data frame, *is* a vector."
>> Strictly speaking, this is false; it depends on exactly what is meant
>> by "extracted." e.g.:
>>
>>> d <- data.frame(col1 = 1:3, col2 = letters[1:3])
>>> v1 <- d[,2] ## a vector
>>> v2 <- d[[2]] ## the same, i.e
>>> identical(v1,v2)
>> [1] TRUE
>>> v3 <- d[2] ## a data.frame
>>> v1
>> [1] "a" "b" "c"  ## a character vector
>>> v3
>>   col2
>> 1    a
>> 2    b
>> 3    c
>>> is.vector(v1)
>> [1] TRUE
>>> is.vector(v3)
>> [1] FALSE
>>> class(v3)  ## data.frame
>> [1] "data.frame"
>> ## but
>>> is.list(v3)
>> [1] TRUE
>>
>> which is simply explained in ?data.frame (where else?!) by:
>> "A data frame is a **list** [emphasis added] of variables of the same
>> number of rows with unique row names, given class "data.frame". If no
>> variables are included, the row names determine the number of rows."
>>
>> "2.  maybe your question is "is a given function for a vector, or for a
>>     data frame/matrix/array?".  if so, i think the only way is reading
>>     the help information (?foo)."
>>
>> Indeed! Is this not what the Help system is for?! But note also that
>> the S3 class system may somewhat blur the issue: foo() may work
>> appropriately and differently for different (S3) classes of objects. A
>> detailed explanation of this behavior can be found in appropriate
>> resources or (more tersely) via ?UseMethod .
>>
>> "you might find reading ?"[" and  ?"[.data.frame" useful"
>>
>> Not just 'useful" -- **essential** if you want to work in R, unless
>> one gets this information via any of the numerous online tutorials,
>> courses, or books that are available. The Help system is accurate and
>> authoritative, but terse. I happen to like this mode of documentation,
>> but others may prefer more extended expositions. I stand by this claim
>> even if one chooses to use the "Tidyverse", data.table package, or
>> other alternative frameworks for handling data. Again, others may
>> disagree, but R is structured around these basics, and imo one remains
>> ignorant of them at their peril.
>>
>> Cheers,
>> Bert
>>
>>
>> Bert Gunter
>>
>> "The trouble with having an open mind is that people keep coming along
>> and sticking things into it."
>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>
>> On Fri, Jul 9, 2021 at 11:57 AM Greg Minshall <minshall using umich.edu>
>> wrote:
>>>
>>> Kai,
>>>
>>>> one more question, how can I know if the function is for column
>>>> manipulations or for vector?
>>>
>>> i still stumble around R code.  but, i'd say the following (and look
>>> forward to being corrected! :):
>>>
>>> 1.  a column, when extracted from a data frame, *is* a vector.
>>>
>>> 2.  maybe your question is "is a given function for a vector, or for
>> a
>>>      data frame/matrix/array?".  if so, i think the only way is
>> reading
>>>      the help information (?foo).
>>>
>>> 3.  sometimes, extracting the column as a vector from a data
>> frame-like
>>>      object might be non-intuitive.  you might find reading ?"[" and
>>>      ?"[.data.frame" useful (as well as ?"[.data.table" if you use
>> that
>>>      package).  also, the str() command can be helpful in
>> understanding
>>>      what is happening.  (the lobstr:: package's sxp() function, as
>> well
>>>      as more verbose .Internal(inspect()) can also give you insight.)
>>>
>>>      with the data.table:: package, for example, if "DT" is a
>> data.table
>>>      object, with "x2" as a column, adding or leaving off quotation
>> marks
>>>      for the column name can make all the difference between ending up
>>>      with a vector, or with a (much reduced) data table:
>>> ----
>>>> is.vector(DT[, x2])
>>> [1] TRUE
>>>> str(DT[, x2])
>>>   num [1:9] 32 32 32 32 32 32 32 32 32
>>>>
>>>> is.vector(DT[, "x2"])
>>> [1] FALSE
>>>> str(DT[, "x2"])
>>> Classes ‘data.table’ and 'data.frame':  9 obs. of  1 variable:
>>>   $ x2: num  32 32 32 32 32 32 32 32 32
>>>   - attr(*, ".internal.selfref")=<externalptr>
>>> ----
>>>
>>>      a second level of indexing may or may not help, mostly depending
>> on
>>>      the use of '[' versus of '[['.  this can sometimes cause
>> confusion
>>>      when you are learning the language.
>>> ----
>>>> str(DT[, "x2"][1])
>>> Classes ‘data.table’ and 'data.frame':  1 obs. of  1 variable:
>>>   $ x2: num 32
>>>   - attr(*, ".internal.selfref")=<externalptr>
>>>> str(DT[, "x2"][[1]])
>>>   num [1:9] 32 32 32 32 32 32 32 32 32
>>> ----
>>>
>>>      the tibble:: package (used in, e.g., the dplyr:: package) also
>>>      (always?) returns a single column as a non-vector.  again, a
>>>      second indexing with double '[[]]' can produce a vector.
>>> ----
>>>> DP <- tibble(DT)
>>>> is.vector(DP[, "x2"])
>>> [1] FALSE
>>>> is.vector(DP[, "x2"][[1]])
>>> [1] TRUE
>>> ----
>>>
>>>      but, note that a list of lists is also a vector:
>>>> is.vector(list(list(1), list(1,2,3)))
>>> [1] TRUE
>>>> str(list(list(1), list(1,2,3)))
>>> List of 2
>>>   $ :List of 1
>>>    ..$ : num 1
>>>   $ :List of 3
>>>    ..$ : num 1
>>>    ..$ : num 2
>>>    ..$ : num 3
>>>
>>>      etc.
>>>
>>> hth.  good luck learning!
>>>
>>> cheers, Greg
>>>
>>> ______________________________________________
>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> ______________________________________________
>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list