[R] problem for strsplit function

Bert Gunter bgunter@4567 @end|ng |rom gm@||@com
Fri Jul 9 23:36:19 CEST 2021


"1.  a column, when extracted from a data frame, *is* a vector."
Strictly speaking, this is false; it depends on exactly what is meant
by "extracted." e.g.:

> d <- data.frame(col1 = 1:3, col2 = letters[1:3])
> v1 <- d[,2] ## a vector
> v2 <- d[[2]] ## the same, i.e
> identical(v1,v2)
[1] TRUE
> v3 <- d[2] ## a data.frame
> v1
[1] "a" "b" "c"  ## a character vector
> v3
  col2
1    a
2    b
3    c
> is.vector(v1)
[1] TRUE
> is.vector(v3)
[1] FALSE
> class(v3)  ## data.frame
[1] "data.frame"
## but
> is.list(v3)
[1] TRUE

which is simply explained in ?data.frame (where else?!) by:
"A data frame is a **list** [emphasis added] of variables of the same
number of rows with unique row names, given class "data.frame". If no
variables are included, the row names determine the number of rows."

"2.  maybe your question is "is a given function for a vector, or for a
    data frame/matrix/array?".  if so, i think the only way is reading
    the help information (?foo)."

Indeed! Is this not what the Help system is for?! But note also that
the S3 class system may somewhat blur the issue: foo() may work
appropriately and differently for different (S3) classes of objects. A
detailed explanation of this behavior can be found in appropriate
resources or (more tersely) via ?UseMethod .

"you might find reading ?"[" and  ?"[.data.frame" useful"

Not just 'useful" -- **essential** if you want to work in R, unless
one gets this information via any of the numerous online tutorials,
courses, or books that are available. The Help system is accurate and
authoritative, but terse. I happen to like this mode of documentation,
but others may prefer more extended expositions. I stand by this claim
even if one chooses to use the "Tidyverse", data.table package, or
other alternative frameworks for handling data. Again, others may
disagree, but R is structured around these basics, and imo one remains
ignorant of them at their peril.

Cheers,
Bert


Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

On Fri, Jul 9, 2021 at 11:57 AM Greg Minshall <minshall using umich.edu> wrote:
>
> Kai,
>
> > one more question, how can I know if the function is for column
> > manipulations or for vector?
>
> i still stumble around R code.  but, i'd say the following (and look
> forward to being corrected! :):
>
> 1.  a column, when extracted from a data frame, *is* a vector.
>
> 2.  maybe your question is "is a given function for a vector, or for a
>     data frame/matrix/array?".  if so, i think the only way is reading
>     the help information (?foo).
>
> 3.  sometimes, extracting the column as a vector from a data frame-like
>     object might be non-intuitive.  you might find reading ?"[" and
>     ?"[.data.frame" useful (as well as ?"[.data.table" if you use that
>     package).  also, the str() command can be helpful in understanding
>     what is happening.  (the lobstr:: package's sxp() function, as well
>     as more verbose .Internal(inspect()) can also give you insight.)
>
>     with the data.table:: package, for example, if "DT" is a data.table
>     object, with "x2" as a column, adding or leaving off quotation marks
>     for the column name can make all the difference between ending up
>     with a vector, or with a (much reduced) data table:
> ----
> > is.vector(DT[, x2])
> [1] TRUE
> > str(DT[, x2])
>  num [1:9] 32 32 32 32 32 32 32 32 32
> >
> > is.vector(DT[, "x2"])
> [1] FALSE
> > str(DT[, "x2"])
> Classes ‘data.table’ and 'data.frame':  9 obs. of  1 variable:
>  $ x2: num  32 32 32 32 32 32 32 32 32
>  - attr(*, ".internal.selfref")=<externalptr>
> ----
>
>     a second level of indexing may or may not help, mostly depending on
>     the use of '[' versus of '[['.  this can sometimes cause confusion
>     when you are learning the language.
> ----
> > str(DT[, "x2"][1])
> Classes ‘data.table’ and 'data.frame':  1 obs. of  1 variable:
>  $ x2: num 32
>  - attr(*, ".internal.selfref")=<externalptr>
> > str(DT[, "x2"][[1]])
>  num [1:9] 32 32 32 32 32 32 32 32 32
> ----
>
>     the tibble:: package (used in, e.g., the dplyr:: package) also
>     (always?) returns a single column as a non-vector.  again, a
>     second indexing with double '[[]]' can produce a vector.
> ----
> > DP <- tibble(DT)
> > is.vector(DP[, "x2"])
> [1] FALSE
> > is.vector(DP[, "x2"][[1]])
> [1] TRUE
> ----
>
>     but, note that a list of lists is also a vector:
> > is.vector(list(list(1), list(1,2,3)))
> [1] TRUE
> > str(list(list(1), list(1,2,3)))
> List of 2
>  $ :List of 1
>   ..$ : num 1
>  $ :List of 3
>   ..$ : num 1
>   ..$ : num 2
>   ..$ : num 3
>
>     etc.
>
> hth.  good luck learning!
>
> cheers, Greg
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list