[R] Question about Rfast colMins and colMaxs

Avi Gross @v|gro@@ @end|ng |rom ver|zon@net
Wed Dec 1 04:42:35 CET 2021


Stephen,

Although what is in the STANDARD R distribution can vary several ways, in
general, if you need to add a line like:

library(something)
or
require(something)

and your code does not work unless you have done that, then you can imagine
it is not sort of built in to R as it starts.

Having said that, tons of exceptions may exist that cause R to load in
things on your machine for everyone or just for you without you having to
notice.

I think this forum lately has been deluged with questions about all kinds of
add-on packages and in particular, lots of the ones in the tidyverse.
Clearly the purpose here is not that broad.

But since I use some packages like the tidyverse extensively, and I am far
from alone, I wonder if someday the powers that be realize it is a losing
battle to exclude at least some of it. It would be so nice not having to
include a long list of packages for some programs or somehow arrange that
people using something you shared had installed and loaded them. But there
are too many packages out there of varying quality and usefulness and
popularity with more every day being added. Worse, many are somewhat
incompatible such as having functions with the same names that hide earlier
ones loaded.

Base R doe come with functions like colSums and colMeans and similar row
functions. But as mentioned, a data.frame is a list of vectors and R
supports certain functional programming constructs over lists using things
like:

lapply(df, min)
sapply(df, min)

And actually quite a few ways depending on what info you want back and
whether you insist it be returned as a list or vector or other things . You
can even supply additional arguments that might be needed such as if you
want to ignore any NA values,

lapply(df, min, na.rm=TRUE

The package you looked at it is trying to be fast and uses what looks like
compiled external code but so does lapply.

If this is too bothersome for you, consider making a one-liner function like
this:

mycolMins <- function(df, ...) lapply(df, min, ...)

Once defined, you can use that just fine and not think about it again and I
note this answer (like others) is offering you something in base R that
works fine on data.frames and the like.

You can extend to many similar ideas like this one that calulates the min
unless you over-ride it with max or mean or sd or a bizarre function like
`[` so a call to:

mycolCalc(df, `[`, 3)

Will return exactly the third items in each row!

I find it to be very common for someone these days to do a quick search for
a way to do something in a language like R and not really look to see if it
is a standard way or something special/ Matrices in R are not quite the same
as some other objects like a data.frame or tibble and a package written to
be used on one may (or may not) happen to work with another. Some packages
are carefully written to try to detect what kind of object it gets and when
possible convert it to another. The "apply" function is meant for matrices
but if it sees something else it looks ta the dimensionality and tries to
coerce it with as.matrix or as.array first. As others have noted, this mean
a data.frame containing non-numeric parts may fail or should have any other
columns hidden/removed as in this df that has some non-numeric fields:

> df
i       s   f     b i2
1 1   hello 1.2  TRUE  5
2 2   there 2.3 FALSE  4
3 3 goodbye 3.4  TRUE  3

So a bit more complex one-liner removes any non-numeric columns like this:

> mycolMins(df[, sapply(df, is.numeric)])
$i
[1] 1

$f
[1] 1.2

$i2
[1] 3

Clearly converting that to a matrix while whole would result in everything
being converted to character and a minimum may be elusive.

-----Original Message-----
From: R-help <r-help-bounces using r-project.org> On Behalf Of Stephen H. Dawson,
DSL via R-help
Sent: Tuesday, November 30, 2021 5:37 PM
To: Bert Gunter <bgunter.4567 using gmail.com>
Cc: r-help using r-project.org
Subject: Re: [R] Question about Rfast colMins and colMaxs

Oh, you are segmenting standard R from the rest of R.

Well, that part did not come across to me in your original reply. I am not
clear on a standard versus non-standard list. I will look into this aspect
and see what I can learn going forward.


Thanks,
*Stephen Dawson, DSL*
/Executive Strategy Consultant/
Business & Technology
+1 (865) 804-3454
http://www.shdawson.com <http://www.shdawson.com>


On 11/30/21 5:26 PM, Bert Gunter wrote:
> ... but Rfast is *not* a "standard" package, as the rest of the PG 
> excerpt says. So contact the maintainer and ask him/her what they 
> think the best practice should be for their package. As has been 
> pointed out already, it appears to differ from the usual "read it in 
> as a data frame" procedure.
>
> Bert
>
> On Tue, Nov 30, 2021 at 2:11 PM Stephen H. Dawson, DSL 
> <service using shdawson.com> wrote:
>> Right, R Studio is not R.
>>
>> However, the Rfast package is part of R.
>>
>> https://cran.r-project.org/web/packages/Rfast/index.html
>>
>> So, rephrasing my question...
>> What is the best practice to bring a csv file into R so it can be 
>> accessed by colMaxs and colMins, please?
>>
>> *Stephen Dawson, DSL*
>> /Executive Strategy Consultant/
>> Business & Technology
>> +1 (865) 804-3454
>> http://www.shdawson.com <http://www.shdawson.com>
>>
>>
>> On 11/30/21 3:19 PM, Bert Gunter wrote:
>>> RStudio is **not** R. In particular, the so-called TidyVerse 
>>> consists of all *non*-standard contributed packages, about which the PG
says:
>>>
>>> "For questions about functions in standard packages distributed with 
>>> R (see the FAQ Add-on packages in R), ask questions on R-help.
>>> [The link is:
>>> https://cran.r-project.org/doc/FAQ/R-FAQ.html#Add-on-packages-in-R
>>> This gives the list of current _standard_ packages]
>>>
>>> If the question relates to a contributed package , e.g., one 
>>> downloaded from CRAN, try contacting the package maintainer first. 
>>> You can also use find("functionname") and
>>> packageDescription("packagename") to find this information. Only 
>>> send such questions to R-help or R-devel if you get no reply or need 
>>> further assistance. This applies to both requests for help and to 
>>> bug reports."
>>>
>>> Note that RStudio maintains its own help resources at:
>>> https://community.rstudio.com/
>>> This is where questions about the TidyVerse, ggplot, etc. should be
posted.
>>>
>>>
>>>
>>> Bert Gunter
>>>
>>> "The trouble with having an open mind is that people keep coming 
>>> along and sticking things into it."
>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>>
>>> On Tue, Nov 30, 2021 at 10:55 AM Stephen H. Dawson, DSL via R-help 
>>> <r-help using r-project.org> wrote:
>>>> Hi,
>>>>
>>>>
>>>> I am working to understand the Rfast functions of colMins and 
>>>> colMaxs. I worked through the example listed on page 54 of the PDF.
>>>>
>>>> https://cran.r-project.org/web/packages/Rfast/index.html
>>>>
>>>> https://cran.r-project.org/web/packages/Rfast/Rfast.pdf
>>>>
>>>> My data is in a CSV file. So, I bring it into R Studio using:
>>>> Data <- read.csv("./input/DataSet05.csv", header=T)
>>>>
>>>> However, I read the instructions listed on page 54 of the PDF 
>>>> saying I need to bring data into R using a matrix. I think read.csv 
>>>> brings the data in as a dataframe. I think colMins is failing 
>>>> because it is looking for a matrix but finds a dataframe.
>>>>
>>>>    > colMaxs(Data)
>>>> Error in colMaxs(Data) :
>>>>      Not compatible with requested type: [type=list; target=double].
>>>>    > colMins(Data, na.rm = TRUE)
>>>> Error in colMins(Data, na.rm = TRUE) :
>>>>      unused argument (na.rm = TRUE)
>>>>    > colMins(Data, value = FALSE, parallel = FALSE) Error in 
>>>> colMins(Data, value = FALSE, parallel = FALSE) :
>>>>      Not compatible with requested type: [type=list; target=double].
>>>>
>>>> QUESTION
>>>> What is the best practice to bring a csv file into R Studio so it 
>>>> can be accessed by colMaxs and colMins, please?
>>>>
>>>>
>>>> Thanks,
>>>> --
>>>> *Stephen Dawson, DSL*
>>>> /Executive Strategy Consultant/
>>>> Business & Technology
>>>> +1 (865) 804-3454
>>>> http://www.shdawson.com <http://www.shdawson.com>
>>>>
>>>> ______________________________________________
>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see 
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide 
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>

______________________________________________
R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list