[R] Question about Rfast colMins and colMaxs

Stephen H. Dawson, DSL @erv|ce @end|ng |rom @hd@w@on@com
Wed Dec 1 18:06:23 CET 2021


Thanks, Avi.

Yes, loading packages by library command is necessary to access a 
function not resident in the standard R code.

The data set I am reviewing has column names changing. The thought is to 
do a review of max and min for whatever the column names happen to be 
for the data input I am reviewing.


*Stephen Dawson, DSL*
/Executive Strategy Consultant/
Business & Technology
+1 (865) 804-3454
http://www.shdawson.com <http://www.shdawson.com>


On 11/30/21 10:42 PM, Avi Gross via R-help wrote:
> Stephen,
>
> Although what is in the STANDARD R distribution can vary several ways, in
> general, if you need to add a line like:
>
> library(something)
> or
> require(something)
>
> and your code does not work unless you have done that, then you can imagine
> it is not sort of built in to R as it starts.
>
> Having said that, tons of exceptions may exist that cause R to load in
> things on your machine for everyone or just for you without you having to
> notice.
>
> I think this forum lately has been deluged with questions about all kinds of
> add-on packages and in particular, lots of the ones in the tidyverse.
> Clearly the purpose here is not that broad.
>
> But since I use some packages like the tidyverse extensively, and I am far
> from alone, I wonder if someday the powers that be realize it is a losing
> battle to exclude at least some of it. It would be so nice not having to
> include a long list of packages for some programs or somehow arrange that
> people using something you shared had installed and loaded them. But there
> are too many packages out there of varying quality and usefulness and
> popularity with more every day being added. Worse, many are somewhat
> incompatible such as having functions with the same names that hide earlier
> ones loaded.
>
> Base R doe come with functions like colSums and colMeans and similar row
> functions. But as mentioned, a data.frame is a list of vectors and R
> supports certain functional programming constructs over lists using things
> like:
>
> lapply(df, min)
> sapply(df, min)
>
> And actually quite a few ways depending on what info you want back and
> whether you insist it be returned as a list or vector or other things . You
> can even supply additional arguments that might be needed such as if you
> want to ignore any NA values,
>
> lapply(df, min, na.rm=TRUE
>
> The package you looked at it is trying to be fast and uses what looks like
> compiled external code but so does lapply.
>
> If this is too bothersome for you, consider making a one-liner function like
> this:
>
> mycolMins <- function(df, ...) lapply(df, min, ...)
>
> Once defined, you can use that just fine and not think about it again and I
> note this answer (like others) is offering you something in base R that
> works fine on data.frames and the like.
>
> You can extend to many similar ideas like this one that calulates the min
> unless you over-ride it with max or mean or sd or a bizarre function like
> `[` so a call to:
>
> mycolCalc(df, `[`, 3)
>
> Will return exactly the third items in each row!
>
> I find it to be very common for someone these days to do a quick search for
> a way to do something in a language like R and not really look to see if it
> is a standard way or something special/ Matrices in R are not quite the same
> as some other objects like a data.frame or tibble and a package written to
> be used on one may (or may not) happen to work with another. Some packages
> are carefully written to try to detect what kind of object it gets and when
> possible convert it to another. The "apply" function is meant for matrices
> but if it sees something else it looks ta the dimensionality and tries to
> coerce it with as.matrix or as.array first. As others have noted, this mean
> a data.frame containing non-numeric parts may fail or should have any other
> columns hidden/removed as in this df that has some non-numeric fields:
>
>> df
> i       s   f     b i2
> 1 1   hello 1.2  TRUE  5
> 2 2   there 2.3 FALSE  4
> 3 3 goodbye 3.4  TRUE  3
>
> So a bit more complex one-liner removes any non-numeric columns like this:
>
>> mycolMins(df[, sapply(df, is.numeric)])
> $i
> [1] 1
>
> $f
> [1] 1.2
>
> $i2
> [1] 3
>
> Clearly converting that to a matrix while whole would result in everything
> being converted to character and a minimum may be elusive.
>
> -----Original Message-----
> From: R-help <r-help-bounces using r-project.org> On Behalf Of Stephen H. Dawson,
> DSL via R-help
> Sent: Tuesday, November 30, 2021 5:37 PM
> To: Bert Gunter <bgunter.4567 using gmail.com>
> Cc: r-help using r-project.org
> Subject: Re: [R] Question about Rfast colMins and colMaxs
>
> Oh, you are segmenting standard R from the rest of R.
>
> Well, that part did not come across to me in your original reply. I am not
> clear on a standard versus non-standard list. I will look into this aspect
> and see what I can learn going forward.
>
>
> Thanks,
> *Stephen Dawson, DSL*
> /Executive Strategy Consultant/
> Business & Technology
> +1 (865) 804-3454
> http://www.shdawson.com <http://www.shdawson.com>
>
>
> On 11/30/21 5:26 PM, Bert Gunter wrote:
>> ... but Rfast is *not* a "standard" package, as the rest of the PG
>> excerpt says. So contact the maintainer and ask him/her what they
>> think the best practice should be for their package. As has been
>> pointed out already, it appears to differ from the usual "read it in
>> as a data frame" procedure.
>>
>> Bert
>>
>> On Tue, Nov 30, 2021 at 2:11 PM Stephen H. Dawson, DSL
>> <service using shdawson.com> wrote:
>>> Right, R Studio is not R.
>>>
>>> However, the Rfast package is part of R.
>>>
>>> https://cran.r-project.org/web/packages/Rfast/index.html
>>>
>>> So, rephrasing my question...
>>> What is the best practice to bring a csv file into R so it can be
>>> accessed by colMaxs and colMins, please?
>>>
>>> *Stephen Dawson, DSL*
>>> /Executive Strategy Consultant/
>>> Business & Technology
>>> +1 (865) 804-3454
>>> http://www.shdawson.com <http://www.shdawson.com>
>>>
>>>
>>> On 11/30/21 3:19 PM, Bert Gunter wrote:
>>>> RStudio is **not** R. In particular, the so-called TidyVerse
>>>> consists of all *non*-standard contributed packages, about which the PG
> says:
>>>> "For questions about functions in standard packages distributed with
>>>> R (see the FAQ Add-on packages in R), ask questions on R-help.
>>>> [The link is:
>>>> https://cran.r-project.org/doc/FAQ/R-FAQ.html#Add-on-packages-in-R
>>>> This gives the list of current _standard_ packages]
>>>>
>>>> If the question relates to a contributed package , e.g., one
>>>> downloaded from CRAN, try contacting the package maintainer first.
>>>> You can also use find("functionname") and
>>>> packageDescription("packagename") to find this information. Only
>>>> send such questions to R-help or R-devel if you get no reply or need
>>>> further assistance. This applies to both requests for help and to
>>>> bug reports."
>>>>
>>>> Note that RStudio maintains its own help resources at:
>>>> https://community.rstudio.com/
>>>> This is where questions about the TidyVerse, ggplot, etc. should be
> posted.
>>>>
>>>>
>>>> Bert Gunter
>>>>
>>>> "The trouble with having an open mind is that people keep coming
>>>> along and sticking things into it."
>>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>>>
>>>> On Tue, Nov 30, 2021 at 10:55 AM Stephen H. Dawson, DSL via R-help
>>>> <r-help using r-project.org> wrote:
>>>>> Hi,
>>>>>
>>>>>
>>>>> I am working to understand the Rfast functions of colMins and
>>>>> colMaxs. I worked through the example listed on page 54 of the PDF.
>>>>>
>>>>> https://cran.r-project.org/web/packages/Rfast/index.html
>>>>>
>>>>> https://cran.r-project.org/web/packages/Rfast/Rfast.pdf
>>>>>
>>>>> My data is in a CSV file. So, I bring it into R Studio using:
>>>>> Data <- read.csv("./input/DataSet05.csv", header=T)
>>>>>
>>>>> However, I read the instructions listed on page 54 of the PDF
>>>>> saying I need to bring data into R using a matrix. I think read.csv
>>>>> brings the data in as a dataframe. I think colMins is failing
>>>>> because it is looking for a matrix but finds a dataframe.
>>>>>
>>>>>     > colMaxs(Data)
>>>>> Error in colMaxs(Data) :
>>>>>       Not compatible with requested type: [type=list; target=double].
>>>>>     > colMins(Data, na.rm = TRUE)
>>>>> Error in colMins(Data, na.rm = TRUE) :
>>>>>       unused argument (na.rm = TRUE)
>>>>>     > colMins(Data, value = FALSE, parallel = FALSE) Error in
>>>>> colMins(Data, value = FALSE, parallel = FALSE) :
>>>>>       Not compatible with requested type: [type=list; target=double].
>>>>>
>>>>> QUESTION
>>>>> What is the best practice to bring a csv file into R Studio so it
>>>>> can be accessed by colMaxs and colMins, please?
>>>>>
>>>>>
>>>>> Thanks,
>>>>> --
>>>>> *Stephen Dawson, DSL*
>>>>> /Executive Strategy Consultant/
>>>>> Business & Technology
>>>>> +1 (865) 804-3454
>>>>> http://www.shdawson.com <http://www.shdawson.com>
>>>>>
>>>>> ______________________________________________
>>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list