[R] I need to create new variables based on two numeric variables and one dichotomize conditional category variables.

CALUM POLWART po|c1410 @end|ng |rom gm@||@com
Sun Nov 5 09:27:17 CET 2023


In this case I think I've made the labels the number and so a double is
possible. But it wasn't what I actually set out to do!  (I'm a tidy fab so
would have to engage the brain in base too much)

When I thought 'I think I might say that's a factor (sometimes in medical
equations gender is shown as a "F" and described as a "factor". Not our
factors... But still a factor).

My programming is never efficient!

But I was planning for a scenario for where gender is not a binary concept,
adding unknowns, trans etc. and if the data set is very large then
obviously storing as factors is less memory intensive, but processing may
be more intensive.








On Sun, 5 Nov 2023, 04:27 , <avi.e.gross using gmail.com> wrote:

> There are many techniques Callum and yours is an interesting twist I had
> not considered.
>
> Yes, you can specify what integer a factor uses to represent things but
> not what I meant. Of course your trick does not work for some other forms
> of data like real numbers in double format. There is a cost to converting a
> column to a factor that is recouped best if it speeds things up multiple
> times.
>
> The point I was making was that when you will be using group_by,
> especially if done many times, it might speed things up if the column is
> already a normal factor, perhaps just indexed from 1 onward. My guess is
> that underneath the covers, some programs implicitly do such a factor
> conversion if needed. An example might be aspects of the ggplot program
> where you may get a mysterious order of presentation in the graph unless
> you create a factor with the order you wish to have used and avoid it
> making one invisibly.
>
> From: CALUM POLWART <polc1410 using gmail.com>
> Sent: Saturday, November 4, 2023 7:14 PM
> To: avi.e.gross using gmail.com
> Cc: Jorgen Harmse <JHarmse using roku.com>; r-help using r-project.org;
> mkzaman.m using gmail.com
> Subject: Re: [R] I need to create new variables based on two numeric
> variables and one dichotomize conditional category variables.
>
> I might have factored the gender.
>
> I'm not sure it would in any way be quicker.  But might be to some extent
> easier to develop variations of. And is sort of what factors should be
> doing...
>
> # make dummy data
> gender <- c("Male", "Female", "Male", "Female")
> WC <- c(70,60,75,65)
> TG <- c(0.9, 1.1, 1.2, 1.0)
> myDf <- data.frame( gender, WC, TG )
>
> # label a factor
> myDf$GF <- factor(myDf$gender, labels= c("Male"=65, "Female"=58))
>
> # do the maths
> myDf$LAP <- (myDf$WC - as.numeric(myDf$GF))* myDf$TG
>
> #show results
> head(myDf)
>
> gender WC  TG GF  LAP
> 1   Male 70 0.9 58 61.2
> 2 Female 60 1.1 65 64.9
> 3   Male 75 1.2 58 87.6
> 4 Female 65 1.0 65 64.0
>
>
> (Reality: I'd have probably used case_when in tidy to create a new numeric
> column)
>
>
>
>
> The equation to
> calculate LAP is different for male and females. I am giving both equations
> below.
>
> LAP for male = (WC-65)*TG
> LAP for female = (WC-58)*TG
>
> My question is 'how can I calculate the LAP and create a single new column?
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list