[R] Variable Class "numeric" instead recognized by dplyr as a 'factor'

Bert Gunter bgunter.4567 at gmail.com
Sun Sep 27 22:12:01 CEST 2015


I believe you need to spend some time with an R tutorial, as I don't
believe what you understand what factors are and how they should be
used."Dummy variables" are also almost certainly unnecessary and
usually undesirable, as well.

A few comments below may help..

Cheers,
Bert


Bert Gunter

"Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom."
   -- Clifford Stoll


On Sun, Sep 27, 2015 at 12:58 AM,  <james.vordtriede at att.net> wrote:
> Hi--I’m new to R.  For a dissertation, my panel data is for 48 Sub-Saharan countries (cross-sectional index=’i’) over 55 years 1960-2014 (time-series index=’t’).  The variables read into R from a text file are levels data.  The 2SLS regression due to reverse causality will be based on change in the levels data, so will need to difference the data grouped by cross-sectional index ‘i’.
>
>
> There are nearly 50 total variables, but the model essentially will regress the differenced Yit ~ X1it+X2it+X3it+X4it+X5it+X6it, with a dummy variable attached to each of the change-X(s).
>
>
> Due to missing data, R originally classified each X and Y variable as a ‘factor’, subsequently changed to ‘numeric’ via ‘as.numeric’ command.

No.
a) missing data will not cause numeric data to become factor. There's
something wrong in the data from the beginning (as Thierry said)

b) If f is numeric data that is a factor, as.numeric(f) is almost
certainly **not** the corrrect way to change it to numeric. You will
get garbage, viz.:

> f <- runif(5)
> f
[1] 0.42568762 0.03105132 0.46606135 0.35251240 0.57303571
> as.numeric(factor(f))
[1] 3 1 4 2 5




>
>
> However, when I write the following command for dplr solely to difference Yit (=Yit-Yi[t-1]) mutated to new variable dYit, I receive error messages to the effect that Yit and each of the X variables are ‘factors’.
>
>
>
>
>>library (dplr)
>
>>dt = CSUdata2 %>% group_by (i) %>% (dYit=Yit-lag(Yit))
>
>
>
> ‘CSUdata2’ is the object in which the tab-delimited text file dataset is stored.
>
>
> Questions:
>
>
>  Any idea why dplyr reads the variables as ‘factors’?  A class(*) command per variable shows R to know each Y and X as ‘numeric’.
>
>
> Is the command to difference Yit done correctly?  I plan to use the same command for each variable requiring change until I understand the commands better.

Almost certainly not. See ?diff


>
>
>
> Thank you.
>
>
>
>
>
>
>
>
>
> Sent from Windows Mail
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list