[R] HELP - as.numeric changing column data

William Dunlap wdunlap at tibco.com
Wed Jan 6 22:48:01 CET 2016


By the way, here is an example where the advice in FAQ 7.10 (change the
factor
columns to numeric) would give incorrect results.  The incorrect header
setting
in the call to read.table causes an extra row of non-numeric data to appear
at the
start of the imported data.

  > txt <- "ColA ColB\n101 102\n201 202\n"
  > str(read.table(text=txt))
'  data.frame':   3 obs. of  2 variables:
   $ V1: Factor w/ 3 levels "101","201","ColA": 3 1 2
   $ V2: Factor w/ 3 levels "102","202","ColB": 3 1 2
  > str(read.table(text=txt, header=TRUE))
  'data.frame':   2 obs. of  2 variables:
   $ ColA: int  101 201
   $ ColB: int  102 202


Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Wed, Jan 6, 2016 at 1:26 PM, Andy Schneider <andyschneider85 at gmail.com>
wrote:

>
> Hi Bill -
>
> Thanks so much! This was actually a great help, and problem worked out.
>
> Cheers,
> Andy
>
> On Wed, Jan 6, 2016 at 4:23 PM, William Dunlap <wdunlap at tibco.com> wrote:
>
>> You may have read in your data incorrectly - a column you expected to be
>> numeric was not recognized as such so it was read in a character and then
>> converted to a 'factor'.
>>
>> FAQ 7.10 tells how to work around the problem
>>
>> https://cran.r-project.org/doc/FAQ/R-FAQ.html#How-do-I-convert-factors-to-numeric_003f
>> but a better solution is to repeatedly call read.table with various
>> parameters (esp. colClasses=c(...), header=TRUE/FALSE, dec=","/".",
>> stringsAsFactors=FALSE) until str(yourData) shows you that all the column
>> types are what you expect.
>>
>> It is a waste of time to do much of anything with your data until
>> str(yourData) and some simple plots of it show you that it was read into R
>> correctly.
>>
>>
>> Bill Dunlap
>> TIBCO Software
>> wdunlap tibco.com
>>
>> On Wed, Jan 6, 2016 at 12:01 PM, Andy Schneider <
>> andyschneider85 at gmail.com> wrote:
>>
>>> Hi -
>>>
>>> I'm trying to plot some data and having a lot of trouble! I have a simple
>>> dataset consisting of two columns - income_per_capita and
>>> mass_beauty_value.
>>> When I read the data in and plot it, I get the attached plot Mass Beauty
>>> Non-Numeric:
>>> <http://r.789695.n4.nabble.com/file/n4716202/Mass_Beauty_Non-Numeric.jpg>
>>> .
>>> You can see that, while it contains all the values, the income_per_capita
>>> axis is out of order and there are some weird vertical lines happening.
>>>
>>> To fix this, I converted both columns to numerics using:
>>>
>>> mass_beauty$income_per_capita <-
>>> as.numeric(mass_beauty$income_per_capita)
>>> mass_beauty$mass_beauty_value <-
>>> as.numeric(mass_beauty$mass_beauty_value)
>>>
>>> When I did this, I noticed that my income_per_capita column's values
>>> suddenly changed. Whereas I have values extending all the way to 30,000
>>> or
>>> so before, now they maxed out at around 1,400. While at first I thought
>>> they
>>> might at least have changed to scale, it unfortunately looks like changes
>>> were more or less random. But, they plotted much better:
>>> <http://r.789695.n4.nabble.com/file/n4716202/Mass_Beauty_Plot.jpg> .
>>>
>>> Does anyone have any solution for how I can convert my income_per_capita
>>> column to a plottable numeric without changing up its values? I've tried
>>> doing as.numeric(as.character(mass_beauty_value$income_per_capita)) but
>>> it
>>> didn't work.
>>>
>>> Thanks so much for your help!
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list