[R] Error occurred during mean calculation of a column of a data frame, which is apparently contents numeric data

Duncan Murdoch murdoch.duncan at gmail.com
Wed Feb 29 14:41:26 CET 2012


On 12-02-29 8:16 AM, R. Michael Weylandt wrote:
> Factors are internally stored as integers (enums if you have used
> other programming languages) with a special label set -- it's more
> memory efficient than storing the whole string over and over.

That was one of the original justifications, but character vectors are 
just as memory efficient these days.

The other justifications are still valid:  sometimes you have a vector 
which only takes on a subset of the possible values it could take, and 
when you tabulate it, you'd like to see those zero counts.  You may also 
want to control the display order, and a factor allows that.

For example:

x <- c("a", "a", "b")
table(x)
x <- factor(x, levels=c("c", "b", "a"))
table(x)

Duncan Murdoch

>
> Michael
>
> On Wed, Feb 29, 2012 at 5:49 AM, Aniruddha Mukherjee
> <aniruddha.mukherjee at tcs.com>  wrote:
>> Hello Berend.
>>
>> Many thanks for your prompt reply and that helped me a lot. One more
>> thing, if you please explain, I shall be highly obliged.
>> Why in my case (i.e. when stringsAsFactors was TRUE by default),
>>> as.numeric(matr1$Pulse_rate)
>> displays the following
>>   [1]  4  5  7  5  9  8  6 10  3  2  5  1 10 10
>> ?
>>
>> Best regards.
>>
>>
>> From:
>> Berend Hasselman<bhh at xs4all.nl>
>> To:
>> Aniruddha Mukherjee<aniruddha.mukherjee at tcs.com>
>> Cc:
>> R-help<r-help at r-project.org>
>> Date:
>> 02/29/2012 03:57 PM
>> Subject:
>> Re: [R] Error occurred during mean calculation of a column of a data
>> frame, which is apparently contents numeric data
>>
>>
>>
>>
>> On 29-02-2012, at 09:45, Aniruddha Mukherjee wrote:
>>
>>> Hello R people,
>>>
>>> How can I compute the mean of the "Pulse_rate" column of the data frame
>> or
>>> matrix from the following character object called "str_got". It has 14
>>> entries and each entry has 8 values, separated by commas. Please go thru
>>
>>> the following R commands to know how I tried to unstring and unlist the
>>> values to form a data frame.
>>>> str_got
>>> [1]
>> "bp,67,2011-12-09T19:59:44.044+05:30,9830576102,68.0,124.0,58.0,66.0"
>>> "bp,67,2011-12-09T20:19:31.031+05:30,9830576102,72.0,133.0,93.0,40.0"
>>> .....
>>>>
>>> matr<-matrix(unlist(strsplit(str_got, ",")), nrows, byrow=T)
>>
>> nrows?
>> I assume this was set somewhere in your script and not shown.
>> Is it length(str_got)?
>>
>>>> matr
>>>         [,1]   [,2]                                              [,3]
>>>        [,4]               [,5]        [,6]       [,7]       [,8]
>>> [1,] "bp" "67"    "2011-12-09T19:59:44.044+05:30" "9830576102" "68.0"
>>> ......
>>
>>> Note column names must be inserted before computing the desired mean
>>> value.
>>> matr1<-as.data.frame(matr)
>>
>> Use matr1<- as.data.frame(matr, stringsAsFactors=FALSE)
>>
>> If you don't dos tringsAsFactors=FALSE the column will be a factor and
>> that is not equivalent with numeric.
>>
>> What's wrong with
>>
>> matr1$Pulse_rate<- as.numeric(matr1$Pulse_rate)
>>
>> Then you can calculate the desired mean with
>>
>> mean(matr1$Pulse_rate)
>>
>> or
>>
>> mean(matr1[,"Pulse_rate"])
>>
>> Berend
>>
>>
>>
>> =====-----=====-----=====
>> Notice: The information contained in this e-mail
>> message and/or attachments to it may contain
>> confidential or privileged information. If you are
>> not the intended recipient, any dissemination, use,
>> review, distribution, printing or copying of the
>> information contained in this e-mail message
>> and/or attachments to it are strictly prohibited. If
>> you have received this communication in error,
>> please notify us by reply e-mail or telephone and
>> immediately and permanently delete the message
>> and any attachments. Thank you
>>
>>
>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list