[R] Use of Factors

jim holtman jholtman at gmail.com
Fri Mar 21 03:05:03 CET 2008


Do 'str' on your object and you will see that they are 'factors'.  May
have gotten that way when you read them in and there was character
data in the column.  To convert it back to numeric, do:

cpx_interp$HR <- as.numeric(as.character(cpx_interp$HR))



On Thu, Mar 20, 2008 at 9:26 AM, Beck, Kenneth (STP)
<Kenneth.Beck at bsci.com> wrote:
> Relatively new to R, I'm trying to do a relatively simple task. I have
> data set that has several variables arranged by SubjID and visit, with
> multiple observations for that combination. I do linear regression on
> those multiple observations, then generated a set of interpolated values
> from the regression at fixed intervals along "x". I now want to average
> each of those across all the SubjID's. When I use either by() or
> tapply(), I get an error indicating the interpolated values are factors,
> even though they display looking like floating point numbers. The mean
> function returns a value that is obviously wrong, though the count of
> observations in the subsets is correct. I am including code snippets to
> try to demostrate how this is all created:, sorry for the length of this
>
> Here is output when I try to use the mean function,
> mean_interp_HR=tapply(cpx_interp$HR[cpx_interp$visit==1 &
> cpx_interp$xl==0],cpx_interp$SubjId[cpx_interp$visit==1 &
> cpx_interp$xl==0],mean)
> Warning in mean.default(X[[1L]], ...) :
>  argument is not numeric or logical: returning NA
> Warning in mean.default(X[[2L]], ...) :
>  argument is not numeric or logical: returning NA
> Warning in mean.default(X[[3L]], ...) :
>  argument is not numeric or logical: returning NA
> Warning in mean.default(X[[4L]], ...) :
>  argument is not numeric or logical: returning NA
> Warning in mean.default(X[[5L]], ...) :
>  argument is not numeric or logical: returning NA
>
> Look at the data I am submitting to tapply and mean:
> > cpx_interp$HR[cpx_interp$visit==1 & cpx_interp$xl==0]
> [1] 62.5252140470478 67.6151493460742 68.3931063786315 78.6591518601803
> 59.7674671000443
> 90 Levels: 62.5252140470478 66.046907240618 69.5686004341883
> 69.8766646005142 71.9631282463843 ... 85.4270562298357
> > cpx_interp$SubjId[cpx_interp$visit==1 & cpx_interp$xl==0]
> [1] ADENPV07 ADENPVJN ADENPV0Z ADENPVM9 ADENPVMB
> Levels: ADENPV07 ADENPVJN ADENPV0Z ADENPVM9 ADENPVMB
>
> Why is the $HR variable listed as "90 levels" as if it is a factor? Why
> is it not treated as floating point to get simple mean?
>
> Here is how the HR values are generated:
>
> # create the array
> interp_out=array(,c(18,length(cols2)))
> # create the values to interpolate to
> interp_out[,3]=c(0,25,50,75,100,125,150,175,200,0,25,50,75,100,125,150,1
> 75,200);
> # fill the visits
> interp_out[,2]=c(1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2)
> # fill the SubjID
> interp_out[,1]=SubjID;
> Now fill in interplated values for each visit.
> interp_out[1:9,4]=hrv1;interp_out[10:18,4]=hrv2;
>
> # hrv1 & hrv2 come from the following function, the "lm" parameter is
> output from the standard lm() function:
> interpolateToXL = function(lm,maxxl){
> int_values=matrix(nrow=9,ncol=1)
> int_values[1,]=coef(lm)[1];
> if (maxxl>25)
>  int_values[2,]=coef(lm)[1]+coef(lm)[2] * 25
> if (maxxl>50)
>  int_values[3,]=coef(lm)[1]+coef(lm)[2] * 50
> if (maxxl>75)
>  int_values[4,]=coef(lm)[1]+coef(lm)[2] * 75
> if (maxxl>100)
>  int_values[5,]=coef(lm)[1]+coef(lm)[2] * 100
> if (maxxl>125)
>  int_values[6,]=coef(lm)[1]+coef(lm)[2] * 125
> if (maxxl>150)
>  int_values[7,]=coef(lm)[1]+coef(lm)[2] * 150
> if (maxxl>175)
>  int_values[8,]=coef(lm)[1]+coef(lm)[2] * 175
> if (maxxl>200)
>  int_values[9,]=coef(lm)[1]+coef(lm)[2] * 200
> return (int_values)
> }
>
>
> Ken Beck PhD
> Research Scientist
> Boston Scientific CRM (Guidant)
> 10-212
> kenneth.beck at bsci.com
>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?



More information about the R-help mailing list