[R] How do I convert factors to numeric? It's an FAQ but...

David Winsemius dwinsemius at comcast.net
Fri Apr 13 15:34:39 CEST 2012


On Apr 13, 2012, at 9:08 AM, John Coulthard wrote:

>
> Dear R list people
>
> I loaded a file of numbers into R and got a dataframe of factors.   
> So I tried to convert it to numeric as per the FAQ using as.numeric().

Actually you used as.numeric(as.character()) which should have been  
successful under ordinary circumstances. However you applied it to an  
entire dataframe, when you should have applied it to each column  
separately. The last error message told you that you were sending the  
function the wrong datatype (list).


>  But I'm getting errors (please see example), so what am I getting  
> wrong?
>
> Thanks for your time.
> John
>
> Example...
>
> #my data object
>> f
>   GSM187153 GSM187154 GSM187155 GSM187156 GSM187157 GSM187158  
> GSM187159
> 13  7.199346  7.394519  7.466155  8.035864  7.438536  7.308401   
> 7.707994
> 14  6.910426  6.360291  6.228221   7.42918  7.120322  6.108129   
> 7.201477
> 15   8.85921  9.152096  9.125067    6.4458  8.600319   8.97577   
> 9.691167
> 16  5.851665  5.621529  5.673689  6.331274  6.160159   5.65945   
> 5.595156
> 17  9.905257  8.596643   9.11741  9.872789  8.909299  9.104171   
> 9.158998
> 18  6.176691  6.429807  6.418132  6.849236  6.162308  6.432743   
> 6.444664
> 19  7.599871  8.795133  8.382509  5.887119  7.941895  7.666692   
> 8.170374
> 20  9.458262   8.39701  8.402015    9.0859  8.995632  8.427601   
> 8.265105
> 21  8.179803  9.868286 10.570601  4.905013  9.488779  9.148336   
> 9.654022
> 22  7.456822  8.037138  7.953766  6.666418  7.674927  7.995109   
> 7.635158


That is not a reproducible example. You should provide the unedited  
output from dput(f)

Try:

numf <- lapply(f, function(x) as.numeric(as.character(x)) ) # returns  
a list
numf <- as.data.frame(numf)
str(numf)

'data.frame':	10 obs. of  7 variables:
  $ GSM187153: num  7.2 6.91 8.86 5.85 9.91 ...
  $ GSM187154: num  7.39 6.36 9.15 5.62 8.6 ...
  $ GSM187155: num  7.47 6.23 9.13 5.67 9.12 ...
  $ GSM187156: num  8.04 7.43 6.45 6.33 9.87 ...
  $ GSM187157: num  7.44 7.12 8.6 6.16 8.91 ...
  $ GSM187158: num  7.31 6.11 8.98 5.66 9.1 ...
  $ GSM187159: num  7.71 7.2 9.69 5.6 9.16 ...


Tested on
 > dput(f)
structure(list(GSM187153 = structure(c(4L, 3L, 8L, 1L, 10L, 2L,
6L, 9L, 7L, 5L), .Label = c("5.851665", "6.176691", "6.910426",
"7.199346", "7.456822", "7.599871", "8.179803", "8.85921", "9.458262",
"9.905257"), class = "factor"), GSM187154 = structure(c(4L, 2L,
9L, 1L, 7L, 3L, 8L, 6L, 10L, 5L), .Label = c("5.621529", "6.360291",
"6.429807", "7.394519", "8.037138", "8.39701", "8.596643", "8.795133",
"9.152096", "9.868286"), class = "factor"), GSM187155 = structure(c(5L,
3L, 10L, 2L, 9L, 4L, 7L, 8L, 1L, 6L), .Label = c("10.570601",
"5.673689", "6.228221", "6.418132", "7.466155", "7.953766", "8.382509",
"8.402015", "9.11741", "9.125067"), class = "factor"), GSM187156 =  
structure(c(8L,
7L, 4L, 3L, 10L, 6L, 2L, 9L, 1L, 5L), .Label = c("4.905013",
"5.887119", "6.331274", "6.4458", "6.666418", "6.849236", "7.42918",
"8.035864", "9.0859", "9.872789"), class = "factor"), GSM187157 =  
structure(c(4L,
3L, 7L, 1L, 8L, 2L, 6L, 9L, 10L, 5L), .Label = c("6.160159",
"6.162308", "7.120322", "7.438536", "7.674927", "7.941895", "8.600319",
"8.909299", "8.995632", "9.488779"), class = "factor"), GSM187158 =  
structure(c(4L,
2L, 8L, 1L, 9L, 3L, 5L, 7L, 10L, 6L), .Label = c("5.65945", "6.108129",
"6.432743", "7.308401", "7.666692", "7.995109", "8.427601", "8.97577",
"9.104171", "9.148336"), class = "factor"), GSM187159 = structure(c(5L,
3L, 10L, 1L, 8L, 2L, 6L, 7L, 9L, 4L), .Label = c("5.595156",
"6.444664", "7.201477", "7.635158", "7.707994", "8.170374", "8.265105",
"9.158998", "9.654022", "9.691167"), class = "factor")), .Names =  
c("GSM187153",
"GSM187154", "GSM187155", "GSM187156", "GSM187157", "GSM187158",
"GSM187159"), class = "data.frame", row.names = c("13", "14",
"15", "16", "17", "18", "19", "20", "21", "22"))



>> class(f)
> [1] "data.frame"
>
> #all the columns in the dataframe are of class 'factor'
>> for(i in 1:ncol(f)){if(class(f[,i])!="factor"){print(class(f[,i]))}}
>>
> #but it won't convert to numeric
>> g<-as.numeric(as.character(f))
> Warning message:
> NAs introduced by coercion
>> g
> [1] NA NA NA NA NA NA NA NA NA NA
>> g<-as.numeric(levels(f))[as.integer(f)]
> Error: (list) object cannot be coerced to type 'integer'
>>
>
>
> R version 2.14.1 (2011-12-22)
> Copyright (C) 2011 The R Foundation for Statistical Computing
> ISBN 3-900051-07-0
> Platform: i386-redhat-linux-gnu (32-bit)
>
>
> 		 	   		
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list