[R] How do I coerce numeric factor columns of data frame to vector?

Murray Jorgensen maj at stats.waikato.ac.nz
Tue Sep 9 03:25:27 CEST 2003


Hi Thomas et al,

checking the code that read the frame, I see that the problem was indeed 
caused by missing value codes at the read.table() stage. However I did 
not want to re-visit the reading stages again with these frames. (To 
show why not I include the code that read them, which you may recognise 
from an earlier thread in which I got some help from Andy Liaw.)

Murray

nam.vec 
<-c(“min.pkt.sz”,”pkt.count”,”bytes”,”duration”,”m1.psz”,”m1.count”,”m2.psz”,”m2.count”,”m3.psz”,”m3.count”,”iat.min”,”iat.max”,”m1.iat”,”m1.iat.count”,”m2.iat”,”m2.iat.count”,”m3.iat”,”m3.iat.count”,”port”,”ip.address”,“min.pkt.sz2”,”pkt.count2”,”bytes2”,
”m1.psz2”,”m1.count2”,”m2.psz2”,”m2.count2”,”m3.psz2”,”m3.count2”,”iat.min2”,”iat.max2”,”m1.iat2”,”m1.iat.count2”,”m2.iat2”,”m2.iat.count2”,”m3.iat2”,”m3.iat.count2”,”port2”,”ip.address2”,”diff.min.psz”,”diff.max.psz”)

flines <- 107165
slines <- 3000
sel6 <- sample(flines,3000*6)
selected1 <- sort(sel6[1:3000])
selected2 <- sort(sel6[3001:6000])
selected3 <- sort(sel6[6001:9000])
selected4 <- sort(sel6[9001:12000])
selected5 <- sort(sel6[12001:15000])
selected6 <- sort(sel6[15001:18000])

select.frame <- function(selected) {
strvec <- rep("",slines)
selected <- sort(sample(flines, slines))
skip <- c(0, diff(selected) - 1)
fcon <- file("c:/data/perry/data.csv", open="r")
for (i in 1:length(skip)) {
     ## skip to the selected line
     readLines(fcon, n=skip[i])
     strvec[i] <- readLines(fcon, n=1)
}
close(fcon)
sel.flows <- read.table(textConnection(strvec), header=FALSE, sep=",")
names(sel.flows) <- nam.vec
sel.flows
}


Thomas W Blackwell wrote:

> Michael  -
> 
> Because these columns are factors to begin with, using  as.numeric()
> alone will have unexpected results.  See the section "Warning:" in
> help("factor").
> 
> However, it is worth Murray asking himself WHY these columns are
> factors to start with, rather than the expected numeric values.
> One frequent source of this is using  read.table()  on a file
> which contains column headers without setting  header=T.  Then,
> the character string in the first row of each column prevents
> numeric conversion of all of the other rows.  Another possible
> difficulty is an unusual missing value code, or commas in place
> of decimal points, or anything else, somewhere in the file that
> does not convert automatically to numeric.  Maybe it's worth
> editing the original data file before Murray reads it in.
> 
> Hmmm.  I think I ought to have offered these many cents worth
> with my earlier reply.
> 
> -  tom blackwell  -  u michigan medical school  -  ann arbor  -
> 
> On Mon, 8 Sep 2003, Michael A. Miller wrote:
> 
> 
>>>>>>>"Murray" == Murray Jorgensen <maj at stats.waikato.ac.nz> writes:
>>
>>    > I have just noticed that quite a few columns of a data
>>    > frame that I am working on are numeric factors. For
>>    > summary() purposes I want them to be vectors.
>>
>>Do you want them to be vectors or do you want numeric values?  If
>>the later, try as.numeric instead of as.vector:
>>
>>
>>>as.vector(factor(rep(seq(4),3)))
>>
>> [1] "1" "2" "3" "4" "1" "2" "3" "4" "1" "2" "3" "4"
>>
>>>as.numeric(factor(rep(seq(4),3)))
>>
>> [1] 1 2 3 4 1 2 3 4 1 2 3 4
>>
>>>summary(as.vector(factor(rep(seq(4),3))))
>>
>>   Length     Class      Mode
>>       12 character character
>>
>>>summary(as.numeric(factor(rep(seq(4),3))))
>>
>>   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
>>   1.00    1.75    2.50    2.50    3.25    4.00
>>
>>Mike
> 
> 
> 

-- 
Dr Murray Jorgensen      http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: maj at waikato.ac.nz                                Fax 7 838 4155
Phone  +64 7 838 4773 wk    +64 7 849 6486 home    Mobile 021 1395 862




More information about the R-help mailing list