[R] many chr2factors ?

Christoph Buser buser at stat.math.ethz.ch
Wed Jun 1 17:23:03 CEST 2005


Dear Christian

If you create your data frame by using data.frame all characters
are automatically transformed into factors unless you force them
to stay a character. Maybe that can solve your problem easily.

dat <- data.frame(a=1:10, b=letters[1:10])
str(dat)
  `data.frame':	10 obs. of  2 variables:
  $ a: Factor w/ 10 levels "a","b","c","d",..: 1 2 3 4 5 6 7 8 9 10
  $ b: int  1 2 3 4 5 6 7 8 9 10
 
Assuming that doesn't solve your problem due to the way your
data frame are created you can do it afterwards.

There are two problems with your code. 

First: (and that causes the error) you use in your repeat 

if(!is.character(df[,i]))
  next

Imagine that the last column of you data frame is not a
character you jump to the next cycle and then you are outside of
the range of your data frame. Your break condition is ignored.

Second: You change your data frame inside of a
function. Variables that are created or changed within a
function are local. Their life ends with the end of the
function. Therefore all changes you do will have no effect on
the global data frame you want to change. See the example:

dat1 <- structure(list(a = 1:10, b = letters[1:10]), .Names = c("a", "b"),
                  row.names = as.character(1:10), class = "data.frame")
str(data.frame(dat1))
  `data.frame':	10 obs. of  2 variables:
  $ a: int  1 2 3 4 5 6 7 8 9 10
  $ b: chr  "a" "b" "c" "d" ...
tofac(dat1)
  [1] 2
str(data.frame(dat1))
  `data.frame':	10 obs. of  2 variables:
  $ a: int  1 2 3 4 5 6 7 8 9 10
  $ b: chr  "a" "b" "c" "d" ...

You can use the following code instead

tofac <- function(x){
  for(i in 1:length(x)) {
    if(is.character(x[,i]))
      x[,i] <- factor(x[,i])
  }
  x
}

dat1 <- tofac(dat1)
  [1] 2
str(dat1)
  `data.frame':	10 obs. of  2 variables:
  $ a: int  1 2 3 4 5 6 7 8 9 10
  $ b: Factor w/ 10 levels "a","b","c","d",..: 1 2 3 4 5 6 7 8 9 10

The for loop avoids the problem with the index. Therefore it
works in example that have a non character variable in the last
column, too and by returning x at the end you are sure that you
object keeps existing.

Regards,

Christoph

--------------------------------------------------------------
Christoph Buser <buser at stat.math.ethz.ch>
Seminar fuer Statistik, LEO C13
ETH (Federal Inst. Technology)	8092 Zurich	 SWITZERLAND
phone: x-41-44-632-4673		fax: 632-1228
http://stat.ethz.ch/~buser/
--------------------------------------------------------------

christian schulz writes:
 > Hi,
 > 
 > i would like transfrom 
 > characters from a data.frame to factors automatic.
 > 
 >  > tofac <- function(df){
 > + i=0
 > + repeat{
 > + i <- i+1
 > + if(!is.character(df[,i]))
 > + next
 > + df[,i] <- as.factor(df[,i])
 > + print(i)
 > + if(i == length(df))
 > + break }
 > + }
 >  >
 >  > tofac(abrdat)
 > [1] 7
 > [1] 8
 > [1] 9
 > [1] 11
 > [1] 13
 > [1] 15
 > Error in "[.data.frame"(df, , i) : undefined columns selected
 > 
 > This are the correct columns and i get the idea put into the loop
 > a empty matrix with dimension like df and return it!?
 > 
 > Another check?
 > abrdat2 <- apply(abrdat,2,function(x) 
 > ifelse(is.character(x),as.factor(x),x))
 > 
 > 
 > many thanks & regards,
 > christian
 > 
 > ______________________________________________
 > R-help at stat.math.ethz.ch mailing list
 > https://stat.ethz.ch/mailman/listinfo/r-help
 > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html




More information about the R-help mailing list