[R] Changing values (factors) does not change levels of that value?!

Philipp Pagel p.pagel at wzw.tum.de
Sun Nov 16 15:25:19 CET 2008


On Sun, Nov 16, 2008 at 02:52:10PM +0100, Oliver Bandel wrote:
> OK, but I thought, when touching the data, it will
> recalculate the levels. Now I see, it does not.

No it doesn't - for the reasons given in my explanation.

> > >> x <- factor(c('A','B','C','A','C'))
> > >> y <- x[x!='C']
> > >> y
> > > [1] A B A
> > > Levels: A B C
> > >> factor(y)
> > > [1] A B A
> > > Levels: A B
> 
> Sorry, this looks to me like you throw out all the values,
> where the unwanted attribute is. (?!)

Correct, that's what my example does to create a factor with
missing levels.

> That is not what I meant.

I know, but it does not matter how you got a factor with missing
levles - both problem and solution are the same.

> Or at least it's disturbing because
> you use one value, not working on a data-frame, as I do.

Not a real difference either - a data.frame is just a collection
of vectors and/or factors. So all you need to do apply this to
whatever column holds the factor in question:

foo$bar <- factor(foo$bar)

You may want to have a look at the Introdution to R - especially
the section on data frames.


> After some experimentation I found out the following solution:
> 
> ========================
> weblog <- read.table("web.log") # reading the log
> 
> weblog$V8[ weblog$V8 == "-" ] <- 0  # substituting "-" by 0
> 
> # and now changing the levels-attribute to the new values !!
> attr(weblog$V8, "levels") <- levels( factor( as.vector(weblog$V8) ) )

weblog$V8 <- factor(weblog$V8)

is all you need.

> But after I found that, I saw, that this was a detour from what I
> tried when I started, and now using I do the following:
> 
> ========================
> weblog <- read.table("web.log") # read in the weblog
> 
> weblog$V8[ weblog$V8 == "-" ] <- 0 # substituting "-" by 0
> 
> weblog$V8 <- as.numeric( as.vector(weblog$V8) ) # changing it to numeric

Dangerous:

> x <- factor(c(0,1,3,4,5,7))
> x
[1] 0 1 3 4 5 7
Levels: 0 1 3 4 5 7
> as.numeric(x)
[1] 1 2 3 4 5 6

See "7.10 How do I convert factors to numeric?" in the R-FAQ for
details.

As you are reading the data from a file anyway, the simplest
solution would probably be to use the colClasses argument ot
read.table in order to get numeric avlues in the first place.

cu
	Philipp


-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
85350 Freising, Germany
http://mips.gsf.de/staff/pagel



More information about the R-help mailing list