[R] problems with which

David Winsemius dwinsemius at comcast.net
Sun Aug 15 16:26:11 CEST 2010


On Aug 15, 2010, at 4:32 AM, Nicola Spotorno wrote:

> Dear all,
> I'm quite new in R and I have a problem with the function which.  
> When I use it to select a subset of a dataframe it works well but  
> somewhere R takes trace of the past dataframe and this creates  
> problems with following operations.
> For example:
>
> sentences <- read.xls("frasi.tot.march.3.xls", header=TRUE)
>
> head(sentences)
> fam subjID Cond  Code reg     total     first    second
> 1   f     30   an fDan1   1 0.2812500 0.2812500 0.0000000
> 2   f     30   an fDan1   2 1.7851562 0.5390625 1.2460938
> 3   f     30   an fDan1   3 1.2304688 0.6679688 0.5625000
> 4   f     30   an fDan1   4 0.6289062 0.4375000 0.1914062
> 5   f     30   an fDan2   1 0.1367188 0.1367188 0.0000000
> 6   f     30   an fDan2   2 0.8632812 0.6679688 0.1953125
>
> str(sentences)
> 'data.frame':    4799 obs. of  8 variables:
> $ fam   : Factor w/ 2 levels "f","uf": 1 1 1 1 1 1 1 1 1 1 ...
> $ subjID: int  30 30 30 30 30 30 30 30 30 30 ...
> $ Cond  : Factor w/ 4 levels "an","fi","le",..: 1 1 1 1 1 1 1 1 1  
> 1 ...
> $ Code  : Factor w/ 126 levels "fAan1","fAan2",..: 72 72 72 72 73 73  
> 73 73 74 74 ...
> $ reg   : int  1 2 3 4 1 2 3 4 1 2 ...
> $ total : num  0.281 1.785 1.23 0.629 0.137 ...
> $ first : num  0.281 0.539 0.668 0.438 0.137 ...
> $ second: num  0 1.246 0.562 0.191 0 ...
>
> # If you look the variable "Cond" you see that it has 4 levels
>
> sentences_trial <- sentences[which(sentences$Cond!= "an"),]
>
> > str(sentences)
> 'data.frame':    4799 obs. of  8 variables:
> $ fam   : Factor w/ 2 levels "f","uf": 1 1 1 1 1 1 1 1 1 1 ...
> $ subjID: int  30 30 30 30 30 30 30 30 30 30 ...
> $ Cond  : Factor w/ 4 levels "an","fi","le",..: 1 1 1 1 1 1 1 1 1  
> 1 ...
> $ Code  : Factor w/ 126 levels "fAan1","fAan2",..: 72 72 72 72 73 73  
> 73 73 74 74 ...
> $ reg   : int  1 2 3 4 1 2 3 4 1 2 ...
> $ total : num  0.281 1.785 1.23 0.629 0.137 ...
> $ first : num  0.281 0.539 0.668 0.438 0.137 ...
> $ second: num  0 1.246 0.562 0.191 0 ...
>
> # Now variable "Cond" still has 4 levels but with which I have  
> excluded one level!

You showed us two copies of str(sentences). How can we possibly know  
what sentences_trial looks like?

> #Whether  I apply at this point  interaction plot, the graph  
> considers 4 levels of which.

If you want to remove factor levels from a column just use factor() on  
it again:

sentences_trial <- factor(sentences_trial$Cond)

Or to short-circuit that two-step process use subset with drop =TRUE:

sentences_trial <- subset( sentences, Cond!= "an" , drop=TRUE

>
> attach(sentence_trial)
> x11()
> interaction.plot(Cond,fam,total)
>
> # Where is the problem?
>

I think I identified it, but it was without a reproducible example so  
it remains only an attractive theory.

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list