[R] Strange t-test error: "grouping factor must have exactly 2 levels" while it does...

Petr PIKAL petr.pikal at precheza.cz
Fri Jul 10 12:00:53 CEST 2009


Hi

you have to look to your data
when I used your function to some artificial data I got expected result

> myfun(visko,"konc")

 Levels = 2 

[[1]]
[1] NA

[[2]]

        Welch Two Sample t-test

data:  data[[nam[v]]] by data[[g]] 
t = -1.7778, df = 4.541, p-value = 0.1415
alternative hypothesis: true difference in means is not equal to 0 
95 percent confidence interval:
 -12.861362   2.535362 
sample estimates:
mean in group 1 mean in group 2 
          6.685          11.848 


[[3]]

        Welch Two Sample t-test

data:  data[[nam[v]]] by data[[g]] 
t = -2.6074, df = 3.263, p-value = 0.07327
alternative hypothesis: true difference in means is not equal to 0 
95 percent confidence interval:
 -10.070027   0.775027 
sample estimates:
mean in group 1 mean in group 2 
         2.3275          6.9750

try

debug(myfun)

and see at what column it gives an error and how all values look like 
immediately before an error.

Regards
Petr


r-help-bounces at r-project.org napsal dne 10.07.2009 11:40:30:

> Thanks for your hints, but I'm still stuck... In dataset I mentioned
> (N=134) there are only 3 NA's in variable, and 41% : 59% distribution
> of the two values. It doesn't look like it was because of the data...
> 
> I changed and simplified my function, now it prints levels before
> doing the rest. Here's a "funny" error result:
> 
> > myfun(data, 'varname')
> 
>  Levels = 2
> 
> Error in t.test.formula(data[[nam[v]]] ~ data[[g]]) :
>   grouping factor must have exactly 2 levels
> 
> ...
> 
> I'll paste simplified code, maybe it'd give someone a clue what is going 
wrong:
> 
> myfun <- function(data, g) {
> 
>    require(stats)
> 
>    data <- as.data.frame(data)
>    nam <- names(data)
>    res <- matrix(NA,ncol(data))
> 
>    cat("\n Levels =", nlevels(factor(data[[g]])),"\n\n")
> 
>    for (v in 1:ncol(data)) {
>       if (nam[v] != g) {
>          res[v] <- list(t.test(data[[nam[v]]]~data[[g]]))
>    }}
>    res
> }
> 
> What is going wrong here?
> 
> Greetz,
> Timo
> 
> 
> 2009/7/10 Marc Schwartz <marc_schwartz at me.com>:
> > On Jul 9, 2009, at 5:04 PM, Tymek W wrote:
> >
> >> Hi,
> >>
> >> Could anyone tell me what is wrong:
> >>
> >>> length(unique(mydata$myvariable))
> >>
> >> [1] 2
> >>>
> >>
> >> and in t-test:
> >>
> >> (...)
> >> Error in t.test.formula(othervariable ~ myvariable, mydata) :
> >>  grouping factor must have exactly 2 levels
> >>>
> >>
> >> I re-checked the code and still don't get what is wrong.
> >>
> >> Moreover, there is some strange behavior:
> >>
> >> /1 It seems that the error is vulnerable to NA'a, because it affects
> >> some variables in data set with NA's and doesn't affect same ones in
> >> dataset with NA's removed.
> >>
> >> /2 It seems it works differently with different ways of using
> >> variables in t.test:
> >>
> >> eg. it hapends here: t.test(x~y, dataset) and does not here:
> >> t.test(dataset[['x']]~dataset[['y']])
> >>
> >> Does anyone have any ideas?
> >>
> >> Greetz,
> >> Timo
> >
> >
> > Check the output of:
> >
> >  na.omit(cbind(mydata$othervariable, mydata$myvariable))
> >
> > which will give you some insight into what data is actually available 
to be
> > used in the t test. This will remove any rows that have missing data. 
Your
> > first test above, checking the number of levels, is before missing 
data is
> > removed.
> >
> > The likelihood is that once missing values have been removed, you are 
only
> > left with one unique grouping value in mydata$myvariable.
> >
> > For your note number 2, it should be the same for both examples, as in 
both
> > cases, the same basic approach is used. For example:
> >
> > DF <- data.frame(x = c(1:3, NA, NA, NA), y = rep(1:2, each = 3))
> >
> >> DF
> >   x y
> > 1  1 1
> > 2  2 1
> > 3  3 1
> > 4 NA 2
> > 5 NA 2
> > 6 NA 2
> >
> > # Remove missing data
> >> na.omit(DF)
> >  x y
> > 1 1 1
> > 2 2 1
> > 3 3 1
> >
> >> t.test(x ~ y, data = DF)
> > Error in t.test.formula(x ~ y, data = DF) :
> >  grouping factor must have exactly 2 levels
> >
> >> t.test(DF$x ~ DF$y)
> > Error in t.test.formula(DF$x ~ DF$y) :
> >  grouping factor must have exactly 2 levels
> >
> >
> > If you have a small reproducible example where the two function calls 
behave
> > differently, please post back with it.
> >
> > HTH,
> >
> > Marc Schwartz
> >
> >
> 
> 
> 
> -- 
> pozdrawiam,
> Tymek W
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list