[R] trouble reading in datasets

Peter Dalgaard BSA p.dalgaard at biostat.ku.dk
Tue Oct 26 00:13:02 CEST 1999


Clayton Springer <csprin at brandybuck.ca.sandia.gov> writes:

> Dear All,
> 
> I was trying to follow some of the examples in Venables and Ripley "Modern applied ... with S-plus"
> I have downloaded a copy of the iris data set and loaded into R. :
> 
> however I cannot use the apply command (from p47):
> 
>  > apply (iris, 2 ,mean)
> Error in sum(..., na.rm = na.rm) : invalid "mode" of argument
> 
> > apply (iris, c(2) ,mean)
> Error in sum(..., na.rm = na.rm) : invalid "mode" of argument
> 
> also
> 
> > apply (iris, c(2) ,sum)
> Error in sum(..., na.rm = na.rm) : invalid "mode" of argument
> 
> So ... any suggestions as to what have I not done here? 
> 
> Some commands that show that I did load the dataset.
> > iris
>      V1  V2  V3  V4              V5
> 1   5.1 3.5 1.4 0.2     Iris-setosa
> 

Well, it's one of the built-in datasets, so you could just have typed 

data(iris)

However, that wouldn't have helped you, except giving you a clue that
the problem does not lie in the reading in. apply() works on matrices
and iris is a data frame, so R tries to convert it to one. However, in
doing so, it must try to convert all elements to the same type and the
last column is a factor, so the whole thing becomes character:

> as.matrix(iris)
    Sepal.Length Sepal.Width Petal.Length Petal.Width Species     
1   "5.1"        "3.5"       "1.4"        "0.2"       "setosa"    
2   "4.9"        "3.0"       "1.4"        "0.2"       "setosa"    
3   "4.7"        "3.2"       "1.3"        "0.2"       "setosa"    
4   "4.6"        "3.1"       "1.5"        "0.2"       "setosa"    
....

Now, here's a difference between R and S(-plus 3.4): S will happily
let you take the mean of a character variable as in
> mean("1")   
[1] 1
whereas R will not 
> mean("1")
Error in sum(..., na.rm = na.rm) : invalid "mode" of argument

If you get rid of the 5th column, the problem disappears:
> apply (iris[,-5], 2 ,mean)
Sepal.Length  Sepal.Width Petal.Length  Petal.Width 
    5.843333     3.057333     3.758000     1.199333 

Actually, that isn't the end of the story, because the iris data that
comes with S is stored as a 3-way array, rather than a dataframe. The
way to convert from one to the other is -ahem- left as an exercise....

-- 
   O__  ---- Peter Dalgaard             Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics     2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)             FAX: (+45) 35327907
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list