[Rd] cor() fails with big dataframe

Prof Brian Ripley ripley at stats.ox.ac.uk
Thu Sep 16 13:57:41 CEST 2004


On Thu, 16 Sep 2004, Mayeul KAUFFMANN claimed:

> ?cor says it accepts data.frame. In fact, it does iff they have no (or

It actually says

       x: a numeric vector, matrix or data frame.
            ^^^^^^^

If you want to do the conversions as you say, you should be calling
data.matrix.


On Thu, 16 Sep 2004, Mayeul KAUFFMANN wrote:

> Thanks all for your answers.
> 
> #The difference between the 2 following commands might be a puzzle even
> for intermediate users. (I give explanation below)
> > cor(x[,4],x[,5])
> [1] -0.4352342
> > cor(x[,4:5])
> Error in cor(x[, 4:5]) : missing observations in cov/cor
> In addition: Warning message:
> NAs introduced by coercion
> 
> From: "Martin Maechler" <maechler at stat.math.ethz.ch>
> To: "Mayeul KAUFFMANN" <mayeul.kauffmann at tiscali.fr>
> >     Mayeul> #I found the obvious workaround:
> >     Mayeul> COR <- matrix(rep(0, 81),9,9)
> >     Mayeul> for (i in 1:9) for (j in 1:9) {if (i>j) COR[i,j] <- cor
> (x[,i],x[,j])}
> >     Mayeul> #which works fine, with no warning
> >     Mayeul> #looks like a "cor()" bug.
> Martin Maechler wrote:
> > quite improbably.
> if it is wrong, can you say what is wrong then propose an alternate
> workaround? (or should I ask on r-help).
> 
> 
> > What does
> >     sapply(x, function(u)all(is.finite(u)))
> > return ?
> 
> sapply(x2, function(u)all(is.finite(u)))
>   jntdem smldepnp lrgdepnp contigkb logdstab  majdyds  alliesr  lncaprt
> GATT
>     TRUE     TRUE     TRUE     TRUE     TRUE     TRUE     TRUE     TRUE
> TRUE
> 
> _______________________________________________
> 
> But I now got the explanation. It is not due to size.
> #Tony Plate wrote:
> #I would suspect that your dataframe has columns that result in NA's when
> it
> #is coerced to a matrix
> 
> That's not yet the explanation, but you are close to it.
> 
> All columns are numerics, except 3 that are logical (I thought they would
> be coerced to 0 an 1, which they are with cor(x[,4],x[,5]) not with
> cor(x[,4:5]) )
> They do not changes to NA's or infinite values, they ALL change to TEXT
> 
> ?as.matrix
>  'as.matrix' is a generic function. The method for data frames will
>      convert any non-numeric/complex column into a character vector
>      using 'format' and so return a character matrix, except that
>      all-logical data frames will be coerced to a logical matrix.
> 
> > as.matrix(x[1:3,1:9])
>   jntdem smldepnp     lrgdepnp    contigkb logdstab   majdyds alliesr
> 1 "400"  "0.01420874" "0.2156945" "TRUE"   "5.820108" "TRUE"  "TRUE"
> 2 "400"  "0.01534535" "0.2496879" "TRUE"   "5.820108" "TRUE"  "TRUE"
> 3 "400"  "0.01585586" "0.2570493" "TRUE"   "5.820108" "TRUE"  "TRUE"
>   lncaprt    GATT
> 1 "2.883204" "1"
> 2 "2.906521" "1"
> 3 "2.833357" "1"
> 
> ?cor says it accepts data.frame. In fact, it does iff they have no (or
> only: cor(x[,6:7]) works) logical columns.
> doing cor with a logical (a dummy variable) and a numeric is maybe not as
> sensible as doing it with 2 numerics.
> But it may still usefull to explore data.
> 
> Maybe one may want either to change the documentation of ?cor , or not
> rely on as.matrix to convert the data.frame if some columns  are logical.
> 
> 
> Cheers,
> Mayeul
> 
> ______________________________________________
> R-devel at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
> 
> 

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-devel mailing list