[Rd] boxplot.formula with missing values (PR#6846)

Prof Brian Ripley ripley at stats.ox.ac.uk
Mon May 3 13:33:53 CEST 2004


I think this *is* the correct behaviour for a formula method. The problem
I see is that boxplot.formula does not have an na.action argument and so
you may not have realised that na.action=na.omit is the default.

Note that subset= will `remove the same rows from all columns', too.

It really is not the intention that the formula interface is used with 
matrices, and as.vector will do what I think you intended:

	boxplot(as.vector(fake.data) ~ as.vector(col(fake.data)))

Also, setting options(na.action=na.pass) will work as you expected.

I've added an na.action argument for R-devel.


On Mon, 3 May 2004 rdiaz at cnio.es wrote:

> If an array has missing values in different rows, plotting using the formul=
> a=20
> interface can produce errors. Example:

Well, not do what you expected, but the error appears to be in your 
expectations.

> fake.data <- matrix(rep(-100:100, 4),
>                     ncol =3D 4)
> 
> par(mfrow =3D c(1,2))
> boxplot(fake.data ~ col(fake.data))
> abline(h =3D 0, lty =3D 2)
> boxplot(as.data.frame(fake.data))
> abline(h =3D 0, lty =3D 2)
> 
> ##### Add the missing data
> fake.data[190:200, 1] <- NA
> fake.data[1:5, 3] <- NA
> 
> ## Bot only columns 1 and 3 should change!! (and in opposite directions)
> par(mfrow =3D c(1, 2))
> boxplot(fake.data ~ col(fake.data))
> abline(h =3D 0, lty =3D 2)
> boxplot(as.data.frame(fake.data))
> abline(h =3D 0, lty =3D 2)
> 
> ### The problem is that the same rows are removed from all the columns:
> 
> bp.a <- boxplot(fake.data ~ col(fake.data))
> bp.df<- boxplot(as.data.frame(fake.data))
> 
> ### which happens during the call to
> 
> eval(m, parent.frame())
> 
> inside boxplot.formula
> 
> **********************************
> 
> This happens in at least:
> 
>          _               =20
> platform i686-pc-linux-gnu
> arch     i686            =20
> os       linux-gnu       =20
> system   i686, linux-gnu =20
> status   Patched         =20
> major    1               =20
> minor    9.0             =20
> year     2004            =20
> month    05              =20
> day      02              =20
> language R   =20
> 
>          _               =20
> platform i386-pc-linux-gnu
> 
> arch     i386            =20
> os       linux-gnu       =20
> system   i386, linux-gnu =20
> status                   =20
> major    1               =20
> minor    8.1             =20
> year     2003            =20
> month    11              =20
> day      21              =20
> language R          =20
> 
>          _                          =20
> platform i686-pc-linux-gnu          =20
> arch     i686                       =20
> os       linux-gnu                  =20
> system   i686, linux-gnu            =20
> status   Under development (unstable)
> major    2                          =20
> minor    0.0                        =20
> year     2004                       =20
> month    04                         =20
> day      30                         =20
> language R         =20
> 
> 
> 
> 
> 
> =2D-=20
> Ram=F3n D=EDaz-Uriarte
> Bioinformatics Unit
> Centro Nacional de Investigaciones Oncol=F3gicas (CNIO)
> (Spanish National Cancer Center)
> Melchor Fern=E1ndez Almagro, 3
> 28029 Madrid (Spain)
> =46ax: +-34-91-224-6972
> Phone: +-34-91-224-6900
> 
> http://bioinfo.cnio.es/~rdiaz
> PGP KeyID: 0xE89B3462
> (http://bioinfo.cnio.es/~rdiaz/0xE89B3462.asc)
> 
> ______________________________________________
> R-devel at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-devel
> 
> 

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-devel mailing list