# [R] how to do something like " subset(mat, ("col1">4 & "col2">4)) "

Peter Dalgaard p.dalgaard at biostat.ku.dk
Fri Sep 9 16:23:22 CEST 2005

```Florence Combes <fcombes at gmail.com> writes:

> Dear all,
>
> I have a problem with the "subset()" function. I spent all day yesterday
> with a collegue to solve it and we did not find a satisfying solution (even
> Let's say (for a simple example) a matrix mat:
>
> R> mat
> cola colb colc
> [1,] 1 4 7
> [2,] 2 5 8
> [3,] 3 6 9
>
> My goal is to select the lines of the matrix on the basis of the values of
> more than one column (let's say colb and colc).
> For example I want to select all the lines of the matrix for which values in
> colb and colc are more than 4.
>
> I tried several ways that did not work:
>
> R> mat2 <- subset(mat, ("colb">4 & "colc">4))
> R> mat2
> [1] 1 2 3 4 5 6 7 8 9
>
> it is a vector, not a matrix.
>
> > mat2 <- subset(mat, mat[,2:3]>4)
> > mat2
> [1] 2 3 4 5 6 8 9
>
> tha same: it is a vector; so I tried:
>
> > mat2 <- as.matrix(subset(mat, mat[,("colb">4 & "colc">4)]))
> > mat2
> [,1]
> [1,] 1
> [2,] 2
> [3,] 3
> [4,] 4
> [5,] 5
> [6,] 6
> [7,] 7
> [8,] 8
> [9,] 9
>
> not good :(
>
> Did someone have an idea of how to select the only the lines 2 and 3 of mat
> by a selection on "colb" and "colc" >4 ?

Well, subset has methods for vectors and data frames, so what happens
for matrices is basically that they get converted to vectors. I don't
know what gave you the idea of quoting the names, but

"colb">4

is TRUE because numbers sort before letters!

Try something like

as.matrix(subset(as.data.frame(mat),colb>4 & colc>4))

--
O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
(*) \(*) -- University of Copenhagen   Denmark          Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)                  FAX: (+45) 35327907

```