[R] how to do something like " subset(mat, ("col1">4 & "col2">4)) "

Peter Dalgaard p.dalgaard at biostat.ku.dk
Fri Sep 9 16:23:22 CEST 2005


Florence Combes <fcombes at gmail.com> writes:

> Dear all, 
> 
> I have a problem with the "subset()" function. I spent all day yesterday 
> with a collegue to solve it and we did not find a satisfying solution (even 
> in the archived mails), so I ask for your help. 
> Let's say (for a simple example) a matrix mat: 
> 
> R> mat
> cola colb colc
> [1,] 1 4 7
> [2,] 2 5 8
> [3,] 3 6 9
> 
> My goal is to select the lines of the matrix on the basis of the values of 
> more than one column (let's say colb and colc). 
> For example I want to select all the lines of the matrix for which values in 
> colb and colc are more than 4. 
> 
> I tried several ways that did not work: 
> 
> R> mat2 <- subset(mat, ("colb">4 & "colc">4))
> R> mat2
> [1] 1 2 3 4 5 6 7 8 9
> 
> it is a vector, not a matrix. 
> 
> > mat2 <- subset(mat, mat[,2:3]>4)
> > mat2
> [1] 2 3 4 5 6 8 9
> 
> tha same: it is a vector; so I tried: 
> 
> > mat2 <- as.matrix(subset(mat, mat[,("colb">4 & "colc">4)]))
> > mat2
> [,1]
> [1,] 1
> [2,] 2
> [3,] 3
> [4,] 4
> [5,] 5
> [6,] 6
> [7,] 7
> [8,] 8
> [9,] 9
> 
> not good :(
> 
> Did someone have an idea of how to select the only the lines 2 and 3 of mat 
> by a selection on "colb" and "colc" >4 ? 


Well, subset has methods for vectors and data frames, so what happens
for matrices is basically that they get converted to vectors. I don't
know what gave you the idea of quoting the names, but 

"colb">4

is TRUE because numbers sort before letters!

Try something like

as.matrix(subset(as.data.frame(mat),colb>4 & colc>4))


-- 
   O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark          Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)                  FAX: (+45) 35327907




More information about the R-help mailing list