[R] subsetting tables

Wed Sep 7 09:00:19 CEST 2011

Hi
> 
> Hi Eik,
> 
> greetings to Hamburg! :-) Thanks for the fast and helpful answer
> 
> 
> Eik Vettorazzi-2 wrote:
> > 
> > #compare
> > str(red[,2])
> > str(red[2,])
> > 
> 
> I understand that the first is a real vector of nums in R and the second 
is
> a ?? matrix/list/data.frame ?? of single ? entries? Can I
> transpose/transform it into one vector? Tried 'as.vector' but did not 
help.

See
?"[" 
and its section about data.frame method, drop parameter

drop
logical. If TRUE the result is coerced to the lowest possible dimension. 
The default is to drop if only one column is left, but not to drop if only 
one row is left.

 iris[1,]
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa

as.vector(unlist(iris[1,]))
[1] 5.1 3.5 1.4 0.2 1.0

But if your data are not all numeric they are coerced to numeric - see 
last column values

> 
> 
> Eik Vettorazzi-2 wrote:
> > 
> > sum(red>.5)
> > length(which(red>.5))
> > 
> 
> Sorry for being unprecise. Yes, in this case it was mainly the sum 
(thanks!
> helpful function!), but in general I'd like to understand what happened 
with
> subset here... 
> 
> 
> Eik Vettorazzi-2 wrote:
> > 
> > 
> > and the arr.ind option of which may be useful as well.
> > 
> 
> Thanks a lot, very helpful. For other newbies, here is the line:
> 
>  tableReduced[,-1][which(tableReduced[,-1]>0.5, arr.ind=TRUE)]
> 
> I needed to exclude the first column (-1) since these were titles 
(factors)
> of my rows. In the first trial I forgot to add this information to the 
first
> notion of the table as well, i.e., I tried: 
> 
>  tableReduced[which(tableReduced[,-1]>0.5, arr.ind=TRUE)]
> 
> This will (of course, I have to admit) result in subsetting fields that 
are
> in one column to the left of the intended column. So, if there are any
> subsetting indices in the which-function, they also need to be put in 
front
> of it to make the indices match.
> 
> Just for my understanding, do you know what R did with here? Where do 
the NA
> values come from, what is the row-title NA.1, why does it print the 
first
> two rows unchanged and then goes crazy?
> 
> > subset(red[,], red[,] > 0.5)
> >      Allstar hsa.let.7a hsa.let.7a.1 hsa.let.7a.2
> > 2       0.87       0.79        -0.57         1.07
> > 3       0.67      -1.14        -0.78        -0.95
> > NA        NA         NA           NA           NA
> > NA.1      NA         NA           NA           NA
> > NA.2      NA         NA           NA           NA 
> 

it is rather unusual use of "[". I did not follow whole thread but with 
subsetting you need to consider what you want to get from it.

> str(iris>6)
 logi [1:150, 1:5] FALSE FALSE FALSE FALSE FALSE FALSE ...

Using comparison operator on data frame results in logical matrix which is 
basically logical vector with dimensions. 

> which(iris>6)
 [1]  51  52  53  55  57  59  64  66  69  72  73  74  75  76  77  78  87 
88  92
[20]  98 101 103 104 105 106 108 109 110 111 112 113 116 117 118 119 121 
123 124
[39] 125 126 127 128 129 130 131 132 133 134 135 136 137 138 140 141 142 
144 145
[58] 146 147 148 149 406 408 410 418 419 423 431 432 436

Iris has only 150 rows and you get correct indexing value from first 
column but not from the others.

As you can see from

> tail(iris[which(iris>6),], 10)
     Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
149           6.2         3.4          5.4         2.3 virginica
NA             NA          NA           NA          NA      <NA>
NA.1           NA          NA           NA          NA      <NA>
NA.2           NA          NA           NA          NA      <NA>
NA.3           NA          NA           NA          NA      <NA>
NA.4           NA          NA           NA          NA      <NA>
NA.5           NA          NA           NA          NA      <NA>
NA.6           NA          NA           NA          NA      <NA>
NA.7           NA          NA           NA          NA      <NA>
NA.8           NA          NA           NA          NA      <NA>

you get NA values for those indices which are over 150 (no of iris rows).

If you want let say all items bigger than some threshold from data frame 
you need some small hack

iris1 <- iris[,-5]
iris[ rowSums(iris1 > 6) > 0, ]

or

iris[ rowSums(iris > 6, na.rm=T) > 0, ]

Regards
Petr

> Thanks for this community with fast and reliable help. Amazing to see!
> 
> 
> --
> View this message in context: http://r.789695.n4.nabble.com/subsetting-
> tables-tp3793509p3794527.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.