[R] Subset rows over multiple columns

Gabor Grothendieck ggrothendieck at gmail.com
Fri Apr 14 01:34:50 CEST 2006


Try this:

tt2 <- tt
tt2[,1] <- as.character(tt2[,1])
tt2[,2] <- as.character(tt2[,2])

f <- function(x) with(tt2, mean(righta_a[x == itd_1 | x == itd_45]))
sapply(unique(unlist(tt2[,1:2])), f)


On 4/13/06, Doran, Harold <HDoran at air.org> wrote:
> I have a data frame where I need to subset certain rows before I compute
> the mean of another variable. However, the value that I need to subset
> by is found in multiple columns. For example, in the data below the
> value R0000160 is found in the first and second columns (itd_1 and
> itd_45).  These data are student responses to multiple choice test items
> from a computer adaptive test. So, the variable itd_1 denotes that item
> i was presented to student k in position t and then the variable
> righta_a and righta_b denotes a correct (1) or incorrect response to
> that item when it was presented.
>
> My goal is to get the p-value (mean of the binary variable) for each
> item irrespective of when it was presented to the student.
>
> So, in the sample case below, I would use all elements in righta_a
> (except for the second to last) and then only the second to last value
> in righta_b.
>
> > tail(tt)
>         itd_1   itd_45 righta_a righta_b
> 18407 R0000160 R0208470        1        0
> 18412 R0000160 R0238140        0        1
> 18417 R0000160 R0259690        1        1
> 18422 R0000160 R0000730        1        1
> 18450 R0113750 R0000160        1        1
> 18456 R0000160 R0238690        0        1
>
> One thing I can envision doing is using the reshape option such that
> itd_1 and itd_45 would be in the "long" format. This would cause for
> itd_1 and itd_45 to be stacked in a single column as well as righta_a
> and righta_b and then I could then use tapply and get what I need
> without any subsetting. That is
>
> testScores <- reshape(tt, idvar='id', varying=list(c('itd_1', 'itd_45'),
> c('righta_a', 'righta_b')), v.names=c('item','answer'),
> timevar='item_position', direction='long')
>
> with(testScores, tapply(answer, item, mean))
>
> Or I could get
>
> with(testScores, tapply(answer, list(item, position), mean))
>
> The only problem here is that I have some duplicate IDs in the data and
> reshape doesn't like turning data on its head in that situation, so I
> would need to tinker with those first.
>
> So, I have what I think would be a solution, I wonder if there is
> another way to preserve the data in this "wide" format and get the
> estimates I need? Maybe it is just easier to use reshape. Any
> suggestions?
>
> Harold
> Windows Xp
> R 2.2.1
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>




More information about the R-help mailing list