[R] sub setting a data frame with binomial responses

David L Carlson dcarlson at tamu.edu
Thu Aug 2 06:56:58 CEST 2012


If I understand you correctly you want to exclude columns where all successes equal trials, all successes equal 0, or successes are a mixture of trials and 0 with no in between values. You did not make it clear if the number of trials can vary, but in your example they do not. Given that all three criteria can be consolidated into a single statement:

> mydata <- structure(list(n = c(5, 5, 5, 5), x1 = c(2, 3, 1, 3), 
  x2 = c(5, 5, 5, 5), x3 = c(0, 0, 0, 0), x4 = c(5, 0, 5, 0)), 
  .Names = c("n", "x1", "x2", "x3", "x4"), row.names = c(NA, -4L), 
  class = "data.frame")
> mydata
  n x1 x2 x3 x4
1 5  2  5  0  5
2 5  3  5  0  0
3 5  1  5  0  5
4 5  3  5  0  0
> idx <- sapply(mydata[,-1], function(x) all(x %in% c(0, 5)))
> idx <- c(TRUE, !idx) # add TRUE to include the first column
> mydata[, idx]
  n x1
1 5  2
2 5  3
3 5  1
4 5  3

----------------------------------------------
David L Carlson
Associate Professor of Anthropology
Texas A&M University
College Station, TX 77843-4352

> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of john.james
> Sent: Wednesday, August 01, 2012 10:19 AM
> To: r-help at r-project.org
> Subject: [R] sub setting a data frame with binomial responses
> 
> Hi everyone,
> Let me have a dataframe named “mydata” and created as below,
> *> n=c(5,5,5,5) #number of trils
> > x1=c(2,3,1,3) ) #number of successes
> > x2=c(5,5,5,5) #number of successes
> > x3=c(0,0,0,0) #number of successes
> > x4=c(5,0,5,0) #number of successes
> > mydata=data.frame(n,x1,x2,x3,x4)
> > mydata*
>   n x1 x2 x3 x4
> 1 5  2  5  0  5
> 2 5  3  5  0  0
> 3 5  1  5  0  5
> 4 5  3  5  0  0
> But for my modeling purposes(binomial), I cannot have a dataframe which
> has
> all success columns, all failure columns or only the success and
> failure
> columns.
> That is I need to delete x2, x3 and x4 from my data.frame
> I can delete x2 and x3 as follows
> *mydata = t(subset(t(mydata), rowSums(t(mydata)) > 0))
> mydata = t(subset(t(mydata), rowSums(t(sim.data)) < 20)) #where 20=4*5*
> 
> How can I subset my data by removing x4, which contains either number
> trials
> or zeros as elements?
> Can I give a single logical condition in the subset code to skip all
> such
> rows(i.e. skipping x2,x3, and x4 at once)?
> 
> *** I am doing this for a very large dataframe(1000s of columns as
> responses) in a simulation study, but here I explained with a simple
> case.
> 
> Thank you for your kindness!
> 
> 
> 
> 
> --
> View this message in context: http://r.789695.n4.nabble.com/sub-
> setting-a-data-frame-with-binomial-responses-tp4638702.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list