[R] Deleting specific rows from a dataframe

arun smartpink111 at yahoo.com
Tue Jul 16 05:00:54 CEST 2013


You mentioned data.frame at one place and matrix at another.  Matrix would be faster.

#Speed comparison
#Speed
set.seed(1454)
dfTest<- as.data.frame(matrix(sample(LETTERS[15:18],5*1e6,replace=TRUE),ncol=5))

system.time(res<-dfTest[rowSums(dfTest=="P")==ncol(dfTest),])
#   user  system elapsed 
#  0.628   0.020   0.649 
 dim(res)
#[1] 952   5


set.seed(1454)
mat1<- matrix(sample(LETTERS[15:18],5*1e6,replace=TRUE),ncol=5)
system.time(res1<-mat1[rowSums(mat1=="P")==ncol(mat1),])
# user  system elapsed 
#  0.188   0.004   0.194 
dim(res1)
#[1] 952   5

#Other options include
system.time(res3<- dfTest[apply(sweep(dfTest,1,"P","=="),1,all),])
#   user  system elapsed 
#  5.988   0.120   6.120 
 identical(res,res3)
#[1] TRUE



system.time(res2<- dfTest[apply(dfTest,1, function(x) all(length(table(x))==ncol(dfTest) | names(table(x))=="P")  ), ])
#   user  system elapsed 
#351.492   0.040 352.164 
row.names(res2)<- row.names(res3)
attr(res3,"row.names")<- attr(res2,"row.names")
 identical(res2,res3)
#[1] TRUE


A.K.

----- Original Message -----
From: arun <smartpink111 at yahoo.com>
To: Chirag Gupta <cxg040 at email.uark.edu>
Cc: R help <r-help at r-project.org>
Sent: Monday, July 15, 2013 9:23 PM
Subject: Re: [R] Deleting specific rows from a dataframe

Hi,
If I understand it correctly,
df1<- read.table(text="
sample1 sample2 sample3 sample4 sample5  
 a P P I P P
 b P A P P A
 c P P P P P
 d P P P P P
 e M P M A P
 f P P P P P
 g P P P A P
 h P P P P P
",sep="",header=TRUE,stringsAsFactors=FALSE)
df1[rowSums(df1=="P")==ncol(df1),]
#  sample1 sample2 sample3 sample4 sample5
#c       P       P       P       P       P
#d       P       P       P       P       P
#f       P       P       P       P       P
#h       P       P       P       P       P
A.K.



----- Original Message -----
From: Chirag Gupta <cxg040 at email.uark.edu>
To: r-help at r-project.org
Cc: 
Sent: Monday, July 15, 2013 9:10 PM
Subject: [R] Deleting specific rows from a dataframe

I have a data frame like shown below

  sample1 sample2 sample3 sample4 sample5  a P P I P P  b P A P P A  c P P P
P P  d P P P P P  e M P M A P  f P P P P P  g P P P A P  h P P P P P

I want to keep only those rows which have all "P" across all the columns.

Since the matrix is large (about 20,000 rows), I cannot do it in excel

Any special function that i can use?
-- 
*Chirag Gupta*

    [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list