# [R] select rows with identical columns from a data frame

David Winsemius dwinsemius at comcast.net
Sun Jan 20 18:27:44 CET 2013

```On Jan 20, 2013, at 8:26 AM, Sam Steingold wrote:

>> * Bert Gunter <thagre.oregba at trar.pbz> [2013-01-19 22:26:46 -0800]:
>>
>> But David W. and Bill Dunlap gave you solutions that also work and
>> are
>> much faster, no?!
>
> Yes, indeed, and I am now using David's solution as it is fast
> (enough), simple and concise.

I am a bit surprised by that. I do agree that it was simple and
concise, two programming virtues that I occasionally achieve. However,
when I tested it against either of Bill Dunlap's suggestions mine was
15-40 times slower. (So I saved Bill's code and made a mental note to
study it's superiority.) I could see why the f2 version was superior,
since it progressively shrank the index candidates for further
comparison, but his first function used no such logic and was still 15
times faster.

My test included the creation of the smaller data.frame which his did
not, but when I modified mine to only return the index vector, that
was the step that consumed all the time. I wondered if it were `which`
that consumed the time but it appears the inner step of x==x[[1]] that
was the culprit.

> x <- data.frame(lapply(structure(1:10,names=letters[1:10]),
function(i) sample(c(NA,1,1,1,2,2,2,3), replace=TRUE, size=1e6)))

> system.time({ keep <- x[[1]] == x[[2]]
+    for (i in seq_len(ncol(x))[-(1:2)]) {
+        keep <- keep & x[[i - 1]] == x[[i]]
+    }
+    z2 <- !is.na(keep) & keep})
user  system elapsed
0.179   0.056   0.240

> system.time({z <- rowSums(x==x[[1]]) })
user  system elapsed
3.535   0.535   4.067

> system.time({z <- x==x[[1]] })
user  system elapsed
3.540   0.524   4.061

--
David

>
> Thanks a lot to David, Bill, Rui, and arun for their answers (to this
> question, my many previous questions, and, I hope, my future questions
>
>> On Sat, Jan 19, 2013 at 9:41 PM, Sam Steingold <sds at gnu.org> wrote:
>>>> * Rui Barradas <ehvconeenqnf at fncb.cg> [2013-01-18 21:02:20 +0000]:
>>>>
>>>> Try the following.
>>>>
>>>> complete.cases(f) & apply(f, 1, function(x) all(x == x[1]))
>>>
>>> thanks, this works, but is horribly slow (dim(f) is 766,950x2)
>
--

David Winsemius, MD
Alameda, CA, USA

```