[R] Remove similar rows from matrix

Rui Barradas ruipbarradas at sapo.pt
Thu Aug 23 17:17:09 CEST 2012


Hello,

Here's another close solution.


na_count <- rowSums(is.na(mat))
mat1 <- mat[na_count <= 2, ]
diff_mat1 <- rbind( mat1[1, ], apply(mat1, 2, diff) )
no <- is.na(diff_mat1) | diff_mat1 == 0
yes <- !apply(no, 1, all)
mat1.1 <- mat1[yes, ]

all.equal( mat1.1, mat2 )  # Not quite

why1 <- 1*(is.na(mat1.1) & is.na(mat2))
why2 <- 1*(is.na(mat1.1) | is.na(mat2))
sum(why1); sum(why2)

why2 - why1

Why: In a sequence of "equal" rows, the first is allways kept, even if 
it has an NA where the others don't.
So maybe now the op could use a similar method, but starting from below, 
and then, from both solutions, keep the rows with less NAs.
I'll give it some thought latter.

Hope this helps,

Rui Barradas

Em 23-08-2012 13:09, PIKAL Petr escreveu:
> Hi
>
> I cannot reproduce exactly what you want but maybe you can elaborate this to suit your needs.
>
> sel1<-rowSums(is.na(mat)) # number of NA values
> sel2<-c(0,rowSums(apply(mat,2,diff)==0, na.rm=T)) # rows which are same
>
> but first row is not considered same, therefore I add also the first row
>
> sel<-c(rowSums(embed(sel2,2)),0)
>
> and here I select only rows which are unique and do not have any NA
> mat[(sel1*sel)==0,]
>
> Which is not exactly what you want as one of rows starting  328 shall be included. So there has to be another trick but I can not come to any.
>
> Regards
> Petr
>
>> -----Original Message-----
>> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
>> project.org] On Behalf Of Tonja Krueger
>> Sent: Wednesday, August 22, 2012 10:16 AM
>> To: r-help at r-project.org
>> Subject: [R] Remove similar rows from matrix
>>
>>
>>     Hi everybody,
>>
>>     I have a matrix (mat) from which I want to remove all rows that
>> differ from
>>     other rows in that matrix only by having one ore two NA’s instead of
>> a
>>     numbers.
>>
>>     I would like to remove rows with more NA’s preferably, so in the end
>> the
>>     matrix would look like mat2.
>>
>>     Has someone done something similar before? Thanks for helping, Tonja
>>
>>
>>     Here my example:
>>
>>     ex <- c(14, 56, 114, 132, 187, 279, 324, 328, 328, 338, 338, 338,
>> 346, 346,
>>     395, 398, 428, 428, 428, 452, 452, 452, NA, 466, 467, 525, 894, 923,
>> 968,
>>     980, 1030, 1117, 1156, NA, 1159, 1166, 1166, 1166, 1171, 1171, 1209,
>> 1211,
>>     1235, 1235, 1235, 1275, 1275, 1275, NA, 1291, 1292, 1378, 829, 851,
>> 880,
>>     893, 929, 1003, 1042, 1045, 1045, 1051, 1051, 1051, 1057, 1057,
>> 1097, 1099,
>>     1119, 1119, 1119, 1147, 1147, 1147, 1147, 1167, 1168, 1235, 494,
>> 510, 533,
>>     538, 567, 623, 657, 660, 660, 666, 666, 666, 671, 671, 699, 702, NA,
>> 722,
>>     722, NA, NA, 744, 744, 759, 760, 816, 276, 293, 312, 318, 338, NA,
>> NA, 418,
>>     418, 424, 424, NA, 429, 429, NA, NA, 468, 468, 468, 490, 490, 490,
>> 490, 508,
>>     509, 568, 674, 696, 726, 734, 774, 851, 893, 896, 896, 903, 903,
>> 903, 908,
>>     908, 944, 947, 966, 966, 966, NA, 998, 998, 998, 1014, 1015, 1091,
>> 421, 446,
>>     472, 490, 510, 582, 624, 627, 627, 633, 633, NA, 640, 640, 669, 671,
>> 685,
>>     685, 685, 716, 716, 716, 716, 736, 737, 798, NA, NA, NA, NA, NA, NA,
>> 74, NA,
>>     NA, 82, NA, 82, 86, NA, 104, NA, 114, NA, 114, 119, 119, 119, 119,
>> NA, NA,
>>     NA)
>>
>>     mat <- matrix(example, ncol=8)
>>
>>
>>     ex2 <- c(14, 56, 114, 132, 187, 279, 324, 328, 338, 346, 395, 398,
>> 428, 452,
>>     466, 467, 525, 894, 923, 968, 980, 1030, 1117, 156, 1159, 1166,
>> 1171, 1209,
>>     1211, 1235, 1275, 1291, 1292, 1378, 829, 851, 880, 893, 929, 1003,
>> 1042,
>>     1045, 1051, 1057, 1097, 1099, 1119, 1147, 1167, 1168, 1235, 494,
>> 510, 533,
>>     538, 567, 623, 657, 660, 666, 671, 699, 702, 722, 744, 759, 760,
>> 816, 276,
>>     293, 312, 318, 338, NA, NA, 418, 424, 429, NA, NA, 468, 490, 508,
>> 509, 568,
>>     674, 696, 726, 734, 774, 851, 893, 896, 903, 908, 944, 947, 966,
>> 998, 1014,
>>     1015, 1091, 421, 446, 472, 490, 510, 582, 624, 627, 633, 640, 669,
>> 671, 685,
>>     716, 736, 737, 798, NA, NA, NA, NA, NA, NA, 74, NA, 82, 86, 104, NA,
>> 114,
>>     119, NA, NA, NA)
>>
>>     mat2 <- matrix(example2, ncol=8)
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-
>> guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list