[R] populating matrix with binary variable after matching data from data frame

William Dunlap wdunlap at tibco.com
Thu Aug 14 18:00:23 CEST 2014


This is what I got:
> x1 <- data.frame(V1=c("K","D","K","M"), V2=c("L","A","M","A"))
> X <- array(0, c(4,4), rep(list(LETTERS[1:4]), 2))
> f(X, x1, badEntryAction="omitRows")
  A B C D
A 0 0 0 0
B 0 0 0 0
C 0 0 0 0
D 1 0 0 0
> table(lapply(x1, factor, levels=LETTERS[1:4]))
   V2
V1  A B C D
  A 0 0 0 0
  B 0 0 0 0
  C 0 0 0 0
  D 1 0 0 0

I think you should sort out how your attempts went wrong.

My original 'f' assumed, perhaps foolishly, that x1 had columns names
"V1" and "V2",
perhaps it should have said just i<-as.matrix(x1) and checked that the result
was a 2-column matrix of character data.  E.g.,
f <- function (x, x1, badEntryAction = c("error", "omitRows", "expandX"))
{
    badEntryAction <- match.arg(badEntryAction)
    i <- as.matrix(x1)
    stopifnot(is.character(i), ncol(i)==2)
    if (badEntryAction == "omitRows") {
        i <- i[is.element(i[, 1], dimnames(x)[[1]]) &
               is.element(i[, 2], dimnames(x)[[2]]), , drop = FALSE]
    }
    else if (badEntryAction == "expandX") {
        extraDimnames <- lapply(1:2, function(k) setdiff(i[,
            k], dimnames(x)[[k]]))
        # if you want the same dimnames on both axes,
        # take union of the 2 extraDimnames
        if ((n <- length(extraDimnames[[1]])) > 0) {
            x <- rbind(x, array(0, c(n, ncol(x)),
                       dimnames = list(extraDimnames[[1]], NULL)))
        }
        if ((n <- length(extraDimnames[[2]])) > 0) {
            x <- cbind(x, array(0, c(nrow(x), n), dimnames = list(NULL,
                extraDimnames[[2]])))
        }
    }
    x[i] <- 1
    x
}

Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Thu, Aug 14, 2014 at 8:15 AM, Adrian Johnson
<oriolebaltimore at gmail.com> wrote:
> Hi Bill,
> sorry for trouble. It did not work both solutions.
> Error in `[<-`(`*tmp*`, i, value = 1) : subscript out of bounds
>
>
> my x matrix is may not have  items that x1 has.
>
> say x only has A,B, C, D  , whereas x1 has K, L, M , A and D.  However
> x1 does not have any relationship between B and C thus B-C will be a
> zero anyway.
>
> x1 :
>
> K   L
> D  A
> K  M
> M  A
> Although M associates with A, since M is not present in X - we will
> not map this association with 1. Since A and D are present in X - we
> will assign 1.
>
>
>
>    A  B  C  D
>
> A 0  0  0  0
>
> B 0  0  0  0
>
> C 0  0  0  0
>
> D  1 0  0  0
>
>
> I tried this simple for loop but I get same subset error:
>
>
> for(k in nrow(x1)){
> x[x1[k,]$V1,x1[k,]$V2] <- 1
> x[x1[,k]$V1,x1[,k]$V2] <- 1
> x[x1[,k]$V2,x1[,k]$V1] <- 1
> }
>
> Error in `[<-`(`*tmp*`, hprd[x, ]$V1, hprd[x, ]$V2, value = 1) :
>   subscript out of bounds
>
> Thanks again.
>
> On Wed, Aug 13, 2014 at 6:02 PM, William Dunlap <wdunlap at tibco.com> wrote:
>> Another solution is to use table to generate your x matrix, instead of
>> trying to make one and adding to it.  If you want the table to have
>> the same dimnames on both sides, make factors out of the columns of x1
>> with the same factor levels in both.  E.g., using a *small* example:
>>
>>> X1 <- data.frame(V1=c("A","A","B"), V2=c("C","C","A"))
>>> X <- table(lapply(X1, factor, levels=union(levels(X1[[1]]), levels(X1[[2]]))))
>>> X
>>    V2
>> V1  A B C
>>   A 0 0 2
>>   B 1 0 0
>>   C 0 0 0
>>
>> If you don't want counts, but just a TRUE for presence and FALSE for
>> absence, use X>0.  If you want 1 for presence and 0 for absence you
>> can use pmin(X, 1).
>>
>> Bill Dunlap
>> TIBCO Software
>> wdunlap tibco.com
>>
>>
>> On Wed, Aug 13, 2014 at 2:51 PM, William Dunlap <wdunlap at tibco.com> wrote:
>>> I may have missed something, but I didn't see the result you want for
>>> your example.  Also,
>>> none of the entries in the x1 you showed are row or column names in x,
>>> making it hard to show what you want to happen.
>>>
>>> Here is a function that gives you the choice of
>>>     *error: stop if any row of x1 is 'bad'
>>>     *omitRows: ignore rows of x1 are 'bad'
>>>     *expandX: expand the x matrix to include all rows or columns named in x1
>>> (Row i of x1 is 'bad' if that x1[,1] is not a rowname of x or x1[,2]
>>> is not a column name of x).
>>>
>>> f
>>> function (x, x1, badEntryAction = c("error", "omitRows", "expandX"))
>>> {
>>>     badEntryAction <- match.arg(badEntryAction)
>>>     i <- as.matrix(x1[, c("V1", "V2")])
>>>     if (badEntryAction == "omitRows") {
>>>         i <- i[is.element(i[, 1], dimnames(x)[[1]]) & is.element(i[,
>>>             2], dimnames(x)[[2]]), , drop = FALSE]
>>>     }
>>>     else if (badEntryAction == "expandX") {
>>>         extraDimnames <- lapply(1:2, function(k) setdiff(i[,
>>>             k], dimnames(x)[[k]]))
>>>         # if you want the same dimnames on both axes, take union of
>>> the 2 extraDimnames
>>>         if ((n <- length(extraDimnames[[1]])) > 0) {
>>>             x <- rbind(x, array(0, c(n, ncol(x)), dimnames =
>>> list(extraDimnames[[1]],
>>>                 NULL)))
>>>         }
>>>         if ((n <- length(extraDimnames[[2]])) > 0) {
>>>             x <- cbind(x, array(0, c(nrow(x), n), dimnames = list(NULL,
>>>                 extraDimnames[[2]])))
>>>         }
>>>     }
>>>     x[i] <- 1
>>>     x
>>> }
>>>
>>> Bill Dunlap
>>> TIBCO Software
>>> wdunlap tibco.com
>>>
>>>
>>> On Wed, Aug 13, 2014 at 2:33 PM, Adrian Johnson
>>> <oriolebaltimore at gmail.com> wrote:
>>>> Hello again. sorry for question again.
>>>>
>>>> may be I was not clear in asking before.
>>>>
>>>>  I don't want to remove rows from matrix, since row names and column
>>>> names are identical in matrix.
>>>>
>>>>
>>>> I tried your suggestion and here is what I get:
>>>>
>>>>> fx <- function(x,x1){
>>>> + i <- as.matrix(x1[,c("V1","V2")])
>>>> + x[i]<-1
>>>> + x
>>>> + }
>>>>> fx(x, x1)
>>>>
>>>> Error in `[<-`(`*tmp*`, i, value = 1) : subscript out of bounds
>>>>
>>>>
>>>>
>>>>
>>>>> x[1:4,1:4]
>>>>        ABCA10 ABCA12 ABCA13 ABCA4
>>>> ABCA10      0      0      0     0
>>>> ABCA12      0      0      0     0
>>>> ABCA13      0      0      0     0
>>>> ABCA4       0      0      0     0
>>>>
>>>>
>>>>> x1[1:10,]
>>>>       V1       V2
>>>> 1   AKT3    TCL1A
>>>> 2  AKTIP    VPS41
>>>> 3  AKTIP    PDPK1
>>>> 4  AKTIP   GTF3C1
>>>> 5  AKTIP    HOOK2
>>>> 6  AKTIP    POLA2
>>>> 7  AKTIP KIAA1377
>>>> 8  AKTIP FAM160A2
>>>> 9  AKTIP    VPS16
>>>> 10 AKTIP    VPS18
>>>>
>>>>
>>>> For instance, now I will loop over x1, I go to first row, I get V1 and
>>>> check if if I have a row in x that have item in V1 and then check V2
>>>> exist in colnames, if match then I assign 1. If not I go to row 2.
>>>>
>>>> In some rows, it is possible that I will only see element in V2 that
>>>> exist in row names  and since element in V1 does not exist in X
>>>> matrix, I will give 0. (since matrix X has identical row and column
>>>> names, i feel it does not matter to check an element in column names
>>>> after we check in row names)
>>>>
>>>>
>>>>
>>>> now for instance, If in X1 if I see ABCA10 in x1$V1 and ABCA10 in
>>>> x1$V2 then in matrix X column 1 and row 1  should get 1.
>>>>
>>>> dput - follows..
>>>>
>>>> x <- structure(c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), .Dim = c(4L,
>>>> 4L), .Dimnames = list(c("ABCA10", "ABCA12", "ABCA13", "ABCA4"
>>>> ), c("ABCA10", "ABCA12", "ABCA13", "ABCA4")))
>>>>
>>>>
>>>> x1 <- structure(list(V1 = c("AKT3", "AKTIP", "AKTIP", "AKTIP", "AKTIP",
>>>> "AKTIP", "AKTIP", "AKTIP", "AKTIP", "AKTIP"), V2 = c("TCL1A",
>>>> "VPS41", "PDPK1", "GTF3C1", "HOOK2", "POLA2", "KIAA1377", "FAM160A2",
>>>> "VPS16", "VPS18")), .Names = c("V1", "V2"), row.names = c(NA,
>>>> 10L), class = "data.frame")
>>>>
>>>>
>>>>
>>>> Thanks for your time.
>>>>
>>>>
>>>>
>>>>
>>>> On Wed, Aug 13, 2014 at 12:51 PM, William Dunlap <wdunlap at tibco.com> wrote:
>>>>> You can replace the loop
>>>>>> for (i in nrow(x1)) {
>>>>>>    x[x1$V1[i], x1$V2[i]] <- 1;
>>>>>> }
>>>>> by
>>>>> f <- function(x, x1) {
>>>>>   i <- as.matrix(x1[, c("V1","V2")]) # 2-column matrix to use as a subscript
>>>>>   x[ i ] <- 1
>>>>>   x
>>>>> }
>>>>> f(x, x1)
>>>>>
>>>>> You will get an error if not all the strings in the subscript matrix
>>>>> are in the row or
>>>>> column names of x.  What do you want to happen in this case.  You can choose
>>>>> to first omit the bad rows in the subscript matrix
>>>>>     goodRows <- is.element(i[,1], dimnames(x)[1]) &  is.element(i[,2],
>>>>> dimnames(x)[2])
>>>>>     i <- i[goodRows, , drop=FALSE]
>>>>>     x[ i ] <- 1
>>>>> or you can choose to expand x to include all the names found in x1.
>>>>>
>>>>> It would be good if you included some toy data to better illustrate
>>>>> what you want to do.
>>>>> E.g., with
>>>>>   x <- array(0, c(3,3), list(Row=paste0("R",1:3),Col=paste0("C",1:3)))
>>>>>   x1 <- data.frame(V1=c("R1","R3"), V2=c("C2","C1"))
>>>>> the above f() gives
>>>>>> f(x, x1)
>>>>>     Col
>>>>> Row  C1 C2 C3
>>>>>   R1  0  1  0
>>>>>   R2  0  0  0
>>>>>   R3  1  0  0
>>>>> Is that what you are looking for?



More information about the R-help mailing list