[R] Problem creation tensor

Wed Jul 18 11:38:59 CEST 2012

On Tue, Jul 17, 2012 at 12:31:38PM +0200, Peppe Ricci wrote:
> Hi guys,
> 
> I need some help to analyzing my data.
> I start to describe my data: I have 21 matrices, every matrix on the
> rows has users and on columns has items, in my case films.
> Element of index (i, j) represent the rating expressed by user i about item j.
> I have a matrix for each of professions.
> An example of a this type of matrix is:
> 
>                     item 1    item 2    item 3    item4
>   id user 1        1          ?              ?           5
>   id user 2        ?          3              3           ?
>   id user 3        2          ?              3           2
>   id user 4        ?          ?              ?           4
>   ...
> So user 1 don't like item 1 but he likes so much item 4, for item 2
> and 3 he hasn't expressed a rating, etc.
> I need to construct a tensor with n users, m items and 21 occupations.
> After I have construct the tensor I want apply Parafac.
> I read data from a CSV file and build each matrix for each occupation.
> 
> Didier Leibovici (author of PTAk package) suggested to me:
> 
> ok that's bit clearer you have 21 matrices ( 1 for each occupations)
> of users rating their preferences (from 1 to 5 but without rating all
> of them: missing values) of  m items.
> but I suppose the users are not the same across the 21 occupations
> (one has only one occupation .... if you're talking about
> working/living occupation)
> so you can't create a tensor n users x m items x 21 occupations
> but you can build the contingencies of preferences m items x 21
> occupations x 5 ratings
> 
> One way to build your tensor m x 21 x 5 is:
> M1 is the first occupation (users x m) ...
> UserItem <-rbind(M1,M2, ...M21)
> 
> m=1682
> 
> for (j in 1:m){
>     UserItem[,j] =factor(UserItem[,j],levels=1:5)
> }
> occ=factor(c(rep(1,dim(M1)[1]),rep(2,dim(M2)[1]),
> ...,rep(21,dim(M21)[1])),levels=1:21)
> 
> Z <- array(rep(0,m*21*5),c(m,21,5),
> list(paste("item",1:m,sep=""),paste("Occ",1:21,sep=""),c("pr1","pr2","pr3","pr4","pr5")))
> for ( i in 1:m){
>   as.matrix(table(occ, UserItem[,2]))
>   Z[i,,]=table(occ, UserItem[,i])
> }
> 
> Z.CAND <- CANPARA(Z,dim=7)
> 
> I have implemented this code but I have one error in correspondance of:
> 
>   for ( i in 1:m){
>         Z[i,,]=table(occ,UserItem[,i])
>   }
> 
> and error is:
> 
> Error in
> Z[i,,]=table(occ,UserItem[,i])
> the number of elements to be replaced is not a multiple of the length
> of substitution

Hi.

The problem in this code is that the command

  UserItem <- rbind(M1, M2, ..., M21)

produces a matrix and not a data.frame. Due to this, the commands

    UserItem[, j] <- factor(UserItem[, j], levels=1:5)

do not convert the columns to factors, but the columns remain numeric.
Due to this, the table created as

  table(occ, UserItem[, i])

may not have the full size, since the columns correspond only to
preferences, which do occur in UserItem[, i], and not to all possible
preferences.

Changing 

  UserItem <- rbind(M1, M2, ..., M21)

to

  UserItem <- data.frame(rbind(M1, M2, ..., M21))

can resolve the problem, since then the columns will be coerced to factors,
whose list of levels is complete, even if some level is not used.

For better clarity, consider the definition of the array in an equivalent
form

  Z <- array(0, dim=c(m, 21, 5),
  dimnames=list(paste("item", 1:m, sep=""), paste("Occ", 1:21, sep=""),
  c("pr1", "pr2", "pr3", "pr4", "pr5")))

which contains the names of the used arguments of the function array().

Hope this helps.

Petr Savicky.