[R] optimized R-selection and R-replacement inside a matrix need, strings coerced to factors

Christine SINOQUET christine.sinoquet at univ-nantes.fr
Sat Feb 6 19:15:11 CET 2010


Hello,

I encounter two problems :

First, I need to modify some huge arrays (2000 individuals x 50 000 
variables).

To format the data, I think I should benefit from optimized R-selection 
and R-replacement inside a matrix and prohibite a naive use of loops.

Thank you in advance for providing information about the following problem :

file A  :
2 000 individuals in rows
50 000 columns corresponding to 50 000 variables : each value belongs to 
{0, 1, 2}


file B :
50 000 variables in rows
1st column : character (A,C,G,T) corresponding to code 0
2nd colomn : character corresponding to code 1

convention:
if A[,j]=0, one wants to replace 0 with  character in  B[j,1] twice
if A[,j]=1, one wants to replace 1 with  character in  B[j,1] and 
character in B[j,2]
if A[,j]=2, one wants to replace 2 with  character in  B[j,2] and 
character in B[j,2]

C <- matrix(0,2000,0) # initialization to void matrix

for(j in 1:2000){

 c <- A[,j]
 zeros <- which(c==0);
 ones <- which(c==1);
 twos <- which(c==2);
 rm(c)

 c1 <- matrix("Z",2000)
 c2 <- matrix("Z",2000)
c1[zeros] <-  B$V1[j]; c2[zeros]  <-B$V1[j]
c1[ones]  <-  B$V1[j]; c2[ones]   <-B$V2[j]
c1[twos]  <-  B$V2[j]; c2[twos]   <-B$V2[j]

C <- cbind(C, cbind(c1,c2))
}

I do think some more elaborated solution might exist.

_______________________
However, testing this naive  implementation restricting to 6 individuals 
and variable number 6 (in B), I encounter the problem of character 
strings coerced to numbers.

coding.txt
*allele0 allele1
A C
G T
A G
G C
G T
A T*


c <- data.frame(x=1:6,y=c(0,1,2,0,1,2))
A <- c$y
zeros <- which(A==0);
ones <- which(A==1);
twos <- which(A==2);
rm(A)

c1 <- matrix("Z",6)
c2 <- matrix("Z",6)

B <- read.table(file="coding.txt",h=T)

c1[zeros] <-  B$allele0[6]; c2[zeros]  <-B$allele0[6]
c1[ones]  <-  B$allele0[6]; c2[ones]   <-B$allele1[6]
c1[twos]  <-  B$allele1[6]; c2[twos]   <-B$allele1[6]

results obtained for c1 and c2 :
 > c1
     [,1]
[1,] "1"
[2,] "1"
[3,] "3"
[4,] "1"
[5,] "1"
[6,] "3"
 > c2
     [,1]
[1,] "1"
[2,] "3"
[3,] "3"
[4,] "1"
[5,] "3"
[6,] "3"

Thanks in advance for your help.



More information about the R-help mailing list