# [R] sample and rearrange

David Winsemius dwinsemius at comcast.net
Thu May 20 00:24:21 CEST 2010

```On May 19, 2010, at 5:01 PM, Wu Gong wrote:

>
> It took me a day to make the sense of Jim's code :(
>
> Hope my comments will help.
>
> ## Transform data to matrix
> x <- as.matrix(x)
>
> ## Apply function to each row
> ## Create a function to rearrange bases
> result <- apply(x, 1, function(eachrow){
>
> ## Split each gene to bases
> ## Exclude the fist column which is id
> 	bases <- strsplit(eachrow[-1], '')
>
> ## Transform list to matrix
> ## Because the result of function strsplit is a list
> 	bases <- do.call(rbind,bases)
>
> ## Recombine bases by connecting all bases in each column
> 	recombine <- apply(bases, 2, paste, collapse="")
>
> ## Transpos recombine
> 	cbind(eachrow, t(recombine))
> })
>
> ## Transpose the result matrix
> result <- t(result)

It will come more quickly as you learn more. I also looked at Jimm's
solution by pulling it apart, although I did not spend a whole day at
it, maybe ten minutes. I thought a three line version was more
informative, because it did not make everything scroll of the console:

> x <- read.table(textConnection("SampleID        A1      A2
A3      A4
+  GM920222        GATTGCC GATTGCC GATAGAC GATAGAC
+  GM930040        GTCATCA GAGTGCA ACTATAA GATTGCC
+  GM930040        GTCATCA GAGTGCA ACTATAA GATTGCC"), header=TRUE,
as.is=TRUE)
> x <- as.matrix(x)
> t(apply(x, 1, function(.row){
+      # separate characters
+      z <- do.call(rbind, strsplit(.row[-1], ''))
+      # combine each column
+      z.col <- t(apply(z, 2, paste, collapse=''))
+      cbind(.row, z.col)
+  }))
[,1]       [,2]   [,3]   [,4]   [,5]   [,6]   [,7]   [,8]
[1,] "GM920222" "GGGG" "AAAA" "TTTT" "TTAA" "GGGG" "CCAA" "CCCC"
[2,] "GM930040" "GGAG" "TACA" "CGTT" "ATAT" "TGTG" "CCAC" "AAAC"
[3,] "GM930040" "GGAG" "TACA" "CGTT" "ATAT" "TGTG" "CCAC" "AAAC"

# I usually see if I can get the inner-most function to work:

> z <- do.call(rbind, strsplit(x[1,], ''))
Warning message:
In function (..., deparse.level = 1)  :
number of columns of result is not a multiple of vector length (arg
2)
> z
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
SampleID "G"  "M"  "9"  "2"  "0"  "2"  "2"  "2"

#So I guess I didn't get an exact replica since Jim had excluded the
first element in the row

A1       "G"  "A"  "T"  "T"  "G"  "C"  "C"  "G"
A2       "G"  "A"  "T"  "T"  "G"  "C"  "C"  "G"
A3       "G"  "A"  "T"  "A"  "G"  "A"  "C"  "G"
A4       "G"  "A"  "T"  "A"  "G"  "A"  "C"  "G"
> z <- do.call(rbind, strsplit(x[1,-1], ''))  # there ... cleaner
> z
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
A1 "G"  "A"  "T"  "T"  "G"  "C"  "C"
A2 "G"  "A"  "T"  "T"  "G"  "C"  "C"
A3 "G"  "A"  "T"  "A"  "G"  "A"  "C"
A4 "G"  "A"  "T"  "A"  "G"  "A"  "C"

That seemed to help understand what was going on in the middle of the
functions. Now I wondered if the transpose could be avoided. So I

> z <- do.call(cbind, strsplit(x[1,-1], ''))
> z
A1  A2  A3  A4
[1,] "G" "G" "G" "G"
[2,] "A" "A" "A" "A"
[3,] "T" "T" "T" "T"
[4,] "T" "T" "A" "A"
[5,] "G" "G" "G" "G"
[6,] "C" "C" "A" "A"
[7,] "C" "C" "C" "C"
> z.col <- apply(z, 2, paste, collapse='')
> z.col
A1        A2        A3        A4
"GATTGCC" "GATTGCC" "GATAGAC" "GATAGAC"

## Nope that does not work:
## So try apply on the columns ...
> z.col <- apply(z, 1, paste, collapse='')
> z.col
 "GGGG" "AAAA" "TTTT" "TTAA" "GGGG" "CCAA" "CCCC"

## OK that worked. Now see if it works inside the whole sequence:

> x <- as.matrix(x)
> t(apply(x, 1, function(.row){
+      # separate characters
+      z <- do.call(cbind, strsplit(.row[-1], ''))
+      # combine each column
+      z.col <- apply(z, 1, paste, collapse='')
+      cbind(.row, z.col)
+  }))
[,1]       [,2]       [,3]       [,4]       [,5]       [,
6]       [,7]
[1,] "GM920222" "GM920222" "GM920222" "GM920222" "GM920222" "GM920222"
"GM920222"
[2,] "GM930040" "GM930040" "GM930040" "GM930040" "GM930040" "GM930040"
"GM930040"
[3,] "GM930040" "GM930040" "GM930040" "GM930040" "GM930040" "GM930040"
"GM930040"

Well not exactly.
[,8]   [,9]   [,10]  [,11]  [,12]  [,13]  [,14]
[1,] "GGGG" "AAAA" "TTTT" "TTAA" "GGGG" "CCAA" "CCCC"
[2,] "GGAG" "TACA" "CGTT" "ATAT" "TGTG" "CCAC" "AAAC"
[3,] "GGAG" "TACA" "CGTT" "ATAT" "TGTG" "CCAC" "AAAC"
> x <- as.matrix(x)
> t(apply(x, 1, function(.row){
+      # separate characters
+      z <- do.call(cbind, strsplit(.row[-1], ''))
+      # combine each column
+      z.col <- apply(z, 1, paste, collapse='')
## and add the transpose columns:
+      cbind(.row, t(z.col))
+  }))
[,1]       [,2]   [,3]   [,4]   [,5]   [,6]   [,7]   [,8]
[1,] "GM920222" "GGGG" "AAAA" "TTTT" "TTAA" "GGGG" "CCAA" "CCCC"
[2,] "GM930040" "GGAG" "TACA" "CGTT" "ATAT" "TGTG" "CCAC" "AAAC"
[3,] "GM930040" "GGAG" "TACA" "CGTT" "ATAT" "TGTG" "CCAC" "AAAC"

So I got to the same place but didn't really achieve any savings.

>
> -----
> A R learner.

David "also still learning" Winsemius, MD
West Hartford, CT

```