[R] sample and rearrange

jim holtman jholtman at gmail.com
Wed May 19 14:53:42 CEST 2010


try this:

> x <- read.table(textConnection("SampleID        A1      A2      A3      A4
+ GM920222        GATTGCC GATTGCC GATAGAC GATAGAC
+ GM930040        GTCATCA GAGTGCA ACTATAA GATTGCC
+ GM930040        GTCATCA GAGTGCA ACTATAA GATTGCC
+ GM960023        GATTGCC GTCATCA GATTGCC GATTGCC
+ GM920224        ACTAGAA GTCATCA GTCATCA ACTAGAA
+ GM920224        ACTAGAA GTCATCA GTCATCA ACTAGAA
+ GM920034        GATTGCC GTCATCA GATTGCA GATTGCA
+ GM920096        GATTGCC GATTGCC GATTGCA GATTGCC
+ GM930029        GTCATCA GATTGCC GTCATCA GATTGCC
+ GM940031        GATTGCC GAGTGCA GATTGCA ACTAGAA
+ GM960028        GATTGCC GAGTGCA GATTGCA ACTAGAA
+ GM980007        GTCATCA GATTGCC ACTTGAA GTCATCA
+ GM970009        ACTAGAA GTCAGAA GTCAGCA ACTAGCA
+ GM930026        ACTAGAA GAGTGCA GAGTGCA ACTAGAA
+ GM920031        GATTGCC GTCATCA GATTGCC GATTGCC
+ GM990105        GATTGCC GATTGCC GTCAGCA GTCAGCA
+ GM920202        GATTGCC GATTGCC GATTGCC GATTGCC
+ GM920089        GAGTGCA GTCAGAA ACTATCA GATTGCC
+ GM980051        ACTAGAA ACTAGAA GATAGCC GATAGCC
+ GM930109        GTCATCA GAGTGCA GTTTTAA ACTAGAA
+ GM940039        GTCATCA GAGTGCA GTTTGCC ACTTTCA
+ GM050099        GAGTGCA GTCAGAA GTTATCC ACTTTCA
+ GM050099        GAGTGCA GTCAGAA GTTATCC ACTTTCA
+ GM030005        ACTAGAA GAGTGCA ACTAGAA ACTAGAA
+ GM050009        ACTAGAA GATTGCC GATTGCC ACTAGAA
+ GM990027        GATTGCC GAGTGCA GATTGCA GATTGCC"), header=TRUE, as.is=TRUE)
> x <- as.matrix(x)
> t(apply(x, 1, function(.row){
+     # separate characters
+     z <- do.call(rbind, strsplit(.row[-1], ''))
+     # combine each column
+     z.col <- t(apply(z, 2, paste, collapse=''))
+     # add the ID
+     cbind(.row[1], z.col)
+ }))
      [,1]       [,2]   [,3]   [,4]   [,5]   [,6]   [,7]   [,8]
 [1,] "GM920222" "GGGG" "AAAA" "TTTT" "TTAA" "GGGG" "CCAA" "CCCC"
 [2,] "GM930040" "GGAG" "TACA" "CGTT" "ATAT" "TGTG" "CCAC" "AAAC"
 [3,] "GM930040" "GGAG" "TACA" "CGTT" "ATAT" "TGTG" "CCAC" "AAAC"
 [4,] "GM960023" "GGGG" "ATAA" "TCTT" "TATT" "GTGG" "CCCC" "CACC"
 [5,] "GM920224" "AGGA" "CTTC" "TCCT" "AAAA" "GTTG" "ACCA" "AAAA"
 [6,] "GM920224" "AGGA" "CTTC" "TCCT" "AAAA" "GTTG" "ACCA" "AAAA"
 [7,] "GM920034" "GGGG" "ATAA" "TCTT" "TATT" "GTGG" "CCCC" "CAAA"
 [8,] "GM920096" "GGGG" "AAAA" "TTTT" "TTTT" "GGGG" "CCCC" "CCAC"
 [9,] "GM930029" "GGGG" "TATA" "CTCT" "ATAT" "TGTG" "CCCC" "ACAC"
[10,] "GM940031" "GGGA" "AAAC" "TGTT" "TTTA" "GGGG" "CCCA" "CAAA"
[11,] "GM960028" "GGGA" "AAAC" "TGTT" "TTTA" "GGGG" "CCCA" "CAAA"
[12,] "GM980007" "GGAG" "TACT" "CTTC" "ATTA" "TGGT" "CCAC" "ACAA"
[13,] "GM970009" "AGGA" "CTTC" "TCCT" "AAAA" "GGGG" "AACC" "AAAA"
[14,] "GM930026" "AGGA" "CAAC" "TGGT" "ATTA" "GGGG" "ACCA" "AAAA"
[15,] "GM920031" "GGGG" "ATAA" "TCTT" "TATT" "GTGG" "CCCC" "CACC"
[16,] "GM990105" "GGGG" "AATT" "TTCC" "TTAA" "GGGG" "CCCC" "CCAA"
[17,] "GM920202" "GGGG" "AAAA" "TTTT" "TTTT" "GGGG" "CCCC" "CCCC"
[18,] "GM920089" "GGAG" "ATCA" "GCTT" "TAAT" "GGTG" "CACC" "AAAC"
[19,] "GM980051" "AAGG" "CCAA" "TTTT" "AAAA" "GGGG" "AACC" "AACC"
[20,] "GM930109" "GGGA" "TATC" "CGTT" "ATTA" "TGTG" "CCAA" "AAAA"
[21,] "GM940039" "GGGA" "TATC" "CGTT" "ATTT" "TGGT" "CCCC" "AACA"
[22,] "GM050099" "GGGA" "ATTC" "GCTT" "TAAT" "GGTT" "CACC" "AACA"
[23,] "GM050099" "GGGA" "ATTC" "GCTT" "TAAT" "GGTT" "CACC" "AACA"
[24,] "GM030005" "AGAA" "CACC" "TGTT" "ATAA" "GGGG" "ACAA" "AAAA"
[25,] "GM050009" "AGGA" "CAAC" "TTTT" "ATTA" "GGGG" "ACCA" "ACCA"
[26,] "GM990027" "GGGG" "AAAA" "TGTT" "TTTT" "GGGG" "CCCC" "CAAC"


On Wed, May 19, 2010 at 8:29 AM, Laetitia Schmid <laetitia at gmt.su.se> wrote:
> Dear Wu Gong and Peter Ehlers,
> thank you very much for your help debugging my script.
>
> Now I have a general following up question:
> Is there a straightforward way to rearrange the following dataset so that
> all first letters of each column will be combined in one column, all the
> second letters in a second column, all the third ones in a third column and
> so on, resulting in 7 columns,
> i.e. for the first individual (GM920222) GGGG AAAA TTTT TTAA GGGG CCAA CCCC
> ?
>
> Thank you very much,
> Laetitia
>
> SampleID        A1      A2      A3      A4
> GM920222        GATTGCC GATTGCC GATAGAC GATAGAC
> GM930040        GTCATCA GAGTGCA ACTATAA GATTGCC
> GM930040        GTCATCA GAGTGCA ACTATAA GATTGCC
> GM960023        GATTGCC GTCATCA GATTGCC GATTGCC
> GM920224        ACTAGAA GTCATCA GTCATCA ACTAGAA
> GM920224        ACTAGAA GTCATCA GTCATCA ACTAGAA
> GM920034        GATTGCC GTCATCA GATTGCA GATTGCA
> GM920096        GATTGCC GATTGCC GATTGCA GATTGCC
> GM930029        GTCATCA GATTGCC GTCATCA GATTGCC
> GM940031        GATTGCC GAGTGCA GATTGCA ACTAGAA
> GM960028        GATTGCC GAGTGCA GATTGCA ACTAGAA
> GM980007        GTCATCA GATTGCC ACTTGAA GTCATCA
> GM970009        ACTAGAA GTCAGAA GTCAGCA ACTAGCA
> GM930026        ACTAGAA GAGTGCA GAGTGCA ACTAGAA
> GM920031        GATTGCC GTCATCA GATTGCC GATTGCC
> GM990105        GATTGCC GATTGCC GTCAGCA GTCAGCA
> GM920202        GATTGCC GATTGCC GATTGCC GATTGCC
> GM920089        GAGTGCA GTCAGAA ACTATCA GATTGCC
> GM980051        ACTAGAA ACTAGAA GATAGCC GATAGCC
> GM930109        GTCATCA GAGTGCA GTTTTAA ACTAGAA
> GM940039        GTCATCA GAGTGCA GTTTGCC ACTTTCA
> GM050099        GAGTGCA GTCAGAA GTTATCC ACTTTCA
> GM050099        GAGTGCA GTCAGAA GTTATCC ACTTTCA
> GM030005        ACTAGAA GAGTGCA ACTAGAA ACTAGAA
> GM050009        ACTAGAA GATTGCC GATTGCC ACTAGAA
> GM990027        GATTGCC GAGTGCA GATTGCA GATTGCC
> GM990066        GATTGCC GTCATCA GTCATCA GATTGCC
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?



More information about the R-help mailing list