[R] Creating submatrices from a dataframe, depending on factors in sample names

Bert Gunter gunter.berton at gene.com
Mon Dec 1 18:46:03 CET 2014


I do not have the patience to study your request carefully, but does
the following help?

> a <- 1:3
> x <- outer(a,a,paste,sep=".")
> x
     [,1]  [,2]  [,3]
[1,] "1.1" "1.2" "1.3"
[2,] "2.1" "2.2" "2.3"
[3,] "3.1" "3.2" "3.3"
> x[upper.tri(x)]
[1] "1.2" "1.3" "2.3"

> x[upper.tri(x,diag=TRUE)]
[1] "1.1" "1.2" "2.2" "1.3" "2.3" "3.3"

This gives you a vector all possible pairs (including identical pairs
or not) of values of a, which you could then loop over as an index to
do what you want, I think.

If this is not what you want, just ignore without replying.

Cheers,
Bert


Bert Gunter
Genentech Nonclinical Biostatistics
(650) 467-7374

"Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom."
Clifford Stoll




On Mon, Dec 1, 2014 at 8:47 AM, Tim Richter-Heitmann
<trichter at uni-bremen.de> wrote:
> Hello there,
>
> this is a cross-post of a stack-overflow question, which wasnt answered, but
> is very important for my work. Apologies for breaking any rules, but i do
> hope for some help from the list instead:
>
> I have a huge matrix of pairwise similarity percentages between different
> samples. The samples are belonging to groups. The groups are determined by
> the suffix "_n" in the row.names/header names.
> In the first step, i wanted to create submatrices consisting of all pairs
> within single groups (i.e. for all samples from "_1").
> However, I realized that i need to know all pairwise submatrices, between
> all combination of groups. So, i want to create (a list of) vectors that are
> named "_n1 vs _n2" (or similar) for all combinations of n, as illustrated by
> the colored rectangulars:
>
> http://i.stack.imgur.com/XMkxj.png
>
> Reproducible code, as provided by helpful Stack Overflow members, dealing
> with identical "_n"s.
>
>
>         df <- structure(list(HQ673618_1 = c(NA, 90.8, 89.8, 89.6, 89.8,
> 88.9,
>         87.8, 88.2, 88.3), HQ674317_1 = c(90.8, NA, 98.6, 97.7, 98.4,
>         97.4, 94.9, 96.2, 95.1), EU686630_1 = c(89.8, 98.6, NA, 98.4,
>         98.9, 97.7, 95.4, 96.4, 95.8), EU686593_2 = c(89.6, 97.7, 98.4,
>         NA, 98.1, 96.8, 94.4, 95.6, 94.8), JN166322_2 = c(89.8, 98.4,
>         98.9, 98.1, NA, 97.5, 95.3, 96.5, 95.9), EU491340_2 = c(88.9,
>         97.4, 97.7, 96.8, 97.5, NA, 96.5, 97.7, 96), AB694259_3 = c(87.8,
>         94.9, 95.4, 94.4, 95.3, 96.5, NA, 98.3, 95.9), AB694258_3 = c(88.2,
>         96.2, 96.4, 95.6, 96.5, 97.7, 98.3, NA, 95.8), AB694462_3 = c(88.3,
>         95.1, 95.8, 94.8, 95.9, 96, 95.9, 95.8, NA)), .Names =
> c("HQ673618_1",
>         "HQ674317_1", "EU686630_1", "EU686593_2", "JN166322_2",
> "EU491340_2",
>         "AB694259_3", "AB694258_3", "AB694462_3"), class = "data.frame",
> row.names = c("HQ673618_1",
>         "HQ674317_1", "EU686630_1", "EU686593_2", "JN166322_2",
> "EU491340_2",
>         "AB694259_3", "AB694258_3", "AB694462_3"))
>
>
>         indx <- gsub(".*_", "", names(df))
>         sub.matrices <- lapply(unique(indx), function(x) {
>           temp <- which(indx %in% x)
>           df[temp, temp]
>         })
>         unique_values <- lapply(sub.matrices, function(x) x[upper.tri(x)])
>         names(unique_values) <- unique(indx)
>
> This code needs to be expanded to form sub.matrices for any combination of
> unique indices in temp.
>
>
> Thank you so much!
>
>
>
>
> --
> Tim Richter-Heitmann (M.Sc.)
> PhD Candidate
>
>
>
> International Max-Planck Research School for Marine Microbiology
> University of Bremen
> Microbial Ecophysiology Group (AG Friedrich)
> FB02 - Biologie/Chemie
> Leobener Straße (NW2 A2130)
> D-28359 Bremen
> Tel.: 0049(0)421 218-63062
> Fax: 0049(0)421 218-63069
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list