[R] wanting to count instances of values in each cell of a series of simulated symmetric matrices of the same size

Wed Jun 2 07:05:54 CEST 2021

Bert, 

You are obviously correct about the diagonals. I was not thinking carefully. Typically they are expected to be at or near 0.5 in an outbred population but can theoretically go to 1.0 in completely inbred populations. This was subset from a madeup pedigree and I reused parents, hence the inbreeding.

Space is a concern since I will need to simulate more matrices for the same precision as the matrices (breeding populations) increase in size. However, I will certainly look into your suggestion. I am doing this on the side so it may take a few days.

Thank you for your kind attention. I will provide more definitive feedback later.

Mark
R. Mark Sharp, Ph.D.
Data Scientist and Biomedical Statistical Consultant
7526 Meadow Green St.
San Antonio, TX 78251
mobile: 210-218-2868
rmsharp using me.com

> On Jun 1, 2021, at 10:44 PM, Bert Gunter <bgunter.4567 using gmail.com> wrote:
> 
> Come again?! The diagonal values in your example are not all .5.
> 
> If space is not an issue, a straightforward approach is to collect all the matrices into a 3d array and use indexing.
> Here is a simple reprex (as you did not provide one in a convenient form, e.g via dput())
> 
> x <- matrix(1:9, nr = 3); y <- x+10
> diag(x) <- diag(y) <- 0
> print(x) ; print(y)
> ## Now you need to populate a 3 x 3 x 2 array with these matrices
> ## How you do this depends on your naming conventions
> ## You might use a loop, or ls() and assign(),
> ##  or collect your matrices into a list and use do.call() or ...
> ## You will *not*want to do this if you have lots of matrices:
> list_of_mats <- list(x,y) 
> arr <- array(do.call(c,list_of_mats), dim = c(3,3,length(list_of_mats)))
> arr
> arr[2,3,] ## all the values in the [2,3] cell of the matrices; do whatever you want with them.
> 
> Cheers,
> Bert
> 
> 
> 
> Bert Gunter
> 
> "The trouble with having an open mind is that people keep coming along and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
> 
> 
> On Tue, Jun 1, 2021 at 7:00 PM R. Mark Sharp via R-help <r-help using r-project.org <mailto:r-help using r-project.org>> wrote:
> I want to capture the entire distribution of values for each cell in a sequence of symmetric matrices of the same size. The diagonal values are all 0.5 so I need only the values above or below the diagonal. 
> 
> A small example with three of the structures I am wanting to count follows:
>        F      G      H      I     J
> F 0.6250 0.3750 0.2500 0.1875 0.125
> G 0.3750 0.6250 0.2500 0.1875 0.125
> H 0.2500 0.2500 0.5000 0.1875 0.125
> I 0.1875 0.1875 0.1875 0.5000 0.250
> J 0.1250 0.1250 0.1250 0.2500 0.500
> 
>        F      G      H      I     J
> F 0.5625 0.3125 0.1875 0.1250 0.125
> G 0.3125 0.5625 0.1875 0.1250 0.125
> H 0.1875 0.1875 0.5000 0.1875 0.125
> I 0.1250 0.1250 0.1875 0.5000 0.250
> J 0.1250 0.1250 0.1250 0.2500 0.500
> 
>         F       G      H       I      J
> F 0.50000 0.25000 0.1250 0.09375 0.0625
> G 0.25000 0.50000 0.1250 0.09375 0.0625
> H 0.12500 0.12500 0.5000 0.18750 0.1250
> I 0.09375 0.09375 0.1875 0.50000 0.2500
> J 0.06250 0.06250 0.1250 0.25000 0.5000
> 
> 
> To be more specific, I have coded up a solution for a single cell with the sequence of values (one from each matrix) in a vector. 
> 
> I used match() below and it works with a matrix but I do not know how to do what is in the if statements with matrices. Since the number of values and the values will be different among the various cells a simple array structure does not seem appropriate and I am assuming I will need to use a list but I would like to do as much as I can with matrices for speed and clarity.
> 
> #' Counts the number of occurrences of each kinship value seen for a pair of
> #' individuals.
> #'
> #' @examples
> #' \donttest{
> #' set.seed(20210529)
> #' kSamples <- sample(c(0, 0.0675, 0.125, 0.25, 0.5, 0.75), 10000, replace = TRUE,
> #'                    prob = c(0.005, 0.3, 0.15, 0.075, 0.0375, 0.01875))
> #' kVC <- list(kinshipValues = numeric(0),
> #'             kinshipCounts = numeric(0))
> #' for (kSample in kSamples) {
> #'   kVC <- countKinshipValues(kSample, kVC$kinshipValues, kVC$kinshipCounts)
> #' }
> #' kVC
> #' ## $kinshipValues
> #' ## [1] 0.2500 0.1250 0.0675 0.7500 0.5000 0.0000
> #' ##
> #' ## $kinshipCounts
> #' ## [1]  301 2592 5096 1322  592   97
> #' }
> #'
> #' @param kValue numeric value being counted (kinship value in
> #' \emph{nprcgenekeepr})
> #' @param kinshipValues vector of unique values of \code{kValue} seen
> #' thus far.
> #' @param kinshipCounts vector of the counts of the unique values of
> #' \code{kValue} seen thus far.
> #' @export
> countKinshipValues <- function(kValue, kinshipValues = numeric(0),
>                               kinshipCounts = numeric(0)) {
>   kinshipValue <- match(kValue, kinshipValues, nomatch = -1L)
>   if (kinshipValue == -1L) {
>     kinshipValues <- c(kinshipValues, kValue)
>     kinshipCounts[length(kinshipCounts) + 1] <- 1
>   } else {
>     kinshipCounts[kinshipValue] <- kinshipCounts[kinshipValue] + 1
>   }
>   list(kinshipValues = kinshipValues,
>        kinshipCounts = kinshipCounts)
> }
> 
> Mark
> 
> 
> R. Mark Sharp, Ph.D.
> Data Scientist and Biomedical Statistical Consultant
> 7526 Meadow Green St.
> San Antonio, TX 78251
> mobile: 210-218-2868
> rmsharp using me.com <mailto:rmsharp using me.com>
> 
> ______________________________________________
> R-help using r-project.org <mailto:R-help using r-project.org> mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help <https://stat.ethz.ch/mailman/listinfo/r-help>
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html <http://www.r-project.org/posting-guide.html>
> and provide commented, minimal, self-contained, reproducible code.

	[[alternative HTML version deleted]]