[R] Help on MFAmix to Reduce Dimensions for a Genetic Dataset

Fri Jul 5 16:25:54 CEST 2019

Please read the Posting Guide (again?).

1) Most attachments are stripped by the mailing list. Yours was. It is far more practical to embed a small subset of data in the email as the output of dput() than to attach files.

2) This is the R-help mailing list, but your question has no R code in it. That is a strong indicator that it may be off-topic. Another is that you seem to be asking how to use a specialized contributed package (PCAmixdata? you did not say...) rather than about the R programming language. (Would you ask for help on nuclear physics in an English class?)

Just from your description I suspect you may need to familiarize yourself with factors and what they are used for in R. Both the genetic markers and groups vectors may be candidates to be factors, though I know nearly nothing about genetic data analysis so that could be incorrect.

Also (possibly incorrect as I don't know about this package) the use of unique values (seq) in an input named "groups" seems  counter to the typical usage of such a vector. Have you read the vignette [1]? It does not use unique values for groups.

As for your error... remove them if the analysis won't support leaving them in. You may be able to re-introduce them later, but really, if you are reducing dimensionality then aren't constant values the most obvious candidates for reduction?

[1] https://cran.r-project.org/web/packages/PCAmixdata/vignettes/PCAmixdata.html

On July 4, 2019 10:25:19 AM PDT, "Xu, Bingze" <bingzexu using wustl.edu> wrote:
>Dear R Help Members,
>
>
>I am a student from Washington University in St. Louis and I am working
>on a genetic project in R. In this dataset, there are 497 rows and
>11,226 columns. Every row is a genetic marker series for a particular
>hybrid, and every column is a genetic marker with value 1, 0, -1 and
>NA. I am trying to reduce the dimensions of this dataset because I want
>to combine this one and other weather datasets for further predictions.
>
>
>However I encountered two problems after I read the package
>instructions carefully:
>
>1, For groups and name.groups, I set them as seq(1, ncol(geno)) and
>colnames(geno) respectively, where geno is the variable name for
>test_fill. I have attached this dataset for your reference. I am
>wondering if these settings are correct.
>
>
>2, Error: There are columns in X.quali where all the categories are
>identical keeps popping out. Since I do not know much about the genetic
>dataset and I do not want to remove the columns even if those two
>columns are identical, what can I do next?
>
>
>Thank you very much and I look forward to your reply.
>
>Best wishes,
>Bingze Xu
>Graduate Student - M.S. in Business Analytics
>Washington University in St. Louis
>Email: bingzexu using wustl.edu
>Phone: +1 314-665-8323
>
>
>______________________________________________
>R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

-- 
Sent from my phone. Please excuse my brevity.