[R] NMDS with missing data?

David Carlson dcarlson at tamu.edu
Mon May 13 17:44:33 CEST 2013


First. Do not use html messages. They are converted to plain text and your
table ends up a mess. See below. It appears the variables are all numeric?
If so, there are two standard approaches to handling multiple scales and
magnitudes with cluster analysis:

1. Use z-scores. The scale() function will convert each variable into a
standard score with a mean of 0 and a standard deviation of 1. Then use
Euclidean distance in the dist() function which will adjust for your missing
values.

2. Use prcomp() on the correlation matrix of the variables to extract a set
of principal components and use the principal component scores in the
cluster analysis. This may allow you to reduce the number of variables in
the data set if the 29 variables are correlated with one another.

-------------------------------------
David L Carlson
Associate Professor of Anthropology
Texas A&M University
College Station, TX 77840-4352

From: Elizabeth Beck [mailto:elizabethbeck0 at gmail.com] 
Sent: Friday, May 10, 2013 1:20 PM
To: dcarlson at tamu.edu
Cc: r-help at r-project.org
Subject: Re: [R] NMDS with missing data?

Hi David, 

You are right in that Bray-Curtis is not suitable for my dataset, and that
my variables are very different. Given your suggestions, I am struggling
with how to transform or standardize my data given that they vary so much.
Additionally, looking at the dist() package I am not sure which distance
measure would be most appropriate. Euclidean seems to most widely used but
I'm not sure if it is appropriate for myself (there much more help for
ecology data than toxicology). Given a sample of my data below ( total of
287 obs. of  29 variables) can you suggest a starting point?

SODIUM
K
CL
HCO3
ANION
CA
P
GLUCOSE
 CHOLEST
       GGT
   GLDH
CK
AST
PROTEIN
ALBUMIN
GLOBULIN
A_G
UA
BA
CORTICO
T3
T4
THYROID
145
3.3
102
24
22
2.9
2.45
9.8
5.7
3
3
678
5
34
15
19
0.79
180
6
70.97
1.31
12.77
0.102376
146
3.2
102
21
26
2.89
2.68
11.1
6.78
3
4
1290
9
36
18
18
1
170
13
79.1
3.51
18.78
0.186751
147
2.5
103
22
25
2.96
2.59
10
5.78
3
6
1582
11
35
17
18
0.94
272
10
65.84
1.84
15.5
0.118602
148
2.5
101
21
29
2.91
2.91
10.6
5.83
3
3
1479
8
35
17
18
0.94
317
8
74.9
2.59
20.68
0.125389

Thank you!
Elizabeth

On Thu, May 9, 2013 at 7:50 AM, David Carlson <dcarlson at tamu.edu> wrote:
Since you pass your entire data.frame to metaMDS(), your first error
probably comes from the fact that you have included ID as one of the
variables. You should look at the results of

str(dat)

You can drop cases with missing values using

> dat2 <- na.omit(dat)
> metaMDS(dat2[,-1])

would run the analysis on all but the first column (ID) with all the cases
containing complete data. But that assumes that sex and exposure are not
factors.

Or you could use one of the distance functions in dist() which adjust for
missing values. However dist() does not have an option to use Bray-Curtis
(the default in metaMDS()). Bray-Curtis is designed for comparing species
counts or proportions so it is not clear that it is an appropriate
dissimilarity measure for your data. Further, your data seem contain a
mixture of measurement scales and/or magnitudes so some variable
standardization or transformations are probably necessary before you can get
any useful results from MDS.

-------------------------------------
David L Carlson
Associate Professor of Anthropology
Texas A&M University
College Station, TX 77840-4352

-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On
Behalf Of Elizabeth Beck
Sent: Wednesday, May 8, 2013 3:39 PM
To: r-help at r-project.org
Subject: [R] NMDS with missing data?

Hi,
I'm trying to run NMDS (non-metric multidimensional scaling) with R vegan
(metaMDS) but I have a few NAs in my data set. I've tried to run it 2 ways.

The first way with my entire data set which includes variables such as ID,
sex, exposure, treatment, sodium, potassium, chloride....

mydata.mds<-metaMDS(dat)

I get the following error:

 in if (any(autotransform, noshare > 0, wascores) && any(comm < 0)) { :
  missing value where TRUE/FALSE needed
In addition: Warning messages:
1: In Ops.factor(left, right) : < not meaningful for factors
2: In Ops.factor(left, right) : < not meaningful for factors
3: In Ops.factor(left, right) : < not meaningful for factors
4: In Ops.factor(left, right) : < not meaningful for factors
5: In Ops.factor(left, right) : < not meaningful for factors

The second way with only those last biochemical variables (29 in total).

mydata.mds<-metaMDS(measurements)

I get this error:

Error in if (any(autotransform, noshare > 0, wascores) && any(comm < 0)) {
:
  missing value where TRUE/FALSE needed

My go to "na.rm=TRUE" does nothing. Any ideas on how to account for NAs and
if so which of the above options I should be using?
Thanks!
Elizabeth
        [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list