[R] organizing my data before doing a cluster analysis

David L Carlson dcarlson at tamu.edu
Tue Jan 26 18:55:37 CET 2016


Your question involves a number of basic features of R. First, don't use html formatting in your email because the r-help list strips out the formatting. Second, use the R function dput() to paste data into your email since we can transfer that to R easily. I've converted your table to an R data frame called fishing:

> dput(fishing)
structure(list(Trip = structure(1:3, .Label = c("A", "B", "C"
), class = "factor"), Depth = c(14L, 8L, 22L), Species1 = c(1L, 
0L, 1L), Species2 = c(1L, 1L, 0L), Species3 = c(0L, 1L, 1L)), .Names = c("Trip", 
"Depth", "Species1", "Species2", "Species3"), class = "data.frame", row.names = c(NA, 
-3L))
> fishing
  Trip Depth Species1 Species2 Species3
1    A    14        1        1        0
2    B     8        0        1        1
3    C    22        1        0        1

To get just the species columns and transpose:

> fishclus <- t(fishing[, 3:5])
> fishclus
         [,1] [,2] [,3]
Species1    1    0    1
Species2    1    1    0
Species3    0    1    1

Now you are ready to use cluster analysis on fishclus. Once you have decided what kind of cluster analysis, what distance measure, and how many groups to create you can compare those groups to your Depth variable using boxplots or something similar.

You would benefit by learning more about R before going much farther. Go to this webpage:

https://cran.r-project.org/other-docs.html

If you have trouble deciding which one(s), here are some suggestions:

https://cran.r-project.org/doc/contrib/usingR.pdf
https://cran.r-project.org/doc/contrib/Paradis-rdebuts_en.pdf
https://cran.r-project.org/doc/contrib/Owen-TheRGuide.pdf

-------------------------------------
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352

-----Original Message-----
From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Michael
Sent: Tuesday, January 26, 2016 10:49 AM
To: r-help at r-project.org
Subject: [R] organizing my data before doing a cluster analysis

I have been reading the different cluster analysis methods available in R.  I have a problem getting my data in the correct format so I can use these methods.  I explain below.


I am trying to cluster different fish species to see what fish are caught with each other on a commercial fishing trips.  I gave each fish species a 1 if it was caught on a trip and a 0 if it was not.  I also have the depth where the fish were caught on each trip.  So my data looks like this:

              Depth   Species1    Species 2   Species 3
Trip A       14          1                1               0
Trip B        8           0                1               1
Trip C       22          1                0               1

I looked at the cluster analysis examples in R and they have the data in a format with variables for the columns the rows are the objects you want to be clustered.  When I transpose my data I get depth as a row.  I show an example below:

                     Trip A      Trip B      Trip C
Depth               14            8            22
Species 1           1            0             1
Species 2           1            1             0
Species 3           0            1             1

So the R cluster program will treat depth as an object that will be clustered.  I don't know how to still incorporate depth into the analysis, and also not have it be treated as an object that will be clustered.  Any help would be greatly appreciated.


Mike



	[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list