[R] transpose dataset to PC-ORD?

Dave Roberts droberts at montana.edu
Tue May 23 23:46:13 CEST 2006


Daniel,

     I can help somewhat I think.  PC-ORD also allows data input in what 
it calls "database" format, where each row is

sample, taxon, abundance

There as many rows/sample as there are non-zero species, and only three 
columns.  To get your taxon data.frame (currently samples as rows, 
species as columns, called data in your example) in that format try

dematrify(data,file='whatever.csv')

with the function pasted below (watch out for email-altered line 
breaks).  That will create a CSV file you can import into PC-ORD.

     Just to encourage you a little, you really should try the Ecology 
packages in R.  See packages vegan, ade-4, and labdsv, for example, and 
take a look at

http://ecology.msu.montana.edu/labdsv/R

Dave R.
*********************************************************************
dematrify <- function (df,filename=NULL,sep=",")
{
     tmp <- which(df>0,arr.ind=TRUE)
     stack <- NULL
     samples <- row.names(tmp)
     taxon <- names(df)[tmp[,2]]
     abund <- rep(NA,nrow(tmp))
     for (i in 1:nrow(tmp)) {
         abund[i] <- df[samples[i],taxon[i]]
         stack <- 
rbind(stack,paste(samples[i],sep,taxon[i],sep,abund[i],"\n",sep=""))
     }
     if (is.null(filename)) {
         tmp2 <- cbind(samples,taxon,abund)
         tmp2 <- data.frame(tmp2[order(tmp2[,1]),])
         return(tmp2)
     }
     else {
         stack <- sort(stack)
         sink(file=filename)
         cat(stack)
         sink()
     }
}

Daniel Gruner wrote:
> Hello:
> 
> I need to take a species-sample matrix and transpose it to the format 
> used by PC-ORD for analysis. Unfortunately, the number of species is 
> very large (>5000), and so this operation cannot be performed simply 
> in an application like Excel, which has a 255 column limit. So, I 
> wrote relatively simple code in R that I hoped would do this 
> (appended below). But there are glitches.
> 
> The format needed for PC-ORD (where "NA" shows an empty cell):
> 
> NA,3,sites,NA
> NA,3,species,NA
> NA,Q,Q,Q
> NA,sp1,sp2,sp3
> site1,1,0,0
> site2,0,1,2
> site3,0,3,0
> 
> 2 cells in first row indicate number of samples (rows), the second 
> column indicates number of species (columns), the third row indicates 
> variable type (Q = quantitative), and the fourth row shows column 
> headers (species names). So, one can create a transposable matrix in 
> a spreadsheet where 5000+ species are the rows:
> 
> NA,NA,NA,NA,site1,site2,site3
> 3,3,Q,sp1,1,0,0
> sites,species,Q,sp2,0,1,3
> NA,NA,Q,sp3,0,2,0
> 
> 
> It is important that the data file written out is totally clean and 
> ready to go for PC-ORD, because I cannot open and edit it in a 
> spreadsheet. However, the code performs the transpose operation and 
> writes the file, but now the former row IDs are the first row in the 
> new file (NA,1,2,3), and the 4 leading spaces are "X, X.1, X.2, 
> X.3".  I'd like to delete the first row and delete the first 4 values 
> of column1, without deleting the column.
> 
> NA,1,2,3
> X,3,islands,NA
> X.1,3,speciesNA
> X.2,Q,Q,Q
> X.3,sp1,sp2,sp3
> site1,1,0,0
> site2,0,1,2
> site3,0,3,0
> 
> I have tried various tricks that I will not list/belabor here 
> (various col.names, row.names, header, Extract, etc commands). Any 
> further hints on code that will either stop R from adding these, or 
> strip them at the end?
> 
> (PS, yes, I can learn how to my multivariate analyses in R and skip 
> PC-ORD, but I am time limited on this one, and it seems that this 
> code could be very useful in numerous ways)
> 
> Many thanks for the help,
> Dan Gruner
> (Windows XP, R vers2.2)
> 
> 
> 
> ##transpose datasets to convert to PC-ORD format
> 
> data<-read.csv("data.csv", header=TRUE, as.is=T,
>     strip.white=T, na.strings="NA")
> data<-as.matrix(data)
> data.trans <- t(data)
> write.csv(data.trans, file = "datatransp.csv",
>     quote = F, na = "")
> 
> 
> 
> *******************************
> 
> Daniel S. Gruner, Postdoctoral Scholar
> Bodega Marine Lab, University of California -- Davis
> PO Box 247, 2099 Westside Rd
> Bodega Bay, CA 94923-0247
> (o) 707.875.2022  (f) 707.875.2009   (m) 707.338.5722
> email:  dsgruner_at_ucdavis.edu
> http://www.bml.ucdavis.edu/facresearch/gruner.html
> http://www.hawaii.edu/ant/
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
> 
> 


-- 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
David W. Roberts                                     office 406-994-4548
Professor and Head                                      FAX 406-994-3190
Department of Ecology                         email droberts at montana.edu
Montana State University
Bozeman, MT 59717-3460



More information about the R-help mailing list