[R] variables - data-structure

Peter Dalgaard p.dalgaard at biostat.ku.dk
Sat Dec 18 15:25:06 CET 2004


Helmut Kudrnovsky <hellik at web.de> writes:

> dear R-friends,
> 
> i`ve got a large dataset of  vegetation-samples with about 500
> variables(=species) in the following format:
> 
> 1 spec1
> 1 spec23
> 1 spec54
> 1 spec63
> 2 spec1
> 2 spec2
> 2 spec253
> 2 spec300
> 2 spec423
> 3 spec20
> 3 spec88
> 3 spec121
> 3 spec200
> 3 spec450
> .
> .
> 
> this means:  sample 1 (grassland) with the species (=spec) 1, 23, 54, 63
> 
> is it possible to get a following data-structure for further analysis?
> 
> 		1	2	3	......
> spec1		1	1	0
> spec2		0	1	0
> spec3
> ...
> spec253	0	1	0
> ...
> spec450	0	0	1
> 
> with thanks from the snowy tirol
> helli

Should be fairly easy. You could for instance generate a
table(species,area) - with a few complications if the same combination
can occur more than once. Or use matrix indexing

M <- matrix(0,nspec,narea)
M[cbind(species,area)] <- 1

Upon reading, the sort order of the species may be a little
problematic:

dd <- read.table(stdin())

0: 1 spec1
1: 1 spec23
2: 1 spec54
3: 1 spec63
4: 2 spec1
6: 2 spec2
7: 2 spec253
8: 2 spec300
9: 2 spec423
10: 3 spec20
11: 3 spec88
12: 3 spec121
13: 3 spec200
14: 3 spec450
15: 
# ctrl-D terminates input

names(dd) <- c("area","species")
with(dd, table(species,area))

         area
species   1 2 3
  spec1   1 1 0
  spec121 0 0 1
  spec2   0 1 0
  spec20  0 0 1
  spec200 0 0 1
  spec23  1 0 0
  spec253 0 1 0
  spec300 0 1 0
  spec423 0 1 0
  spec450 0 0 1
  spec54  1 0 0
  spec63  1 0 0
  spec88  0 0 1

To fix up, use something like

 specn <- paste("spec", 
                sort(as.numeric(substring(levels(dd$species),5))),
                sep="")
 dd <- transform(dd, species=factor(species,levels=specn))
 with(dd, table(species,area))


-- 
   O__  ---- Peter Dalgaard             Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics     2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)             FAX: (+45) 35327907




More information about the R-help mailing list