[R] multicolumn sort on dataframe?

Bambang Suryobroto suryobroto at ipb.ac.id
Mon Mar 29 08:22:05 CEST 2004


Dear lists;

I'm migrating to and slowly learning R. I want to expand this multicolumn
sorting subject to counting the frequencies of mutiplicate rows.

The motivation is to count the frequencies of individuals with same
haplotypes in a population genetic study. A sample of table (ex.dta) is as
follows:

IDNUM DYS19 DYS388 DYS390 DYS393 DYS394 DYS395
TG002   200    129    203    133    251    119
TG053   200    129    203    133    251    119
TG020   200    129    207    133    251    127
TG066    NA     NA     NA     NA     NA     NA
TG104   200    129    203    133    251    119
TG018    NA     NA    199    133     NA    119
TG060   200    129    203    133    251    119
TG058    NA     NA     NA    133     NA     NA
TG009   200    129    203    133    251    119
TG106   200    129    211    137    251    123

I did like this:

> ex <- read.table( "ex.dta" , header=T, row.names=1 )
> one <- rep( 1,10 )
> aggregate( one , by=ex , sum )
  DYS19 DYS388 DYS390 DYS393 DYS394 DYS395 x
1   200    129    203    133    251    119 5
2   200    129    211    137    251    123 1
3   200    129    207    133    251    127 1

and got exactly what I wanted. However, as the table grows larger, the
script takes longer time to complete. For 300x6 table, after about 10
minutes Windows complained low in virtual memory and increased the paging
file while denying request from other applications. Eventually R crashed
leaving Windows crippled.

Did I miss something? Are there any ways other than the two line script
above?

Context:
R 1.8.1 on WinXP Pro
Rgui.exe --max-mem-size=400M
Celeron 1GHz, 256 MB ram, free harddisk space 3.3 GB

All best,

Bambang Suryobroto, D.Sc
Head, Laboratory of Zoology
Department of Biology
Faculty of Mathematics and Natural Sciences
Bogor Agricultural University
Jalan Pajajaran, Bogor 16143
INDONESIA
Tel: +62-251-328391
Fax: +62-251-345011




More information about the R-help mailing list