[R] difficult data manipulation question

markleeds at verizon.net markleeds at verizon.net
Mon Jul 3 22:37:58 CEST 2006


hi everyone :

suppose i have a matrix in which some column names are identical so,
for example, TEMP

  "AAA", "BBB", "CCC", "DDD","AAA", "BBB"
    0      2      1     2      0      0
    2      3      7     6      0      1
    1.5    4      9     9      6      0
    1.0    6      10    11     3      3


I didn't even check  yet whether identical column names are allowed
in a matrix but i hope they are.

assuming that they are, then i would like to be able to take the matrix and  make a new matrix with the following requirements.

1) whenever there is a unique column name, just take that column for the new matrix

2) whenever the column name is not unique, take the one
that has the most non zero elements ? ( in the case of
ties, i don't care which one is picked ).

so, in this case, the resulting matrix would just be the first 4 columns.

i realize ( or atleast i think ) that 
sum( TEMP[(TEMP[,columnname] !=0) ,columnname) will give me the
number of non elements in a column with the name columnmame
but how to use this deal with the non uniqueness to solve my particular problem is beyond me. plus, i think the command will
bomb because columnname will not always be unique ? 
Thanks for any help. I realize this is not a trivial problem so I really appreciate it.

                                          Mark



More information about the R-help mailing list