[R] Recoding multiple columns consistently

Uwe Ligges ligges at statistik.uni-dortmund.de
Wed Aug 29 09:59:18 CEST 2007



Ron Crump wrote:
> Hi,
> 
> I have a dataframe that contains pedigree information;
> that is individual, sire and dam identities as separate
> columns. It also has date of birth.
> 
> These identifiers are not numeric, or not sequential.
> 
> Obviously, an identifier can appear in one or two columns,
> depending on whether it was a parent or not. These should
> be consistent.
> 
> Not all identifiers appear in the individual column - it
> is possible for a parent not to have its own record if its
> parents were not known.
> 
> Missing parental (sire and/or dam) identifiers can occur.
> 
> I need to export the data for use in another program that
> requires the pedigree to be coded as integers, increasing
> with date of birth (therefore sire and dam always have
> lower identifiers than their offspring) and with missing
> values coded as 0.
> 
> How would I go about doing this?
> 
> And a second, simpler related question, if I have a column with
> n different values (may be strings or non-sequential integers)
> identifying levels (possibly with repeated occurences), how
> can I recode them to be sequential from 1 to n?


rank(x, ties.method="first")


For the question above you can do as follows, for example:
order() identifiers by date, make them unique() and assign them to a new 
"levels" object. Then make them ordered factors:
   factor(some_column, levels=levels, ordered = TRUE)
and then as.numeric(factor_object) is what you are going to get.

Uwe Ligges





> I can solve both problems in fortran, so could use loops to
> do it in R, but feel there should be quicker, more elegant,
> "more R" solution.
> 
> Thanks for your help.
> 
> Ron.
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list