[R] Converting unique strings to unique numbers

Kate Ignatius kate.ignatius at gmail.com
Fri May 29 20:30:26 CEST 2015


I found this helpful.  However - the second to forth columns come out
all zero - was this the intention?

That is:

X0001 0 0 0  2  1 BYX859
X0001 0 0 0  1  1 BYX894
X0001 0 0 0  2  2 BYX862
X0001 0 0 0  2  2 BYX863
X0001 0 0 0  2  2 BYX864
X0001 0 0 0  2  2 BYX865

On Fri, May 29, 2015 at 1:31 PM, William Dunlap <wdunlap at tibco.com> wrote:
> match() will do what you want.  E.g., run your data through
> the following function.
>
> f <- function (data)
> {
>     uniqStrings <- unique(c(data[, 2], data[, 3], data[, 4]))
>     uniqStrings <- setdiff(uniqStrings, "0")
>     for (j in 2:4) {
>         data[[j]] <- match(data[[j]], uniqStrings, nomatch = 0L)
>     }
>     data
> }
>
>
>
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
>
> On Fri, May 29, 2015 at 9:58 AM, Kate Ignatius <kate.ignatius at gmail.com>
> wrote:
>>
>> I have a pedigree file as so:
>>
>> X0001 BYX859      0      0  2  1 BYX859
>> X0001 BYX894      0      0  1  1 BYX894
>> X0001 BYX862 BYX894 BYX859  2  2 BYX862
>> X0001 BYX863 BYX894 BYX859  2  2 BYX863
>> X0001 BYX864 BYX894 BYX859  2  2 BYX864
>> X0001 BYX865 BYX894 BYX859  2  2 BYX865
>>
>> And I was hoping to change all unique string values to numbers.
>>
>> That is:
>>
>> BYX859 = 1
>> BYX894 = 2
>> BYX862 = 3
>> BYX863 = 4
>> BYX864 = 5
>> BYX865 = 6
>>
>> But only in columns 2 - 4.  Essentially I would like the data to look like
>> this:
>>
>> X0001 1 0 0  2  1 BYX859
>> X0001 2 0 0  1  1 BYX894
>> X0001 3 2 1  2  2 BYX862
>> X0001 4 2 1  2  2 BYX863
>> X0001 5 2 1  2  2 BYX864
>> X0001 6 2 1  2  2 BYX865
>>
>> Is this possible with factors?
>>
>> Thanks!
>>
>> K.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>



More information about the R-help mailing list