[R] Mapping from one vector to another

Thu Jul 17 18:45:27 CEST 2014

Jeff,

Even though the solutions from the previous responders are good enough
for my current situation, the principle you just raised will be
definitely beneficial to your future work. Thanks a lot for sharing
the insights!

Gang

On Thu, Jul 17, 2014 at 12:06 PM, Jeff Newmiller
<jdnewmil at dcn.davis.ca.us> wrote:
> You ask about generic methods for introducing alternate values for factors,
> and some of the other responses address this quite efficiently.
>
> However, a factor has meaning only within one vector at a time, since
> another vector may have additional values or missing values relative to
> the first vector. For example, you used the "sample" function which
> is not guaranteed to select at least one of each of the four letters in L4.
> Or, what if the data has values the mapping doesn't address?
>
> For any work in which I am dealing with categorical data in multiple
> places (e.g. your "d" data frame and whatever data structure you use
> to define your mapping) I prefer NOT to work with factors until all of
> my categories of data are moved into one vector (typically a column
> in a data frame). Rather, I work with character vectors during the
> data manipulation phase and only convert to factor when I start
> analyzing or displaying the data.
>
> With this in mind, I use a general flow something like:
>
> d <- data.frame( x = 1, y = 1:10, fac = fac, stringsAsFactors=FALSE )
> mp <- data.frame( fac=LETTERS[1:4], value=c(8,11,3,2) )
> d2 <- merge( d, mp, all.x=TRUE )
> d2$fac <- factor( d2$fac ) # optional
>
> If you actually are in the analysis phase and are not pulling data from
> multiple external sources, then you may have already confirmed the
> completeness and range of values you have to work with then one of the other
> more efficient methods may still be a better choice for this specific task.
>
> Hadley Wickham's "tidy data" [1] principles address this concern more
> thoroughly than I have.
>
> [1] Google this phrase... paper seems to be a work in progress.
>
>
> On Thu, 17 Jul 2014, Gang Chen wrote:
>
>> Suppose I have the following dataframe:
>>
>> L4 <- LETTERS[1:4]
>> fac <- sample(L4, 10, replace = TRUE)
>> (d <- data.frame(x = 1, y = 1:10, fac = fac))
>>
>>     x  y  fac
>> 1  1  1   B
>> 2  1  2   B
>> 3  1  3   D
>> 4  1  4   A
>> 5  1  5   C
>> 6  1  6   D
>> 7  1  7   C
>> 8  1  8   B
>> 9  1  9   B
>> 10 1 10   B
>>
>> I'd like to add another column 'var' that is defined based on the
>> following mapping of column 'fac':
>>
>> A -> 8
>> B -> 11
>> C -> 3
>> D -> 2
>>
>> How can I achieve this in an elegant way (with a generic approach for
>> any length)?
>>
>> Thanks,
>> Gang
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> ---------------------------------------------------------------------------
> Jeff Newmiller                        The     .....       .....  Go Live...
> DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
>                                       Live:   OO#.. Dead: OO#..  Playing
> Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
> /Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
> ---------------------------------------------------------------------------