[R] How to combine character month and year columns into one column

Marc Schwartz marc_schwartz at me.com
Tue Sep 23 20:38:39 CEST 2014


Hi David,

My initial reaction (not that the decision is mine to make), is that from a technical perspective, obviously indexing by name is common.

There are two considerations, off the top of my head:

1. There would be a difference, of course, between:

> month.abb["1"]
<NA> 
  NA 

and

> month.abb["01"]
   01 
"Jan" 


Thus, is this approach overly fragile and potentially going to create more problems (bugs, head scratching, etc.) than it solves.


2. From a consistency standpoint, I don't see an indication that other built-in constants have similar name attributes, not that I did an exhaustive review. So I suspect that if there were reasonable justification for it here, it would also need to at least be considered for other constants, which increases the scope of work a good bit.


If there is a desire for this, one could file an RFE at https://bugs.r-project.org to gauge the reactions from R Core, unless they comment here first.

Regards,

Marc


On Sep 23, 2014, at 12:47 PM, David Winsemius <dwinsemius at comcast.net> wrote:

> Marc;
> 
> Feature request:
> 
> Would it make sense to construct month.abb as a named vector so that the operation that was attempted would have succeeded? Adding alphanumeric names c("01", "02", "03", "04", "05", "06",
> "07", "08", "09", "10", "11", "12") would allow character extraction from substring or regex extracted month values which are always character-class.
> 
> Example:
> 
>> names(month.abb) <- c("01", "02", "03", "04", "05", "06",
> + "07", "08", "09", "10", "11", "12")
>> month.abb
>   01    02    03    04    05    06    07    08    09    10    11    12 
> "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct" "Nov" "Dec" 
> 
> 
>> month.abb[ substr(Sys.Date(), 6,7) ]
>   09 
> "Sep" 
> 
> -- 
> David.
> 
> On Sep 23, 2014, at 9:03 AM, Marc Schwartz wrote:
> 
>> On Sep 23, 2014, at 10:41 AM, Kuma Raj <pollaroid at gmail.com> wrote:
>> 
>>> Dear R users,
>>> 
>>> I have a data with  month and year columns which are both characters
>>> and wanted to create a new column like Jan-1999
>>> with the following code. The result is all NA for the month part. What
>>> is wrong with the and what is the right way to combine the two?
>>> 
>>> ddf$MonthDay <- paste(month.abb[ddf$month], ddf$Year, sep="-" )
>>> 
>>> 
>>> Thanks
>>> 
>>>> dput(ddf)
>>> structure(list(month = c("01", "02", "03", "04", "05", "06",
>>> "07", "08", "09", "10", "11", "12"), Year = c("1999", "1999",
>>> "1999", "1999", "1999", "1999", "1999", "1999", "1999", "1999",
>>> "1999", "1999"), views = c(42, 49, 44, 38, 37, 35, 38, 39, 38,
>>> 39, 38, 46), MonthDay = c("NA-1999", "NA-1999", "NA-1999", "NA-1999",
>>> "NA-1999", "NA-1999", "NA-1999", "NA-1999", "NA-1999", "NA-1999",
>>> "NA-1999", "NA-1999")), .Names = c("month", "Year", "views",
>>> "MonthDay"), row.names = 109:120, class = "data.frame")
>>>> 
>>> 
>> 
>> 
>> 
>> Since you are trying to use ddf$month as an index into month.abb, you will either need to coerce ddf$month to numeric in your code, or adjust how the data frame is created.
>> 
>> In the case of the former approach:
>> 
>>> paste(month.abb[as.numeric(ddf$month)], ddf$Year, sep="-" )
>> [1] "Jan-1999" "Feb-1999" "Mar-1999" "Apr-1999" "May-1999" "Jun-1999"
>> [7] "Jul-1999" "Aug-1999" "Sep-1999" "Oct-1999" "Nov-1999" "Dec-1999"
>> 
>> 
>> Regards,
>> 
>> Marc Schwartz
>> 
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> David Winsemius
> Alameda, CA, USA
> 



More information about the R-help mailing list