[R] How to sum some columns based on their names

Jeff Newmiller jdnewmil at dcn.davis.CA.us
Mon Oct 13 15:30:27 CEST 2014


Learn regular expressions.. there are many websites and books that describe how they work. R has a number of functions that use them...

?regexp
?grep

For example...

grep("^[^0-9]*(6574|85|7584)[^0-9]*$",names(dta))

where dta is your data frame. You can read that regular expression as zero or more characters that are not digits at the beginning of the string, followed by any of three specified sequences of digits, followed by zero or more non-digit characters at the end of the string.

You can then use that function as the column specification index to look only at certain columns. The sapply function can apply the sum function to all of those columns:

sapply(dta[,grep("^[^0-9]*(6574|85|7584)[^0-9]*$",names(dta))],sum)
---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
--------------------------------------------------------------------------- 
Sent from my phone. Please excuse my brevity.

On October 13, 2014 5:57:45 AM PDT, Kuma Raj <pollaroid at gmail.com> wrote:
>I want to sum columns based on their names. As an exampel how could I
>sum columns which contain 6574, 7584 and 85 as column names?  In
>addition, how could I sum those which contain 6574, 7584 and 85 in
>ther names and have a prefix "f". My data contains several variables
>with
>
>I want to sum columns based on their names. As an exampel how could I
>sum columns which contain 6574, 7584 and 85 as column names?  In
>addition, how could I sum those which contain 6574, 7584 and 85 in
>ther names and have a prefix "f". My data contains several variables
>with
>
>dput(df1)
>structure(list(date = structure(c(1230768000, 1230854400, 1230940800,
>1231027200, 1231113600, 1231200000, 1231286400, 1231372800, 1231459200,
>1231545600, 1231632000), class = c("POSIXct", "POSIXt"), tzone =
>"UTC"),
>    f014card = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), f1534card = c(0,
>    1, 1, 0, 0, 1, 0, 0, 1, 0, 1), f3564card = c(1, 6, 1, 5,
>    5, 4, 4, 7, 6, 4, 6), f6574card = c(3, 6, 4, 5, 5, 2, 10,
>    3, 4, 2, 4), f7584card = c(13, 6, 1, 4, 10, 6, 8, 12, 10,
>  4, 3), f85card = c(5, 3, 1, 0, 2, 10, 7, 9, 1, 7, 3), m014card = c(0,
>    0, 0, 0, 0, 0, 0, 0, 0, 0, 0), m1534card = c(0, 0, 1, 0,
>    0, 0, 0, 1, 1, 1, 0), m3564card = c(12, 7, 4, 7, 12, 13,
>    12, 7, 12, 2, 11), m6574card = c(3, 4, 8, 8, 8, 10, 7, 6,
>    7, 7, 5), m7584card = c(8, 10, 5, 4, 12, 7, 14, 11, 9, 1,
> 11), m85card = c(1, 4, 3, 0, 3, 4, 5, 5, 4, 5, 0)), .Names = c("date",
>"f014card", "f1534card", "f3564card", "f6574card", "f7584card",
>"f85card", "m014card", "m1534card", "m3564card", "m6574card",
>"m7584card", "m85card"), class = "data.frame", row.names = c("1",
>"2", "3", "4", "5", "6", "7", "8", "9", "10", "11"))
>
>______________________________________________
>R-help at r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list