[R] sum some columns for each row

Wed Jul 15 01:05:37 CEST 2015

I used two rows to test the data frame, as follows.

> dat <- read.table("TOV_43_Protein_Clusters_abundance1.tab",
header=TRUE,sep = "\t")
> dat1 <- dat[1:2,]
> str(dat1)
'data.frame':    2 obs. of  44 variables:
 $ X      : Factor w/ 1075762 levels "","POV_Cluster_1000001",..: 305266
625028
 $ X109DCM: Factor w/ 46 levels "","1","10","109DCM",..: 1 1
 $ X109SUR: Factor w/ 41 levels "","1","10","109SUR",..: 1 1
 $ X18DCM : Factor w/ 31 levels "","1","10","11",..: 1 1
 $ X18SUR : Factor w/ 25 levels "","1","10","11",..: 1 1
 $ X22SUR : Factor w/ 50 levels "","1","10","11",..: 1 2
 $ X23DCM : Factor w/ 46 levels "","1","10","11",..: 1 1
 $ X25DCM : Factor w/ 42 levels "","1","10","11",..: 1 1
 $ X25SUR : Factor w/ 47 levels "","1","10","11",..: 1 1
 $ X30DCM : Factor w/ 34 levels "","1","10","11",..: 1 1
 $ X31SUR : Factor w/ 43 levels "","1","10","11",..: 1 1
 $ X32DCM : Factor w/ 15 levels "","1","10","11",..: 1 1
 $ X32SUR : Factor w/ 58 levels "","1","10","11",..: 1 1
 $ X34DCM : Factor w/ 53 levels "","1","10","11",..: 1 35
 $ X34SUR : Factor w/ 47 levels "","1","10","11",..: 10 14
 $ X36DCM : Factor w/ 48 levels "","1","10","11",..: 2 43
 $ X36SUR : Factor w/ 45 levels "","1","10","11",..: 23 38
 $ X38DCM : Factor w/ 40 levels "","1","10","11",..: 3 23
 $ X38SUR : Factor w/ 44 levels "","1","10","11",..: 7 41
 $ X39DCM : Factor w/ 38 levels "","1","10","11",..: 34 38
 $ X39SUR : Factor w/ 40 levels "","1","10","11",..: 13 40
 $ X41DCM : Factor w/ 47 levels "","1","10","11",..: 13 40
 $ X41SUR : Factor w/ 40 levels "","1","10","11",..: 1 1
 $ X42DCM : Factor w/ 48 levels "","1","10","11",..: 2 3
 $ X42SUR : Factor w/ 41 levels "","1","10","11",..: 2 1
 $ X46SUR : Factor w/ 31 levels "","1","10","11",..: 2 2
 $ X52DCM : Factor w/ 49 levels "","1","10","11",..: 13 23
 $ X64DCM : Factor w/ 35 levels "","1","10","11",..: 1 2
 $ X64SUR : Factor w/ 36 levels "","1","10","11",..: 1 1
 $ X65DCM : Factor w/ 38 levels "","1","10","11",..: 1 1
 $ X65SUR : Factor w/ 35 levels "","1","10","11",..: 1 1
 $ X66DCM : Factor w/ 27 levels "","1","10","11",..: 1 1
 $ X66SUR : Factor w/ 35 levels "","1","10","11",..: 1 1
 $ X67SUR : Factor w/ 38 levels "","1","10","11",..: 1 1
 $ X68DCM : Factor w/ 33 levels "","1","10","11",..: 1 1
 $ X68SUR : Factor w/ 36 levels "","1","10","11",..: 1 1
 $ X70MES : Factor w/ 23 levels "","1","10","11",..: 1 1
 $ X70SUR : Factor w/ 37 levels "","1","10","11",..: 1 1
 $ X72DCM : Factor w/ 40 levels "","1","10","11",..: 13 27
 $ X72SUR : Factor w/ 38 levels "","1","10","11",..: 1 1
 $ X76DCM : Factor w/ 44 levels "","1","10","11",..: 1 1
 $ X76SUR : Factor w/ 34 levels "","1","10","11",..: 1 1
 $ X82DCM : Factor w/ 29 levels "","1","10","11",..: 1 1
 $ X85DCM : Factor w/ 30 levels "","1","10","11",..: 1 1

Thank you!!
Dawn

On Tue, Jul 14, 2015 at 3:48 PM, Jeff Newmiller <jdnewmil at dcn.davis.ca.us>
wrote:

> I suspect your data frame "dat" has non-numeric data in some of the
> columns that have ABC in their names. Any column of a data frame can be
> numeric or not, but the data frame as a unit cannot be numeric. If your
> data file has odd characters in done of the otherwise-numeric columns, the
> whole column will be read in as a factor or character strings. Look at the
> output of str(dat) for columns that don't show "num'. If you can find the
> column, and then one of the bad rows, you can use a text editor to fix them
> manually, or show us examples of the bad data and we can suggest ways to
> fix it in R.
> ---------------------------------------------------------------------------
> Jeff Newmiller                        The     .....       .....  Go Live...
> DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live
> Go...
>                                       Live:   OO#.. Dead: OO#..  Playing
> Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
> /Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
> ---------------------------------------------------------------------------
> Sent from my phone. Please excuse my brevity.
>
> On July 14, 2015 2:35:38 PM PDT, Dawn <dawn1313 at gmail.com> wrote:
> >Hi,
> >
> >I used a small set of data (several columns and rows) and it works fine
> >using the following command:
> >abc <- rowSums(test[,grep("ABC",names(test),fixed=T)],na.rm=T)
> >
> >But when I used the real big data table, "Error in rowSums(dat[,
> >grep("ABC", names(dat), fixed = T)], na.rm = T) :
> >  'x' must be numeric"
> >Then it didn't work either using as.numeric():
> >> as.numeric(dat)
> >Error: (list) object cannot be coerced to type 'double'
> >
> >Thanks!
> >Dawn
> >
> >
> >
> >
> >On Fri, Jul 10, 2015 at 4:35 PM, Dawn <dawn1313 at gmail.com> wrote:
> >
> >> Thank you all and sorry for the data messing. It has worked!
> >>
> >> Best,
> >> Dawn
> >>
> >> On Fri, Jul 10, 2015 at 4:15 AM, Jim Lemon <drjimlemon at gmail.com>
> >wrote:
> >>
> >>> Hi Dawn,
> >>> Your data are a bit messed up, but try the following:
> >>>
> >>> colSums(dat[,grep("ABC",names(dat),fixed=TRUE)],na.rm=TRUE)
> >>> colSums(dat[,grep("XYZ",names(dat),fixed=TRUE)],na.rm=TRUE)
> >>>
> >>> I'm assuming that you want to discard the NA values.
> >>>
> >>> Jim
> >>>
> >>> On Fri, Jul 10, 2015 at 6:52 AM, Rui Barradas <ruipbarradas at sapo.pt>
> >>> wrote:
> >>> > Hello,
> >>> >
> >>> > Please use ?dput to give a data example, like this it's completely
> >>> > unreadable. If your data.frame is named 'dat' use
> >>> >
> >>> > dput(head(dat, 30))  # paste the outut of this in your mail
> >>> >
> >>> >
> >>> > And don't post in html, use plain text only, like the posting
> >guide
> >>> says.
> >>> >
> >>> > Rui Barradas
> >>> >
> >>> >
> >>> > Em 09-07-2015 18:12, Dawn escreveu:
> >>> >>
> >>> >> Hi,
> >>> >>
> >>> >> I have a big dataframe as follows
> >>> >>
> >>> >>      109ABC    109XYZ    18ABC    18XYZ    22XYZ    23ABC
> >25ABC
> >>> >> 25XYZ
> >>> >>     30ABC    31XYZ    32ABC    32XYZ    34DCM    34XYZ    36ABC
> >>> 36SUR
> >>> >> 38DCM    38XYZ    39DCM    39SUR    41DCM    41SUR    42DCM
> >42SUR
> >>> >> 46SUR    52DCM    64ABC    64XYZ    65ABC    65XYZ    66ABC
> >66XYZ
> >>> >> 67XYZ    68ABC    68SUR    70MES    70SUR    72ABC    72XYZ
> >76ABC
> >>> >> 76XYZ    82ABC    85ABC    POV
> >>> >> Cluster_1
> >17
> >>> 1
> >>> >> 3    10    14    5    2    2        1    1    1    2
> >>> >>                          2                            TT:61
> >>> >> Cluster_2                    1                                4
> > 20
> >>> >> 6    5    3    6    9    9    6        10        1    3    1
> >>> >>                              4                            TT:88
> >>> >> Cluster_3    3        3                            6        4
> >   17
> >>> >> 17    18    13    17    19    22    11    5    21    8    5    18
> >   4
> >>> >> 7                                        9
> >>> >> TT:227
> >>> >> ........
> >>> >>
> >>> >> I want to get two columns, i.e,  one is to sum columns for all
> >>> including
> >>> >> ABC for each row and the other is  to sum columns for all
> >including XYZ
> >>> >> for
> >>> >> each row.
> >>> >>
> >>> >> Is there some help? Thank you!
> >>> >> Dawn
> >>> >>
> >>> >>         [[alternative HTML version deleted]]
> >>> >>
> >>> >> ______________________________________________
> >>> >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >>> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >>> >> PLEASE do read the posting guide
> >>> >> http://www.R-project.org/posting-guide.html
> >>> >> and provide commented, minimal, self-contained, reproducible
> >code.
> >>> >>
> >>> >
> >>> > ______________________________________________
> >>> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >>> > https://stat.ethz.ch/mailman/listinfo/r-help
> >>> > PLEASE do read the posting guide
> >>> http://www.R-project.org/posting-guide.html
> >>> > and provide commented, minimal, self-contained, reproducible code.
> >>>
> >>
> >>
>
>

	[[alternative HTML version deleted]]