[R] sum some columns for each row

David Winsemius dwinsemius at comcast.net
Wed Jul 15 03:28:26 CEST 2015


On Jul 14, 2015, at 4:49 PM, Dawn wrote:

> I attached the file

Well, you may have attached it, but you evidently did not read the posting guide about which filetypes are accepted by the mailserver.

> .... including the first two rows and please help to make it
> the numeric data frame. Hopefully the following command works:
> 
> dcm <- rowSums(dat1[,grep("DCM",names(dat1),fixed=T)],na.rm=T)

How do you expect that to deliver anything meaningful if all of your columns are factor class?

That was the reason for this error in an earlier posting of yours:

But when I used the real big data table, "Error in rowSums(dat[,
grep("ABC", names(dat), fixed = T)], na.rm = T) :
 'x' must be numeric"

You are not paying attention to the responses you have received so far.

I think Bert Gunter's suggestion that you need to work through more introductory tutorials is on point. 

-- 
David.
> 
> Thank you very much!
> Dawn
> 
> On Tue, Jul 14, 2015 at 4:36 PM, Jeff Newmiller <jdnewmil at dcn.davis.ca.us>
> wrote:
> 
>> Well it is pretty obvious that all of your columns have non-numeric data
>> in them, but you are the only one who can tell which ones should have been
>> numeric, and you are also the one who can peruse your data file in a text
>> editor.
>> ---------------------------------------------------------------------------
>> Jeff Newmiller                        The     .....       .....  Go Live...
>> DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live
>> Go...
>>                                      Live:   OO#.. Dead: OO#..  Playing
>> Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
>> /Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
>> ---------------------------------------------------------------------------
>> Sent from my phone. Please excuse my brevity.
>> 
>> On July 14, 2015 4:05:37 PM PDT, Dawn <dawn1313 at gmail.com> wrote:
>>> I used two rows to test the data frame, as follows.
>>> 
>>>> dat <- read.table("TOV_43_Protein_Clusters_abundance1.tab",
>>> header=TRUE,sep = "\t")
>>>> dat1 <- dat[1:2,]
>>>> str(dat1)
>>> 'data.frame':    2 obs. of  44 variables:
>>> $ X      : Factor w/ 1075762 levels "","POV_Cluster_1000001",..: 305266
>>> 625028
>>> $ X109DCM: Factor w/ 46 levels "","1","10","109DCM",..: 1 1
>>> $ X109SUR: Factor w/ 41 levels "","1","10","109SUR",..: 1 1
>>> $ X18DCM : Factor w/ 31 levels "","1","10","11",..: 1 1
>>> $ X18SUR : Factor w/ 25 levels "","1","10","11",..: 1 1
>>> $ X22SUR : Factor w/ 50 levels "","1","10","11",..: 1 2
>>> $ X23DCM : Factor w/ 46 levels "","1","10","11",..: 1 1
>>> $ X25DCM : Factor w/ 42 levels "","1","10","11",..: 1 1
>>> $ X25SUR : Factor w/ 47 levels "","1","10","11",..: 1 1
>>> $ X30DCM : Factor w/ 34 levels "","1","10","11",..: 1 1
>>> $ X31SUR : Factor w/ 43 levels "","1","10","11",..: 1 1
>>> $ X32DCM : Factor w/ 15 levels "","1","10","11",..: 1 1
>>> $ X32SUR : Factor w/ 58 levels "","1","10","11",..: 1 1
>>> $ X34DCM : Factor w/ 53 levels "","1","10","11",..: 1 35
>>> $ X34SUR : Factor w/ 47 levels "","1","10","11",..: 10 14
>>> $ X36DCM : Factor w/ 48 levels "","1","10","11",..: 2 43
>>> $ X36SUR : Factor w/ 45 levels "","1","10","11",..: 23 38
>>> $ X38DCM : Factor w/ 40 levels "","1","10","11",..: 3 23
>>> $ X38SUR : Factor w/ 44 levels "","1","10","11",..: 7 41
>>> $ X39DCM : Factor w/ 38 levels "","1","10","11",..: 34 38
>>> $ X39SUR : Factor w/ 40 levels "","1","10","11",..: 13 40
>>> $ X41DCM : Factor w/ 47 levels "","1","10","11",..: 13 40
>>> $ X41SUR : Factor w/ 40 levels "","1","10","11",..: 1 1
>>> $ X42DCM : Factor w/ 48 levels "","1","10","11",..: 2 3
>>> $ X42SUR : Factor w/ 41 levels "","1","10","11",..: 2 1
>>> $ X46SUR : Factor w/ 31 levels "","1","10","11",..: 2 2
>>> $ X52DCM : Factor w/ 49 levels "","1","10","11",..: 13 23
>>> $ X64DCM : Factor w/ 35 levels "","1","10","11",..: 1 2
>>> $ X64SUR : Factor w/ 36 levels "","1","10","11",..: 1 1
>>> $ X65DCM : Factor w/ 38 levels "","1","10","11",..: 1 1
>>> $ X65SUR : Factor w/ 35 levels "","1","10","11",..: 1 1
>>> $ X66DCM : Factor w/ 27 levels "","1","10","11",..: 1 1
>>> $ X66SUR : Factor w/ 35 levels "","1","10","11",..: 1 1
>>> $ X67SUR : Factor w/ 38 levels "","1","10","11",..: 1 1
>>> $ X68DCM : Factor w/ 33 levels "","1","10","11",..: 1 1
>>> $ X68SUR : Factor w/ 36 levels "","1","10","11",..: 1 1
>>> $ X70MES : Factor w/ 23 levels "","1","10","11",..: 1 1
>>> $ X70SUR : Factor w/ 37 levels "","1","10","11",..: 1 1
>>> $ X72DCM : Factor w/ 40 levels "","1","10","11",..: 13 27
>>> $ X72SUR : Factor w/ 38 levels "","1","10","11",..: 1 1
>>> $ X76DCM : Factor w/ 44 levels "","1","10","11",..: 1 1
>>> $ X76SUR : Factor w/ 34 levels "","1","10","11",..: 1 1
>>> $ X82DCM : Factor w/ 29 levels "","1","10","11",..: 1 1
>>> $ X85DCM : Factor w/ 30 levels "","1","10","11",..: 1 1
>>> 
>>> 
>>> Thank you!!
>>> Dawn
>>> 
>>> On Tue, Jul 14, 2015 at 3:48 PM, Jeff Newmiller
>>> <jdnewmil at dcn.davis.ca.us>
>>> wrote:
>>> 
>>>> I suspect your data frame "dat" has non-numeric data in some of the
>>>> columns that have ABC in their names. Any column of a data frame can
>>> be
>>>> numeric or not, but the data frame as a unit cannot be numeric. If
>>> your
>>>> data file has odd characters in done of the otherwise-numeric
>>> columns, the
>>>> whole column will be read in as a factor or character strings. Look
>>> at the
>>>> output of str(dat) for columns that don't show "num'. If you can find
>>> the
>>>> column, and then one of the bad rows, you can use a text editor to
>>> fix them
>>>> manually, or show us examples of the bad data and we can suggest ways
>>> to
>>>> fix it in R.
>>>> 
>> 
>>> ---------------------------------------------------------------------------
>>>> Jeff Newmiller                        The     .....       .....  Go
>>> Live...
>>>> DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live
>>>> Go...
>>>>                                      Live:   OO#.. Dead: OO#..
>>> Playing
>>>> Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
>>>> /Software/Embedded Controllers)               .OO#.       .OO#.
>>> rocks...1k
>>>> 
>> 
>>> ---------------------------------------------------------------------------
>>>> Sent from my phone. Please excuse my brevity.
>>>> 
>>>> On July 14, 2015 2:35:38 PM PDT, Dawn <dawn1313 at gmail.com> wrote:
>>>>> Hi,
>>>>> 
>>>>> I used a small set of data (several columns and rows) and it works
>>> fine
>>>>> using the following command:
>>>>> abc <- rowSums(test[,grep("ABC",names(test),fixed=T)],na.rm=T)
>>>>> 
>>>>> But when I used the real big data table, "Error in rowSums(dat[,
>>>>> grep("ABC", names(dat), fixed = T)], na.rm = T) :
>>>>> 'x' must be numeric"
>>>>> Then it didn't work either using as.numeric():
>>>>>> as.numeric(dat)
>>>>> Error: (list) object cannot be coerced to type 'double'
>>>>> 
>>>>> Thanks!
>>>>> Dawn
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> On Fri, Jul 10, 2015 at 4:35 PM, Dawn <dawn1313 at gmail.com> wrote:
>>>>> 
>>>>>> Thank you all and sorry for the data messing. It has worked!
>>>>>> 
>>>>>> Best,
>>>>>> Dawn
>>>>>> 
>>>>>> On Fri, Jul 10, 2015 at 4:15 AM, Jim Lemon <drjimlemon at gmail.com>
>>>>> wrote:
>>>>>> 
>>>>>>> Hi Dawn,
>>>>>>> Your data are a bit messed up, but try the following:
>>>>>>> 
>>>>>>> colSums(dat[,grep("ABC",names(dat),fixed=TRUE)],na.rm=TRUE)
>>>>>>> colSums(dat[,grep("XYZ",names(dat),fixed=TRUE)],na.rm=TRUE)
>>>>>>> 
>>>>>>> I'm assuming that you want to discard the NA values.
>>>>>>> 
>>>>>>> Jim
>>>>>>> 
>>>>>>> On Fri, Jul 10, 2015 at 6:52 AM, Rui Barradas
>>> <ruipbarradas at sapo.pt>
>>>>>>> wrote:
>>>>>>>> Hello,
>>>>>>>> 
>>>>>>>> Please use ?dput to give a data example, like this it's
>>> completely
>>>>>>>> unreadable. If your data.frame is named 'dat' use
>>>>>>>> 
>>>>>>>> dput(head(dat, 30))  # paste the outut of this in your mail
>>>>>>>> 
>>>>>>>> 
>>>>>>>> And don't post in html, use plain text only, like the posting
>>>>> guide
>>>>>>> says.
>>>>>>>> 
>>>>>>>> Rui Barradas
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Em 09-07-2015 18:12, Dawn escreveu:
>>>>>>>>> 
>>>>>>>>> Hi,
>>>>>>>>> 
>>>>>>>>> I have a big dataframe as follows
>>>>>>>>> 
>>>>>>>>>     109ABC    109XYZ    18ABC    18XYZ    22XYZ    23ABC
>>>>> 25ABC
>>>>>>>>> 25XYZ
>>>>>>>>>    30ABC    31XYZ    32ABC    32XYZ    34DCM    34XYZ
>>> 36ABC
>>>>>>> 36SUR
>>>>>>>>> 38DCM    38XYZ    39DCM    39SUR    41DCM    41SUR    42DCM
>>>>> 42SUR
>>>>>>>>> 46SUR    52DCM    64ABC    64XYZ    65ABC    65XYZ    66ABC
>>>>> 66XYZ
>>>>>>>>> 67XYZ    68ABC    68SUR    70MES    70SUR    72ABC    72XYZ
>>>>> 76ABC
>>>>>>>>> 76XYZ    82ABC    85ABC    POV
>>>>>>>>> Cluster_1
>>>>> 17
>>>>>>> 1
>>>>>>>>> 3    10    14    5    2    2        1    1    1    2
>>>>>>>>>                         2                            TT:61
>>>>>>>>> Cluster_2                    1
>>> 4
>>>>> 20
>>>>>>>>> 6    5    3    6    9    9    6        10        1    3    1
>>>>>>>>>                             4
>>> TT:88
>>>>>>>>> Cluster_3    3        3                            6        4
>>>>>  17
>>>>>>>>> 17    18    13    17    19    22    11    5    21    8    5
>>> 18
>>>>>  4
>>>>>>>>> 7                                        9
>>>>>>>>> TT:227
>>>>>>>>> ........
>>>>>>>>> 
>>>>>>>>> I want to get two columns, i.e,  one is to sum columns for all
>>>>>>> including
>>>>>>>>> ABC for each row and the other is  to sum columns for all
>>>>> including XYZ
>>>>>>>>> for
>>>>>>>>> each row.
>>>>>>>>> 
>>>>>>>>> Is there some help? Thank you!
>>>>>>>>> Dawn
>>>>>>>>> 
>>>>>>>>>        [[alternative HTML version deleted]]
>>>>>>>>> 
>>>>>>>>> ______________________________________________
>>>>>>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
>>> see
>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>>>> PLEASE do read the posting guide
>>>>>>>>> http://www.R-project.org/posting-guide.html
>>>>>>>>> and provide commented, minimal, self-contained, reproducible
>>>>> code.
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> ______________________________________________
>>>>>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
>>> see
>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>>> PLEASE do read the posting guide
>>>>>>> http://www.R-project.org/posting-guide.html
>>>>>>>> and provide commented, minimal, self-contained, reproducible
>>> code.
>>>>>>> 
>>>>>> 
>>>>>> 
>>>> 
>>>> 
>> 
>> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA



More information about the R-help mailing list