[R] Dput Help in R

Thu Dec 31 18:30:03 CET 2015

> On Dec 30, 2015, at 11:26 PM, SHIVI BHATIA <shivi.bhatia at safexpress.com> wrote:
> 
> Hi Duncan,
> Please find the dput from the data.
> 
> ab<-read.csv("collection_last.csv",header=TRUE)
> y<-ab[1:10,]
> 

This is (possibly) partial output from a dput call. Unable to repair at any rate.
> 
> ab<- "2,458", "2,461", "2,462", "2,463", "2,464", "2,465", "2,468",
> "2,469", "2,470", "2,473", "2,474", "2,475", "2,476", "2,477",
> "2,478", "2,479", "2,480", "2,483", "2,484,267", "2,485",
> 
snipped
> "99,581", "99,834", "990", "992", "992,489", "993", "994",
> "994,195", "995", "996", "998", "999"), class = "factor"),

It is useful in showing that these items (presumably the column named "Final" are factors. Notice the commas in the values you might think were numeric. You will need to remove the commas (probably with `gsub`) before using `as.numeric`.

I haven't quite figured out how a dataframe could have a factor column that was so much longer than the adjacent columns named "Month" and "Year". I would suggest redoing the read.csv with stringsAsFactor=FALSE so that you can then work on "pure" text before the coercion to numeric.

-- David.

> Month = structure(c(11L, 11L, 7L, 2L, 2L, 12L, 11L, 11L,
>                    11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L,
> 11L,
>                    11L, 11L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L),
> .Label = c("Apr",
> 
> "Aug", "Dec", "Feb", "Jan", "Jul", "Jun", "Mar", "May", "Nov",
> 
> "Oct", "Sep"), class = "factor"), Year = c(2010L, 2010L,
> 
> 2011L, 2011L, 2011L, 2011L, 2011L, 2011L, 2011L, 2011L, 2011L,
> 
> 2011L, 2011L, 2011L, 2011L, 2011L, 2011L, 2011L, 2011L, 2011L,
> 
> 2011L, 2011L, 2011L, 2011L, 2011L, 2011L, 2011L, 2011L, 2011L,
> 
> 2011L)), .Names = c("DOC_TYPE", "DOC_NO", "DOC_DT", "SFX_CODE",
> 
> "CUSTOMER", "DOC_AMOUNT", "OS_ASON_RPT_DT", "OS_DAYS", "BILLING_BRANCH",
> 
> "COLL_BR", "RECEIPT_NO", "RECEIPT_DT", "Applied.Date", "RECEIPT_AMT",
> 
> "TDS_AMT", "REBATE", "Final", "Month", "Year"), row.names = c(NA,
> 
> 30L), class = "data.frame")
> 
> 
> Not sure if this would help.
>> 
> -----Original Message-----
> From: Duncan Murdoch [mailto:murdoch.duncan at gmail.com]
> Sent: Wednesday, December 30, 2015 10:23 PM
> To: SHIVI BHATIA <shivi.bhatia at safexpress.com>; r-help at r-project.org
> Subject: Re: [R] Dput Help in R
> 
> On 30/12/2015 5:56 AM, SHIVI BHATIA wrote:
>> Dear Team,
>> 
>> 
>> 
>> I am facing an error while performing a manipulation using a dplyr
> package.
>> In the code below, I am using mutate to build a new calculated column:
>> 
>> 
>> 
>> kp<-read.csv("collection_last.csv",header=TRUE)
>> 
>> mutate(kp,dif=DOC_AMOUNT-RECEIPT_AMT+TDS_AMT+REBATE)
>> 
>> 
>> 
>> However it gives an error:-
>> 
>> Warning messages:
>> 
>> 1: In Ops.factor(c(28831L, 28831L, 17504L, 4184L, 36187L, 25819L, 699L,  :
>> 
>>   '-' not meaningful for factors
>> 
>> 2: In Ops.factor(c(28831L, 28831L, 17504L, 4184L, 36187L, 25819L, 699L,  :
>> 
>>   '+' not meaningful for factors
>> 
>> 3: In Ops.factor(c(28831L, 28831L, 17504L, 4184L, 36187L, 25819L, 699L,  :
>> 
>>   '+' not meaningful for factors
>> 
>> 
>> 
>> This is an error when some of my variables are factors hence I have
>> tried to change these to numeric so used the expression as:
>> 
>> kp$DOC_TYPE=as.numeric(kp$DOC_TYPE).
>> 
>> 
>> 
>> this now shows as variable type of as "double". So expedite help on
>> this one i was trying to create a reproducible example and i am highly
>> struggling to
>> 
>> create one. the data i have is approx. around 1 million rows with 21
>> columns hence when i use a dput option it does not capture the entire
>> detailing and row level info required to share and even
>> dput(head(kp$DOC_TYPE) does not help either.
>> 
>> I have seen many stack overflow & r help column before composing this
> email.
>> Hence i need help to create this reproducible example to share with
>> the experts in the community. Apologies if this is a repeat.
>> 
>> 
>> 
>> PLEASE HELP AS I AM HIGHLY STRUGGLING TO BUILD ANY OUTCOME.
> 
> If you are working with a dataframe or matrix named x, just use
> 
> y <- x[1:10,]
> 
> to extract the first 10 rows.  The error will probably occur with this
> subset as well, and dput() will give you a reasonably sized amount of
> output.  If the error doesn't happen, just take a bigger subset, and
> possibly leave off the beginning, e.g.
> 
> y <- x[101:110,]
> 
> for 10 lines starting at line 101.
> 
> Duncan Murdoch
> 
> This e-mail is confidential. It may also be legally privileged. If you are not the addressee you may not copy, forward, disclose or use any part of it. If you have received this message in error, please delete it and all copies from your system and notify the sender immediately by return e-mail. Internet communications cannot be guaranteed to be timely, secure, error or virus-free. The sender does not accept liability for any errors or omissions.
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA