[R] How to define proper breaks in RFM analysis

Sat Oct 14 00:54:35 CEST 2017

Hemant's problem is that the indicators are not distributed uniformly.
With a uniform distribution, categorization gives a reasonably optimal
separation of cases. One approach would be to drop categorization and
calculate the overall score as the mean of the standardized indicator
scores. Whether this is an option I do not know. I did offer an
"eyeball" set of breaks in a previous email, but apparently this was
not sufficient.

Jim

On Sat, Oct 14, 2017 at 4:27 AM, David Winsemius <dwinsemius at comcast.net> wrote:
>
>> On Oct 13, 2017, at 2:51 AM, PIKAL Petr <petr.pikal at precheza.cz> wrote:
>>
>> Hi
>>
>> You expect us to solve your problem but you ignore advice already recieved.
>>
>> Your data are unreadable, use dput(yourdata) instead. see ?dput
>>
>>> test<-read.table("clipboard", heade=T)
>> Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec,  :
>>  line 115 did not have 6 elements
>
> I didn't have such a problem: (illustrated with a more minimal example)
>
> dat <-  scan( what=list("",1,"",1L,1L,1),
>              text="194849 6.99 8/22/2017 9 5 9.996
> 194978 14.78 8/28/2017 3 15 16.308
> 198614 18.44 7/31/2017 31 1 18.44
> 234569 34.99 8/20/2017 11 8 13.5075
> 252686 7.99 7/31/2017 31 2 7.99
> 291719 21.26 8/25/2017 6 2 15.67
> 291787 46.1 8/31/2017 0 2 32.57
> 292630 24.34 7/31/2017 31 1 24.34
> 295204 21.86 7/18/2017 44 1 21.86
> 295989 8.98 8/20/2017 11 2 14.095
> 298883 14.38 8/24/2017 7 2 11.185
> 308824 10.77 7/31/2017 31 1 10.77")
>
> names(dat) <- c("user_id", "subtotal_amount", "created_at", "Recency", "Frequency", "Monetary")
> dat <- data.frame(dat,stringsAsFactors=FALSE)
>
> I suspect read.table would also have worked for me, but I was expecting difficulties based on Petr's posting.
>
>
> #And ended up with this result (on the original copied data):
>> str(dat)
> 'data.frame':   500 obs. of  6 variables:
>  $ user_id        : chr  "194849" "194978" "198614" "234569" ...
>  $ subtotal_amount: num  6.99 14.78 18.44 34.99 7.99 ...
>  $ created_at     : chr  "8/22/2017" "8/28/2017" "7/31/2017" "8/20/2017" ...
>  $ Recency        : int  9 3 31 11 31 6 0 31 44 11 ...
>  $ Frequency      : int  5 15 1 8 2 2 2 1 1 2 ...
>  $ Monetary       : num  10 16.31 18.44 13.51 7.99 ...
>
> ...  but the following criticism seems, well, _critical_ (as in essential for one to address if a reasonable proposal is to be offered.)
>
>
>> What is „ideal interval“ can you define it? Should it be such to provide eqal number of observations?
>
> That is the crucial question for you to answer, Hemant. Read the ?quartile help page if your answer is "yes" or even "maybe".
>>
>> Or maybe you could normalise your values and use quartile method.
>
> Well, maybe not so much on that last one, Petr. Normalization should not affect the classification based on quartiles. It doesn't change the ordering of variables.
>
> --
> David.
>
>>
>> Cheers
>> Petr
>>
>> From: Hemant Sain [mailto:hemantsain55 at gmail.com]
>> Sent: Friday, October 13, 2017 8:51 AM
>> To: PIKAL Petr <petr.pikal at precheza.cz>
>> Cc: r-help mailing list <r-help at r-project.org>
>> Subject: Re: [R] How to define proper breaks in RFM analysis
>>
>> Hey,
>> i want to define 3 ideal breaks (bin) for each variable one of those variables is attached in the previous email,
>> i don't want to consider quartile method because quartile is not working ideally for that data set because data distribution is non normal.
>> so i want you to suggest another method so that i can define 3 breaks with the ideal interval for Recency, frequency and monetary to calculate RFM score.
>> i'm again attaching you some of the data set.
>> please look into it and help me with the R code.
>> Thanks
>>
>>
>>
>> Data
>>
>> user_id
>>
>> subtotal_amount
>>
>> created_at
>>
>> Recency
>>
>> Frequency
>>
>> Monetary
>>
>> 194849
>>
>> 6.99
>>
>> 8/22/2017
>>
> snipped
>
>>
>>
>> On 13 October 2017 at 10:35, PIKAL Petr <petr.pikal at precheza.cz<mailto:petr.pikal at precheza.cz>> wrote:
>> Hi
>>
>> Your statement about attaching data is problematic. We cannot do much with it. Instead use output from dput(yourdata) to show us what exactly your data look like.
>>
>> We also do not know how do you want to split your data. It would be nice if you can show also what should be the bins with respective data. Unless you provide this information you probably would not get any sensible answer.
>>
>> Cheers
>> Petr
>>
>>
>>> -----Original Message-----
>>> From: R-help [mailto:r-help-bounces at r-project.org<mailto:r-help-bounces at r-project.org>] On Behalf Of Hemant Sain
>>> Sent: Thursday, October 12, 2017 10:18 AM
>>> To: r-help mailing list <r-help at r-project.org<mailto:r-help at r-project.org>>
>>> Subject: [R] How to define proper breaks in RFM analysis
>>>
>>> Hello,
>>> I'm working on RFM analysis and i wanted to define my own breaks but my
>>> frequency distribution is not normally distributed so when I'm using quartile its
>>> not giving the optimal results.
>>> so I'm looking for a better approach where i can define breaks dynamically
>>> because after visualization i can do it easily but i want to apply this model so
>>> that it can automatically define the breaks according to data set.
>>> I'm attaching sample data for reference.
>>>
>>> Thanks
>>>
>>>                           *Freq*
>>> 5
>>> 15
>>> 1
> snipped
>> .
>>
>>       [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius
> Alameda, CA, USA
>
> 'Any technology distinguishable from magic is insufficiently advanced.'   -Gehm's Corollary to Clarke's Third Law
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.