[R] RFM analysis

Wed Oct 11 23:54:49 CEST 2017

Hi Hemant,
Let's take it one step at a time. Save this code as "qdrfm.R" in your
R working directory: It includes the comments I added last time and
fixes a bug in the recency scoring.

qdrfm<-function(x,rbreaks=3,fbreaks=3,mbreaks=3,
 date.format="%Y-%m-%d",weights=c(1,1,1),finish=NA) {

 # if no finish date is specified, use current date
 if(is.na(finish)) finish<-as.Date(date(), "%a %b %d %H:%M:%S %Y")
 x$rscore<-as.numeric(finish-as.Date(x[,3],date.format))
 cat("Range of purchase recency",range(x$rscore),"\n")
 cat("Range of purchase freqency",range(table(x[,1])),"\n")
 cat("Range of purchase amount",range(by(x[,2],x[,1],sum)),"\n")
 custIDs<-unique(x[,1])
 ncust<-length(custIDs)
 # initialize a data frame to hold the output
 rfmout<-data.frame(custID=custIDs,rscore=rep(0,ncust),
  fscore=rep(0,ncust),mscore=rep(0,ncust))
 # categorize the minimum number of days
 # since last purchase for each customer
 rfmout$rscore<-cut(by(x$rscore,x[,1],min),breaks=rbreaks,labels=FALSE)
 # categorize the number of purchases
 # recorded for each customer
 rfmout$fscore<-cut(table(x[,1]),breaks=fbreaks,labels=FALSE)
 # categorize the amount purchased
 # by each customer
 rfmout$mscore<-cut(by(x[,2],x[,1],sum),breaks=mbreaks,labels=FALSE)
 # calculate the RFM score from the
 # optionally weighted average of the above
 rfmout$cscore<-round((weights[1]*rfmout$rscore+
  weights[2]*rfmout$fscore+
  weights[3]*rfmout$mscore)/sum(weights),2)
 return(rfmout[order(rfmout$cscore),])
}

Now you can load the function into your workspace like this:

source("qdrfm.R")

Load your data:

df<-read.csv("df.csv")

Run the function with the defaults except for the finish date:

df.rfm<-qdrfm(df,finish=as.Date("2017-08-31"))
Range of purchase recency 31 122
Range of purchase freqency 1 4
Range of purchase amount 5.97 127.65

Your problem is now apparent. If I use the following breaks, I will
generate NA values in all three scores:

df.rfm2<-qdrfm(df,rbreaks=c(10,30,50),fbreaks=c(1,2,3),
 mbreaks=c(8,14,400),finish=as.Date("2017-08-31"))
head(df.rfm2)

As I wrote before, the breaks _must_ cover the range of values if you
want a sensible analysis:

df.rfm3<-qdrfm(df,rbreaks=c(0,75,150),fbreaks=c(0,2,5),
 mbreaks=c(0,75,150),finish=as.Date("2017-08-31"))
head(df.rfm3)

Looking at df.rfm3, it seems that the recency score is the only one
discriminating users. This suggests to me that the data distributions
are causing a problem.  First, you have 946 users in a dataset of 1000
rows, meaning that almost all made only one transaction. Second, your
purchase amounts are concentrated in the 0-20 range. Therefore if I
change the breaks to reflect this, I get a much better separation of
customers:

df.rfm4<-qdrfm(df,rbreaks=c(0,75,150),fbreaks=c(0,1,5),
 mbreaks=c(0,10,150),finish=as.Date("2017-08-31"))

Maybe this will get you going.

Jim

On Wed, Oct 11, 2017 at 4:43 PM, Hemant Sain <hemantsain55 at gmail.com> wrote:
> Also try to put finish date as 2017-08-31.
> and help me with the complete running r code.
>
> On 11 October 2017 at 10:36, Hemant Sain <hemantsain55 at gmail.com> wrote:
>>
>> Hey Jim,
>> i'm attaching you the actual dataset i'm working on and i want RFM breaks
>> as
>> r=(10,30,50), f=(1,2,3),m=(8,14,400).
>>