[R] RFM analysis
Jim Lemon
drjimlemon at gmail.com
Wed Oct 11 23:54:49 CEST 2017
Hi Hemant,
Let's take it one step at a time. Save this code as "qdrfm.R" in your
R working directory: It includes the comments I added last time and
fixes a bug in the recency scoring.
qdrfm<-function(x,rbreaks=3,fbreaks=3,mbreaks=3,
date.format="%Y-%m-%d",weights=c(1,1,1),finish=NA) {
# if no finish date is specified, use current date
if(is.na(finish)) finish<-as.Date(date(), "%a %b %d %H:%M:%S %Y")
x$rscore<-as.numeric(finish-as.Date(x[,3],date.format))
cat("Range of purchase recency",range(x$rscore),"\n")
cat("Range of purchase freqency",range(table(x[,1])),"\n")
cat("Range of purchase amount",range(by(x[,2],x[,1],sum)),"\n")
custIDs<-unique(x[,1])
ncust<-length(custIDs)
# initialize a data frame to hold the output
rfmout<-data.frame(custID=custIDs,rscore=rep(0,ncust),
fscore=rep(0,ncust),mscore=rep(0,ncust))
# categorize the minimum number of days
# since last purchase for each customer
rfmout$rscore<-cut(by(x$rscore,x[,1],min),breaks=rbreaks,labels=FALSE)
# categorize the number of purchases
# recorded for each customer
rfmout$fscore<-cut(table(x[,1]),breaks=fbreaks,labels=FALSE)
# categorize the amount purchased
# by each customer
rfmout$mscore<-cut(by(x[,2],x[,1],sum),breaks=mbreaks,labels=FALSE)
# calculate the RFM score from the
# optionally weighted average of the above
rfmout$cscore<-round((weights[1]*rfmout$rscore+
weights[2]*rfmout$fscore+
weights[3]*rfmout$mscore)/sum(weights),2)
return(rfmout[order(rfmout$cscore),])
}
Now you can load the function into your workspace like this:
source("qdrfm.R")
Load your data:
df<-read.csv("df.csv")
Run the function with the defaults except for the finish date:
df.rfm<-qdrfm(df,finish=as.Date("2017-08-31"))
Range of purchase recency 31 122
Range of purchase freqency 1 4
Range of purchase amount 5.97 127.65
Your problem is now apparent. If I use the following breaks, I will
generate NA values in all three scores:
df.rfm2<-qdrfm(df,rbreaks=c(10,30,50),fbreaks=c(1,2,3),
mbreaks=c(8,14,400),finish=as.Date("2017-08-31"))
head(df.rfm2)
As I wrote before, the breaks _must_ cover the range of values if you
want a sensible analysis:
df.rfm3<-qdrfm(df,rbreaks=c(0,75,150),fbreaks=c(0,2,5),
mbreaks=c(0,75,150),finish=as.Date("2017-08-31"))
head(df.rfm3)
Looking at df.rfm3, it seems that the recency score is the only one
discriminating users. This suggests to me that the data distributions
are causing a problem. First, you have 946 users in a dataset of 1000
rows, meaning that almost all made only one transaction. Second, your
purchase amounts are concentrated in the 0-20 range. Therefore if I
change the breaks to reflect this, I get a much better separation of
customers:
df.rfm4<-qdrfm(df,rbreaks=c(0,75,150),fbreaks=c(0,1,5),
mbreaks=c(0,10,150),finish=as.Date("2017-08-31"))
Maybe this will get you going.
Jim
On Wed, Oct 11, 2017 at 4:43 PM, Hemant Sain <hemantsain55 at gmail.com> wrote:
> Also try to put finish date as 2017-08-31.
> and help me with the complete running r code.
>
> On 11 October 2017 at 10:36, Hemant Sain <hemantsain55 at gmail.com> wrote:
>>
>> Hey Jim,
>> i'm attaching you the actual dataset i'm working on and i want RFM breaks
>> as
>> r=(10,30,50), f=(1,2,3),m=(8,14,400).
>>
More information about the R-help
mailing list