[R] How to get items for both LHS and RHS for only specific columns in arules?‏

Kim C. minorthreatx at hotmail.com
Fri Jan 16 15:28:31 CET 2015


Sorry, I see that the formatting of the e-mail went all wrong and was completely unreadable. you can find a readable version in the attachment and down below (if it will work this time).
----------------------------------------------------------
Hi all, 

I have a question about the arules package in R. I hope the example tables are readable in your email, otherwise you can view it in the question.txt in the attachment.

Within the apriori function in the arules package, I want the outcome to only contain these two variables in the LHS HouseOwnerFlag=0 and HouseOwnerFlag=1. The RHS should only contain attributes from the column Product. For instance:

   lhs                                          rhs                                    
1 {HouseOwnerFlag=0} => {Product=SV 16xDVD M360 Black}          
2 {HouseOwnerFlag=1} => {Product=Adventure Works 26" 720p}      
3 {HouseOwnerFlag=0} => {Product=Litware Wall Lamp E3015 Silver}
4 {HouseOwnerFlag=1} => {Product=Contoso Coffee Maker 5C E0900} 

So now I use the following: 
rules <- apriori(sales, parameter=list(support =0.01, confidence =0.8, minlen=2), appearance = list(lhs=c("HouseOwnerFlag=0", "HouseOwnerFlag=1")))

Then I use this to ensure that only the Product column is on the RHS: 
inspect( subset( rules, subset = rhs %pin% "Product=" ) )

The outcome is like this (for the sake of readability, I omitted the colomns for support, lift, confidence):
    lhs                                                                                                                                              rhs 
1 {ProductKey=153, IncomeGroup=Moderate, BrandName=Adventure Works }   => {Product=SV 16xDVD M360 Black} 
2 {ProductKey=176, MaritalStatus=M, ProductCategoryName=TV and Video }      => {Product=Adventure Works 26" 720p} 
3 {BrandName=Southridge Video, NumberChildrenAtHome=0 }                               => {Product=Litware Wall Lamp E3015 Silver} 
4 {HouseOwnerFlag=1, BrandName=Southridge Video, ProductKey=170 }             => {Product=Contoso Coffee Maker 5C E0900} 

So apparently the LHS is able to contain every possible column, not just HouseOwnerFlag like I specified.  I see that I can put default="rhs" in the apriori function to prevent this, like so: 
rules <- apriori(sales, parameter=list(support =0.001, confidence =0.5, minlen=2), appearance = list(lhs=c("HouseOwnerFlag=0", "HouseOwnerFlag=1"), default="rhs")) 

Then upon inspecting (without the subset part, just inspect(rules), there are far less rules (7) than before but it does indeed only contain
HouseOwnerFlag in the LHS:

    lhs                  rhs                                 
1 {HouseOwnerFlag=0} => {MaritalStatus=S}                   
2 {HouseOwnerFlag=1} => {Gender=M}                      
3 {HouseOwnerFlag=0} => {NumberChildrenAtHome=0}          
4 {HouseOwnerFlag=1} => {Gender=M}                        

However on the RHS there's nothing from the column Product in the RHS. So it has no use to inspect it with subset as ofcourse it would return null. I tested it several times with different support numbers to experiment and see if Product would appear or not, but the 7 same rules remain the same.

So my question is, how can I specify both the LHS (HouseOwnerFlag) and RHS (Product)? What am I doing wrong?

You can reproduce this problem by downloading this testdataset from the attachment or via this link:
https://www.dropbox.com/s/tax5xalac5xgxtf/testdf.txt?dl=0 
Mind you, I only took the first 20 rows from a huge dataset (12 million), so the output here won't have the same product names as the example I displayed above. But the problem still remains the same. (if you would like to have the entire dataset I can email it ofcourse). I want to be able to get only HouseOwnerFlag=0 and/or HouseOwnerFlag=1 on the LHS and the column Product on the RHS. 

I asked this question on other forum before, but no response at all unfortunately. Since this mailinglist is dedicated to R only I thought you guys might be able to help me. 

Thanks in advance! I look forward to hear from you.

Kim

From: minorthreatx at hotmail.com
To: r-help at r-project.org
Date: Thu, 15 Jan 2015 13:50:54 +0100
Subject: [R] How to get items for both LHS and RHS for only specific columns in arules?‏

Hi all, I have a question about the arules package in R. I hope the example tables are readable in your email, otherwise you can view it in the question.txt in the attachment.Within the apriori function in the arules package, I want the outcome to only contain these two variables in the LHS HouseOwnerFlag=0 and HouseOwnerFlag=1. The RHS should only contain attributes from the column Product. For instance: lhs rhs support confidence lift1 {HouseOwnerFlag=0} => {Product=SV 16xDVD M360 Black} 0.2500000 0.2500000 1.0000002 {HouseOwnerFlag=1} => {Product=Adventure Works 26" 720p} 0.2500000 0.2500000 1.0000003 {HouseOwnerFlag=0} => {Product=Litware Wall Lamp E3015 Silver}0.1666667 0.3333333 1.3333334 {HouseOwnerFlag=1} => {Product=Contoso Coffee Maker 5C E0900} 0.1666667 0.3333333 1.333333So now I use the following: rules <- apriori(sales, parameter=list(support =0.01, confidence =0.8, minlen=2), appearance = list(lhs=c("HouseOwnerFlag=0", "HouseOwnerFlag=1")))Then I use this to ensure that only the Product column is on the RHS: inspect( subset( rules, subset = rhs %pin% "Product=" ) )The outcome is like this (for the sake of readability, I omitted the colomns for support, lift, confidence): lhs rhs 1 {ProductKey=153, IncomeGroup=Moderate, BrandName=Adventure Works } => {Product=SV 16xDVD M360 Black} 2 {ProductKey=176, MaritalStatus=M, ProductCategoryName=TV and Video } => {Product=Adventure Works 26" 720p} 3 {BrandName=Southridge Video, NumberChildrenAtHome=0 } => {Product=Litware Wall Lamp E3015 Silver} 4 {HouseOwnerFlag=1, BrandName=Southridge Video, ProductKey=170 } => {Product=Contoso Coffee Maker 5C E0900} So apparently the LHS is able to contain every possible column, not just HouseOwnerFlag like I specified. I see that I can put default="rhs" in the apriori function to prevent this, like so: rules <- apriori(sales, parameter=list(support =0.001, confidence =0.5, minlen=2), appearance = list(lhs=c("HouseOwnerFlag=0", "HouseOwnerFlag=1"), default="rhs")) Then upon inspecting (without the subset part, just inspect(rules), there are far less rules (7) than before but it does indeed only containHouseOwnerFlag in the LHS: lhs rhs support confidence lift1 {HouseOwnerFlag=0} => {MaritalStatus=S} 0.2500000 0.2500000 1.0000002 {HouseOwnerFlag=1} => {Gender=M} 0.2500000 0.2500000 1.0000003 {HouseOwnerFlag=0} => {NumberChildrenAtHome=0} 0.1666667 0.3333333 1.3333334 {HouseOwnerFlag=1} => {Gender=M} 0.1666667 0.3333333 1.333333However on the RHS there's nothing from the column Product in the RHS. So it has no use to inspect it with subset as ofcourse it would return null. I tested it several times with different support numbers to experiment and see if Product would appear or not, but the 7 same rules remain the same.So my question is, how can I specify both the LHS (HouseOwnerFlag) and RHS (Product)? What am I doing wrong?You can reproduce this problem by downloading this testdataset from the attachment (testdf.txt) or via this link:https://www.dropbox.com/s/tax5xalac5xgxtf/testdf.txt?dl=0 Mind you, I only took the first 20 rows from a huge dataset (12 million), so the output here won't have the same product names as the example I displayed above. But the problem still remains the same. (if you would like to have the entire dataset I can email it ofcourse). I want to be able to get only HouseOwnerFlag=0 and/or HouseOwnerFlag=1 on the LHS and the column Product on the RHS. I asked this question on other forum before, but no response at all unfortunately. Since this mailinglist is dedicated to R only I thought you guys might be able to help me. Thanks in advance! I look forward to hear from you.Kim 

______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. 		 	   		  
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: question.txt
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20150116/38a4bea8/attachment.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: testdf.txt
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20150116/38a4bea8/attachment-0001.txt>


More information about the R-help mailing list