[R] grouping followed by finding frequent patterns in R

Bert Gunter gunter.berton at gene.com
Sun Mar 10 15:55:55 CET 2013


1.Please cc to the list, as I have here, unless your comments are off topic.

2. Use dput() (?dput) to include **small** amounts of data in your
message, as attachments are generally stripped from r-help.

3. I have no experience with itemsets or the arules package, but a
quick glance at the docs there said that your data argument must be in
a specific form coercible into an S4 "transactions" class. I suspect
that neither your initial data frame nor the list deriving from split
is, but maybe someone familiar with the package can tell you for sure.
That's why you need to cc to the list.

-- Bert


On Sun, Mar 10, 2013 at 7:04 AM, Dhiman Biswas <crazydhimu at gmail.com> wrote:
> Dear Bert,
>
> My intention is to mine frequent itemsets of TRN_TYP for individual CIN out
> of that data.
> But the problem is using eclat after splitting gives the following error:
>
> Error in eclat(list) : internal error in trio library
>
> PS: I have attached my dataset.
>
>
> On Sat, Mar 9, 2013 at 8:27 PM, Bert Gunter <gunter.berton at gene.com> wrote:
>>
>> I **suggest** that you explain what you wish to accomplish using a
>> reproducible example rather than telling us what packages you think
>> you should use. I believe you are making things too complicated; e.g.
>> what do you mean by "frequent patterns"?  Moreover, "basket format" is
>> rather unclear -- and may well be unnecessary. But using lists, it
>> could be simply accomplished by
>>
>> ?split  ## as in
>> the_list <- with(yourdata, split(TYP,  CIN.TRN))
>>
>> or possibly
>>
>> the_list <- with(yourdata, tapply(TYP,CIN.TRN, FUN = table))
>>
>> Of course, these may be irrelevant and useless, but without knowing
>> your purpose ...?
>>
>> -- Bert
>>
>> On Sat, Mar 9, 2013 at 4:37 AM, Dhiman Biswas <crazydhimu at gmail.com>
>> wrote:
>> > I have a data in the following form :
>> > CIN TRN_TYP
>> > 9079954    1
>> > 9079954    2
>> > 9079954    3
>> > 9079954    4
>> > 9079954    5
>> > 9079954    4
>> > 9079954    5
>> > 9079954    6
>> > 9079954    7
>> > 9079954    8
>> > 9079954    9
>> > 9079954    9
>> > .                    .
>> > .                    .
>> > .                    .
>> > there are 100 types of CIN (9079954,12441087,15246633,...) and
>> > respective
>> > TRN_TYP
>> >
>> > first of all, I want this data to be grouped into basket format:
>> > 9079954   1, 2, 3, 4, 5, ....
>> > 12441087  19, 14, 21, 3, 7, ...
>> > .
>> > .
>> > .
>> > and then apply eclat from arules package to find frequent patterns.
>> >
>> > 1) I ran the following code:
>> > file<-read.csv("D:/R/Practice/Data_Input_NUM.csv")
>> > file <- file[!duplicated(file),]
>> > eclat(split(file$TRN_TYP,file$CIN))
>> >
>> > but it gave me the following error:
>> > Error in asMethod(object) : can not coerce list with transactions with
>> > duplicated items
>> >
>> > 2) I ran this code:
>> > file<-read.csv("D:/R/Practice/Data_Input_NUM.csv")
>> > file_new<-file[,c(3,6)] # because my file Data_Input_NUM has many other
>> > columns as well, so I selecting only CIN and TRN_TYP
>> > file_new <- file_new[!duplicated(file_new),]
>> > eclat(split(file_new$TRN_TYP,file_new$CIN))
>> >
>> > but again:
>> > Error in eclat(split(file_new$TRN_TYP, file_new$CIN)) :
>> >   internal error in trio library
>> >
>> > PLEASE HELP
>> >
>> >         [[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > R-help at r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>>
>>
>>
>> --
>>
>> Bert Gunter
>> Genentech Nonclinical Biostatistics
>>
>> Internal Contact Info:
>> Phone: 467-7374
>> Website:
>>
>> http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
>
>



-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm



More information about the R-help mailing list