[R] Data transformation & cleaning

Wed Sep 28 09:36:55 CEST 2011

Seems your questions belong to rule mining for frequent item sets.
check arules package

Weidong Gu

On Tue, Sep 27, 2011 at 11:13 PM, pip56789 <pde3p at virginia.edu> wrote:
> Hi,
>
> I have a few methodological and implementation questions for ya'll. Thank
> you in advance for your help. I have a dataset that reflects people's
> preference choices. I want to see if there's any kind of clustering effect
> among certain preference choices (e.g. do people who pick choice A also pick
> choice D).
>
> I have a data set that has one record per user ID, per preference choice.
> It's a "long" form of a data set that looks like this:
>
> ID | Page
> 123 | Choice A
> 123 | Choice B
> 456 | Choice A
> 456 | Choice B
> ...
>
> I thought that I should do the following
>
> 1. Make the data set "wide", counting the observations so the data looks
> like this:
> ID | Count of Preference A | Count of Preference B
> 123 | 1 | 1
> ...
>
> Using
> table1 <- dcast(data,ID ~ Page,fun.aggregate=length,value_var='Page' )
>
> 2. Create a correlation matrix of preferences
> cor(table2[,-1])
>
> How would I restrict my correlation to show preferences that met a minimum
> sample threshold? Can you confirm if the two following commands do the same
> thing? What would I do from here (or am I taking the wrong approach)
> table1 <- dcast(data,Page ~ Page,fun.aggregate=length,value_var='Page' )
> table2 <- with(data, table(Page,Page))
>
>
> many thanks,
> Peter
>
> --
> View this message in context: http://r.789695.n4.nabble.com/Data-transformation-cleaning-tp3849889p3849889.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>