[R] Duplicated function with conditional statement

David Winsemius dwinsemius at comcast.net
Fri Jul 26 23:16:34 CEST 2013


On Jul 26, 2013, at 2:06 PM, David Winsemius wrote:

> 
> On Jul 26, 2013, at 11:51 AM, Uwe Ligges wrote:
> 
>> 
>> 
>> On 25.07.2013 21:05, vanessa van der vaart wrote:
>>> Hi everybody,,
>>> I have a question about R function duplicated(). I have spent days try to
>>> figure this out,but I cant find any solution yet. I hope somebody can help
>>> me..
>>> this is my data:
>>> 
>>> subj=c(1,1,1,2,2,3,3,3,4,4)
>>> response=c('sample','sample','buy','sample','buy','sample','
>>> sample','buy','sample','buy')
>>> product=c(1,2,3,2,2,3,2,1,1,4)
>>> tt=data.frame(subj, response, product)
>>> 
>>> the data look like this:
>>> 
>>> subj response product
>>> 1     1   sample       1
>>> 2     1   sample       2
>>> 3     1      buy          3
>>> 4     2   sample       2
>>> 5     2         buy       2
>>> 6     3   sample       3
>>> 7     3   sample       2
>>> 8     3         buy       1
>>> 9     4  sample       1
>>> 10   4       buy        4
>>> 
>>> I want to create new  column based on the value on response and product
>>> column. if the value on product is duplicated, then  the value on new column
>>> is 1, otherwise is 0.
>> 
>> 
>> According to your description:
>> 
> 
> Agree that the description did not match the output. I tried to match the output using a rule that could be expressed as: 
> 
> if( a "buy"- associated "product" value precedes the current "product" value){1}else{0}
> 

So this delivers the specified output:

tt$rown <- rownames(tt)
as.numeric ( apply(tt, 1, function(x) { 
     x['product'] %in% tt[ rownames(tt) < x['rown'] & tt$response == "buy", "product"]  } ) )

# [1] 0 0 0 0 0 1 1 0 1 0

> -- 
> David.
> 
>> tt$newcolumn <- as.integer(duplicated(tt$product) & tt$response=="buy")
>> 
>> which is different from what you show us below, where I cannot derive any systematic rule from.
>> 
>> Uwe Ligges
>> 
>>> but I want to add conditional statement that the value on product column
>>> will only be considered as duplicated if the value on response column is
>>> 'buy'.
>>> for illustration, the table should look like this:
>>> 
>>> subj response product newcolumn
>>> 1     1   sample       1          0
>>> 2     1   sample       2          0
>>> 3     1      buy          3          0
>>> 4     2   sample       2          0
>>> 5     2         buy       2          0
>>> 6     3   sample       3          1
>>> 7     3   sample       2           1
>>> 8     3         buy       1           0
>>> 9     4  sample       1            1
>>> 10   4       buy       4             0
>>> 
>>> 
>>> can somebody help me?
>>> any help will be appreciated.
>>> I am new in this mailing list, so forgive me in advance, If I did not  ask
>>> the question appropriately.
>>> 
>>> 	[[alternative HTML version deleted]]
>>> 
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>> 
>> 
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> David Winsemius
> Alameda, CA, USA
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA



More information about the R-help mailing list