[R] Duplicated function with conditional statement

arun smartpink111 at yahoo.com
Sun Jul 28 03:11:24 CEST 2013


HI,
May be this is what you wanted.
#using tt1
indx<-which(tt1$response=="buy")
tt1$newcolumn<-0
tt1[unique(unlist(lapply(seq_along(indx),function(i){x1<-if(i==length(indx)) seq(indx[i],nrow(tt1)) else if((indx[i+1]-indx[i])==1) indx[i] else seq(indx[i]+1,indx[i+1]-1);x2<- tt1[unique(c(indx[1:i],x1)),];x3<-subset(x2,response=="sample");x4<- subset(x2,response=="buy"); x5<-row.names(x4)[duplicated(x4$product)];x6<-if(nrow(x3)!=0) row.names(x3)[x3$product%in% x4$product];sort(c(x5,x6))}))),"newcolumn"]<-1


 tt1
   subj response product newcolumn
1     1   sample       1         0
2     1   sample       2         0
3     1      buy       3         0
4     2   sample       2         0
5     2      buy       2         0
6     3   sample       3         1
7     3   sample       2         1
8     3      buy       1         0
9     4   sample       1         1
10    4      buy       4         0
11    5      buy       4         1
12    5   sample       2         1
13    5      buy       2         1
14    6      buy       4         1
15    6   sample       5         0
16    6   sample       5         0
17    7   sample       4         1
18    7      buy       3         1
19    7      buy       4         1
20    8      buy       5         0
21    8   sample       4         1
22    8      buy       2         1
A.K.





________________________________
From: vanessa van der vaart <vanessa.vaart at gmail.com>
To: arun <smartpink111 at yahoo.com> 
Cc: David Winsemius <dwinsemius at comcast.net>; R help <r-help at r-project.org> 
Sent: Saturday, July 27, 2013 6:55 PM
Subject: Re: [R] Duplicated function with conditional statement



Dear all,,
thank you all for your help..Its been such a help but its not really exactly what I am looking for. Apparently I havent explained the condition very clearly. I hope this can works.

If the data on column product is duplicated from the previous row, (its applied for response==buy and ==sample) , and it is duplicated from the row which has the value on column 'response'== buy, than  the value = 1, otherwise is =0.
so in that case,
if the value is duplicated but it is duplicated from the previous row where the value of resonse==sample, than it is not considered duplicated, and in the new column is 0

thank you very much in advance,
I really appreciated



On Sat, Jul 27, 2013 at 3:45 AM, arun <smartpink111 at yahoo.com> wrote:


>
>On some slightly different datasets:
>tt1<-structure(list(subj = c(1, 1, 1, 2, 2, 3, 3, 3, 4, 4, 5, 5, 5,
>6, 6, 6, 7, 7, 7, 8, 8, 8), response = structure(c(2L, 2L, 1L,
>2L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 1L,
>1L, 2L, 1L), .Label = c("buy", "sample"), class = "factor"),
>    product = c(1, 2, 3, 2, 2, 3, 2, 1, 1, 4, 4, 2, 2, 4, 5,
>    5, 4, 3, 4, 5, 4, 2)), .Names = c("subj", "response", "product"
>), class = "data.frame", row.names = c(NA, 22L))
>
>tt2<- structure(list(subj = c(1, 1, 1, 2, 2, 3, 3, 3, 4, 4, 5, 5, 5,
>6, 6, 6, 7, 7, 7, 8, 8, 8), response = structure(c(2L, 2L, 1L,
>2L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L,
>1L, 2L, 2L), .Label = c("buy", "sample"), class = "factor"),
>    product = c(1, 2, 3, 2, 2, 3, 2, 1, 1, 4, 1, 4, 5, 1, 4,
>    2, 3, 3, 2, 5, 3, 4)), .Names = c("subj", "response", "product"
>), class = "data.frame", row.names = c(NA, 22L))
>
>tt3<- structure(list(subj = c(1, 1, 1, 2, 2, 3, 3, 3, 4, 4, 5, 5, 5,
>6, 6, 6, 7, 7, 7, 8, 8, 8), response = structure(c(2L, 2L, 1L,
>2L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 1L, 1L,
>1L, 1L, 2L), .Label = c("buy", "sample"), class = "factor"),
>    product = c(1, 2, 3, 2, 2, 3, 2, 1, 1, 4, 1, 1, 3, 5, 2,
>    2, 2, 2, 4, 3, 2, 5)), .Names = c("subj", "response", "product"
>), class = "data.frame", row.names = c(NA, 22L))
>
>
>#Tried David's solution:
>tt1$rown <- rownames(tt1)
>as.numeric ( apply(tt1, 1, function(x) {
>    x['product'] %in% tt1[ rownames(tt1) < x['rown'] & tt1$response == "buy", "product"]  } ) )
>  #gave inconsistent results especially since the first 10 rows were from `tt`
># [1] 0 1 1 1 1 1 1 0 1 0 1 0 0 1 0 0 1 0 1 0 1 1
>
>#similarly for `tt2` and `tt3`.
>
>
>##Created this function.  It seems to work in the tested cases, though it is not tested extensively.
>fun1<- function(dat,colName,newColumn){
>      indx<- which(dat[,colName]=="buy")
>      dat[,newColumn]<-0
>      dat[unlist(lapply(seq_along(indx),function(i){
>            x1<- if(i==length(indx)){
>                seq(indx[i],nrow(dat))
>             }
>            else if((indx[i+1]-indx[i])==1){
>            indx[i]
>            }
>            else {
>            seq(indx[i]+1,indx[i+1]-1)
>             }
>            x2<- dat[unique(c(indx[i:1],x1)),]
>            x3<- subset(x2,response=="sample")
>            x4<- subset(x2,response=="buy")
>            if(nrow(x3)!=0) {
>                            row.names(x3)[x3$product%in% x4$product]
>                       }
>                                   
>            })),newColumn]<-1
>    dat
>
>    }
>fun1(tt,"response","newCol")
>#   subj response product rown newCol
>#1     1   sample       1    1      0
>#2     1   sample       2    2      0
>#3     1      buy       3    3      0
>#4     2   sample       2    4      0
>#5     2      buy       2    5      0
>#6     3   sample       3    6      1
>#7     3   sample       2    7      1
>#8     3      buy       1    8      0
>#9     4   sample       1    9      1
>#10    4      buy       4   10      0
>
>fun1(tt1,"response","newCol")
>#   subj response product newCol
>#1     1   sample       1      0
>#2     1   sample       2      0
>#3     1      buy       3      0
>#4     2   sample       2      0
>#5     2      buy       2      0
>#6     3   sample       3      1
>#7     3   sample       2      1
>#8     3      buy       1      0
>#9     4   sample       1      1
>#10    4      buy       4      0
>#11    5      buy       4      0
>#12    5   sample       2      1
>#13    5      buy       2      0
>#14    6      buy       4      0
>#15    6   sample       5      0
>#16    6   sample       5      0
>#17    7   sample       4      1
>#18    7      buy       3      0
>#19    7      buy       4      0
>#20    8      buy       5      0
>#21    8   sample       4      1
>#22    8      buy       2      0
>#Also
> fun1(tt2,"response","newCol")
> fun1(tt3,"response","newCol")
>A.K.
>
>P.S.  Below is OP's clarification regarding the conditional statement in a private message:
>
>I am sorry i didnt question it very clearly, let me change the
>conditional statement, I hope you can understand. i will explain by
>example
>
>as you can see, almost every number is duplicated, but only in row 6th,7th,and 9th the value on column is 1.
>
>on row4th, the value is duplicated( 2 already occurred on 2nd row),but
>since the value is considered as duplicated only if the value is
>duplicated where the response is 'buy' than the value on column, on
>row4th still zero. 
>
>On row 6th, where the value product column is 3. 3 is already occurred
>in 3rd row where the value on response is 'buy', so the value on column
>should be 1
>
>I hope it can understand the conditional statement.
>
>
>
>
>
>
>
>
>
>----- Original Message -----
>From: David Winsemius <dwinsemius at comcast.net>
>To: David Winsemius <dwinsemius at comcast.net>
>Cc: R-help at r-project.org; Uwe Ligges <ligges at statistik.tu-dortmund.de>
>Sent: Friday, July 26, 2013 5:16 PM
>Subject: Re: [R] Duplicated function with conditional statement
>
>
>On Jul 26, 2013, at 2:06 PM, David Winsemius wrote:
>
>>
>> On Jul 26, 2013, at 11:51 AM, Uwe Ligges wrote:
>>
>>>
>>>
>>> On 25.07.2013 21:05, vanessa van der vaart wrote:
>>>> Hi everybody,,
>>>> I have a question about R function duplicated(). I have spent days try to
>>>> figure this out,but I cant find any solution yet. I hope somebody can help
>>>> me..
>>>> this is my data:
>>>>
>>>> subj=c(1,1,1,2,2,3,3,3,4,4)
>>>> response=c('sample','sample','buy','sample','buy','sample','
>>>> sample','buy','sample','buy')
>>>> product=c(1,2,3,2,2,3,2,1,1,4)
>>>> tt=data.frame(subj, response, product)
>>>>
>>>> the data look like this:
>>>>
>>>> subj response product
>>>> 1     1   sample       1
>>>> 2     1   sample       2
>>>> 3     1      buy          3
>>>> 4     2   sample       2
>>>> 5     2         buy       2
>>>> 6     3   sample       3
>>>> 7     3   sample       2
>>>> 8     3         buy       1
>>>> 9     4  sample       1
>>>> 10   4       buy        4
>>>>
>>>> I want to create new  column based on the value on response and product
>>>> column. if the value on product is duplicated, then  the value on new column
>>>> is 1, otherwise is 0.
>>>
>>>
>>> According to your description:
>>>
>>
>> Agree that the description did not match the output. I tried to match the output using a rule that could be expressed as:
>>
>> if( a "buy"- associated "product" value precedes the current "product" value){1}else{0}
>>
>
>So this delivers the specified output:
>
>tt$rown <- rownames(tt)
>as.numeric ( apply(tt, 1, function(x) {
>     x['product'] %in% tt[ rownames(tt) < x['rown'] & tt$response == "buy", "product"]  } ) )
>
># [1] 0 0 0 0 0 1 1 0 1 0
>
>> --
>> David.
>>
>>> tt$newcolumn <- as.integer(duplicated(tt$product) & tt$response=="buy")
>>>
>>> which is different from what you show us below, where I cannot derive any systematic rule from.
>>>
>>> Uwe Ligges
>>>
>>>> but I want to add conditional statement that the value on product column
>>>> will only be considered as duplicated if the value on response column is
>>>> 'buy'.
>>>> for illustration, the table should look like this:
>>>>
>>>> subj response product newcolumn
>>>> 1     1   sample       1          0
>>>> 2     1   sample       2          0
>>>> 3     1      buy          3          0
>>>> 4     2   sample       2          0
>>>> 5     2         buy       2          0
>>>> 6     3   sample       3          1
>>>> 7     3   sample       2           1
>>>> 8     3         buy       1           0
>>>> 9     4  sample       1            1
>>>> 10   4       buy       4             0
>>>>
>>>>
>>>> can somebody help me?
>>>> any help will be appreciated.
>>>> I am new in this mailing list, so forgive me in advance, If I did not  ask
>>>> the question appropriately.
>>>>
>>>>     [[alternative HTML version deleted]]
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> David Winsemius
>> Alameda, CA, USA
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>David Winsemius
>Alameda, CA, USA
>
>______________________________________________
>R-help at r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.
>
>



More information about the R-help mailing list