[R] Duplicate names in the pivot column

phii m@iii@g oii phiiipsmith@c@ phii m@iii@g oii phiiipsmith@c@
Sun Mar 29 17:45:40 CEST 2020


Thank you very much, Jim and Jeff. Both of your solutions work 
splendidly.

Philip

On 2020-03-29 02:25, Jim Lemon wrote:
> Hi Phil,
> Sorry it's not in the environment you are using, but perhaps this will 
> help:
> 
> taby<-table(df$y)
> ynames<-names(taby)
> for(yval in 1:length(taby)) {
>  if(taby[yval] > 1) {
>   cat(paste(ynames[yval],1:taby[yval],sep=""),"\n")
>   df$y[which(df$y == 
> ynames[yval])]<-paste(ynames[yval],1:taby[yval],sep="")
>  }
> }
> 
> Jim
> 
> On Sun, Mar 29, 2020 at 12:19 PM <phil using philipsmith.ca> wrote:
>> 
>> I have a problem involving inefficient coding. My code works, but in 
>> my
>> actual application it takes a very long time to execute. I have 
>> included
>> a reprex here that uses the same code, but with a much smaller-scale
>> application.
>> 
>> The data frame I am working with (df in my reprex) is in long form and 
>> I
>> want to change it to wide form. My problem is that the pivot column,
>> column 2 in my reprex, has some duplicate strings, so the pivot 
>> doesn't
>> work well (df1 in my reprex). I want to find all the duplicates and 
>> tag
>> them so they are no longer duplicates. My code succeeds (df3 in my
>> reprex). But in the real application there can be over 100 "cases" and
>> the for loops grind on far too long.
>> 
>> I encounter this problem frequently in the datasets I use, so I am
>> looking for a general solution that is as efficient as possible. Any
>> help will be much appreciated.
>> 
>> Philip
>> 
>> ``` r
>> library(tidyverse)
>> df <- data.frame(time=c(1,1,1,1,1,1,2,2,2,2,2,2),
>>                   
>> y=c("A","B","C","B","D","C","A","B","C","B","D","C"),
>>                   
>> z=sample(1:100,12,replace=TRUE),stringsAsFactors=FALSE)
>> df1 <- pivot_wider(df,id_cols=1,names_from=y,values_from=z)
>> #> Warning: Values in `z` are not uniquely identified; output will
>> contain list-cols.
>> #> * Use `values_fn = list(z = list)` to suppress this warning.
>> #> * Use `values_fn = list(z = length)` to identify where the 
>> duplicates
>> arise
>> #> * Use `values_fn = list(z = summary_fun)` to summarise duplicates
>> fixcol <- function(dfm,cases,per,s,tag) {
>>    # dfm is the data frame
>>    # s is the target column number, containing character names
>>    # tag is a string to be added to a duplicate name
>>    # cases is the number of rows for a single time period
>>    # per is the number of time periods
>>    # all time periods must have the same number of rows
>>    for (k in 1:per) {
>>      for (i in (1+(k-1)*cases):(k*cases-1)) {
>>        for (j in (i+1):(k*cases)) {
>>          if (dfm[j,s]==dfm[i,s]) { # found a duplicate
>>            dfm[j,s] <- paste0(dfm[i,s],tag) # fix the duplicate
>>            dfm[j,s]
>>          }
>>        }
>>      }
>>    }
>>    return(dfm)
>> }
>> df2 <- fixcol(df,6,2,2,"_dup")
>> df3 <- pivot_wider(df2,id_cols=1,names_from=y,values_from=z)
>> ```
>> 
>> <sup>Created on 2020-03-28 by the [reprex
>> package](https://reprex.tidyverse.org) 
>> (v0.3.0)</sup>______________________________________________
>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide 
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list