[R] Match strings across two differently sized dataframes and copy corresponding row to dataframe

jim holtman jholtman at gmail.com
Thu Jun 30 17:36:58 CEST 2011


?merge

On Thu, Jun 30, 2011 at 9:35 AM, Chris Beeley <chris.beeley at gmail.com> wrote:
> Hello-
>
> Sorry, this is a bit of a noob question, but I can't seem to progress
> it any further.
>
> I have two dataframes which contain a series of strings which exactly
> match. The problem is one has more rows than the other (more cases
> have been added) and they have been sorted so that they are not in the
> same order. The smaller dataframe, though, contains in another column
> which has codes classifying the strings.
>
> So, for every row of the larger dataframe, I want to look up the
> string in the smaller dataframe, and then use that row number to copy
> across the code for the string into the larger dataframe. Here's my
> idea so far:
>
> # comments is the smaller dataframe with the codes, mydata is the
> larger dataframe to which I would like to copy it.
>
> commvec=charmatch(comments$ImproveOne, mydata$Improve)  # this is the
> match between the strings one way
> datavec=charmatch(mydata$Improve, comments$ImproveOne) # this is the
> match the other way
>
> mydata$ImproveCat1=NA # produce a variable to hold the copied codes
>
> mydata$ImproveCat1[datavec[!is.na(datavec)]]=
> comments$ImproveCat[commvec[!is.na(commvec)]] # for all the non
> missing row numbers identified in the larger dataframe-
> # copy the corresponding code from the smaller dataframe (which lives
> in comments$ImproveCat
>
> However, the last command doesn't work because the variables are not
> the same length. They nearly are though, not sure if that's
> coincidence or shows I'm close
>
> length(mydata$ImproveCat1[datavec[!is.na(datavec)]]) # yields 1567
>
> length(comments$ImproveCat[commvec[!is.na(commvec)]]) # yields 1512
>
> I'm sorry, I did try to construct an example dataframe, but ironically
> I can't make that work either! Sorry!
>
> Any help gratefully received.
>
> Many thanks!
>
> Chris Beeley
> Institute of Mental Health, UK
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?



More information about the R-help mailing list