[R] using match-type function to return correctly ordered data from a dataframe

William Dunlap wdunlap at tibco.com
Sat Oct 27 18:26:00 CEST 2012


Is the following what you want?

  > dfLETTERS <- data.frame(LETTER=LETTERS[1:5], lData=c("Ay","Bee","Cee","Dee","Eee"), row.names=sprintf("LRow%d",1:5))
  > z <- c("D", "B", "A", "B")
  > dfLETTERS[match(z, dfLETTERS$LETTER), ]
          LETTER lData
  LRow4        D   Dee
  LRow2        B   Bee
  LRow1        A    Ay
  LRow2.1      B   Bee
  > # or when z includes things not in the list to match:
  > dfLETTERS[match(c("E",NA,"notALetter","A"), dfLETTERS$LETTER), ]
        LETTER lData
  LRow5      E   Eee
  NA      <NA>  <NA>
  NA.1    <NA>  <NA>
  LRow1      A    Ay

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf
> Of Markus Weisner
> Sent: Saturday, October 27, 2012 9:01 AM
> To: Jeff Newmiller
> Cc: r-help at r-project.org
> Subject: Re: [R] using match-type function to return correctly ordered data from a
> dataframe
> 
> Hi Jeff. I believe my Function #1 actually does use "%in%" to select the
> data.  I use "%in%" all the time but, as far as I can tell, it can only
> return a vector of logical values.  As a result, it does keep the order of
> the dataframe from which you are selecting data.  It does not, however,
> appear that you can return the data in the order of the values that you
> were specifying the data to be in.
> 
> To try and clarify my order assertion, take for example a dataframe that
> has a column "LETTER" with a record for each alphabetical letter.  The
> dataframe is ordered so that "A" is record 1 and "Z" is record 26.  Say
> that I want to pull records from this dataframe based on a list of letters
> and I want it to return those records in the order of the letters I passed
> it.  I could use a something like the following code to pull records ...
> 
> myDataFrame[myDataFrame$LETTERS, %in% myPassedListOfLetters,]
> 
> If I pass it the list, myPassedListOfLetters <- c("C", "B", "A"), I will
> receive the data back in the order "A", "B", "C".  What I am trying to
> figure out is how to get the data back in the order of the list that I
> specified I want the data in ("C", "B", "A").
> 
> Hope that clarifies what I am trying to figure out a bit.  Thanks for your
> help!
> Best,
> Markus
> 
> 
> 
> 
> On Fri, Oct 26, 2012 at 11:00 PM, Jeff Newmiller
> <jdnewmil at dcn.davis.ca.us>wrote:
> 
> > Have you actually read
> >
> > ?"%in%"
> >
> > ?
> >
> > Although a valuable tool, not all answers are most effectively obtained by
> > Googling.
> >
> > Also, your repeated assertions that the answers are not maintained in
> > order are poorly framed. They DO stay in order according to the zipcode
> > database order. That said, your desire for numeric indexes is only as far
> > away as your help file.
> > ---------------------------------------------------------------------------
> > Jeff Newmiller                        The     .....       .....  Go Live...
> > DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live
> > Go...
> >                                       Live:   OO#.. Dead: OO#..  Playing
> > Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
> > /Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
> > ---------------------------------------------------------------------------
> > Sent from my phone. Please excuse my brevity.
> >
> > Markus Weisner <r at themarkus.com> wrote:
> >
> > >I am regularly running into a problem where I can't seem to figure out
> > >how
> > >maintain correct data order when selecting data out of a dataframe.
> > >The
> > >below code shows an example of trying to pull data from a dataframe
> > >using
> > >ordered zip codes.  My problem is returning the pulled data in the
> > >correct
> > >order.  This is a very simple example, but it illustrates a regular
> > >problem
> > >that I am running into.
> > >
> > >In the past, I have used fairly complicated solutions to pull this off.
> > >There has got to be a more simple and straightforward method ...
> > >probably
> > >some function that I missed in all my googling.
> > >
> > >Thanks in advance for anybody's help figuring this out.
> > >~Markus
> > >
> > >
> > >### Function Definitions ###
> > >
> > ># FUNCTION #1 (returns wrong order)
> > >getLatitude1 = function(myzips) {
> > >
> > >  # load libraries and data
> > >  library(zipcode)
> > >  data(zipcode)
> > >
> > >  # get latitude values
> > > mylats = zipcode[zipcode$zip %in% myzips, "latitude"] #problem is that
> > >this code does not maintain order
> > >
> > >  # return data
> > >  return(mylats)
> > >}
> > >
> > ># FUNCTION #2 (also returns wrong order)
> > >getLatitude2 = function(myzips) {
> > >
> > >  # load libraries and data
> > >  library(zipcode)
> > >  data(zipcode)
> > >
> > >  # convert myzips to DF
> > >  myzips = as.data.frame(as.character(myzips))
> > >
> > >  # merge in zipcode data based on zip
> > >  results = merge(myzips, zipcode[,c("zip", "latitude")], by.x =
> > >"as.character(myzips)", by.y="zip", all.x=TRUE)
> > >
> > >  # return data
> > >  return(results$latitude)
> > >}
> > >
> > >
> > >### Code ###
> > >
> > ># specify a set of zip codes
> > >myzips = c("74432", "72537", "06026", "01085", "65793")
> > >
> > ># create a DF
> > >myzips.df = data.frame(zip=myzips, latitude=NA, longitude=NA)
> > >
> > ># look at data to determine what should be returned and in what order
> > >library(zipcode)
> > >data(zipcode)
> > >zipcode[zipcode$zip %in% myzips,]
> > >
> > ># test function #1 (function definition below)
> > >myzips.df$latitude = getLatitude1(myzips.df$zip) #returns wrong order
> > >
> > ># test function #2 (function definition below)
> > >myzips.df$latitude = getLatitude2(myzips.df$zip) #also returns wrong
> > >order
> > >
> > >
> > >
> > ># need "myzips %in% zipcode$zip" to return array/df indices rather than
> > >logical
> > >
> > >       [[alternative HTML version deleted]]
> > >
> > >______________________________________________
> > >R-help at r-project.org mailing list
> > >https://stat.ethz.ch/mailman/listinfo/r-help
> > >PLEASE do read the posting guide
> > >http://www.R-project.org/posting-guide.html
> > >and provide commented, minimal, self-contained, reproducible code.
> >
> >
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list