[R] using match-type function to return correctly ordered data from a dataframe

Jeff Newmiller jdnewmil at dcn.davis.ca.us
Sat Oct 27 08:00:24 CEST 2012


Have you actually read

?"%in%"

?

Although a valuable tool, not all answers are most effectively obtained by Googling.

Also, your repeated assertions that the answers are not maintained in order are poorly framed. They DO stay in order according to the zipcode database order. That said, your desire for numeric indexes is only as far away as your help file.
---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
--------------------------------------------------------------------------- 
Sent from my phone. Please excuse my brevity.

Markus Weisner <r at themarkus.com> wrote:

>I am regularly running into a problem where I can't seem to figure out
>how
>maintain correct data order when selecting data out of a dataframe. 
>The
>below code shows an example of trying to pull data from a dataframe
>using
>ordered zip codes.  My problem is returning the pulled data in the
>correct
>order.  This is a very simple example, but it illustrates a regular
>problem
>that I am running into.
>
>In the past, I have used fairly complicated solutions to pull this off.
>There has got to be a more simple and straightforward method ...
>probably
>some function that I missed in all my googling.
>
>Thanks in advance for anybody's help figuring this out.
>~Markus
>
>
>### Function Definitions ###
>
># FUNCTION #1 (returns wrong order)
>getLatitude1 = function(myzips) {
>
>  # load libraries and data
>  library(zipcode)
>  data(zipcode)
>
>  # get latitude values
> mylats = zipcode[zipcode$zip %in% myzips, "latitude"] #problem is that
>this code does not maintain order
>
>  # return data
>  return(mylats)
>}
>
># FUNCTION #2 (also returns wrong order)
>getLatitude2 = function(myzips) {
>
>  # load libraries and data
>  library(zipcode)
>  data(zipcode)
>
>  # convert myzips to DF
>  myzips = as.data.frame(as.character(myzips))
>
>  # merge in zipcode data based on zip
>  results = merge(myzips, zipcode[,c("zip", "latitude")], by.x =
>"as.character(myzips)", by.y="zip", all.x=TRUE)
>
>  # return data
>  return(results$latitude)
>}
>
>
>### Code ###
>
># specify a set of zip codes
>myzips = c("74432", "72537", "06026", "01085", "65793")
>
># create a DF
>myzips.df = data.frame(zip=myzips, latitude=NA, longitude=NA)
>
># look at data to determine what should be returned and in what order
>library(zipcode)
>data(zipcode)
>zipcode[zipcode$zip %in% myzips,]
>
># test function #1 (function definition below)
>myzips.df$latitude = getLatitude1(myzips.df$zip) #returns wrong order
>
># test function #2 (function definition below)
>myzips.df$latitude = getLatitude2(myzips.df$zip) #also returns wrong
>order
>
>
>
># need "myzips %in% zipcode$zip" to return array/df indices rather than
>logical
>
>	[[alternative HTML version deleted]]
>
>______________________________________________
>R-help at r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list