[R] dplyr's arrange function

David Winsemius dwinsemius at comcast.net
Thu Jun 16 01:16:11 CEST 2016


> On Jun 15, 2016, at 2:08 PM, Muhuri, Pradip (AHRQ/CFACT) <Pradip.Muhuri at ahrq.hhs.gov> wrote:
> 
> Hello,
> 
> I am using the dplyr's arrange() function to sort  one of the  many data frames  on a character variable (named "prevalence").
> 
> Issue: I am not getting the desired output  (line 7 is the problem, which should be the very last line in the sorted data frame) because the sorted field is character, not numeric. 
> 
> The reproducible example and the output are appended below. 
> 
> Is there any work-around  to convert/treat  this character variable (named "prevalence" in the data frame below)  as numeric before using the arrange() function within the dplyr package?
> 
> Any hints will be appreciated.
> 
> Thanks,
> 
> Pradip Muhuri
> 
> # Reproducible Example 
> 
> library("readr")
> testdata <- read_csv(
> "indicator,  prevalence
> 1. Health check-up, 77.2 (1.19)
> 2. Blood cholesterol checked,  84.5 (1.14)
> 3. Recieved flu vaccine, 50.0 (1.33)
> 4. Blood pressure checked, 88.7 (0.88)
> 5. Aspirin use-problems, 11.7 (1.02)
> 6.Colonoscopy, 60.2 (1.41)
> 7. Sigmoidoscopy,  6.1 (0.61)
> 8. Blood stool test, 14.6 (1.00)
> 9.Mammogram,  72.6 (1.82)
> 10. Pap Smear test, 73.3 (2.37)")
> 
> # Sort on the character variable in descending order
> arrange(testdata, desc(prevalence))
> 
> # Results from Console
> 
>                      indicator  prevalence
>                          (chr)       (chr)
> 1     4. Blood pressure checked 88.7 (0.88)
> 2  2. Blood cholesterol checked 84.5 (1.14)
> 3            1. Health check-up 77.2 (1.19)
> 4            10. Pap Smear test 73.3 (2.37)
> 5                   9.Mammogram 72.6 (1.82)
> 6                 6.Colonoscopy 60.2 (1.41)
> 7              7. Sigmoidoscopy  6.1 (0.61)
> 8       3. Recieved flu vaccine 50.0 (1.33)
> 9           8. Blood stool test 14.6 (1.00)
> 10      5. Aspirin use-problems 11.7 (1.02)

Despite the fact that the prevalence columns is not really the  mixed numeric/alpha , it still can be sorted quite easily with the very handy gtools::mixedorder function:

> > require(gtools)
> Loading required package: gtools
> > testdata[ mixedorder(testdata$prevalence), ]
>                       indicator  prevalence
> 7              7. Sigmoidoscopy  6.1 (0.61)
> 5       5. Aspirin use-problems 11.7 (1.02)
> 8           8. Blood stool test 14.6 (1.00)
> 3       3. Recieved flu vaccine 50.0 (1.33)
> 6                 6.Colonoscopy 60.2 (1.41)
> 9                   9.Mammogram 72.6 (1.82)
> 10           10. Pap Smear test 73.3 (2.37)
> 1            1. Health check-up 77.2 (1.19)
> 2  2. Blood cholesterol checked 84.5 (1.14)
> 4     4. Blood pressure checked 88.7 (0.88)

The mixedorder function splits the strings at the space boundaries and tests for numeric or alpha.

> 
> 
> Pradip K. Muhuri,  AHRQ/CFACT
> 5600 Fishers Lane # 7N142A, Rockville, MD 20857
> Tel: 301-427-1564
> 

-- 

David Winsemius
Alameda, CA, USA



More information about the R-help mailing list