[R] < symbols in a data frame

Marc Schwartz marc_schwartz at me.com
Wed Jul 9 19:29:42 CEST 2014


On Jul 9, 2014, at 12:19 PM, Sam Albers <tonightsthenight at gmail.com> wrote:

> Hello,
> 
> I have recently received a dataset from a metal analysis company. The
> dataset is filled with less than symbols. What I am looking for is a
> efficient way to subset for any whole numbers from the dataset. The column
> is automatically formatted as a factor because of the "<" symbols making it
> difficult to deal with the numbers is a useful way.
> 
> So in sum any ideas on how I could subset the example below for only whole
> numbers?
> 
> Thanks in advance!
> 
> Sam
> 
> #code
> 
> metals <-
> 
> 
> structure(list(Parameter = structure(c(1L, 2L, 3L, 4L, 6L, 7L,
> 8L, 9L, 10L, 11L, 12L, 13L, 15L, 16L, 17L, 18L, 19L, 20L, 1L), .Label
> = c("Antimony",
> "Arsenic", "Barium", "Beryllium", "Boron (Hot Water Soluble)",
> "Cadmium", "Chromium", "Cobalt", "Copper", "Lead", "Mercury",
> "Molybdenum", "Nickel", "pH 1:2", "Selenium", "Silver", "Thallium",
> "Tin", "Vanadium", "Zinc"), class = "factor"), Cedar.Creek = structure(c(3L,
> 3L, 7L, 3L, 2L, 4L, 3L, 34L, 36L, 2L, 5L, 7L, 3L, 7L, 3L, 45L,
> 4L, 4L, 3L), .Label = c("<1", "<10", "<100", "<1000", "<200",
> "<5", "<500", "0.1", "0.13", "0.5", "0.8", "1.07", "1.1", "1.4",
> "1.5", "137", "154", "163", "165", "169", "178", "2.3", "2.4",
> "22", "24", "244", "27.2", "274", "3", "3.1", "40.2", "43", "50",
> "516", "53.3", "550", "569", "65", "66.1", "68", "7.6", "72",
> "77", "89", "951"), class = "factor")), .Names = c("Parameter",
> "Cedar.Creek"), row.names = c(NA, 19L), class = "data.frame")


Sam,

You can use ?gsub to remove the '<' characters from the column and then use ?subset to select the records you wish.

Note that gsub() returns a character vector, so you want to coerce to numeric.

> as.numeric(gsub("<", "", metals$Cedar.Creek))
 [1]  100  100  500  100   10 1000  100  516  550   10  200  500  100
[14]  500  100  951 1000 1000  100


For example:

> subset(metals, as.numeric(gsub("<", "", Cedar.Creek)) == 100)
   Parameter Cedar.Creek
1   Antimony        <100
2    Arsenic        <100
4  Beryllium        <100
7     Cobalt        <100
13  Selenium        <100
15  Thallium        <100
19  Antimony        <100


> subset(metals, as.numeric(gsub("<", "", Cedar.Creek)) <= 500)
    Parameter Cedar.Creek
1    Antimony        <100
2     Arsenic        <100
3      Barium        <500
4   Beryllium        <100
5     Cadmium         <10
7      Cobalt        <100
10    Mercury         <10
11 Molybdenum        <200
12     Nickel        <500
13   Selenium        <100
14     Silver        <500
15   Thallium        <100
19   Antimony        <100


You can also just create a new column that is numeric and go from there:

metals$CC.Num <- as.numeric(gsub("<", "", metals$Cedar.Creek))

> str(metals)
'data.frame':	19 obs. of  3 variables:
 $ Parameter  : Factor w/ 20 levels "Antimony","Arsenic",..: 1 2 3 4 6 7 8 9 10 11 ...
 $ Cedar.Creek: Factor w/ 45 levels "<1","<10","<100",..: 3 3 7 3 2 4 3 34 36 2 ...
 $ CC.Num     : num  100 100 500 100 10 1000 100 516 550 10 ...


> metals
    Parameter Cedar.Creek CC.Num
1    Antimony        <100    100
2     Arsenic        <100    100
3      Barium        <500    500
4   Beryllium        <100    100
5     Cadmium         <10     10
6    Chromium       <1000   1000
7      Cobalt        <100    100
8      Copper         516    516
9        Lead         550    550
10    Mercury         <10     10
11 Molybdenum        <200    200
12     Nickel        <500    500
13   Selenium        <100    100
14     Silver        <500    500
15   Thallium        <100    100
16        Tin         951    951
17   Vanadium       <1000   1000
18       Zinc       <1000   1000
19   Antimony        <100    100



Regards,

Marc Schwartz



More information about the R-help mailing list