[R] < symbols in a data frame

MacQueen, Don macqueen1 at llnl.gov
Thu Jul 10 01:17:56 CEST 2014


After reading the metals data frame, I would do this:

metals$result <- as.numeric(gsub('<','',metals$Cedar.Creek))
metals$flag <- ifelse(grepl('<',metals$Cedar.Creek),'<','h')

Also, assuming you got your data into R using read.table(),
read.csv(), or similar, I would include
   stringsAsFactors=TRUE

as another argument to the function call. You don't need factors at this
point.

-Don
-- 
Don MacQueen

Lawrence Livermore National Laboratory
7000 East Ave., L-627
Livermore, CA 94550
925-423-1062





On 7/9/14 11:02 AM, "Sam Albers" <tonightsthenight at gmail.com> wrote:

>Thanks for all the responses. It sometimes difficult to outline
>exactly what you need. These response were helpful to get there.
>Speaking to Bert's point a bit, I needed a column to identify where
>the < symbol was used. If I knew more about R I think I might be
>embarrassed to post my solution to that problem but here is how I used
>Sarah's solution but still kept the info about detection limits. I'm
>sure there is a more elegant way:
>
>metals <-
>structure(list(Parameter = structure(c(1L, 2L, 3L, 4L, 6L, 7L,
>8L, 9L, 10L, 11L, 12L, 13L, 15L, 16L, 17L, 18L, 19L, 20L, 1L), .Label
>= c("Antimony",
>"Arsenic", "Barium", "Beryllium", "Boron (Hot Water Soluble)",
>"Cadmium", "Chromium", "Cobalt", "Copper", "Lead", "Mercury",
>"Molybdenum", "Nickel", "pH 1:2", "Selenium", "Silver", "Thallium",
>"Tin", "Vanadium", "Zinc"), class = "factor"), Cedar.Creek =
>structure(c(3L,
>3L, 7L, 3L, 2L, 4L, 3L, 34L, 36L, 2L, 5L, 7L, 3L, 7L, 3L, 45L,
>4L, 4L, 3L), .Label = c("<1", "<10", "<100", "<1000", "<200",
>"<5", "<500", "0.1", "0.13", "0.5", "0.8", "1.07", "1.1", "1.4",
>"1.5", "137", "154", "163", "165", "169", "178", "2.3", "2.4",
>"22", "24", "244", "27.2", "274", "3", "3.1", "40.2", "43", "50",
>"516", "53.3", "550", "569", "65", "66.1", "68", "7.6", "72",
>"77", "89", "951"), class = "factor")), .Names = c("Parameter",
>"Cedar.Creek"), row.names = c(NA, 19L), class = "data.frame")
>
>
>
>metals$temp1<-metals$Cedar.Creek
>metals$Cedar.Creek <- as.character(metals$Cedar.Creek)
>metals$Cedar.Creek <- gsub("<", "", metals$Cedar.Creek)
>metals$Cedar.Creek <- as.numeric(metals$Cedar.Creek)
>
>metals$temp2<-metals$temp1==metals$Cedar.Creek
>metals$Detection<-factor(ifelse(metals$temp2=="TRUE","Measured","Limit"))
>metals[,c(1,2,5)]
>
>
>Thanks again!
>
>Sam
>
>On Wed, Jul 9, 2014 at 10:41 AM, Bert Gunter <gunter.berton at gene.com>
>wrote:
>> Well, ?grep and ?regex are clearly apropos here -- dealing with
>> character data is an essential skill for handling input from diverse
>> sources with various formatting conventions. I suggest you go through
>> one of the many regular expression tutorials on the web to learn more.
>>
>> But this may not be the important issue here at all. If "<k" means the
>> value is left censored at k -- i.e. we know it's less than k but not
>> how much less -- than Sarah's proposal is not what you want to do.
>> Exactly what you do want to do depends on context, and as it concerns
>> statistical methodology, is not something that should be discussed
>> here. Consult a local statistician if this is a correct guess.
>> Otherwise ignore.
>>
>> ... and please post in plain text in future (as requested) as HTML can
>> get garbled.
>>
>> Bert Gunter
>> Genentech Nonclinical Biostatistics
>> (650) 467-7374
>>
>> "Data is not information. Information is not knowledge. And knowledge
>> is certainly not wisdom."
>> Clifford Stoll
>>
>>
>>
>>
>> On Wed, Jul 9, 2014 at 10:26 AM, Sarah Goslee <sarah.goslee at gmail.com>
>>wrote:
>>> Hi Sam,
>>>
>>> I'd take the similar tack of removing the < instead. Note that if you
>>> import the data frame using the stringsAsFactors=FALSE argument, you
>>> don't need the first step.
>>>
>>> metals$Cedar.Creek <- as.character(metals$Cedar.Creek)
>>> metals$Cedar.Creek <- gsub("<", "", metals$Cedar.Creek)
>>> metals$Cedar.Creek <- as.numeric(metals$Cedar.Creek)
>>>
>>> R> str(metals)
>>> 'data.frame':    19 obs. of  2 variables:
>>>  $ Parameter  : Factor w/ 20 levels "Antimony","Arsenic",..: 1 2 3 4 6
>>> 7 8 9 10 11 ...
>>>  $ Cedar.Creek: num  100 100 500 100 10 1000 100 516 550 10 ...
>>>
>>> Sarah
>>>
>>>
>>> On Wed, Jul 9, 2014 at 1:19 PM, Sam Albers
>>><tonightsthenight at gmail.com> wrote:
>>>> Hello,
>>>>
>>>> I have recently received a dataset from a metal analysis company. The
>>>> dataset is filled with less than symbols. What I am looking for is a
>>>> efficient way to subset for any whole numbers from the dataset. The
>>>>column
>>>> is automatically formatted as a factor because of the "<" symbols
>>>>making it
>>>> difficult to deal with the numbers is a useful way.
>>>>
>>>> So in sum any ideas on how I could subset the example below for only
>>>>whole
>>>> numbers?
>>>>
>>>> Thanks in advance!
>>>>
>>>> Sam
>>>>
>>>> #code
>>>>
>>>> metals <-
>>>>
>>>>
>>>> structure(list(Parameter = structure(c(1L, 2L, 3L, 4L, 6L, 7L,
>>>> 8L, 9L, 10L, 11L, 12L, 13L, 15L, 16L, 17L, 18L, 19L, 20L, 1L), .Label
>>>> = c("Antimony",
>>>> "Arsenic", "Barium", "Beryllium", "Boron (Hot Water Soluble)",
>>>> "Cadmium", "Chromium", "Cobalt", "Copper", "Lead", "Mercury",
>>>> "Molybdenum", "Nickel", "pH 1:2", "Selenium", "Silver", "Thallium",
>>>> "Tin", "Vanadium", "Zinc"), class = "factor"), Cedar.Creek =
>>>>structure(c(3L,
>>>> 3L, 7L, 3L, 2L, 4L, 3L, 34L, 36L, 2L, 5L, 7L, 3L, 7L, 3L, 45L,
>>>> 4L, 4L, 3L), .Label = c("<1", "<10", "<100", "<1000", "<200",
>>>> "<5", "<500", "0.1", "0.13", "0.5", "0.8", "1.07", "1.1", "1.4",
>>>> "1.5", "137", "154", "163", "165", "169", "178", "2.3", "2.4",
>>>> "22", "24", "244", "27.2", "274", "3", "3.1", "40.2", "43", "50",
>>>> "516", "53.3", "550", "569", "65", "66.1", "68", "7.6", "72",
>>>> "77", "89", "951"), class = "factor")), .Names = c("Parameter",
>>>> "Cedar.Creek"), row.names = c(NA, 19L), class = "data.frame")
>>>>
>>>
>>> --
>>> Sarah Goslee
>>> http://www.functionaldiversity.org
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>>http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>
>______________________________________________
>R-help at r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list