[R] Cannot get "==" operator to return TRUE

G See gsee000 at gmail.com
Fri Feb 3 17:09:11 CET 2012


Hi Sarah,

Thank you very much for the response.

In fact, it does work on Mac even without including the space:

> Symbol <- "GOOG"
>     require(XML)
Loading required package: XML
> URL <- paste("http://earnings.com/company.asp?client=cb&ticker=", Symbol, sep="")
> x <- readHTMLTable(URL, stringsAsFactors=FALSE)
> table.loc <- tail(grep("Earnings Releases", x), 1) + 1
> if (identical(numeric(0), table.loc)) return(NULL)
> rdata <- x[[table.loc]]
> header <- rdata[1, ]
> rdata <- rdata[-1, ]
> colnames(rdata) <- header
> #format ticker column
> rdata[, 1] <- gsub("\r\n\t\t\t", "", rdata[, 1])
> rdata <- na.omit(rdata)
>
> any(is.na(rdata))
[1] FALSE
>     rdata[rdata == "n/a"] <- NA
> any(is.na(rdata))
[1] TRUE

Garrett

On Fri, Feb 3, 2012 at 9:57 AM, Sarah Goslee <sarah.goslee at gmail.com> wrote:
> Is that exactly what you're doing, in a clean session?
>
> x <- rdata[27, 4]
>
>> x == "n/a "
> [1] TRUE
>> x == "n/a"
> [1] FALSE
>
> Because as long as the space is included, the test should be TRUE.
>
> (I renamed the dput object rdata, because df() is a base function.)
>
> df[df == "n/a"] <- NA
> shouldn't work on Mac, or any other system, because no elements of
> your data frame are "n/a", but are instead "n/a "
>
> If it were my data, I'd get rid of the spaces at the end of the values before
> trying to do anything, either before reading it into R, or with gsub() after.
>
> Sarah
>
> On Fri, Feb 3, 2012 at 10:25 AM, G See <gsee000 at gmail.com> wrote:
>> I have a data.frame named "df". The dput of df is at the bottom of this e-mail.
>> What I'd like to do is replace the "n/a " values with NA.  On Mac OSX, it works
>> to do this:
>> df[df == "n/a"] <- NA
>>
>> However, it does not work on Ubuntu.  See below.
>>
>> Thanks in advance,
>> Garrett
>>
>>> x <- df[27, 4] # complete data.frame dput is below
>>> dput(x)
>> "n/a "
>>> x == "n/a "
>> [1] FALSE
>>> x == "n/a"
>> [1] FALSE
>>> str(x)
>>  chr "n/a "
>>> is.na(x)
>> [1] FALSE
>>> grep("n/a ", x)
>> integer(0)
>>> grep("n/a", x)
>> [1] 1
>>
>>
>>> sessionInfo()
>> R version 2.14.1 (2011-12-22)
>> Platform: x86_64-pc-linux-gnu (64-bit)
>>
>> locale:
>>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>>  [7] LC_PAPER=C                 LC_NAME=C
>>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>
>> other attached packages:
>> [1] XML_3.4-3                  qmao_1.1.10
>> [3] FinancialInstrument_0.10.9 quantmod_0.3-17
>> [5] TTR_0.21-0                 Defaults_1.1-1
>> [7] xts_0.8-3                  zoo_1.7-6
>>
>> loaded via a namespace (and not attached):
>> [1] grid_2.14.1    lattice_0.20-0 tools_2.14.1
>>>
>>
>>
>> ### More detail ###
>> ## Here is the complete data.frame
>>> dput(df)
>> structure(list(SYMBOL = c("GOOG ", "GOOG ", "GOOG ", "GOOG ",
>> "GOOG ", "GOOG ", "GOOG ", "GOOG ", "GOOG ", "GOOG ", "GOOG ",
>> "GOOG ", "GOOG ", "GOOG ", "GOOG ", "GOOG ", "GOOG ", "GOOG ",
>> "GOOG ", "GOOG ", "GOOG ", "GOOG ", "GOOG ", "GOOG ", "GOOG ",
>> "GOOG ", "GOOG ", "GOOG ", "GOOG ", "GOOG "), PERIOD = c("Q4 2011",
>> "Q3 2011", "Q2 2011", "Q1 2011", "Q4 2010", "Q3 2010", "Q2 2010",
>> "Q1 2010", "Q4 2009", "Q3 2009", "Q2 2009", "Q1 2009", "Q4 2008",
>> "Q3 2008", "Q2 2008", "Q1 2008", "Q4 2007", "Q3 2007", "Q2 2007",
>> "Q1 2007", "Q4 2006", "Q3 2006", "Q2 2006", "Q1 2006", "Q4 2005",
>> "Q3 2005", "Q2 2005", "Q1 2005", "Q4 2004", "Q3 2004"),
>>    `EVENT TITLE` = c("Q4 2011 Google Earnings Release", "Q3 2011
>> Google Inc Earnings Release",
>>    "Q2 2011 Google Inc Earnings Release", "Q1 2011 Google Inc
>> Earnings Release",
>>    "Q4 2010 Google Earnings Release", "Q3 2010 Google Earnings Release",
>>    "Q2 2010 Google Earnings Release", "Q1 2010 Google Earnings Release",
>>    "Q4 2009 Google Earnings Release", "Q3 2009 Google Earnings Release",
>>    "Q2 2009 Google Earnings Release", "Q1 2009 Google Earnings Release",
>>    "Q4 2008 Google Earnings Release", "Q3 2008 Google Earnings Release",
>>    "Q2 2008 Google Earnings Release", "Q1 2008 Google Earnings Release",
>>    "Q4 2007 Google Earnings Release", "Q3 2007 Google Earnings Release",
>>    "Q2 2007 Google Earnings Release", "Q1 2007 Google Earnings Release",
>>    "Q4 2006 Google Earnings Release", "Q3 2006 Google Earnings Release",
>>    "Q2 2006 Google Earnings Release", "Q1 2006 Google Earnings Release",
>>    "Q4 2005 Google Earnings Release", "Q3 2005 Google Earnings Release",
>>    "Q2 2005 Google Earnings Release", "Q1 2005 Google Earnings Release",
>>    "Q4 2004 Google Earnings Release", "Q3 2004 Google Earnings Release"
>>    ), `EPS ESTIMATE` = c("$ 10.49 ", "$ 8.74 ", "$ 7.85 ",
>>    "$ 8.10 ", "$ 8.09 ", "$ 6.68 ", "$ 6.52 ", "$ 6.60 ",
>>    "$ 6.50 ", "$ 5.42 ", "$ 5.09 ", "$ 4.93 ", "$ 4.95 ",
>>    "$ 4.76 ", "$ 4.74 ", "$ 4.52 ", "$ 4.44 ", "$ 3.78 ",
>>    "$ 3.59 ", "$ 3.30 ", "$ 2.92 ", "$ 2.42 ", "$ 2.22 ",
>>    "$ 1.97 ", "n/a ", "n/a ", "n/a ", "n/a ", "n/a ",
>>    "n/a "), `EPS ACTUAL` = c("$ 9.50 ", "$ 9.72 ", "$ 8.74 ",
>>    "$ 8.08 ", "$ 8.75 ", "$ 7.64 ", "$ 6.45 ", "$ 6.76 ",
>>    "$ 6.79 ", "$ 5.89 ", "$ 5.36 ", "$ 5.16 ", "$ 5.10 ",
>>    "$ 4.92 ", "$ 4.63 ", "$ 4.84 ", "$ 4.43 ", "$ 3.91 ",
>>    "$ 3.56 ", "$ 3.68 ", "$ 3.18 ", "$ 2.62 ", "$ 2.49 ",
>>    "$ 2.29 ", "n/a ", "n/a ", "n/a ", "n/a ", "n/a ",
>>    "n/a "), `PREV. YEAR ACTUAL` = c("$ 8.75 ", "$ 7.64 ",
>>    "$ 6.45 ", "$ 6.76 ", "$ 6.79 ", "$ 5.89 ", "$ 5.36 ",
>>    "$ 5.16 ", "$ 5.10 ", "$ 4.92 ", "$ 4.63 ", "$ 4.84 ",
>>    "$ 4.43 ", "$ 3.91 ", "$ 3.56 ", "$ 3.68 ", "$ 3.18 ",
>>    "$ 2.62 ", "$ 2.49 ", "$ 2.29 ", "n/a ", "n/a ", "n/a ",
>>    "n/a ", "n/a ", "n/a ", "n/a ", "n/a ", "n/a ", "n/a "
>>    ), TIME = c("2012-01-19 15:15:00 CST", "2011-10-13 15:15:00 CDT",
>>    "2011-07-14 15:15:00 CDT", "2011-04-14 15:15:00 CDT", "2011-01-20
>> 15:15:00 CST",
>>    "2010-10-14 15:15:00 CDT", "2010-07-15 15:15:00 CDT", "2010-04-15
>> 15:15:00 CDT",
>>    "2010-01-21 15:15:00 CST", "2009-10-15 15:15:00 CDT", "2009-07-16
>> 15:15:00 CDT",
>>    "2009-04-16 15:15:00 CDT", "2009-01-22 15:15:00 CST", "2008-10-16
>> 15:15:00 CDT",
>>    "2008-07-17 15:15:00 CDT", "2008-04-17 15:15:00 CDT", "2008-01-31
>> 15:15:00 CST",
>>    "2007-10-18 15:15:00 CDT", "2007-07-19 15:15:00 CDT", "2007-04-19
>> 15:15:00 CDT",
>>    "2007-01-31 15:15:00 CST", "2006-10-19 15:15:00 CDT", "2006-07-20
>> 15:15:00 CDT",
>>    "2006-04-20 15:15:00 CDT", "2006-01-31 15:15:00 CST", "2005-10-20
>> 15:15:00 CDT",
>>    "2005-07-21 15:15:00 CDT", "2005-04-21 15:15:00 CDT", "2005-02-01
>> 15:15:00 CST",
>>    "2004-10-21 15:15:00 CDT")), .Names = c("SYMBOL", "PERIOD",
>> "EVENT TITLE", "EPS ESTIMATE", "EPS ACTUAL", "PREV. YEAR ACTUAL",
>> "TIME"), row.names = 2:31, na.action = structure(31L, .Names = "32",
>> class = "omit"), class = "data.frame")
>>
>
>
> --
> Sarah Goslee
> http://www.functionaldiversity.org



More information about the R-help mailing list