[R] Need fresh eyes to see what I'm missing

Avi Gross @v|gro@@ @end|ng |rom ver|zon@net
Tue Sep 14 20:45:29 CEST 2021


Rich,

You have helped us understand and at this point, suppose we now are sure
about the way missing info is supplied. What you show is not the same as the
CSV sample earlier but assuming you know that "Eqp" is the one and only way
they signaled bad data.

One choice is to fix the original data before reading into R. Chances are
placing exactly NA in those places, perhaps using a global substitute of
some sort, might do it.

But as Bert noted, R is a very powerful environment and you can use it.

One argument you can use with read.csv() is to tell it "Eqp" is to be
treated as an NA. The substitution may then be made as it is read in AND you
might then notice it is properly read in as a column of doubles.

Suppose you read in this data and make sure the column involved is read as
character strings, instead. You can use any number of tools in base R or
dplyr to replace Eqp with NA such as in a pipeline ... %>%
mutate(fps=ifelse(fps=="Eqp", NA, fps)) %>% ...

The above is one of many ways and of course afterward, you may want to
reconvert the character column back to floating point. I note dplyr can do
both in the same function as it applies them in order:

	mutate(fps=ifelse(fps=="Eqp", NA, fps), fps=as.double(fps))

The point is that in many cases, the data must be carefully examined and
cleaned and set up. In some cases, it may also be useful to treat some as
factors as in the hours and minutes. If you continue on your road and hit
ggplot() to make graphs, factors may be useful in various kinds of fine
tuning.

-----Original Message-----
From: R-help <r-help-bounces using r-project.org> On Behalf Of Rich Shepard
Sent: Tuesday, September 14, 2021 1:59 PM
To: r-help using r-project.org
Subject: Re: [R] Need fresh eyes to see what I'm missing

On Tue, 14 Sep 2021, Bert Gunter wrote:

> **Don't do this.*** You will make errors. Use fit-for-purpose tools.
> That's what R is for. Also, be careful **how** you "download", as that 
> already may bake in problems.

Bert,

Haven't had downloading errors saving displayed files.

The problem with the velocities data is shown here:
2020-11-24 11:00	PST	Eqp 
2020-11-24 11:05	PST	Eqp 
2020-11-24 11:10	PST	Eqp 
2020-11-24 11:15	PST	Eqp 
2020-11-24 11:20	PST	Eqp 
2020-11-24 11:25	PST	Eqp 
2020-11-24 11:30	PST	Eqp 
2020-11-24 11:35	PST	Eqp 
2020-11-24 11:40	PST	Eqp 
2020-11-24 11:45	PST	Eqp 
2020-11-24 11:50	PST	Eqp 
2021-01-08 16:26	PST	Eqp

Equipment failure during the period shown.

What's the best way to replace these lines? Just remove them or change them
to NA?

Regards,

Rich

______________________________________________
R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list