[R] Cleaning

Ashta sewashm at gmail.com
Thu Nov 12 02:44:33 CET 2015


Hi Sarah,

I used the following to clean my data, the program crushed several times.


*test <- dat[dat$Var1 == "YYZ" | dat$Var1 =="MSN" ,]*



*What is the difference between these two**test <- dat[dat$Var1
**%in% "YYZ" | dat$Var1** %in% "MSN" ,]*




On Wed, Nov 11, 2015 at 6:38 PM, Sarah Goslee <sarah.goslee at gmail.com>
wrote:

> Please keep replies on the list so others may participate in the
> conversation.
>
> If you have a character vector containing the potential values, you
> might look at %in% for one approach to subsetting your data.
>
> Var1 %in% myvalues
>
> Sarah
>
> On Wed, Nov 11, 2015 at 7:10 PM, Ashta <sewashm at gmail.com> wrote:
> > Thank you Sarah for your prompt response!
> >
> > I have the list of values of the variable Var1 it is around 20.
> > How can I modify this one to include all the 20 valid values?
> >
> > test <- dat[dat$Var1 == "YYZ" | dat$Var1 =="MSN" ,]
> >
> > Is there a way (efficient )  of doing it?
> >
> > Thank you again
> >
> >
> >
> > On Wed, Nov 11, 2015 at 6:02 PM, Sarah Goslee <sarah.goslee at gmail.com>
> > wrote:
> >>
> >> Hi,
> >>
> >> On Wed, Nov 11, 2015 at 6:51 PM, Ashta <sewashm at gmail.com> wrote:
> >> > Hi all,
> >> >
> >> > I have a data frame with  huge rows and columns.
> >> >
> >> > When I looked at the data,  it has several garbage values need to be
> >> >
> >> > cleaned. For a sample I am showing you the frequency distribution
> >> > of one variables
> >> >
> >> >     Var1 Freq
> >> > 1    :    3
> >> > 2    ]    6
> >> > 3    MSN 1040
> >> > 4    YYZ  300
> >> > 5    \\    4
> >> > 6    +     3
> >> > 7.   ?>   15
> >>
> >> Please use dput() to provide your data. I made a guess at what you had
> >> in R, but could be wrong.
> >>
> >>
> >> > and continues.
> >> >
> >> > I want to keep those rows that contain only a valid variable value
> >> >
> >> > In this  case MSN and YYZ. I tried the following
> >> >
> >> > *test <- dat[dat$Var1 == "YYZ" | dat$Var1 =="MSN" ,]*
> >> >
> >> > but I am not getting the desired result.
> >>
> >> What are you getting? How does it differ from the desired result?
> >>
> >> >  I have
> >> >
> >> > Any help or idea?
> >>
> >> I get:
> >>
> >> > dat <- structure(list(X = 1:7, Var1 = c(":", "]", "MSN", "YYZ",
> "\\\\",
> >> + "+", "?>"), Freq = c(3L, 6L, 1040L, 300L, 4L, 3L, 15L)), .Names =
> c("X",
> >> + "Var1", "Freq"), class = "data.frame", row.names = c(NA, -7L))
> >> >
> >> > test <- dat[dat$Var1 == "YYZ" | dat$Var1 =="MSN" ,]
> >> > test
> >>   X Var1 Freq
> >> 3 3  MSN 1040
> >> 4 4  YYZ  300
> >>
> >> Which seems reasonable to me.
> >>
> >>
> >> >
> >> >         [[alternative HTML version deleted]]
> >>
> >> Please don't post in HTML either: it introduces all sorts of errors to
> >> your message.
> >>
> >> Sarah
> >>
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list