[R] Cleaning up messy Excel data

jim holtman jholtman at gmail.com
Tue Feb 28 22:42:16 CET 2012


First of all when reading in the CSV file, use 'as.is = TRUE' to
prevent the changing to factors.

Now that things are character in that column, you can use some pattern
expressions (gsub, regex, ...) to search for and change your data.
E.g.,

sub("<.*", "0", yourCol)

should do it for you.

On Tue, Feb 28, 2012 at 4:27 PM, Noah Silverman <noahsilverman at ucla.edu> wrote:
> Unfortunately, some data I need to work with was delivered in a rather messy Excel file.  I want to import into R and clean up some things so that I can do my analysis.  Pulling in a CSV from Excel is the easy part.
>
> My current challenge is dealing with some text mixed in the values.
> i.e.   118   5.7   <2.0  3.7
>
> Since this column in Excel has a "<2.0" value, then R reads the column as a factor with levels.  Ideally, I want to convert it a normal vector of scalars and code code the "<2.0" as 0.
>
> Can anyone suggest an easy way to do this?
>
> Thanks!
>
>
> --
> Noah Silverman
> UCLA Department of Statistics
> 8117 Math Sciences Building
> Los Angeles, CA 90095
>
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.



More information about the R-help mailing list