[R] about data problem

Martin Maechler maechler at stat.math.ethz.ch
Wed Sep 21 13:01:20 CEST 2016


>>>>> Joe Ceradini <joeceradini at gmail.com>
>>>>>     on Tue, 20 Sep 2016 17:06:17 -0600 writes:

    > read.csv("your_data.csv", stringsAsFactors=FALSE)
    > (I'm just reiterating Jianling said...)

If you do not have very many columns, and want to become more
efficient and knowledgeable,
I strongly recommend alternatively to use the 'colClasses' argument
to read.csv or read.table (they are the same apart from defaults
for arguments!) and set "numeric" for numeric columns.

This has a similar effect to the *combination* of
 1)  stringsAsFactors = FALSE
 2)  foo <- as.numeric(foo) # for respective columns

Martin


    > Joe

    > On Tue, Sep 20, 2016 at 4:56 PM, lily li <chocold12 at gmail.com> wrote:

    >> Is there a function in read.csv that I can use to avoid converting numeric
    >> to factor? Thanks a lot.
    >> 
    >> 
    >> 
    >> On Tue, Sep 20, 2016 at 4:42 PM, lily li <chocold12 at gmail.com> wrote:
    >> 
    >> > Thanks. Then what should I do to solve the problem?
    >> >
    >> > On Tue, Sep 20, 2016 at 4:30 PM, Jeff Newmiller <
    >> jdnewmil at dcn.davis.ca.us>
    >> > wrote:
    >> >
    >> >> I suppose you can do what works for your data, but I wouldn't recommend
    >> >> na.rm=TRUE because it hides problems rather than clarifying them.
    >> >>
    >> >> If in fact your data includes true NA values (the letters NA or simply
    >> >> nothing between the commas are typical ways this information may be
    >> >> indicated), then read.csv will NOT change from integer to factor
    >> >> (particularly if you have specified which markers represent NA using the
    >> >> na.strings argument documented under read.table)... so you probably DO
    >> have
    >> >> unexpected garbage still in your data which could be obscuring valuable
    >> >> information that could affect your conclusions.
    >> >> --
    >> >> Sent from my phone. Please excuse my brevity.
    >> >>
    >> >> On September 20, 2016 3:11:42 PM PDT, lily li <chocold12 at gmail.com>
    >> >> wrote:
    >> >> >I reread the data, and use 'na.rm = T' when reading the data. This time
    >> >> >it
    >> >> >has no such problem. It seems that the existence of NAs convert the
    >> >> >integer
    >> >> >to factor. Thanks for your help.
    >> >> >
    >> >> >
    >> >> >On Tue, Sep 20, 2016 at 4:09 PM, Jianling Fan <fanjianling at gmail.com>
    >> >> >wrote:
    >> >> >
    >> >> >> Add the "stringsAsFactors = F"  when you read the data, and then
    >> >> >> convert them to numeric.
    >> >> >>
    >> >> >> On 20 September 2016 at 16:00, lily li <chocold12 at gmail.com> wrote:
    >> >> >> > Yes, it is stored as factor. I can't check out any problem in the
    >> >> >> original
    >> >> >> > data. Reread data doesn't help either. I use read.csv to read in
    >> >> >the
    >> >> >> data,
    >> >> >> > do you think it is better to use read.table? Thanks again.
    >> >> >> >
    >> >> >> > On Tue, Sep 20, 2016 at 3:55 PM, Greg Snow <538280 at gmail.com>
    >> >> >wrote:
    >> >> >> >
    >> >> >> >> This indicates that your Discharge column has been
    >> >> >stored/converted as
    >> >> >> >> a factor (run str(df) to verify and check other columns).  This
    >> >> >> >> usually happens when functions like read.table are left to try to
    >> >> >> >> figure out what each column is and it finds something in that
    >> >> >column
    >> >> >> >> that cannot be converted to a number (possibly an oh instead of a
    >> >> >> >> zero, an el instead of a one, or just a letter or punctuation mark
    >> >> >> >> accidentally in the file).  You can either find the error in your
    >> >> >> >> original data, fix it, and reread the data, or specify that the
    >> >> >column
    >> >> >> >> should be numeric using the colClasses argument to read.table or
    >> >> >other
    >> >> >> >> function.
    >> >> >> >>
    >> >> >> >>
    >> >> >> >>
    >> >> >> >> On Tue, Sep 20, 2016 at 3:46 PM, lily li <chocold12 at gmail.com>
    >> >> >wrote:
    >> >> >> >> > Hi R users,
    >> >> >> >> >
    >> >> >> >> > I have a problem in reading data.
    >> >> >> >> > For example, part of my dataframe is like this:
    >> >> >> >> >
    >> >> >> >> > df
    >> >> >> >> > month day year          Discharge
    >> >> >> >> >    3        1   2010                6.4
    >> >> >> >> >    3        2   2010               7.58
    >> >> >> >> >    3        3   2010               6.82
    >> >> >> >> >    3        4   2010               8.63
    >> >> >> >> >    3        5   2010               8.16
    >> >> >> >> >    3        6   2010               7.58
    >> >> >> >> >
    >> >> >> >> > Then if I type summary(df), why it converts the discharge data
    >> >> >to
    >> >> >> >> levels? I
    >> >> >> >> > also met the same problem when reading some other csv files. How
    >> >> >to
    >> >> >> solve
    >> >> >> >> > this problem? Thanks.
    >> >> >> >> >
    >> >> >> >> > Discharge
    >> >> >> >> > 7.58     :2
    >> >> >> >> > 6.4       :1
    >> >> >> >> > 6.82     :1
    >> >> >> >> > 8.63     :1
    >> >> >> >> > 8.16     :1
    >> >> >> >> >
    >> >> >> >> >         [[alternative HTML version deleted]]
    >> >> >> >> >
    >> >> >> >> > ______________________________________________
    >> >> >> >> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
    >> >> >see
    >> >> >> >> > https://stat.ethz.ch/mailman/listinfo/r-help
    >> >> >> >> > PLEASE do read the posting guide http://www.R-project.org/
    >> >> >> >> posting-guide.html
    >> >> >> >> > and provide commented, minimal, self-contained, reproducible
    >> >> >code.
    >> >> >> >>
    >> >> >> >>
    >> >> >> >>
    >> >> >> >> --
    >> >> >> >> Gregory (Greg) L. Snow Ph.D.
    >> >> >> >> 538280 at gmail.com
    >> >> >> >>
    >> >> >> >
    >> >> >> >         [[alternative HTML version deleted]]
    >> >> >> >
    >> >> >> > ______________________________________________
    >> >> >> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
    >> >> >> > https://stat.ethz.ch/mailman/listinfo/r-help
    >> >> >> > PLEASE do read the posting guide http://www.R-project.org/
    >> >> >> posting-guide.html
    >> >> >> > and provide commented, minimal, self-contained, reproducible code.
    >> >> >>
    >> >> >>
    >> >> >>
    >> >> >> --
    >> >> >> Jianling Fan
    >> >> >> 樊建凌
    >> >> >>
    >> >> >
    >> >> >       [[alternative HTML version deleted]]
    >> >> >
    >> >> >______________________________________________
    >> >> >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
    >> >> >https://stat.ethz.ch/mailman/listinfo/r-help
    >> >> >PLEASE do read the posting guide
    >> >> >http://www.R-project.org/posting-guide.html
    >> >> >and provide commented, minimal, self-contained, reproducible code.
    >> >>
    >> >>
    >> >
    >> 
    >> [[alternative HTML version deleted]]
    >> 
    >> ______________________________________________
    >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
    >> https://stat.ethz.ch/mailman/listinfo/r-help
    >> PLEASE do read the posting guide http://www.R-project.org/
    >> posting-guide.html
    >> and provide commented, minimal, self-contained, reproducible code.




    > -- 
    > Cooperative Fish and Wildlife Research Unit
    > Zoology and Physiology Dept.
    > University of Wyoming
    > JoeCeradini at gmail.com / 914.707.8506
    > wyocoopunit.org

    > [[alternative HTML version deleted]]

    > ______________________________________________
    > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
    > https://stat.ethz.ch/mailman/listinfo/r-help
    > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
    > and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list