[R] Different behaviour of data()

Jan_Svatos@eurotel.cz Jan_Svatos at eurotel.cz
Thu Jan 3 13:46:18 CET 2002


Thanks to Prof. Ripley for quick and useful answer.
Yes, I will either transfrorm the data-acquiring tool to get the columns as
numbers, not character,
or read them as character, and then manage them with as.factor().

Jan


- - - Original message: - - -
From: Prof Brian Ripley <ripley at stats.ox.ac.uk>
Send: 1/3/02 1:20:11 PM
To: <Jan_Svatos at eurotel.cz> <r-help at stat.math.ethz.ch>
Subject: Re: [R] Different behaviour of data()

This is nothing to do with data().  data uses read.table to read .csv
files, and that *is* in its help file!

Also, these fields are not numeric nor integers but strings, so you can't
expect the standard methods to make sense of them. What `Writing R
Extensions' recommends you should do is to read them in once, correctly,
them dump them as .rda files. *Then* data() will work as you expected.
If you use compression the files might be much smaller, too.

I'm not clear why type.convert is not objecting to overflowing integers,
but that will depend on the implementation of strtol on your platform. We
might manage to improve it.  But in any case I think you ought to read
these fields as character.


On Thu, 3 Jan 2002 Jan_Svatos at eurotel.cz wrote:

> Dear List,
>
> I frequently use the
>
> data()
>
> function to load csv files (with separator ";") into R session,
> typically
>
> data(myfile)
>
> loads myfile.csv from my working/data directory into R.
> Now, in 1.4.0 version, everything works as expected, but with one
> difference:
> The values readed in older versions in "num" mode are now readed as "int"
> mode,
> converting the values larger than 2147483647 (2^{31}-1) into that value.
>
> This has a consequence when reading such kind of data:
>
> <example>
>
> File
> alerts.csv
> looks like:
>
> "IMSI";"DialedDigits";"Cnt";"Pri";"Dur"
> "230020100010125";"+28491628975809";3;332;2391
> "230020100010125";"+28491723744868";1;12;75
> etc...
> with first row being the colnames of resulting dataframe.
>
> <R-1.3.1>
> In 1.3.1 session:
> >data(alerts); str(alerts$IMSI)
> gives
>
> num [1:2793] 2.3e+14 2.3e+14 2.3e+14 2.3e+14 2.3e+14 ...
>
> >str(as.character(alerts$IMSI))
> gives
> chr [1:2793] "230020100010125" "230020100010125" "230020100010125" ...
>
> and
> >n<-length(unique(alerts$IMSI)); n
> gives 125, (i.e. reads the data as they are)
>
> </R-1.3.1>
>
> <R-1.4.0>
>
> while the same on 1.4.0 gives
>
> int [1:2793] 2147483647  2147483647 2147483647 ...
>
> and
> >n<-length(unique(alerts$IMSI)); n
> gives 1. (i.e. reflects the conversion of the data in int mode, which
> destroys the info about
> IMSI numbers, which are always 15 digit numbers)
>
> </R-1.4.0>
> </example>
>
> I was unable to find in http://cran.r-project.org/src/base/NEWS
> some comment to this new behaviour of data().
> What I found was:
>
> ---
> read.table() has new arguments `nrows' and `colClasses'.  If the
>            latter is NA (the default), conversion is attempted to
>            logical, integer, numeric or complex, not just to numeric
> ---
>
> Should I use read.table() with colClasses specified (instead of data())?
>
> Why not, but this involves lots of "hand-made" changes to my R-scripts,
> which is unpleasant and involves risk of some typos and so on.
>
> Is there some more "systematic" way to solve this problem?
>
> >version
>
> platform i386-pc-mingw32
> arch     x86
> os       Win32
> system   x86, Win32
> status
> major    1
> minor    4.0
> year     2001
> month    12
> day      19
> language R
>
> Thanks In Advance,
> Jan
>
> -------------------------------------------------
> designed for _monospaced_ font
> -------------------------------------------------
> /- Jan Svatos,  PhD         Sokolovska 855/225 -/
> /- Data Analyst,            Prague 9           -/
> /- Eurotel Praha            190 00             -/
> /- jan_svatos at eurotel.cz    Czechia            -/
> -------------------------------------------------
>
>
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.
-.-.-
> r-help mailing list -- Read
http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
> Send "info", "help", or "[un]subscribe"
> (in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
>
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._.
_._._
>

--
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595


-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list