[R] textConnection appears to be slow

james.holtman@convergys.com james.holtman at convergys.com
Fri Jun 21 11:50:52 CEST 2002


I was trying to read in a file and delete lines that did not have the
correct
number of fields on them.  I was reading the file as one character vector
per line
using 'scan' with sep='\n'.  I was then using 'count.fields' with
'textConnection' to the object I just read in.

I thought at first the system was locked up, but further testing showed
that the
'textConnection' was a very slow way to read in data to 'count.fields' as
compared to
'count.fields' just reading the file.

Is this a characteristic of using 'textConnection' on large objects?

==============================================================

> unix.time(x.1 <- scan('iostat.zigzag.020620', what='', sep='\n'))
Read 117163 items
[1] 4.00 0.07 4.08   NA   NA
> str(x.1)
 chr [1:117163] "000035 atf233       0.0    0.8    0.0    5.9  0.0  0.0
9.3   0   0 " ...
#
# count.fields just reading the file directly; this appears to work fine
(<4 seconds)
#
> unix.time(x.2 <- count.fields('iostat.zigzag.020620'))
[1] 3.35 0.04 3.39   NA   NA
> str(x.2)
 int [1:117163] 11 11 11 11 11 11 11 11 11 11 ...
> sum(x.2 != 11)    # determine number of 'bad' records
[1] 3
#
# processing times get longer with larger objects
#
> unix.time(x.3 <- count.fields(textConnection(x.1[1:3000])))
[1] 0.94 0.00 0.94   NA   NA
> unix.time(x.3 <- count.fields(textConnection(x.1[1:7000])))
[1] 13.61  0.02 13.64    NA    NA
> unix.time(x.3 <- count.fields(textConnection(x.1[1:10000])))
[1] 31.61  0.00 31.75    NA    NA
>


platform "i386-pc-mingw32"
arch     "i386"
os       "mingw32"
system   "i386, mingw32"
status   ""
major    "1"
minor    "5.1"
year     "2002"
month    "06"
day      "17"
language "R"

--

NOTICE:  The information contained in this electronic mail transmission is
intended by Convergys Corporation for the use of the named individual or
entity to which it is directed and may contain information that is
privileged or otherwise confidential.  If you have received this electronic
mail transmission in error, please delete it from your system without
copying or forwarding it, and notify the sender of the error by reply email
or by telephone (collect), so that the sender's address records can be
corrected.


-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list