Summary: [R] read.table on Mac OS X, CARBON vs. DARWIN

David R. Bickel dbickel at mail.mcg.edu
Sat Feb 23 01:02:36 CET 2002


Adding that line didn't work for me. I get the same problem as before 
(version 1.4.0):

'temp' is a two-line text file with three tab-delimited columns.

UNDER DARWIN:
 > read.table('temp')
               V1   V2   V3
1 AFFX-BioB-5_at -214 -139
2 AFFX-BioB-M_at  -49  -11
 > read.table('temp',as.is=TRUE)
stack imbalance in internal type.convert, 26 then 25stack imbalance in 
Internal, 25 then 24
stack imbalance in if, 19 then 18
stack imbalance in <-, 17 then 16
stack imbalance in {, 15 then 14
stack imbalance in for, 8 then 7
stack imbalance in {, 6 then 5
               V1   V2   V3
1 AFFX-BioB-5_at -214 -139
2 AFFX-BioB-M_at  -49  -11
Error: unprotect(): stack imbalance

UNDER CARBON:
 > read.table('temp')
               V1   V2   V3
1 AFFX-BioB-5_at -214 -139
2 AFFX-BioB-M_at  -49  -11
 > read.table('temp',as.is=TRUE)
               V1   V2   V3
1 AFFX-BioB-5_at -214 -139
2 AFFX-BioB-M_at  -49  -11


On Friday, February 22, 2002, at 09:00 X, Meinhard Ploner wrote:

> Thanks a lot, James!!
> The problem is fixed. On the version 1.4.0 Mac/darwin (the latest 
> available version for this system) the function read.table (which is 
> called from read.delim etc., too) has the bug you explained.
>
> Inserting the row
> 	nlines <- nlines+1
> after
> 	 lines <- c(lines, line)
> removes this bug.
> M.
>
>
> On Friday, February 22, 2002, at 02:33  PM, james.holtman at convergys.com 
> wrote:
>
>>
>> If you can not the the latest 1.4.1, here is a patch (adds one line to
>> read.table) that will fix it on your current system.
>>
>>> The 'read.table' function appears to be up to 10X slower in R 1.4.0 
>>> than
>> R
>>> 1.3.1 for some of the data sets I read in.  I was comparing the source
>> code
>>> for the 2 versions and see that it was rewritten in R 1.4.0.
>>>
>>> I think I found out what part of the problem might be.  I was 
>>> comparing
>>> R1.3.1 and R1.4.0 code and it appears that a statement is missing in 
>>> some
>>> of the code for R 1.4.  This is the section of code at the beginning 
>>> of
>>> read.table.  The loop starting with 'while (nlines < 5)' will read in 
>>> the
>>> entire file, because there is no increment of 'nlines' in the loop.  I
>>> traced through the code  and this is what was happening.  It then 
>>> does a
>>> 'pushBack' of the entire file.  In tracing through the code, this is
>> where
>>> is appears to be taking the time.  With the change noted below, the 
>>> speed
>>> was similar to R 1.3.1 and the results were the same.
>>>
>>> Here is the current code with what I think is the additional statement
>>> needed:
>>>
>>> =================part of read.table========
>>>
>>>     nlines <- 0
>>>     lines <- NULL
>>>     while (nlines < 5) {
>>>         line <- readLines(file, 1, ok = TRUE)
>>>         if (length(line) == 0)
>>>             break
>>>         if (blank.lines.skip && length(grep("^[ \\t]*$", line)))
>>>             next
>>>         if (length(comment.char) && nchar(comment.char)) {
>>>             pattern <- paste("^[ \\t]*", substring(comment.char,
>>>                 1, 1), sep = "")
>>>             if (length(grep(pattern, line)))
>>>                 next
>>>         }
>>>         lines <- c(lines, line)
>>>        #
>>>        #  additional line required
>>>        #
>>>        nlines <- nlines+1
>>>     }
>>
>>> --
>>
>>
>>
>>
>> Meinhard Ploner <meinhardploner at gmx.net> on 02/22/2002 03:17:34
>>
>> To:   james.holtman at convergys.com
>> cc:
>> Subject:  Re: [R] read.table on Mac OS X, CARBON vs. DARWIN
>>
>>
>> Yes. Thanks a lot.
>> I had the 1.4.0 because on Fink the latest version (1.4.1) is not
>> available. However, I will download it from the CRAN.
>> Meinhard
>>
>>
>> On Thursday, February 21, 2002, at 10:29  PM,
>> james.holtman at convergys.com wrote:
>>
>>> read.table did have a bug in it in 1.4.0.  It was fixed in 1.4.1.  Is
>>> that
>>> what you are running with?
>>
>>
>>
>>
>>
>> --
>>
>> NOTICE:  The information contained in this electronic mail 
>> transmission is
>> intended by Convergys Corporation for the use of the named individual 
>> or
>> entity to which it is directed and may contain information that is
>> privileged or otherwise confidential.  If you have received this 
>> electronic
>> mail transmission in error, please delete it from your system without
>> copying or forwarding it, and notify the sender of the error by reply 
>> email
>> or by telephone (collect), so that the sender's address records can be
>> corrected.
>>
>>
>>

http://www.mcg.edu/research/biostat/bickel.html

David R. Bickel, PhD
Assistant Professor
Medical College of Georgia
Office of Biostatistics and Bioinformatics
1120 Fifteenth St., AE-3037
Augusta, GA 30912-4900

Tel.: 706-721-4697; Fax: 706-721-6294
E-mail: dbickel at mail.mcg.edu or bickel at prueba.info
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/enriched
Size: 4761 bytes
Desc: not available
Url : https://stat.ethz.ch/pipermail/r-help/attachments/20020222/19523d0e/attachment.bin


More information about the R-help mailing list