[R] the quote problem with readLines()

jim holtman jholtman at gmail.com
Wed Mar 18 15:51:41 CET 2009


The amount of data that you want to read in (136M numbers) will
require about 1GB of memory (8 bytes per number for floating point -
truncation does not reduce this number of bytes).  So if you want to
read it all in, then find a 64-bit version of R and probably at least
4GB of memory for your process.  A 32-bit version might have just
enough space if you can allocate all the 4GB of memory to that
process.

So if you want to have it all in memory, invest in a larger computer.
If you want to run on the system you have, then you will probably have
to sample your data so that you can get a portion that will fit in
memory to run your test, or see if there is a way of processing
portions of the file and then combining for a final result.
On Wed, Mar 18, 2009 at 9:58 AM, Dongyan Song <yzhskdls at hotmail.com> wrote:
>
> Hi,
>
> Thank you for your concern!
>
> The file has 136,047,472 lines, with one value in each line, and is 1.7G in
> size. I run in a Linux (OpenSuse OS) with 4G memory in total. The error
> message is Error: cannot allocate vector of size 2.0 Gb. And the worst thing
> is even if I read all the data into R after I truncate the numbers'
> precision, i.e. from 1.234567e+00 to 1.2, I cannot manipulate these numbers,
> for example, I cannot do ks.test, histogram, kernel density estimator, which
> I want to do with these numbers. And after I input commands above, computer
> also give error messages like Error: cannot allocate vector of size 809.1
> Mb. I can read a half of file, but I want to know the overall distribution
> of those numbers, and values in this file is not ordered, and it is not
> quite easy to random pick up some numbers or sort them.
>
> Is these information enough? Thank you again!
>
> Best,
> Dongyan
>
>
>
> jholtman wrote:
>>
>> readLines is doing exactly what you are asking:
>>
>> Value
>> A character vector of length the number of lines read.
>>
>> You still have to convert the character strings to numeric.  Exactly
>> how large is "quite large"?  What system are you running on?  How much
>> memory do you have?  What is the error message that you are getting?
>> Exactly what does your file look like?  Have you tried reading in
>> portions of the file?  How big will it be if you could read it in?
>> Will it take up more than 25% of real memory?  There is still some
>> information you need to provide so an assessment can be made.
>>
>> On Tue, Mar 17, 2009 at 8:50 AM, Dongyan Song <yzhskdls at hotmail.com>
>> wrote:
>>>
>>> Dear all,
>>>
>>> I read a file with all numbers with readLines function, as below,
>>>> f <- file("data.txt")
>>>> a <- readLines(f)
>>> but all the values in a are in format "....", and I cannot do the
>>> calculation with them since they are not numeric. I wonder how should I
>>> skip
>>> those quotes, thank you for help!
>>> I have to use readLines function instead of scan, read.table or matrix,
>>> because the size of file is quite large, and other function cannot
>>> allocate
>>> enough space/memory to read the input file.
>>>
>>> Best,
>>> Dongyan
>>> --
>>> View this message in context:
>>> http://www.nabble.com/the-quote-problem-with-readLines%28%29-tp22558454p22558454.html
>>> Sent from the R help mailing list archive at Nabble.com.
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>>
>> --
>> Jim Holtman
>> Cincinnati, OH
>> +1 513 646 9390
>>
>> What is the problem that you are trying to solve?
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>
>
> -----
> Dongyan Song, Msc
> Medical informatics, Uppsala University, Sweden
> --
> View this message in context: http://www.nabble.com/the-quote-problem-with-readLines%28%29-tp22558454p22579163.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?




More information about the R-help mailing list