[R] the quote problem with readLines()

Dongyan Song yzhskdls at hotmail.com
Wed Mar 18 16:16:13 CET 2009


Hi Jim,

Thank you very much! I will try to sample them then.

Best,
Dongyan


jholtman wrote:
> 
> The amount of data that you want to read in (136M numbers) will
> require about 1GB of memory (8 bytes per number for floating point -
> truncation does not reduce this number of bytes).  So if you want to
> read it all in, then find a 64-bit version of R and probably at least
> 4GB of memory for your process.  A 32-bit version might have just
> enough space if you can allocate all the 4GB of memory to that
> process.
> 
> So if you want to have it all in memory, invest in a larger computer.
> If you want to run on the system you have, then you will probably have
> to sample your data so that you can get a portion that will fit in
> memory to run your test, or see if there is a way of processing
> portions of the file and then combining for a final result.
> On Wed, Mar 18, 2009 at 9:58 AM, Dongyan Song <yzhskdls at hotmail.com>
> wrote:
>>
>> Hi,
>>
>> Thank you for your concern!
>>
>> The file has 136,047,472 lines, with one value in each line, and is 1.7G
>> in
>> size. I run in a Linux (OpenSuse OS) with 4G memory in total. The error
>> message is Error: cannot allocate vector of size 2.0 Gb. And the worst
>> thing
>> is even if I read all the data into R after I truncate the numbers'
>> precision, i.e. from 1.234567e+00 to 1.2, I cannot manipulate these
>> numbers,
>> for example, I cannot do ks.test, histogram, kernel density estimator,
>> which
>> I want to do with these numbers. And after I input commands above,
>> computer
>> also give error messages like Error: cannot allocate vector of size 809.1
>> Mb. I can read a half of file, but I want to know the overall
>> distribution
>> of those numbers, and values in this file is not ordered, and it is not
>> quite easy to random pick up some numbers or sort them.
>>
>> Is these information enough? Thank you again!
>>
>> Best,
>> Dongyan
>>
>>
>>
>> jholtman wrote:
>>>
>>> readLines is doing exactly what you are asking:
>>>
>>> Value
>>> A character vector of length the number of lines read.
>>>
>>> You still have to convert the character strings to numeric.  Exactly
>>> how large is "quite large"?  What system are you running on?  How much
>>> memory do you have?  What is the error message that you are getting?
>>> Exactly what does your file look like?  Have you tried reading in
>>> portions of the file?  How big will it be if you could read it in?
>>> Will it take up more than 25% of real memory?  There is still some
>>> information you need to provide so an assessment can be made.
>>>
>>> On Tue, Mar 17, 2009 at 8:50 AM, Dongyan Song <yzhskdls at hotmail.com>
>>> wrote:
>>>>
>>>> Dear all,
>>>>
>>>> I read a file with all numbers with readLines function, as below,
>>>>> f <- file("data.txt")
>>>>> a <- readLines(f)
>>>> but all the values in a are in format "....", and I cannot do the
>>>> calculation with them since they are not numeric. I wonder how should I
>>>> skip
>>>> those quotes, thank you for help!
>>>> I have to use readLines function instead of scan, read.table or matrix,
>>>> because the size of file is quite large, and other function cannot
>>>> allocate
>>>> enough space/memory to read the input file.
>>>>
>>>> Best,
>>>> Dongyan
>>>> --
>>>> View this message in context:
>>>> http://www.nabble.com/the-quote-problem-with-readLines%28%29-tp22558454p22558454.html
>>>> Sent from the R help mailing list archive at Nabble.com.
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>
>>>
>>>
>>> --
>>> Jim Holtman
>>> Cincinnati, OH
>>> +1 513 646 9390
>>>
>>> What is the problem that you are trying to solve?
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>>
>>
>> -----
>> Dongyan Song, Msc
>> Medical informatics, Uppsala University, Sweden
>> --
>> View this message in context:
>> http://www.nabble.com/the-quote-problem-with-readLines%28%29-tp22558454p22579163.html
>> Sent from the R help mailing list archive at Nabble.com.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
> 
> 
> 
> -- 
> Jim Holtman
> Cincinnati, OH
> +1 513 646 9390
> 
> What is the problem that you are trying to solve?
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 


-----
Dongyan Song, Msc
Medical informatics, Uppsala University, Sweden
-- 
View this message in context: http://www.nabble.com/the-quote-problem-with-readLines%28%29-tp22558454p22581029.html
Sent from the R help mailing list archive at Nabble.com.




More information about the R-help mailing list