[R] Efficiency question: replacing all NAs with a zero

Gabor Grothendieck ggrothendieck at gmail.com
Tue Mar 30 02:45:13 CEST 2010


Its going to be pretty hard to do anything useful if you can`t even do
simple operations like that without overflowing memory but anyways try
this (untested):

write.table(DF, "DF.csv", sep = ",", quote = FALSE)
rm(DF)
DF <- read.csv(pipe("sed s/NA/0/g DF.csv"))


On Mon, Mar 29, 2010 at 8:33 PM, Dimitri Liakhovitski <ld7631 at gmail.com> wrote:
> Just tried it. It's definitely faster - but I get the same error:
> " Reached total allocation of 1535Mb:"
>
> On Mon, Mar 29, 2010 at 8:27 PM, Gabor Grothendieck
> <ggrothendieck at gmail.com> wrote:
>> See if this works for you:
>>
>> DF[is.na(DF)] <- 0
>>
>> On Mon, Mar 29, 2010 at 8:21 PM, Dimitri Liakhovitski <ld7631 at gmail.com> wrote:
>>> Dear R'ers,
>>>
>>> I have a very large data frame (over 4000 rows and 2,500 columns). My
>>> task is very simple - I have to replace all NAs with a zero. My code
>>> works fine on smaller data frames - but I have to deal with a huge one
>>> and there are many NAs in each column.
>>> R runs out of memory on me ("Reached total allocation of 1535Mb: see
>>> help(memory.size)"). Is there any other, more efficient way of doing
>>> it?
>>> Thanks a lot for any hints!
>>> Dimitri
>>>
>>>
>>> # Building an example frame:
>>> frame<-data.frame(a=rnorm(1:100),b=rnorm(1:100),c=rnorm(1:100),d=rnorm(1:100),e=rnorm(1:100),f=rnorm(1:100),g=rnorm(1:100))
>>> set.seed(1234)
>>> for(i in names(frame)){
>>>        i.for.NA<-sample(1:100,60)
>>>        frame[[i]][i.for.NA]<-NA
>>> }
>>>
>>> # Replacing all NAs in "frame" with zeros - is of course fast in this
>>> example, because this data frame is very small
>>> system.time({
>>> frame<-lapply(frame,function(x){
>>>        x[is.na(x)]<-0
>>>        return(x)
>>> })})
>>>
>>>
>>> --
>>> Dimitri Liakhovitski
>>> Ninah.com
>>> Dimitri.Liakhovitski at ninah.com
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>
>
>
> --
> Dimitri Liakhovitski
> Ninah.com
> Dimitri.Liakhovitski at ninah.com
>



More information about the R-help mailing list