[R] how to chage values in data frame to NA iside a function

ripley@stats.ox.ac.uk ripley at stats.ox.ac.uk
Tue Feb 25 08:58:03 CET 2003


You are mis-using <<-.  I don't know what you think it does, so please
look it up.  Using <<- in R/S programming is normally a sign of incorrect
thinking (but not quite always).  (Also, it behaves differently in R and
in S which can be a cause of confusion to those who know only one of the
definitions.)

On Tue, 25 Feb 2003, Petr Pikal wrote:

> Thank you for your answers. It works OK but my real question is 
> why my  function behaves differently used on vector and data 
> frame (or matrix or list).  

> I attached a full version below with some foo data, but basically 
> the function  returns the correct index if applied correctly on any 
> type (list, data frame, matrix,  vector) but it changes values of 
> operand only if operand is a vector. 

Not so.  It always alters an object called `y'.  It just so happens that
your vector argument was called `y' and the other cases you tried were
not.
 
> Why please? 

(Because that is what you asked it to do ....)

I can see a way to do what I think is your intention (to change the
object which was passed as the y argument from the parent environment), 
but it is convoluted and against the spirit of a functional language, so I 
won't describe it.

> On 21 Feb 2003 at 10:23, Spencer Graves wrote: 
> 
> > Thomas Blackwell's solution will also work if dropout(df$y) returns a 
> > logical vector of length = length(df$y).  This also allows more 
> > general conditions, e.g., 
> >  
> >    select1 <- df[,1] > 0 
> >    select2 <- (select1) & (dr[,2] > 0) 
> >  
> >    df[select2, "y"] <- NA	 
> >  
> > Spencer Graves 
> >  
> > Thomas W Blackwell wrote: 
> > > Petr  - 
> > >  
> > > Does your function return "index" or return "y" after modifying y ? 
> > > In the email, it looks as though it returns "index".  If so, the 
> > > following should work: 
> > >  
> > >  
> > >>df$y[ dropout(df$y) ] <- NA 
> > >  
> > >  
> > > -  tom blackwell  -  u michigan medical school  -  ann arbor  - 
> > >  
> > >  
> > >  
> > > On Fri, 21 Feb 2003, Petr Pikal wrote: 
> > >  
> > >  
> > >>Dear all 
> > >> 
> > >>I have a function in which I would like to change some values to NA 
> > >>according to some condition. 
> > >> 
> > >>dropout<-function(y, nahr=FALSE,...) { 
> > >> 
> > >><some stuff for computing an index> 
> > >> 
> > >>if (nahr) y[index]<<-NA 
> > >>invisible(index) 
> > >> 
> > >>} 
> > >> 
> > >>in case y is a vector all works OK but if it is a part of data frame 
> > >>by calling 
> > >> 
> > >>dropout(df$y) or dropout(df[,number]) no change is done. 
> > >> 
> > >>Please can you help me what is wrong with my code? 
> > >> 
> > >>By the way 
> > >> 
> > >>idx<-dropout(df$y) 
> > >>df$y[idx]<-NA 
> > >> 
> > >>works OK 
> > >> 
> > >>Thanks a lot beforehand 
> > >> 
> > >>Best regards. 
> > >> 
> > >>Petr Pikal 
> 
> 
> #foo data 
> 
> x<-seq(0,100,.1) 
> y<-sin(x)+rnorm(length(x),mean=0,sd=1) 
> y1<-y-c(rep(0,200),exp(x[20:50]),rep(0,770)) 
> y<-y1+50 
> y<-y*(y>0) 
> y[600:700]<-0 
> df<-data.frame(y) 
> mat<-as.matrix(df) 
> mylist<-as.list(df) 
> 
> #vector 
> 
> plot(x,y) 
> ddd<-dropout(y) 
> points(x[ddd],y[ddd],col=2) 
> ddd<-dropout(y,nahr=T) 
> plot(x,y) 
> rm(ddd) 
> 
> #data frame 
> 
> plot(x,df$y) 
> ddd<-dropout(df$y) 
> points(x[ddd],df$y[ddd],col=2) 
> ddd<-dropout(df$y,nahr=T) 
> plot(x,df$y) 
> rm(ddd) 
> 
> #matrix 
> 
> plot(x,mat[,1]) 
> ddd<-dropout(mat[,1]) 
> points(x[ddd],mat[ddd,1],col=2) 
> ddd<-dropout(mat[,1],nahr=T) 
> plot(x,mat[,1]) 
> rm(ddd) 
> 
> #list 
> 
> plot(x,mylist$y) 
> ddd<-dropout(df$y) 
> points(x[ddd],mylist$y[ddd],col=2) 
> ddd<-dropout(mylist$y,nahr=T) 
> plot(x,mylist$y) 
> 
> #this is full function 
> 
> dropout<-function(y,span=21, mez=NULL, p=0.99995, 
> nahradit=FALSE, ...) { 
> 
> ### this part is just computing the logical index vector with length 
> = length(y)  ### and TRUE values where dropout occurs 
> 
> #kontrola licheho spanu 
> if(span/2-span%/%2<.4|span<2) span<-
> ceiling(span+floor(1/span)+.1) 
> 
> n<-length(y) 
> s<-span%/%2 
> 
> 
> idx1<-y==0 
> prumer<-median(y[!idx1],na.rm=T) 
> 
> if (is.null(mez))   
> { 
> mez<-mad(y[!idx1],na.rm=T) 
> dm<-prumer-mez*qnorm(p) 
> hm<-prumer+mez*qnorm(p) 
> } else { 
> 
> dm<-prumer-mez 
> hm<-prumer+mez 
> } 
> 
> 
> idx2<-y<dm 
> idx3<-y>hm 
> 
> idx<-as.logical(idx1+idx2+idx3) 
> z <- embed(idx,span) 
> rowSums(z) 
> length(rowSums(z)) 
> sumy<-rowSums(z)>0 
> index<-c(rep(sumy[1],s),sumy,rep(sumy[n-span+1],s)) 
> 
> ### index is a returned logical vector and it is OK 
> 
> if (nahradit) y[index]<<-NA 
> ### this is the ghastly line which does not work as I expected :-( 
> 
> invisible(index) 
> 
> } 
> 
> 
> Thank you 
> 
> Best regardsPetr Pikal
> petr.pikal at precheza.cz
> p.pik at volny.cz
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> http://www.stat.math.ethz.ch/mailman/listinfo/r-help
> 

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595




More information about the R-help mailing list