[R] dataframe operation

Gabor Grothendieck ggrothendieck at gmail.com
Wed Jan 24 22:21:09 CET 2007


Here is a slight variation on Marc's idea:

isna <- is.na(DF)
DF[] <- replace(100 * col(isna), isna, NA)

On 1/24/07, Marc Schwartz <marc_schwartz at comcast.net> wrote:
> On Wed, 2007-01-24 at 14:16 -0600, Marc Schwartz wrote:
> > On Wed, 2007-01-24 at 14:10 -0600, Marc Schwartz wrote:
> > > On Wed, 2007-01-24 at 20:27 +0100, Indermaur Lukas wrote:
> > > > hi
> > > > i have a dataframe "a" which looks like:
> > > >
> > > > column1, column2, column3
> > > > 10,12, 0
> > > > NA, 0,1
> > > > 12,NA,50
> > > >
> > > > i want to replace all values in column1 to column3 which do not contain "NA" with values of vector "b" (100,200,300).
> > > >
> > > > any idea i can do it?
> > > >
> > > > i appreciate any hint
> > > > regards
> > > > lukas
> > > >
> > >
> > > Here is one possibility:
> > >
> > > > sapply(seq(along = colnames(DF)),
> > >          function(x) ifelse(is.na(DF[[x]]), 100 * x, DF[[x]]))
> > >      [,1] [,2] [,3]
> > > [1,]   10   12    0
> > > [2,]  100    0    1
> > > [3,]   12  200   50
> > >
> > >
> > > Note that the returned object will be a matrix, so if you need a data
> > > frame, just coerce the result with as.data.frame().
> >
> > OK....that's what I get for pulling the trigger too fast.
> >
> > Just reverse the logic in the function:
> >
> > > sapply(seq(along = colnames(DF)),
> >          function(x) ifelse(!is.na(DF[[x]]), 100 * x, DF[[x]]))
> >      [,1] [,2] [,3]
> > [1,]  100  200  300
> > [2,]   NA  200  300
> > [3,]  100   NA  300
> >
> >
> > I misread the query initially.
>
> Here is another possibility, which may be faster depending upon the
> actual size and dims of your initial data frame.
>
> Preallocate a matrix of replacement values:
>
> Mat <- matrix(rep(seq(along = colnames(DF)) * 100, each = nrow(DF)),
>              ncol = ncol(DF))
>
> > Mat
>     [,1] [,2] [,3]
> [1,]  100  200  300
> [2,]  100  200  300
> [3,]  100  200  300
>
>
> Now do the replacement:
>
> > ifelse(!is.na(DF), Mat, NA)
>  column1 column2 column3
> 1     100     200     300
> 2      NA     200     300
> 3     100      NA     300
>
>
> In doing some testing, the above may be about 10 times faster than using
> sapply() in my first solution, again depending upon the structure of
> your DF.
>
> HTH,
>
> Marc
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list