[R] tapply

Martin Maechler maechler at stat.math.ethz.ch
Wed Jun 22 10:13:27 CEST 2005


>>>>> "AndyL" == Liaw, Andy <andy_liaw at merck.com>
>>>>>     on Tue, 21 Jun 2005 13:30:54 -0400 writes:

    AndyL> Try:
    >> (x <- factor(1:2, levels=1:5))
    AndyL> [1] 1 2
    AndyL> Levels: 1 2 3 4 5
    >> (x <- x[, drop=TRUE])
    AndyL> [1] 1 2
    AndyL> Levels: 1 2

or  
    (x <- factor(1:2, levels=1:5))
    (x2 <- factor(x))

which also drops the level
Martin

    AndyL> Andy

    >> From: Weiwei Shi [mailto:helprhelp at gmail.com] 
    >> 
    >> Even before I tried, I already realize it must be true when I read
    >> this reply! Great job! thanks, Andy.
    >> 
    >> > str(z)
    >> `data.frame':   235 obs. of  2 variables:
    >> $ CLAIMNUM : Factor w/ 1907 levels "0","10000001849",..: 1083 1083
    >> 1083 1582 1582 1084 1681 1681 1391 1391 ...
    >> $ SIU.SAVED: int  475 3000 3000 0 0 4352 0 0 4500 3000 ...
    >> 
    >> So, I have another general question: how to avoid this when I 
    >> do the matching?
    >> In my case, claimnum does not have to be a factor.  I think I can do
    >> as.integer on it to de-factor it. But, I want to know how to do it w/
    >> keeping is as factor? btw, what's your way to drop those levels?  :)
    >> 
    >> weiwei 
    >> 
    >> 
    >> On 6/21/05, Liaw, Andy <andy_liaw at merck.com> wrote:
    >> > What does str(z) say?  I suspect the second column is a 
    >> factor, which, after
    >> > the subsetting, has some empty levels.  If so, just drop 
    >> those levels.
    >> > 
    >> > Andy
    >> > 
    >> > > From: Weiwei Shi
    >> > >
    >> > > hi
    >> > > i tried all the methods suggested above:
    >> > > ave and rowsum with "with" function works for my 
    >> situation. I think
    >> > > the problem might not be due to tapply.
    >> > > My data z comes from
    >> > > z<-y[y[[1]] %in% x[[2]], c(1,9)]
    >> > >
    >> > > while z is supposed to have no entries for those non-matched
    >> > > between x and y.
    >> > >
    >> > > however, when I run tapply, and the result also includes those
    >> > > non-matched entries. I use is.na function to remove those 
    >> entry from z
    >> > > first and then use tapply again, but the result is the same: those
    >> > > NA's and those non-matched results are still there. 
    >> That's what I mean
    >> > > by "it doesn't work".
    >> > >
    >> > > Is there something I missed here so that z "implicitly" has some
    >> > > "trace" back to y dataset?
    >> > >
    >> > > thanks,
    >> > >
    >> > > On 6/20/05, Gabor Grothendieck <ggrothendieck at gmail.com> wrote:
    >> > > > On 6/20/05, Weiwei Shi <helprhelp at gmail.com> wrote:
    >> > > > > hi,
    >> > > > > i have another question on tapply:
    >> > > > > i have a dataset z like this:
    >> > > > > 5540 389100307391      2600
    >> > > > > 5541 389100307391      2600
    >> > > > > 5542 389100307391      2600
    >> > > > > 5543 389100307391      2600
    >> > > > > 5544 389100307391      2600
    >> > > > > 5546 381300302513        NA
    >> > > > > 5547 387000307470        NA
    >> > > > > 5548 387000307470        NA
    >> > > > > 5549 387000307470        NA
    >> > > > > 5550 387000307470        NA
    >> > > > > 5551 387000307470        NA
    >> > > > > 5552 387000307470        NA
    >> > > > >
    >> > > > > I want to sum the column 3 by column 2.
    >> > > > > I removed NA by calling:
    >> > > > > tapply(z[[3]], z[[2]], sum, na.rm=T)
    >> > > > > but it does not work.
    >> > > > >
    >> > > > > then, i used
    >> > > > > z1<-z[!is.na(z[[3]],]
    >> > > > > and repeat
    >> > > > > still doesn't work.
    >> > > > >
    >> > > > > please help.
    >> > > > >
    >> > > >
    >> > > > Depending on what you want you may be able to use rowsum:
    >> > > >
    >> > > > - display only groups that have at least one non-NA with the sum
    >> > > >   being the sum of the non-NAs:
    >> > > >
    >> > > >         with(na.omit(z), rowsum(V3, V2))
    >> > > >
    >> > > > - display all groups with the sum being NA if any member is NA:
    >> > > >
    >> > > >         rowsum(z$V3, z$V2)
    >> > > >
    >> > >
    >> > >
    >> > > --
    >> > > Weiwei Shi, Ph.D
    >> > >
    >> > > "Did you always know?"
    >> > > "No, I did not. But I believed..."
    >> > > ---Matrix III
    >> > >
    >> > > ______________________________________________
    >> > > R-help at stat.math.ethz.ch mailing list
    >> > > https://stat.ethz.ch/mailman/listinfo/r-help
    >> > > PLEASE do read the posting guide!
    >> > > http://www.R-project.org/posting-guide.html
    >> > >
    >> > >
    >> > >
    >> > 
    >> > 
    >> > 
    >> > 
    >> --------------------------------------------------------------
    >> ----------------
    >> > Notice:  This e-mail message, together with any 
    >> attachments, contains information of Merck & Co., Inc. (One 
    >> Merck Drive, Whitehouse Station, New Jersey, USA 08889), 
    >> and/or its affiliates (which may be known outside the United 
    >> States as Merck Frosst, Merck Sharp & Dohme or MSD and in 
    >> Japan, as Banyu) that may be confidential, proprietary 
    >> copyrighted and/or legally privileged. It is intended solely 
    >> for the use of the individual or entity named on this 
    >> message.  If you are not the intended recipient, and have 
    >> received this message in error, please notify us immediately 
    >> by reply e-mail and then delete it from your system.
    >> > 
    >> --------------------------------------------------------------
    >> ----------------
    >> > 
    >> 
    >> 
    >> -- 
    >> Weiwei Shi, Ph.D
    >> 
    >> "Did you always know?"
    >> "No, I did not. But I believed..."
    >> ---Matrix III
    >> 
    >> 
    >> 

    AndyL> ______________________________________________
    AndyL> R-help at stat.math.ethz.ch mailing list
    AndyL> https://stat.ethz.ch/mailman/listinfo/r-help
    AndyL> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html




More information about the R-help mailing list