[R] subsetting a data set

Petr Pikal petr.pikal at precheza.cz
Fri Sep 8 11:26:53 CEST 2006


Hi

if you use summary aggregate probably will not work and tapply have 
to be called differently

tapply(seq(along=Max[,1]), list(Max$Status), function(i, x) 
summary(x[i]), x=Max[,one.column])

or you can use by

by(Max[,1:5]), list(Max$Status), summary)

or if you do not like the output  something like that

lll <- lapply(as.list(Max[,your.columns]), function(x) 
sapply(split(x,Max$Status),summary))
do.call("rbind",lll)
or
do.call("data.frame",lll)

HTH
Petr

On 8 Sep 2006 at 10:03, Graham Smith wrote:

Date sent:      	Fri, 8 Sep 2006 10:03:51 +0100
From:           	"Graham Smith" <myotisone at gmail.com>
To:             	"Petr Pikal" <petr.pikal at precheza.cz>
Copies to:      	r-help at stat.math.ethz.ch
Subject:        	Re: [R] subsetting a data set

> Petr,
> 
> Thanks I shall have at look at these options.
> 
> Sorry about the confusion with the "Max", in my example "Max" is the
> name of the variable that I am summarising. I chose a poor example to
> cut and paste form R, not thinking about the obvious confusion this
> would cause.
> 
> Thanks again
> 
> Graham
> 
> On 08/09/06, Petr Pikal <petr.pikal at precheza.cz> wrote:
> >
> > Hi
> >
> > I am not sure if your Max is the same as max so I am not sure what
> > you exactly want from your data. However you shall consult ?tapply,
> > ?by, ?aggregate and maybe also ?"[" together with chapter 2 in intro
> > manual in docs directory.
> >
> > aggregate(data[, some.columns], list(data$factor1, data$factor2),
> > max)
> >
> > will give you maximum for specified columns based on spliting the
> > data according to both factors
> >
> > Also connection summary with max is not common and I wonder what is
> > your output in this case. I believe that there are six same numbers.
> > However R is case sensitive and maybe Max does something different
> > from max. In my case it throws an error.
> >
> > HTH
> > Petr
> >
> > On 8 Sep 2006 at 8:06, Graham Smith wrote:
> >
> > Date sent:              Fri, 8 Sep 2006 08:06:16 +0100
> > From:                   "Graham Smith" < myotisone at gmail.com>
> > To:                     r-help at stat.math.ethz.ch
> > Subject:                [R] subsetting a data set
> >
> > > I have a data set called GQ1, which has 20 variables one of which
> > > is a factor called Status at thre levels "Expert", "Ecol" and
> > > "Stake"
> > >
> > > I have managed to evaluate some of the data split by status using
> > > commands like:
> > >
> > > summary (Max[Status=="Ecol"])
> > >
> > > BUT how do I produce  asummary for Ecol and Expert combined, the
> > > only example I can find suggsts I could use
> > >
> > > summary (Max[Status=="Ecol"& Status=="Expert"]) but that doesn't
> > > work.
> > >
> > > Additionally on the same vein, if I cannot work out how to create
> > > a new data set that would contain all the data for all the
> > > variables but only for the data where Status = Ecol, or where
> > > status equalles Ecol and Expert.
> > >
> > > I know this is yet again a very simple problem, but I really can't
> > > find the solution in the help or the books I have.
> > >
> > > Many thanks,
> > >
> > > Graham
> > >
> > >  [[alternative HTML version deleted]]
> > >
> > > ______________________________________________
> > > R-help at stat.math.ethz.ch mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > > http://www.R-project.org/posting-guide.html and provide commented,
> > > minimal, self-contained, reproducible code.
> >
> > Petr Pikal
> > petr.pikal at precheza.cz
> >
> >
> 
>  [[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html and provide commented,
> minimal, self-contained, reproducible code.

Petr Pikal
petr.pikal at precheza.cz



More information about the R-help mailing list