[R] how to subset unique factor combinations from a data frame.

Petr PIKAL petr.pikal at precheza.cz
Tue Jan 4 12:05:39 CET 2011


Hi

r-help-bounces at r-project.org napsal dne 04.01.2011 11:19:02:

> Hi,
> 
> Sorry that my example is not clear. I will give an example of what each
> variable holds. I hope this clearly explains the case.
> 
> Names of the dataframe (df) and description
> 
> Year :- Year is calendar year, from 1980 to 2010
> 
> Country :- is the country name, total no. (levels) of countries is ~ 190 

> 
> Commodity :- Crude oil, Sugar, Rubber, Coffee .... No. (levels) of
> commodities is 20
> 
> Attribute: - Production, Consumption, Stock, Import, Export... Levels ~ 
20
> 
> Unit :- this is actually not a factor. It describes the unit of 
Attribute.
> Say the unit for Coffee (commodity) - Production (attribute) is 60 kgs.
> While the unit for Crude oil - Production is 1000 barrels
> 
> Value :-  value 
> 
> > tail(df, n = 10) // example data//
> 
> Year   Country      Commodity   Attribute   Unit
> Value
> 1991   United Kingdom   Wheat, Durum   Total Supply   (1000 MT)   70
> 1991   United Kingdom   Wheat, Durum   TY Exports   (1000 MT)   0
> 1991   United Kingdom   Wheat, Durum   TY Imp. from U   (1000 MT)   0
> 1991   United Kingdom   Wheat, Durum   TY Imports   (1000 MT)   60
> 1991   United Kingdom   Wheat, Durum   Yield      (MT/HA)   5
> 
> Wish this is clear. Any suggestion

suggestion is still the same, use aggregate on any other similar function 
maybe from plyr package. No matter how exactly you will describe your data 
if you fail to show any code you used and how this code failed in 
delivering desired result you will get only vague responses.

Regards
Petr


> 
> Regards,
> 
> SNVK
> 
> -----Original Message-----
> From: Petr PIKAL [mailto:petr.pikal at precheza.cz] 
> Sent: Tuesday, January 04, 2011 4:06 PM
> To: SNV Krishna
> Cc: r-help at r-project.org
> Subject: Odp: [R] how to subset unique factor combinations from a data
> frame.
> 
> Hi
> 
> r-help-bounces at r-project.org napsal dne 04.01.2011 05:21:25:
> 
> > Hi All
> > 
> > I have these questions and request members expert view on this. 
> > 
> > a) I have a dataframe (df) with five factors (identity variables) and
> value
> > (measured value). The id variables are Year, Country, Commodity,
> Attribute,
> > Unit. Value is a value for each combination of this.
> > 
> > I would like to get just the unique combination of Commodity, 
> > Attribute
> and
> > Unit. I just need the unique factor combination into a dataframe or a
> table.
> > I know aggregate and subset but dont how to use them in this context. 
> 
> aggregate(Value, list(Comoditiy, Atribute, Unit), function)
> 
> > 
> > b) Is it possible to inclue non- aggregate columns with aggregate
> function
> > 
> > say in the above case > aggregate(Value ~ Commodity + Attribute, data 
> > =
> df,
> > FUN = count). The use of count(Value) is just a round about to return
> the
> > combinations of Commodity & Attribute, and I would like to include
> 'Unit'
> > column in the returned data frame?
> 
> Hm. Maybe xtabs? But without any example it is only a guess.
> 
> > 
> > c) Is it possible to subset based on unique combination, some thing 
> > like this.
> > 
> > > subset(df, unique(Commodity), select = c(Commodity, Attribute, 
Unit)). 
> I
> > know this is not correct as it returns an error 'subset needs a 
> > logical evaluation'. Trying various ways to accomplish the task.
> > 
> 
> Probably sqldf package has tools for doing it but I do not use it so you
> have to try yourself.
> 
> df[Comodity==something, c("Commodity", "Attribute", "Unit")]
> 
> can be other way.
> 
> Anyway your explanation is ambiguous. Let say you have three rows with 
the
> same Commodity. Which row do you want to select?
> 
> Regards
> Petr
> 
> 
> > will be grateful for any ideas and help
> > 
> > Regards,
> > 
> > SNVK
> > 
> >    [[alternative HTML version deleted]]
> > 
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list