[R] How to get the proportions of data with respect to two variables in R?

jim holtman jholtman at gmail.com
Sun Dec 1 17:19:11 CET 2013


Here is an example using data.table to get the proportion for the Length/Width:

> input <- read.table(text = "'ID' 'Class' 'Length' 'Width'
+ 2 2 13.5 4.5
+ 2 2 13.5 4.5
+ 2 2 13.5 4.5
+ 2 2 13.5 4.5
+ 3 2 13.5 4.0
+ 3 2 13.5 4.0
+ 3 2 13.5 4.0
+ 3 2 13.5 4.0
+ 4 2 10.0 4.5
+ 4 2 10.0 4.5
+ 4 2 10.0 4.5
+ 4 2 10.0 4.5
+ 5 3 23.0 4.5
+ 5 3 23.0 4.5
+ 5 3 23.0 4.5
+ 5 3 23.0 4.5
+ 6 3 76.5 4.5
+ 6 3 76.5 4.5
+ 6 3 76.5 4.5
+ 6 3 76.5 4.5
+ 6 3 76.5 4.5
+ 7 1 10.0 3.0
+ 7 1 10.0 3.0
+ 7 1 10.0 3.0
+ 7 1 10.0 3.0
+ 8 2 13.5 5.5
+ 8 2 13.5 5.5
+ 8 2 13.5 5.5
+ 8 2 13.5 5.5", header = TRUE)
>
> # remove duplicates
> input <- subset(input, !duplicated(input))
> require(data.table)
> input <- data.table(input)
>
> # create counts by Length/Width
> counts <- input[
+     , list(count = .N)
+     , keyby = 'Class,Length,Width'
+     ]
>
> # add proportion
> counts$prop <- ave(counts$count
+             , counts$Class
+             , FUN = function(x) round(x / sum(x) * 100, 1)
+             )
>
> counts
   Class Length Width count prop
1:     1   10.0   3.0     1  100
2:     2   10.0   4.5     1   25
3:     2   13.5   4.0     1   25
4:     2   13.5   4.5     1   25
5:     2   13.5   5.5     1   25
6:     3   23.0   4.5     1   50
7:     3   76.5   4.5     1   50
>


Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.


On Sun, Dec 1, 2013 at 3:05 AM, umair durrani <umairdurrani at outlook.com> wrote:
> Thanks for your answers Arun. Unfortunately the code didn't work and I am getting the error: arguments must have same length. Here are sample input and output:
> INPUT:
> Vehicle ID Vehicle Class Vehicle Length Vehicle Width
> 2 2 13.5 4.5
> 2 2 13.5 4.5
> 2 2 13.5 4.5
> 2 2 13.5 4.5
> 3 2 13.5 4.0
> 3 2 13.5 4.0
> 3 2 13.5 4.0
> 3 2 13.5 4.0
> 4 2 10.0 4.5
> 4 2 10.0 4.5
> 4 2 10.0 4.5
> 4 2 10.0 4.5
> 5 3 23.0 4.5
> 5 3 23.0 4.5
> 5 3 23.0 4.5
> 5 3 23.0 4.5
> 6 3 76.5 4.5
> 6 3 76.5 4.5
> 6 3 76.5 4.5
> 6 3 76.5 4.5
> 6 3 76.5 4.5
> 7 1 10.0 3.0
> 7 1 10.0 3.0
> 7 1 10.0 3.0
> 7 1 10.0 3.0
> 8 2 13.5 5.5
> 8 2 13.5 5.5
> 8 2 13.5 5.5
> 8 2 13.5 5.5Note that in this input: Total number of cars=4, trucks=2, motorcycles=1
> Sample OutputGroup: cars
> VehicleLength VehicleWidth Proportion
> 13.5 4.5 0.25
> 13.5 4.0 0.25
> 13.5 5.5 0.25
> 23.0 4.5 0.25
>
> Group:trucks
> VehicleLength VehicleWidth Proportion
> 23.0 4.5 0.5
> 76.0 4.5 0.5
>
> Group: motorcycles
> VehicleLength VehicleWidth Proportion
> 10.0 3.0 1.0
>
> Umair Durrani
>
> email: umairdurrani at outlook.com
>
>
>> Date: Sat, 30 Nov 2013 23:41:28 -0800
>> From: smartpink111 at yahoo.com
>> Subject: Re: [R] How to get the proportions of data with respect to two variables in R?
>> To: r-help at r-project.org
>> CC: umairdurrani at outlook.com
>>
>> Hi,
>> It is better to provide a reproducible example.
>> May be this helps:
>> set.seed(252)
>> dat1 <- data.frame(`Vehicle ID`=sample(150,150,replace=FALSE),`Vehicle Class`=rep(1:4,c(20,40,30,60)), `Vehicle length`= sample(15:25,150,replace=TRUE), `Vehicle width`= sample(4:10,150,replace=TRUE),check.names=FALSE)
>> cars <- subset(dat1,`Vehicle Class`==2)
>>  by(cars,INDICES=cars$`Vehicle length`,FUN=table(cars$`Vehicle width`))
>> #Error in FUN(X[[1L]], ...) : could not find function "FUN"
>>
>> by(cars$`Vehicle width`,INDICES=cars$`Vehicle length`, table)
>>  by(dat1$`Vehicle width`,list(dat1$`Vehicle Class`,dat1$`Vehicle length`), table)
>>
>>
>> #Also, you may check
>>
>> ftable(dat1[2:4])
>> prop.table(ftable(dat1[2:4]),1)
>>
>>
>> A.K.
>>
>>
>>
>>
>>
>> On Sunday, December 1, 2013 12:08 AM, umair durrani <umairdurrani at outlook.com> wrote:
>> I have 4 columns: Vehicle ID, Vehicle Class, Vehicle Length and Vehicle Width. Every vehicle has a unique vehicle ID (e.g. 2, 4, 5,...) and the data was collected every 0.1 seconds which means that vehicle IDs are repeated in Vehicle ID column for the number of times they were observed. There are three vehicle classes i.e. 1=motorcycles, 2=cars, 3=trucks in the Vehicle Class column and the lengths and widths are in their respective columns against every vehicle ID. I want to subset the data by vehicle class and then find the proportions of each vehicle model (unique length and width) within every class. For example, for the Vehicle Class = 2 i.e. car, I want to find different models of cars (unique length and width) and their proportions with respect to total number of cars. Here is what I have done so far:To subset data by Vehicle Classcars <- subset(b, b$'Vehicle class'==2)
>> trucks <- subset(b, b$'Vehicle class'==3)
>> motorcycles <- subset(b, b$'Vehicle class'==1)To find the number of carsnumofcars <- length(unique(cars$'Vehicle ID')) # 2830
>> numoftrucks <- length(unique(trucks$'Vehicle ID')) # 137
>> numofmotorcycles <- length(unique(motorcycles$'Vehicle ID'))# 45The above code worked but I could not find the proportions by using the code below:by (cars, INDICES=cars$'Vehicle Length', FUN=table(class$'Vehicle width'))R gives an error stating that it could not find 'FUN'. Please help me in finding the proportions of each model within all classes of vehicles.
>>
>> Umair Durrani
>>
>> email: umairdurrani at outlook.com
>>
>>     [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list