[R] How to avoid the three loops in R?

David L Carlson dcarlson at tamu.edu
Fri Aug 1 16:47:27 CEST 2014


Here's another approach:

# First put the data in a format that is easier to transmit using dput():
dta <- structure(list(Country = structure(c(1L, 3L, 2L, 1L, 3L, 2L, 
1L, 2L, 1L, 3L, 2L, 1L, 3L), .Label = c("AE", "CN", "DE"), class = "factor"), 
    Product = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 1L, 1L, 1L, 2L, 
    2L), Price = c(20L, 20L, 28L, 28L, 28L, 22L, 28L, 28L, 20L, 
    20L, 28L, 28L, 28L), Year_Month = c(201204L, 201204L, 201204L, 
    201204L, 201204L, 201204L, 201204L, 201204L, 201205L, 201205L, 
    201205L, 201205L, 201205L)), .Names = c("Country", "Product", 
"Price", "Year_Month"), class = "data.frame", row.names = c(NA, 
-13L))
 # Then
> grp <- aggregate(Price~Product+Year_Month, dta, function(x) c(sum(x),
+      length(x)))
> dtmrg <- merge(dta, grp, by=c("Product", "Year_Month"))
> newdta <- with(dtmrg, data.frame(Country, Product, Price=Price.x,
+      Year_Month, Price_average_in_other_area=(Price.y[,1]-Price.x)/
+      (Price.y[,2]-1)))
> newdta
   Country Product Price Year_Month Price_average_in_other_area
1       AE       1    20     201204                          24
2       DE       1    20     201204                          24
3       CN       1    28     201204                          20
4       AE       1    20     201205                          24
5       DE       1    20     201205                          24
6       CN       1    28     201205                          20
7       AE       2    28     201204                          25
8       DE       2    28     201204                          25
9       CN       2    22     201204                          28
10      AE       2    28     201205                          28
11      DE       2    28     201205                          28
12      AE       3    28     201204                          28
13      CN       3    28     201204                          28

-------------------------------------
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352

-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of John McKown
Sent: Friday, August 1, 2014 9:07 AM
To: Lingyi Ma
Cc: r-help
Subject: Re: [R] How to avoid the three loops in R?

On Fri, Aug 1, 2014 at 6:41 AM, Lingyi Ma <lingyi.ma at gmail.com> wrote:
> I have the following data set:
>
>   Country  Product   Price  Year_Month
>      AE         1           20    201204
>      DE         1           20    201204
>      CN         1           28    201204
>      AE         2           28    201204
>      DE         2           28    201204
>      CN         2           22    201204
>      AE         3           28    201204
>      CN         3           28    201204
>      AE         1           20    201205
>      DE         1           20    201205
>      CN         1           28    201205
>      AE         2           28    201205
>      DE         2           28    201205
>
> I want to create the one more column which is "The average price of the
> product in other areas".
> in other word, for each month, for each product, I calculate the average of
> such product in the other area.
>
> I want sth like:
>
>   Country  Product   Price  Year_Month    Price_average_In_Other_area
>      AE         1           20    201204              14
>      AE         2           28    201204              25

The output above looks wrong. The Price_average_In_Other_area for AE,
product 1 should be 24?

My possible solution:

# Initialize data.frame & call it "x".
Country <- c("AE","DE","CN","AE","DE","CN","AE","CN","AE","DE","CN","AE","DE");
Product <- c(1,1,1,2,2,2,3,3,1,1,1,2,2);
Price <- c(20,20,28,28,28,22,28,28,20,20,28,28,28);
Year_Month <- c(201204,201204,201204,201204,201204,201204,201204,201204,201205,201205,201205,201205,201205);
x <- data.frame(Country,Product,Price,Year_Month,stringsAsFactors=FALSE);
#
#
library("dplyr");
#
# Get the total Price of all Products and number of Products for each
Product & Year_Month"
y <- summarize(group_by(x, Product,
Year_Month),sumPrice=sum(Price),NoPrice=length(Price));
#
# Merge the above data back into the original data.frame, based on
# Product and Year_Month (similar to SQL inner join).
x <- merge(x=x,y=y);
#
# Now calculate the "other area" average by subtracting the cost in this area
# from the total cost in all areas and divide by the number of areas, minus one.
# Please note that if a Product and Year_Month is unique, i.e. no other areas
# for this Product & Year_Month, this will try to divide by zero.
# This gives "Inf" as an answer.
x$Prive_average_In_Other_area <- (x$sumPrice-x$Price)/(x$NoPrice-1);
# Possible alternate to handle above consideration
x$Avg_other <- ifelse(x$NoPrice>1,(x$sumPrice-x$Price)/(x$NoPrice-1),NA);



>
> Please avoid the three for loop, I have tried and it never end. I have
>  1070427 rows.  Is there better way to speed up my program?

-- 
There is nothing more pleasant than traveling and meeting new people!
Genghis Khan

Maranatha! <><
John McKown

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list