r - Alternative to calculating Elasticity using For Loop -


i've written piece of code calculate elasticity 200k products. elasticity being calculated alright it's been more 15 hours , process still running. see new objects being created. there faster alternative doing without using loop?

below code:

    sku_list <- unique(transact_data4$productid)     elasticity_values <- data.frame()      (i in 1:length(sku_list)){      test_sku <- subset(transact_data4, productid==sku_list[i])      m1 <- lm(formula=sales~price, data= test_sku)      coeffs <- as.data.frame(m1[[1]])      gradient<- coeffs[2,1]      gradient_final <- ifelse(is.na(gradient), -1, gradient)      mean_price <- mean(test_sku$price)      mean_sales <- mean(test_sku$sales)      elasticity <- gradient_final*mean_price/mean_sales      sku_elasticity <- cbind(sku_list[i],elasticity)      elasticity_values <- rbind(elasticity_values,sku_elasticity)     } colnames(elasticity_values)[colnames(elasticity_values)=="v1"] <- "productid"  

here sample dataset:

transact_data <- data.frame(productid=c('a', 'a','a', 'a','a', 'a','b', 'b','b', 'b','b', 'b'),                   price=c(10, 10.5, 11, 12,10, 9,                     10, 11, 13, 11,12.5, 11),                     sales =c(100,93,90,85,99,110,101,95,80,103,82,102), stringsasfactors=false) 

result:

  productid         elasticity 1         -0.913344887348354 2         b  -1.03051724343462 

is there faster way achieve without using loop? due smaller sample (only 2 productid) runs fast. i'm trying run on 200k productid.

thank you.

code

library(dplyr)  transact_data %>% group_by(productid) %>%    do(mod = lm(sales ~ price, data = .),        mean.price = mean(.$price),        mean.sales = mean(.$sales)) %>%    summarise(productid  = productid,              elasticity = ifelse(is.na(coef(mod)[2]), -1, coef(mod)[2]) *                            mean.price / mean.sales)  #   productid elasticity # 1         -0.9133449 # 2         b -1.0305172 

explanation

with library(dplyr) can conveniently grouped calculations:

  • %>% chaining operator, makes code more readable inputting left argument first argument function on right
  • group_by tells next commands want group column productid
  • do used calculate model , needed mean values, within do use dot . refer whole data.frame
  • summarise summarises calculation, computing elasticity

for further info check vignette("introduction").

it way no big surprise code quite slow, you use loops , on top of increment data in loop. check http://www.burns-stat.com/pages/tutor/r_inferno.pdf tutorial on common pitfalls.


Comments

Popular posts from this blog

javascript - Karma not able to start PhantomJS on Windows - Error: spawn UNKNOWN -

c# - Display ASPX Popup control in RowDeleteing Event (ASPX Gridview) -

Nuget pack csproj using nuspec -