r - Alternative to calculating Elasticity using For Loop -
i've written piece of code calculate elasticity 200k products. elasticity being calculated alright it's been more 15 hours , process still running. see new objects being created. there faster alternative doing without using loop?
below code:
sku_list <- unique(transact_data4$productid) elasticity_values <- data.frame() (i in 1:length(sku_list)){ test_sku <- subset(transact_data4, productid==sku_list[i]) m1 <- lm(formula=sales~price, data= test_sku) coeffs <- as.data.frame(m1[[1]]) gradient<- coeffs[2,1] gradient_final <- ifelse(is.na(gradient), -1, gradient) mean_price <- mean(test_sku$price) mean_sales <- mean(test_sku$sales) elasticity <- gradient_final*mean_price/mean_sales sku_elasticity <- cbind(sku_list[i],elasticity) elasticity_values <- rbind(elasticity_values,sku_elasticity) } colnames(elasticity_values)[colnames(elasticity_values)=="v1"] <- "productid"
here sample dataset:
transact_data <- data.frame(productid=c('a', 'a','a', 'a','a', 'a','b', 'b','b', 'b','b', 'b'), price=c(10, 10.5, 11, 12,10, 9, 10, 11, 13, 11,12.5, 11), sales =c(100,93,90,85,99,110,101,95,80,103,82,102), stringsasfactors=false)
result:
productid elasticity 1 -0.913344887348354 2 b -1.03051724343462
is there faster way achieve without using loop? due smaller sample (only 2 productid) runs fast. i'm trying run on 200k productid.
thank you.
code
library(dplyr) transact_data %>% group_by(productid) %>% do(mod = lm(sales ~ price, data = .), mean.price = mean(.$price), mean.sales = mean(.$sales)) %>% summarise(productid = productid, elasticity = ifelse(is.na(coef(mod)[2]), -1, coef(mod)[2]) * mean.price / mean.sales) # productid elasticity # 1 -0.9133449 # 2 b -1.0305172
explanation
with library(dplyr)
can conveniently grouped calculations:
%>%
chaining operator, makes code more readable inputting left argument first argument function on rightgroup_by
tells next commands want group columnproductid
do
used calculate model , needed mean values, withindo
use dot.
refer wholedata.frame
summarise
summarises calculation, computing elasticity
for further info check vignette("introduction")
.
it way no big surprise code quite slow, you use loops , on top of increment data in loop. check http://www.burns-stat.com/pages/tutor/r_inferno.pdf tutorial on common pitfalls.
Comments
Post a Comment