How to target promotions with conversion prediction model to maximize Net Incremental Revenue?

Every time a company chooses to promote some product through tactics like offering discounts or running a digital ad campaign, there is a certain cost as well as some potential revenue earning opportunity associated with it. If the company is not careful in choosing the right set of customers to be receiving the promotion, it can end up losing a lot of money without earning much in return.

Dataset

The dataset that I have used in this project was originally used as a take-home assignment provided by Starbucks for their job candidates. The data for this exercise consists of about 120,000 data points split in a 2:1 ratio among training and test files. In the experiment simulated by the data, an advertising promotion was tested to see if it would bring more customers to purchase a specific product priced at $10. Since it costs the company 0.15 to send out each promotion, it would be best to limit that promotion only to those that are most receptive to the promotion. Each data point includes one column indicating whether or not an individual was sent a promotion for the product, and one column indicating whether or not that individual eventually purchased that product. Each individual also has seven additional features associated with them, which are provided abstractly as V1-V7.

Goal

Our goal is to maximize the following metrics:

  • Incremental Response Rate (IRR)

IRR depicts how many more customers purchased the product with the promotion, as compared to if they didn't receive the promotion. Mathematically, it's the ratio of the number of purchasers in the promotion group to the total number of customers in the purchasers group (treatment) minus the ratio of the number of purchasers in the non-promotional group to the total number of customers in the non-promotional group (control).

$$ IRR = (\frac{N_{Treat\_Purchase}}{N_{Treat}}) - (\frac{N_{Control\_Purchase}}{N_{Control}}) $$
  • Net Incremental Revenue (NIR)

NIR depicts how much is made (or lost) by sending out the promotion. Mathematically, this is 10 times the total number of purchasers that received the promotion minus 0.15 times the number of promotions sent out, minus 10 times the number of purchasers who were not given the promotion.

$$ NIR = (R_{Treat} \cdot N_{Treat\_Purchase} - C_{Treat} \cdot N_{Treat}) - (R_{Control} \cdot N_{Control\_Purchase}) $$

In this case,

$$ NIR = (10 \cdot N_{Treat\_Purchase} - 0.15 \cdot N_{Treat}) - (10 \cdot N_{Control\_Purchase}) $$

We can make use of V1 to V7 variables available in training dataset for each person to decide whether to send promotion to that person or not. We can use various approaches that model the problem differently and predict the likelihood of person purchasing the product after receiving promotion.

How To test our Strategy?

From past data, we know there are four possible outomes:

Table of actual promotion vs. predicted promotion customers:

Actual
PredictedYesNo
YesIII
NoIIIIV

The metrics are only being compared for the individuals we predict should obtain the promotion – that is, quadrants I and II. Since the first set of individuals that receive the promotion (in the training set) receive it randomly, we can expect that quadrants I and II will have approximately equivalent participants.

Comparing quadrant I to II then gives an idea of how well your promotion strategy will work in the future.

When we feel like we have a good optimization strategy, we can complete the promotion_strategy function to be passed to the test_results function.

In [1]:
# load in packages
from itertools import combinations

from test_results import test_results, score
import numpy as np
import time
import pandas as pd
import scipy as sp
import sklearn as sk
import matplotlib.pyplot as plt
import seaborn as sb
%matplotlib inline
from xgboost import XGBClassifier
from imblearn.over_sampling import SMOTE
from sklearn.model_selection import train_test_split, cross_val_score, GridSearchCV
from sklearn.metrics import f1_score
import hyperopt
from hyperopt import fmin, tpe, hp, STATUS_OK, Trials
from hyperopt import space_eval
import pickle
from sklearn.metrics import f1_score
import numpy as np
import gc
import math
def f1_eval(y_pred, dtrain):
    y_true = dtrain.get_label()
    err = 1-f1_score(y_true, np.round(y_pred))
    return 'f1_err', err
import decimal
from sklearn.model_selection import KFold
def float_range(start, stop, step):
    start = decimal.Decimal(start)
    stop = decimal.Decimal(stop)
    step = decimal.Decimal(step)
    while (start < stop):
        yield float(start)
        start += step
# load in the data
train_data = pd.read_csv('./training.csv')
train_data.head()
Using TensorFlow backend.
Out[1]:
ID Promotion purchase V1 V2 V3 V4 V5 V6 V7
0 1 No 0 2 30.443518 -1.165083 1 1 3 2
1 3 No 0 3 32.159350 -0.645617 2 3 2 2
2 4 No 0 2 30.431659 0.133583 1 1 4 2
3 5 No 0 0 26.588914 -0.212728 2 1 4 2
4 8 Yes 0 3 28.044332 -0.385883 1 1 2 2
In [2]:
data_dir = "./data"
In [3]:
train_data.describe()
Out[3]:
ID purchase V1 V2 V3 V4 V5 V6 V7
count 84534.000000 84534.000000 84534.000000 84534.000000 84534.000000 84534.000000 84534.000000 84534.000000 84534.000000
mean 62970.972413 0.012303 1.500662 29.973600 0.000190 1.679608 2.327643 2.502898 1.701694
std 36418.440539 0.110234 0.868234 5.010626 1.000485 0.466630 0.841167 1.117349 0.457517
min 1.000000 0.000000 0.000000 7.104007 -1.684550 1.000000 1.000000 1.000000 1.000000
25% 31467.250000 0.000000 1.000000 26.591501 -0.905350 1.000000 2.000000 2.000000 1.000000
50% 62827.500000 0.000000 2.000000 29.979744 -0.039572 2.000000 2.000000 3.000000 2.000000
75% 94438.750000 0.000000 2.000000 33.344593 0.826206 2.000000 3.000000 4.000000 2.000000
max 126184.000000 1.000000 3.000000 50.375913 1.691984 2.000000 4.000000 4.000000 2.000000
In [4]:
# Cells for you to work and document as necessary - 
# definitely feel free to add more cells as you need

Exploratory analysis

In [5]:
train_data["Promotion"].value_counts()
Out[5]:
Yes    42364
No     42170
Name: Promotion, dtype: int64

If we treat the act of giving promotion as a treatment given by the company to its customers and those that were not given promotion as the control group, then we can see that there is nearly equal number of customers that belong to both the groups.

Checking for the distribution of values of target variable of our interest

In [6]:
train_data["purchase"].value_counts()
Out[6]:
0    83494
1     1040
Name: purchase, dtype: int64

It is clear from this numbers that there is a high imbalance in the number of customers who chose to purchase the product vs those who didn't. We need to take care of this while using this dataset for trainin the machine learning algorithm by using some technique like oversampling from under represented (minority) value 1 for target variable purchase. SMOTE is one useful technique that generates balanced dataset for training purpose while also introducing some variations in the input variables while oversampling the data with minority target value.

Approach 1

Predicting if the customer will make the purchase only after receiving at the promotion.

We are given dataset that includes customers that have been given and not given the promotion. As it costs the company 1.5$ to promote to the customer, it will try to avoid promoting it to the customers who are:

  • Not likely to purchase even after receiving the promotion
  • Are going to purchase even without receiving the promotion

Company is interested in giving the promotion to the customers who are likely to make the purchase only after receiving the promotion. The job of the preditive model is to predict whether the given customer falls into this category. If yes, then our algorithm will suggest the company to give promotion to that customer, otherwise it won't suggest to give promotion to that customer.

A statistical model can be trained to decide whether to give customer the promotion or not by training it with dataset where each customer is labeled as 1 in the output variable if he has been shown promotion and has purchased to product, and 0 for the rest of the scenarios. We can name this new variable as response as it indicates whether the customer resonded positively to our promotion.

In [95]:
train_data_1 = train_data.copy()
In [96]:
train_data_1["response"] = (train_data_1["Promotion"] == "Yes") & (train_data_1["purchase"] == 1)
In [97]:
features = ["V"+str(x) for x in range(1,8)]
In [98]:
X = train_data_1[features]
In [99]:
Y = train_data_1["response"]
In [100]:
Y.value_counts()
Out[100]:
False    83813
True       721
Name: response, dtype: int64
In [20]:
X_train, X_valid, Y_train, Y_valid = train_test_split(X, Y, test_size=0.2, random_state=42)

Generating balanced training dataset using Synthetic Minority Over-sampling Technique (SMOTE)

In [21]:
sm = SMOTE(random_state=42, ratio=1.0)
In [22]:
X_balanced_train, Y_balanced_train = sm.fit_resample(X_train, Y_train)

Converting back to dataframe and series

In [23]:
X_balanced_train = pd.DataFrame(X_balanced_train, columns=features)
In [24]:
X_balanced_train.columns
Out[24]:
Index(['V1', 'V2', 'V3', 'V4', 'V5', 'V6', 'V7'], dtype='object')
In [25]:
Y_balanced_train = pd.Series(Y_balanced_train)
In [29]:
cv = GridSearchCV(estimator=XGBClassifier(), param_grid={
        "max_depth": range(5,8,1),
        "min_child_weight": [5, 10, 20, 50],
        "gamma": [0, 0.1, 0.2],
        "random_state": [42],
        "n_estimators": [1000]
        },         
        scoring="f1", cv=3)


start_time = time.time()
fit_params= {
            "eval_set": [(X_valid, Y_valid)],
            "eval_metric": f1_eval,
            "early_stopping_rounds":20,
            "verbose": 0
        }
cv.fit(X_balanced_train, Y_balanced_train, **fit_params)
elapsed_time = (time.time() - start_time) / 60
print('Elapsed computation time: {:.3f} mins'.format(elapsed_time))
Elapsed computation time: 6.387 mins
In [30]:
cv.best_params_
Out[30]:
{'gamma': 0.2,
 'max_depth': 7,
 'min_child_weight': 5,
 'n_estimators': 1000,
 'random_state': 42}

This will help us deciding number of estimators.

In [31]:
xgb = XGBClassifier(n_estimators=1000)
best_params_xgb = cv.best_params_
xgb.set_params(**best_params_xgb)
xgb.fit(X=X_balanced_train, y=Y_balanced_train.values.ravel(), eval_set=[(X_valid, Y_valid)], eval_metric=f1_eval, early_stopping_rounds=10, verbose=10)
[0]	validation_0-error:0.290235	validation_0-f1_err:0.972261
Multiple eval metrics have been passed: 'validation_0-f1_err' will be used for early stopping.

Will train until validation_0-f1_err hasn't improved in 10 rounds.
[10]	validation_0-error:0.14497	validation_0-f1_err:0.974553
Stopping. Best iteration:
[7]	validation_0-error:0.170521	validation_0-f1_err:0.971689

Out[31]:
XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, gamma=0.2,
              learning_rate=0.1, max_delta_step=0, max_depth=7,
              min_child_weight=5, missing=None, n_estimators=1000, n_jobs=1,
              nthread=None, objective='binary:logistic', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=None,
              silent=None, subsample=1, verbosity=1)
In [32]:
optimal_n_estimators = xgb.best_ntree_limit

We have found out optimal max_depth and number of estimators for XGBoost algorithm for our case. Train the XGBoost on entire training dataset for using it in promotion strategy.

In [33]:
X_balanced, Y_balanced = sm.fit_sample(X,Y)
X_balanced = pd.DataFrame(X_balanced, columns=features)
Y_balanced = pd.Series(Y_balanced)
In [34]:
xgb = XGBClassifier(max_depth=best_params_xgb["max_depth"],
                    gamma=best_params_xgb["gamma"],
                    min_child_weight=best_params_xgb["min_child_weight"],
                    n_estimators=optimal_n_estimators,
                    random_state=42)
xgb.fit(X_balanced, Y_balanced)
Out[34]:
XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, gamma=0.2,
              learning_rate=0.1, max_delta_step=0, max_depth=7,
              min_child_weight=5, missing=None, n_estimators=8, n_jobs=1,
              nthread=None, objective='binary:logistic', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=None,
              silent=None, subsample=1, verbosity=1)
In [35]:
pickle.dump(xgb, open(data_dir + '/xgb_best_approach_1.pkl', 'wb'))
In [36]:
model = pickle.load(open(data_dir + "/xgb_best_approach_1.pkl", 'rb'))
In [37]:
def promotion_strategy(df):
    '''
    INPUT 
    df - a dataframe with *only* the columns V1 - V7 (same as train_data)

    OUTPUT
    promotion_df - np.array with the values
                   'Yes' or 'No' related to whether or not an 
                   individual should recieve a promotion 
                   should be the length of df.shape[0]
                
    Ex:
    INPUT: df
    
    V1	V2	  V3	V4	V5	V6	V7
    2	30	-1.1	1	1	3	2
    3	32	-0.6	2	3	2	2
    2	30	0.13	1	1	4	2
    
    OUTPUT: promotion
    
    array(['Yes', 'Yes', 'No'])
    indicating the first two users would recieve the promotion and 
    the last should not.
    '''
    
    test = df
    
    preds = model.predict(test)
    promotion = []
    for pred in preds:
        if pred:
            promotion.append('Yes')
        else:
            promotion.append('No')
    promotion = np.array(promotion)
    
    
    return promotion
In [38]:
test_results(promotion_strategy)
Nice job!  See how well your strategy worked on our test data below!

Your irr with this strategy is 0.0206.

Your nir with this strategy is 259.25.
We came up with a model with an irr of 0.0188 and an nir of 189.45 on the test set.

 How did you do?
Out[38]:
(0.020606371512773836, 259.25)

Approach 2

  • Indicating whether a person has received promotion as input variable and training a single model to predict whether the person will make purchase or not.
  • For the purpose of deciding whether to send promotion to the person, we can first calculate the probability of person making purchase after receiving promotion and without receiving promotion by senting promotion input variable as 1 or 0 respectively and calculating the difference between the two probabailities. If the difference turns out to be greater than some threshold value, in that case we can send promotion to the person.
  • The value of threshold can be decided using Hyper Parameter optimization technique while using NIR formula for calcualting the return value of the objective function to be minimized i.e. the score can be -NIR.
  • Here I have used log loss as evaluation metric while tuning hyper parameters of XGBoost as I am more interested in calculating accurate probabilities of person giving certain response than just predicting right response.
In [64]:
train_data_1 = train_data.copy()
In [65]:
train_data_1["response"] = train_data_1["purchase"] == 1
In [66]:
train_data_1["response"].unique()
Out[66]:
array([False,  True])
In [67]:
features = ["V"+str(x) for x in range(1,8)] + ["Promotion"]
In [68]:
# X = pd.concat([train_data_1[features],pd.get_dummies(train_data_1["Promotion"])], axis=1)
In [69]:
X = pd.get_dummies(train_data_1[features])
In [70]:
X.shape
Out[70]:
(84534, 9)
In [71]:
features=X.columns
In [72]:
Y = train_data_1["response"]
In [94]:
Y.value_counts()
Out[94]:
False    83494
True      1040
Name: response, dtype: int64
In [73]:
X_train, X_valid, Y_train, Y_valid = train_test_split(X, Y, test_size=0.2, random_state=42)

Generating balanced training dataset using Synthetic Minority Over-sampling Technique (SMOTE)

In [74]:
sm = SMOTE(random_state=42, ratio=1.0)
In [75]:
X_balanced_train, Y_balanced_train = sm.fit_resample(X_train, Y_train)

Converting back to dataframe and series

In [76]:
X_balanced_train = pd.DataFrame(X_balanced_train, columns=features)
In [77]:
X_balanced_train.columns
Out[77]:
Index(['V1', 'V2', 'V3', 'V4', 'V5', 'V6', 'V7', 'Promotion_No',
       'Promotion_Yes'],
      dtype='object')
In [78]:
Y_balanced_train = pd.Series(Y_balanced_train)
In [80]:
cv = GridSearchCV(estimator=XGBClassifier(), param_grid={
        "max_depth": range(5,8,1),
        "min_child_weight": [5, 10, 20, 50],
        "gamma": [0, 0.1, 0.2],
        "random_state": [42],
        "n_estimators": [1000]
        },
        scoring="f1",
         cv=3)


start_time = time.time()
fit_params= {
            "eval_set": [(X_valid, Y_valid)],
            "eval_metric": "logloss",
            "early_stopping_rounds":20,
            "verbose": 0
        }
cv.fit(X_balanced_train, Y_balanced_train, **fit_params)
elapsed_time = (time.time() - start_time) / 60
print('Elapsed computation time: {:.3f} mins'.format(elapsed_time))
Elapsed computation time: 144.219 mins
In [81]:
cv.best_params_
Out[81]:
{'gamma': 0.1,
 'max_depth': 6,
 'min_child_weight': 10,
 'n_estimators': 1000,
 'random_state': 42}
In [82]:
# This will help us deciding number of estimators
xgb = XGBClassifier(n_estimators=1000)
best_params_xgb = cv.best_params_
xgb.set_params(**best_params_xgb)
xgb.fit(X=X_balanced_train, y=Y_balanced_train.values.ravel(), eval_set=[(X_valid, Y_valid)], eval_metric="logloss", early_stopping_rounds=10, verbose=10)
[0]	validation_0-logloss:0.651736
Will train until validation_0-logloss hasn't improved in 10 rounds.
[10]	validation_0-logloss:0.467548
[20]	validation_0-logloss:0.361545
[30]	validation_0-logloss:0.298454
[40]	validation_0-logloss:0.25701
[50]	validation_0-logloss:0.230457
[60]	validation_0-logloss:0.211893
[70]	validation_0-logloss:0.19325
[80]	validation_0-logloss:0.179203
[90]	validation_0-logloss:0.163888
[100]	validation_0-logloss:0.152161
[110]	validation_0-logloss:0.139522
[120]	validation_0-logloss:0.132642
[130]	validation_0-logloss:0.124634
[140]	validation_0-logloss:0.117031
[150]	validation_0-logloss:0.110356
[160]	validation_0-logloss:0.104467
[170]	validation_0-logloss:0.100311
[180]	validation_0-logloss:0.094953
[190]	validation_0-logloss:0.090794
[200]	validation_0-logloss:0.088276
[210]	validation_0-logloss:0.085795
[220]	validation_0-logloss:0.084368
[230]	validation_0-logloss:0.082699
[240]	validation_0-logloss:0.080333
[250]	validation_0-logloss:0.079036
[260]	validation_0-logloss:0.077853
[270]	validation_0-logloss:0.076522
[280]	validation_0-logloss:0.075776
[290]	validation_0-logloss:0.075155
[300]	validation_0-logloss:0.074299
[310]	validation_0-logloss:0.073751
[320]	validation_0-logloss:0.073337
[330]	validation_0-logloss:0.073036
[340]	validation_0-logloss:0.072558
[350]	validation_0-logloss:0.072308
[360]	validation_0-logloss:0.072077
[370]	validation_0-logloss:0.071781
[380]	validation_0-logloss:0.07163
[390]	validation_0-logloss:0.071283
[400]	validation_0-logloss:0.071203
[410]	validation_0-logloss:0.071086
[420]	validation_0-logloss:0.07099
[430]	validation_0-logloss:0.070879
[440]	validation_0-logloss:0.070729
[450]	validation_0-logloss:0.070651
[460]	validation_0-logloss:0.070625
[470]	validation_0-logloss:0.070538
[480]	validation_0-logloss:0.070495
[490]	validation_0-logloss:0.070458
[500]	validation_0-logloss:0.070361
[510]	validation_0-logloss:0.070321
[520]	validation_0-logloss:0.070278
[530]	validation_0-logloss:0.070228
Stopping. Best iteration:
[527]	validation_0-logloss:0.070218

Out[82]:
XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, gamma=0.1,
              learning_rate=0.1, max_delta_step=0, max_depth=6,
              min_child_weight=10, missing=None, n_estimators=1000, n_jobs=1,
              nthread=None, objective='binary:logistic', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=None,
              silent=None, subsample=1, verbosity=1)
In [83]:
optimal_n_estimators = xgb.best_ntree_limit

We have found out optimal max_depth and number of estimators for XGBoost algorithm for our case. Train the XGBoost on entire training dataset for using it in promotion strategy.

In [84]:
X_balanced, Y_balanced = sm.fit_sample(X,Y)
X_balanced = pd.DataFrame(X_balanced, columns=features)
Y_balanced = pd.Series(Y_balanced)
In [85]:
xgb = XGBClassifier(max_depth=best_params_xgb["max_depth"],
                    gamma=best_params_xgb["gamma"],
                    min_child_weight=best_params_xgb["min_child_weight"],
                    n_estimators=optimal_n_estimators,
                    random_state=42)
xgb.fit(X_balanced, Y_balanced)
Out[85]:
XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, gamma=0.1,
              learning_rate=0.1, max_delta_step=0, max_depth=6,
              min_child_weight=10, missing=None, n_estimators=528, n_jobs=1,
              nthread=None, objective='binary:logistic', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=None,
              silent=None, subsample=1, verbosity=1)
In [86]:
pickle.dump(xgb, open(data_dir + '/xgb_best_approach_2.pkl', 'wb'))
In [87]:
model = pickle.load(open(data_dir + "/xgb_best_approach_2.pkl", 'rb'))

We define diff as difference in the probabilities of person purchasing the product with and without receiving promotion. We have to choose the threshold value and if the diff is higher than that threshold value than we can choose to show promotion to that person. To decide the value of threshold that maximizes the NIR for given prediction model, I evaluate the thresholds in range from 0 to 0.1 by calculating the mean of NIR on 10 folds of the validation dataset and choose threshold value with maximum NIR.

In [88]:
def evaluate(X, Y, diff_threshold, after_promotion_purchase_prob_threshold):
    def score(df, promo_pred_col = 'Promotion'):
        n_treat       = df.loc[df[promo_pred_col] == 'Yes',:].shape[0]
        n_control     = df.loc[df[promo_pred_col] == 'No',:].shape[0]
        n_treat_purch = df.loc[df[promo_pred_col] == 'Yes', 'purchase'].sum()
        n_ctrl_purch  = df.loc[df[promo_pred_col] == 'No', 'purchase'].sum()
        nir = 10 * n_treat_purch - 0.15 * n_treat - 10 * n_ctrl_purch
        return nir
    
    nir_scores = []
    kf = KFold(n_splits=10, random_state=42)
    for train_index, test_index in kf.split(X):
        X_train, X_valid = X.loc[train_index], X.loc[test_index]
        Y_train, Y_valid = Y.loc[train_index], Y.loc[test_index]
        
        # As we have already trained the hyper parameters for XGBoost, we need not train it again here
        # we can use the trained model, to calculate score for given threshold value
        model = pickle.load(open(data_dir + "/xgb_best_approach_2.pkl", 'rb'))
        
        X_valid_with_promo = X_valid.copy()
        # predict probability of purchase with promotion
        X_valid_with_promo["Promotion_Yes"] = 1
        X_valid_with_promo["Promotion_No"] = 0
        probs_with_promotion = model.predict_proba(X_valid_with_promo)[:, 1]

        # predict probability of purchase without promotion
        X_valid_with_promo["Promotion_Yes"] = 0
        X_valid_with_promo["Promotion_No"] = 1

        probs_without_promotion = model.predict_proba(X_valid_with_promo)[:, 1]

        # calculate the difference as diff
        diff = probs_with_promotion - probs_without_promotion

        # if diff is above threshold choose to promote else don't
        promos = (probs_with_promotion > after_promotion_purchase_prob_threshold) & (diff > diff_threshold)
        val_data = X_valid.copy()
        val_data["Promotion"] = "No"
        val_data.loc[val_data["Promotion_Yes"] == 1, "Promotion"] = "Yes"
        val_data["purchase"] = Y_valid.copy()
        score_df = val_data.iloc[np.where(promos)]
        nir = score(score_df)
        nir_scores.append(nir)
    return np.asscalar(np.mean(nir_scores))
In [89]:
(X_valid.index == Y_valid.index).all()
Out[89]:
True
In [90]:
evaluated_point_scores = {}

def objective_threshold(params):
    if (str(params) in evaluated_point_scores):
        return evaluated_point_scores[str(params)]
    else:
        print(params)
        diff_threshold = params["diff_threshold"]
        after_promotion_purchase_prob_threshold = params["after_promotion_purchase_prob_threshold"]
        nir_score = evaluate(X=X_valid, Y=Y_valid, 
                             diff_threshold=diff_threshold, 
                             after_promotion_purchase_prob_threshold=after_promotion_purchase_prob_threshold)
        print("nir: " + str(nir_score))        
        evaluated_point_scores[str(params)] = -nir_score
        return -nir_score

param_space = {
    "diff_threshold": hp.choice("diff_threshold", list(float_range("0.02", "0.04", "0.001"))),
    "after_promotion_purchase_prob_threshold": hp.choice("after_promotion_purchase_prob_threshold", list(float_range("0.0", "1.0", "0.1")))
}

start_time = time.time()
best_params_threshold = space_eval(
    param_space, 
    fmin(objective_threshold, 
         param_space, 
         algo=hyperopt.tpe.suggest,
         max_evals=200))
print(best_params_threshold)
elapsed_time = (time.time() - start_time) / 60
print('Elapsed computation time: {:.3f} mins'.format(elapsed_time))
best_diff_threshold = best_params_threshold["diff_threshold"]
best_after_promotion_purchase_prob_threshold = best_params_threshold["after_promotion_purchase_prob_threshold"]
{'after_promotion_purchase_prob_threshold': 0.6, 'diff_threshold': 0.037}
  0%|          | 0/200 [00:00<?, ?it/s, best loss: ?]
/Users/atharva/.pyenv/versions/anaconda3-2019.03/lib/python3.7/site-packages/ipykernel_launcher.py:13: FutureWarning: 
Passing list-likes to .loc or [] with any missing label will raise
KeyError in the future, you can use .reindex() as an alternative.

See the documentation here:
https://pandas.pydata.org/pandas-docs/stable/indexing.html#deprecate-loc-reindex-listlike
  del sys.path[0]

/Users/atharva/.pyenv/versions/anaconda3-2019.03/lib/python3.7/site-packages/ipykernel_launcher.py:14: FutureWarning: 
Passing list-likes to .loc or [] with any missing label will raise
KeyError in the future, you can use .reindex() as an alternative.

See the documentation here:
https://pandas.pydata.org/pandas-docs/stable/indexing.html#deprecate-loc-reindex-listlike
  

nir: 0.985                                           
{'after_promotion_purchase_prob_threshold': 0.2, 'diff_threshold': 0.037}
nir: 2.91                                                         
{'after_promotion_purchase_prob_threshold': 0.0, 'diff_threshold': 0.02}
nir: 13.084999999999999                                          
{'after_promotion_purchase_prob_threshold': 0.2, 'diff_threshold': 0.031}      
nir: 2.91                                                                      
{'after_promotion_purchase_prob_threshold': 0.8, 'diff_threshold': 0.031}      
nir: 0.0                                                                       
{'after_promotion_purchase_prob_threshold': 0.1, 'diff_threshold': 0.027}      
nir: 13.01                                                                     
{'after_promotion_purchase_prob_threshold': 0.6, 'diff_threshold': 0.023}      
nir: 0.985                                                                     
{'after_promotion_purchase_prob_threshold': 0.2, 'diff_threshold': 0.026}      
nir: 2.91                                                                      
{'after_promotion_purchase_prob_threshold': 0.4, 'diff_threshold': 0.02}       
nir: 1.97                                                                      
{'after_promotion_purchase_prob_threshold': 0.0, 'diff_threshold': 0.031}      
nir: 12.725                                                                    
{'after_promotion_purchase_prob_threshold': 0.0, 'diff_threshold': 0.025}       
nir: 12.51                                                                      
{'after_promotion_purchase_prob_threshold': 0.4, 'diff_threshold': 0.038}       
nir: 1.97                                                                       
{'after_promotion_purchase_prob_threshold': 0.8, 'diff_threshold': 0.021}       
nir: 0.0                                                                        
{'after_promotion_purchase_prob_threshold': 0.8, 'diff_threshold': 0.039}       
nir: 0.0                                                                        
{'after_promotion_purchase_prob_threshold': 0.3, 'diff_threshold': 0.039}       
nir: 1.97                                                                       
{'after_promotion_purchase_prob_threshold': 0.9, 'diff_threshold': 0.028}       
nir: 0.0                                                                        
{'after_promotion_purchase_prob_threshold': 0.6, 'diff_threshold': 0.027}       
nir: 0.985                                                                      
{'after_promotion_purchase_prob_threshold': 0.3, 'diff_threshold': 0.028}       
nir: 1.97                                                                       
{'after_promotion_purchase_prob_threshold': 0.8, 'diff_threshold': 0.036}       
nir: 0.0                                                                        
{'after_promotion_purchase_prob_threshold': 0.1, 'diff_threshold': 0.03}        
nir: 13.01                                                                      
{'after_promotion_purchase_prob_threshold': 0.1, 'diff_threshold': 0.022}       
nir: 13.01                                                                      
{'after_promotion_purchase_prob_threshold': 0.5, 'diff_threshold': 0.033}       
nir: 0.985                                                                      
{'after_promotion_purchase_prob_threshold': 0.7, 'diff_threshold': 0.03}        
nir: 0.0                                                                        
{'after_promotion_purchase_prob_threshold': 0.0, 'diff_threshold': 0.034}       
nir: 13.219999999999999                                                         
{'after_promotion_purchase_prob_threshold': 0.7, 'diff_threshold': 0.029}       
nir: 0.0                                                                        
{'after_promotion_purchase_prob_threshold': 0.0, 'diff_threshold': 0.024}       
nir: 12.315000000000001                                                         
{'after_promotion_purchase_prob_threshold': 0.5, 'diff_threshold': 0.034}       
nir: 0.985                                                                      
{'after_promotion_purchase_prob_threshold': 0.9, 'diff_threshold': 0.035}       
nir: 0.0                                                                        
{'after_promotion_purchase_prob_threshold': 0.0, 'diff_threshold': 0.032}       
nir: 12.830000000000002                                                         
{'after_promotion_purchase_prob_threshold': 0.0, 'diff_threshold': 0.037}       
nir: 14.535                                                                     
{'after_promotion_purchase_prob_threshold': 0.4, 'diff_threshold': 0.037}       
nir: 1.97                                                           
{'after_promotion_purchase_prob_threshold': 0.9, 'diff_threshold': 0.023}
nir: 0.0                                                            
{'after_promotion_purchase_prob_threshold': 0.5, 'diff_threshold': 0.026}
nir: 0.985                                                          
{'after_promotion_purchase_prob_threshold': 0.7, 'diff_threshold': 0.025}
nir: 0.0                                                            
{'after_promotion_purchase_prob_threshold': 0.0, 'diff_threshold': 0.038}
nir: 12.715                                                         
{'after_promotion_purchase_prob_threshold': 0.6, 'diff_threshold': 0.031}
nir: 0.985                                                          
{'after_promotion_purchase_prob_threshold': 0.0, 'diff_threshold': 0.021}
nir: 12.430000000000001                                             
{'after_promotion_purchase_prob_threshold': 0.8, 'diff_threshold': 0.032}
nir: 0.0                                                            
{'after_promotion_purchase_prob_threshold': 0.3, 'diff_threshold': 0.029}
nir: 1.97                                                           
{'after_promotion_purchase_prob_threshold': 0.2, 'diff_threshold': 0.036}
nir: 2.91                                                           
{'after_promotion_purchase_prob_threshold': 0.0, 'diff_threshold': 0.027}
nir: 13.945000000000002                                             
{'after_promotion_purchase_prob_threshold': 0.9, 'diff_threshold': 0.033}
nir: 0.0                                                            
{'after_promotion_purchase_prob_threshold': 0.4, 'diff_threshold': 0.035}
nir: 1.97                                                           
{'after_promotion_purchase_prob_threshold': 0.1, 'diff_threshold': 0.024}
nir: 13.01                                                          
{'after_promotion_purchase_prob_threshold': 0.8, 'diff_threshold': 0.037}
nir: 0.0                                                            
{'after_promotion_purchase_prob_threshold': 0.5, 'diff_threshold': 0.023}
nir: 0.985                                                          
{'after_promotion_purchase_prob_threshold': 0.7, 'diff_threshold': 0.026}
nir: 0.0                                                            
{'after_promotion_purchase_prob_threshold': 0.6, 'diff_threshold': 0.025}
nir: 0.985                                                          
{'after_promotion_purchase_prob_threshold': 0.0, 'diff_threshold': 0.03}
nir: 13.560000000000002                                             
{'after_promotion_purchase_prob_threshold': 0.9, 'diff_threshold': 0.039}
nir: 0.0                                                            
{'after_promotion_purchase_prob_threshold': 0.3, 'diff_threshold': 0.036}
nir: 1.97                                                           
{'after_promotion_purchase_prob_threshold': 0.5, 'diff_threshold': 0.027}
nir: 0.985                                                          
{'after_promotion_purchase_prob_threshold': 0.7, 'diff_threshold': 0.037}
nir: 0.0                                                            
{'after_promotion_purchase_prob_threshold': 0.8, 'diff_threshold': 0.033}
nir: 0.0                                                            
{'after_promotion_purchase_prob_threshold': 0.4, 'diff_threshold': 0.029}
nir: 1.97                                                           
{'after_promotion_purchase_prob_threshold': 0.6, 'diff_threshold': 0.032}
nir: 0.985                                                          
{'after_promotion_purchase_prob_threshold': 0.0, 'diff_threshold': 0.035}
nir: 13.325                                                         
{'after_promotion_purchase_prob_threshold': 0.1, 'diff_threshold': 0.037}
nir: 13.01                                                          
{'after_promotion_purchase_prob_threshold': 0.0, 'diff_threshold': 0.028}
nir: 14.184999999999999                                             
{'after_promotion_purchase_prob_threshold': 0.2, 'diff_threshold': 0.03}
nir: 2.91                                                           
{'after_promotion_purchase_prob_threshold': 0.9, 'diff_threshold': 0.037}
nir: 0.0                                                            
{'after_promotion_purchase_prob_threshold': 0.0, 'diff_threshold': 0.022}
nir: 12.805000000000001                                             
{'after_promotion_purchase_prob_threshold': 0.5, 'diff_threshold': 0.039}
nir: 0.985                                                          
{'after_promotion_purchase_prob_threshold': 0.8, 'diff_threshold': 0.026}
nir: 0.0                                                            
{'after_promotion_purchase_prob_threshold': 0.7, 'diff_threshold': 0.02}
nir: 0.0                                                            
{'after_promotion_purchase_prob_threshold': 0.3, 'diff_threshold': 0.025}
nir: 1.97                                                           
{'after_promotion_purchase_prob_threshold': 0.0, 'diff_threshold': 0.023}
nir: 13.0                                                           
{'after_promotion_purchase_prob_threshold': 0.6, 'diff_threshold': 0.038}
nir: 0.985                                                          
{'after_promotion_purchase_prob_threshold': 0.1, 'diff_threshold': 0.036}
nir: 13.01                                                          
{'after_promotion_purchase_prob_threshold': 0.2, 'diff_threshold': 0.027}
nir: 2.91                                                           
{'after_promotion_purchase_prob_threshold': 0.9, 'diff_threshold': 0.034}
nir: 0.0                                                            
{'after_promotion_purchase_prob_threshold': 0.5, 'diff_threshold': 0.037}
nir: 0.985                                                          
{'after_promotion_purchase_prob_threshold': 0.0, 'diff_threshold': 0.029}
nir: 13.409999999999997                                              
{'after_promotion_purchase_prob_threshold': 0.3, 'diff_threshold': 0.024}
nir: 1.97                                                            
{'after_promotion_purchase_prob_threshold': 0.7, 'diff_threshold': 0.035}
nir: 0.0                                                             
{'after_promotion_purchase_prob_threshold': 0.4, 'diff_threshold': 0.028}
nir: 1.97                                                            
{'after_promotion_purchase_prob_threshold': 0.9, 'diff_threshold': 0.03}
nir: 0.0                                                             
{'after_promotion_purchase_prob_threshold': 0.7, 'diff_threshold': 0.021}
nir: 0.0                                                             
{'after_promotion_purchase_prob_threshold': 0.1, 'diff_threshold': 0.034}
nir: 13.01                                                           
{'after_promotion_purchase_prob_threshold': 0.0, 'diff_threshold': 0.036}
nir: 13.415000000000001                                              
{'after_promotion_purchase_prob_threshold': 0.2, 'diff_threshold': 0.029}
nir: 2.91                                                            
{'after_promotion_purchase_prob_threshold': 0.5, 'diff_threshold': 0.032}
nir: 0.985                                                           
{'after_promotion_purchase_prob_threshold': 0.8, 'diff_threshold': 0.024}
nir: 0.0                                                             
{'after_promotion_purchase_prob_threshold': 0.6, 'diff_threshold': 0.03}
nir: 0.985                                                           
{'after_promotion_purchase_prob_threshold': 0.1, 'diff_threshold': 0.026}
nir: 13.01                                                           
{'after_promotion_purchase_prob_threshold': 0.2, 'diff_threshold': 0.039}
nir: 2.91                                                            
{'after_promotion_purchase_prob_threshold': 0.9, 'diff_threshold': 0.038}
nir: 0.0                                                             
{'after_promotion_purchase_prob_threshold': 0.3, 'diff_threshold': 0.021}
nir: 1.97                                                            
{'after_promotion_purchase_prob_threshold': 0.4, 'diff_threshold': 0.036}
nir: 1.97                                                            
{'after_promotion_purchase_prob_threshold': 0.0, 'diff_threshold': 0.033}
nir: 12.934999999999999                                              
{'after_promotion_purchase_prob_threshold': 0.1, 'diff_threshold': 0.032}
nir: 13.01                                                           
{'after_promotion_purchase_prob_threshold': 0.2, 'diff_threshold': 0.024}
nir: 2.91                                                            
{'after_promotion_purchase_prob_threshold': 0.5, 'diff_threshold': 0.022}
nir: 0.985                                                           
{'after_promotion_purchase_prob_threshold': 0.8, 'diff_threshold': 0.03}
nir: 0.0                                                             
{'after_promotion_purchase_prob_threshold': 0.6, 'diff_threshold': 0.02}
nir: 0.985                                                           
{'after_promotion_purchase_prob_threshold': 0.1, 'diff_threshold': 0.031}
nir: 13.01                                                           
{'after_promotion_purchase_prob_threshold': 0.0, 'diff_threshold': 0.026}
nir: 13.825                                                          
{'after_promotion_purchase_prob_threshold': 0.9, 'diff_threshold': 0.027}
nir: 0.0                                                             
{'after_promotion_purchase_prob_threshold': 0.5, 'diff_threshold': 0.029}
nir: 0.985                                                           
{'after_promotion_purchase_prob_threshold': 0.8, 'diff_threshold': 0.034}
nir: 0.0                                                             
{'after_promotion_purchase_prob_threshold': 0.3, 'diff_threshold': 0.033}
nir: 1.97                                                            
{'after_promotion_purchase_prob_threshold': 0.4, 'diff_threshold': 0.032}
nir: 1.97                                                            
{'after_promotion_purchase_prob_threshold': 0.6, 'diff_threshold': 0.024}
nir: 0.985                                                           
{'after_promotion_purchase_prob_threshold': 0.8, 'diff_threshold': 0.025}
nir: 0.0                                                             
{'after_promotion_purchase_prob_threshold': 0.7, 'diff_threshold': 0.038}
nir: 0.0                                                             
{'after_promotion_purchase_prob_threshold': 0.3, 'diff_threshold': 0.02}
nir: 1.97                                                            
{'after_promotion_purchase_prob_threshold': 0.6, 'diff_threshold': 0.021}
nir: 0.985                                                           
{'after_promotion_purchase_prob_threshold': 0.4, 'diff_threshold': 0.031}
nir: 1.97                                                            
100%|██████████| 200/200 [01:52<00:00,  1.78it/s, best loss: -14.535]
{'after_promotion_purchase_prob_threshold': 0.0, 'diff_threshold': 0.037}
Elapsed computation time: 1.872 mins
In [101]:
def promotion_strategy(df):
    '''
    INPUT 
    df - a dataframe with *only* the columns V1 - V7 (same as train_data)

    OUTPUT
    promotion_df - np.array with the values
                   'Yes' or 'No' related to whether or not an 
                   individual should recieve a promotion 
                   should be the length of df.shape[0]
                
    Ex:
    INPUT: df
    
    V1	V2	  V3	V4	V5	V6	V7
    2	30	-1.1	1	1	3	2
    3	32	-0.6	2	3	2	2
    2	30	0.13	1	1	4	2
    
    OUTPUT: promotion
    
    array(['Yes', 'Yes', 'No'])
    indicating the first two users would recieve the promotion and 
    the last should not.
    '''
    X = df.copy()
    # predict probability of purchase with promotion

    X["Promotion_No"] = 0
    X["Promotion_Yes"] = 1
    probs_with_promotion = model.predict_proba(X)[:, 1]


    # predict probability of purchase without promotion
    
    X["Promotion_No"] = 1
    X["Promotion_Yes"] = 0
    probs_without_promotion = model.predict_proba(X)[:, 1]

    # calculate the difference as diff
    diff = probs_with_promotion - probs_without_promotion        

    should_promote = pd.DataFrame() 
    should_promote["promo"] = (probs_with_promotion > best_after_promotion_purchase_prob_threshold) & (diff > best_diff_threshold)
    
    should_promote.loc[diff >= best_diff_threshold, "promo"] = "Yes"
    should_promote.loc[diff < best_diff_threshold, "promo"] = "No"    
    return should_promote["promo"].to_numpy(dtype="str")
In [102]:
test_results(promotion_strategy)
Nice job!  See how well your strategy worked on our test data below!

Your irr with this strategy is 0.0188.

Your nir with this strategy is 98.30.
We came up with a model with an irr of 0.0188 and an nir of 189.45 on the test set.

 How did you do?
Out[102]:
(0.018826981638688733, 98.30000000000001)

Approach 3 (Can be tried on this)

We can try the two models approach that is commonly recommended on literature related to uplift measurement. In this approach, we create one model for people who have received the promotion and another model for those who haven't received it. Each model predicts whether the person would purchase the product. The difference between the probability predicted by first model and second model is to be considered for deciding wether to promote to to that person or not.

Caveat here is that the error of prediction can get doubled as we are using two separate models. Also the scale of the probabilities predicted by two models may not be the same.

In [ ]: