INN Hotels Project¶

Context¶

A significant number of hotel bookings are called-off due to cancellations or no-shows. The typical reasons for cancellations include change of plans, scheduling conflicts, etc. This is often made easier by the option to do so free of charge or preferably at a low cost which is beneficial to hotel guests but it is a less desirable and possibly revenue-diminishing factor for hotels to deal with. Such losses are particularly high on last-minute cancellations.

The new technologies involving online booking channels have dramatically changed customers’ booking possibilities and behavior. This adds a further dimension to the challenge of how hotels handle cancellations, which are no longer limited to traditional booking and guest characteristics.

The cancellation of bookings impact a hotel on various fronts:

Loss of resources (revenue) when the hotel cannot resell the room.
Additional costs of distribution channels by increasing commissions or paying for publicity to help sell these rooms.
Lowering prices last minute, so the hotel can resell a room, resulting in reducing the profit margin.
Human resources to make arrangements for the guests.

Objective¶

The increasing number of cancellations calls for a Machine Learning based solution that can help in predicting which booking is likely to be canceled. INN Hotels Group has a chain of hotels in Portugal, they are facing problems with the high number of booking cancellations and have reached out to your firm for data-driven solutions. You as a data scientist have to analyze the data provided to find which factors have a high influence on booking cancellations, build a predictive model that can predict which booking is going to be canceled in advance, and help in formulating profitable policies for cancellations and refunds.

Data Description¶

The data contains the different attributes of customers' booking details. The detailed data dictionary is given below.

Data Dictionary

Booking_ID: unique identifier of each booking
no_of_adults: Number of adults
no_of_children: Number of Children
no_of_weekend_nights: Number of weekend nights (Saturday or Sunday) the guest stayed or booked to stay at the hotel
no_of_week_nights: Number of week nights (Monday to Friday) the guest stayed or booked to stay at the hotel
type_of_meal_plan: Type of meal plan booked by the customer:
- Not Selected – No meal plan selected
- Meal Plan 1 – Breakfast
- Meal Plan 2 – Half board (breakfast and one other meal)
- Meal Plan 3 – Full board (breakfast, lunch, and dinner)
required_car_parking_space: Does the customer require a car parking space? (0 - No, 1- Yes)
room_type_reserved: Type of room reserved by the customer. The values are ciphered (encoded) by INN Hotels.
lead_time: Number of days between the date of booking and the arrival date
arrival_year: Year of arrival date
arrival_month: Month of arrival date
arrival_date: Date of the month
market_segment_type: Market segment designation.
repeated_guest: Is the customer a repeated guest? (0 - No, 1- Yes)
no_of_previous_cancellations: Number of previous bookings that were canceled by the customer prior to the current booking
no_of_previous_bookings_not_canceled: Number of previous bookings not canceled by the customer prior to the current booking
avg_price_per_room: Average price per day of the reservation; prices of the rooms are dynamic. (in euros)
no_of_special_requests: Total number of special requests made by the customer (e.g. high floor, view from the room, etc)
booking_status: Flag indicating if the booking was canceled or not.

Importing necessary libraries and data¶

# Importing libraries
import warnings
warnings.filterwarnings("ignore")
from statsmodels.tools.sm_exceptions import ConvergenceWarning
warnings.simplefilter("ignore", ConvergenceWarning)
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
pd.set_option("display.max_columns", None)
pd.set_option("display.max_rows", 200)
pd.set_option("display.float_format", lambda x: "%.5f" % x)
from sklearn.model_selection import train_test_split
import statsmodels.stats.api as sms
from statsmodels.stats.outliers_influence import variance_inflation_factor
import statsmodels.api as sm
from statsmodels.tools.tools import add_constant
from sklearn.tree import DecisionTreeClassifier
from sklearn import tree
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import (
    f1_score,
    accuracy_score,
    recall_score,
    precision_score,
    confusion_matrix,
    roc_auc_score,
    plot_confusion_matrix,
    precision_recall_curve,
    roc_curve,
    make_scorer,
)

# Importing data set from google drive

from google.colab import drive
drive.mount('/content/drive')
data = pd.read_csv('/content/drive/MyDrive/DSBA/Logistic Regression/INN Hotels Project/INNHotelsGroup.csv')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).

Data Overview¶

Observations
Sanity checks

# First 5 rows

data.head()

# Last 5 rows

data.tail()

# Shape of data

data.shape

(36275, 19)

# Data types of the columns in dataset

data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 36275 entries, 0 to 36274
Data columns (total 19 columns):
 #   Column                                Non-Null Count  Dtype  
---  ------                                --------------  -----  
 0   Booking_ID                            36275 non-null  object 
 1   no_of_adults                          36275 non-null  int64  
 2   no_of_children                        36275 non-null  int64  
 3   no_of_weekend_nights                  36275 non-null  int64  
 4   no_of_week_nights                     36275 non-null  int64  
 5   type_of_meal_plan                     36275 non-null  object 
 6   required_car_parking_space            36275 non-null  int64  
 7   room_type_reserved                    36275 non-null  object 
 8   lead_time                             36275 non-null  int64  
 9   arrival_year                          36275 non-null  int64  
 10  arrival_month                         36275 non-null  int64  
 11  arrival_date                          36275 non-null  int64  
 12  market_segment_type                   36275 non-null  object 
 13  repeated_guest                        36275 non-null  int64  
 14  no_of_previous_cancellations          36275 non-null  int64  
 15  no_of_previous_bookings_not_canceled  36275 non-null  int64  
 16  avg_price_per_room                    36275 non-null  float64
 17  no_of_special_requests                36275 non-null  int64  
 18  booking_status                        36275 non-null  object 
dtypes: float64(1), int64(13), object(5)
memory usage: 5.3+ MB

There are 14 numeric (float and int type) and 5 string (object type) columns in the data.

# Checking duplicate values

data.duplicated().sum()

0

There are no duplicate values in the dataset.

# Dropping Booking_ID column

data = data.drop(["Booking_ID"], axis=1)

data.head()

Exploratory Data Analysis (EDA)¶

EDA is an important part of any project involving data.
It is important to investigate and understand the data better before building a model with it.
A few questions have been mentioned below which will help you approach the analysis in the right manner and generate insights from the data.
A thorough analysis of the data, in addition to the questions mentioned below, should be done.

Leading Questions:

What are the busiest months in the hotel?
Which market segment do most of the guests come from?
Hotel rates are dynamic and change according to demand and customer demographics. What are the differences in room prices in different market segments?
What percentage of bookings are canceled?
Repeating guests are the guests who stay in the hotel often and are important to brand equity. What percentage of repeating guests cancel?
Many guests have special requirements when booking a hotel room. Do these requirements affect booking cancellation?

# Statistical summary of data

data.describe(include="all")

Data Preprocessing¶

Missing value treatment (if needed)
Feature engineering (if needed)
Outlier detection and treatment (if needed)
Preparing data for modeling
Any other preprocessing steps (if needed)

# Plotting Prices of rooms across various market segments

plt.figure(figsize=(10, 6))
sns.boxplot(
    data=data, x="market_segment_type", y="avg_price_per_room", palette="gist_rainbow"
)
plt.show()

Average room price per room seems to be higher for Online market segment type.

# Plotting Booking status across various market segments

stacked_barplot(data, "market_segment_type", "booking_status")

booking_status           0      1    All
market_segment_type                     
All                  24390  11885  36275
Online               14739   8475  23214
Offline               7375   3153  10528
Corporate             1797    220   2017
Aviation                88     37    125
Complementary          391      0    391
------------------------------------------------------------------------------------------------------------------------

# Plotting Booking status and Number of special requests

stacked_barplot(data, "no_of_special_requests", "booking_status")

booking_status              0      1    All
no_of_special_requests                     
All                     24390  11885  36275
0                       11232   8545  19777
1                        8670   2703  11373
2                        3727    637   4364
3                         675      0    675
4                          78      0     78
5                           8      0      8
------------------------------------------------------------------------------------------------------------------------

Customers seem to confirm booking with increased special requests.

# Plotting Number of special requests and Average prie per room

plt.figure(figsize=(10, 5))
sns.boxplot(data=data, x="no_of_special_requests", y="avg_price_per_room", palette="gist_rainbow")  
plt.show()

Average price per room seems to increase with number of special requests from guests.

# Distribution between Average price per room and booking status

distribution_plot_wrt_target(data, "avg_price_per_room", "booking_status")

# Distribution between Lead time and booking status

distribution_plot_wrt_target(data, "lead_time", "booking_status")

# Combining children and adults to families

family_data = data[(data["no_of_children"] >= 0) & (data["no_of_adults"] > 1)]
family_data.shape

(28441, 18)

family_data["no_of_family_members"] = (
    family_data["no_of_adults"] + family_data["no_of_children"]
)

# Plotting Number of families and booking status

stacked_barplot(family_data, "no_of_family_members", "booking_status")

booking_status            0     1    All
no_of_family_members                    
All                   18456  9985  28441
2                     15506  8213  23719
3                      2425  1368   3793
4                       514   398    912
5                        11     6     17
------------------------------------------------------------------------------------------------------------------------

# Combining week days and weekend stays

stay_data = data[(data["no_of_week_nights"] > 0) & (data["no_of_weekend_nights"] > 0)]
stay_data.shape

(17094, 18)

stay_data["total_days"] = (
    stay_data["no_of_week_nights"] + stay_data["no_of_weekend_nights"]
)

stacked_barplot(stay_data, "total_days", "booking_status")

booking_status      0     1    All
total_days                        
All             10979  6115  17094
3                3689  2183   5872
4                2977  1387   4364
5                1593   738   2331
2                1301   639   1940
6                 566   465   1031
7                 590   383    973
8                 100    79    179
10                 51    58    109
9                  58    53    111
14                  5    27     32
15                  5    26     31
13                  3    15     18
12                  9    15     24
11                 24    15     39
20                  3     8     11
19                  1     5      6
16                  1     5      6
17                  1     4      5
18                  0     3      3
21                  1     3      4
22                  0     2      2
23                  1     1      2
24                  0     1      1
------------------------------------------------------------------------------------------------------------------------

# Plotting Repeated guests and booking status

stacked_barplot(data, "repeated_guest", "booking_status")

booking_status      0      1    All
repeated_guest                     
All             24390  11885  36275
0               23476  11869  35345
1                 914     16    930
------------------------------------------------------------------------------------------------------------------------

Repeated guests seem to confirm bookings at the hotel.

# Busiest months at hotel with grouping 

monthly_data = data.groupby(["arrival_month"])["booking_status"].count()
monthly_data = pd.DataFrame(
    {"Month": list(monthly_data.index), "Guests": list(monthly_data.values)}
)
plt.figure(figsize=(10, 5))
sns.lineplot(data=monthly_data, x="Month", y="Guests")
plt.show()

# Percentage of bookings canceled each month

stacked_barplot(data, "arrival_month", "booking_status")

booking_status      0      1    All
arrival_month                      
All             24390  11885  36275
10               3437   1880   5317
9                3073   1538   4611
8                2325   1488   3813
7                1606   1314   2920
6                1912   1291   3203
4                1741    995   2736
5                1650    948   2598
11               2105    875   2980
3                1658    700   2358
2                1274    430   1704
12               2619    402   3021
1                 990     24   1014
------------------------------------------------------------------------------------------------------------------------

Cancelations of bookings seem to be more in July and least in January.

# Plotting averge price per room and arrival month

plt.figure(figsize=(10, 5))
sns.lineplot(data=data, x="arrival_month", y="avg_price_per_room") 
plt.show()

Average price per room seems to increase during busier months.

Outlier Check

# checking for outliers using boxplot by dropping booking status

numeric_columns = data.select_dtypes(include=np.number).columns.tolist()
numeric_columns.remove("booking_status")

plt.figure(figsize=(15, 12))

for i, variable in enumerate(numeric_columns):
    plt.subplot(4, 4, i + 1)
    plt.boxplot(data[variable], whis=1.5)
    plt.tight_layout()
    plt.title(variable)

plt.show()

EDA¶

It is a good idea to explore the data once again after manipulating it.

Data Preparation for modeling

# Encoding categorical values and splitting data into test & train

X = data.drop(["booking_status"], axis=1)
Y = data["booking_status"]
X = pd.get_dummies(X, drop_first=True)
X.head()  
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.4, random_state=1)

print("Shape of Training set : ", X_train.shape)
print("Shape of test set : ", X_test.shape)
print("Percentage of classes in training set:")
print(y_train.value_counts(normalize=True))
print("Percentage of classes in test set:")
print(y_test.value_counts(normalize=True))

Shape of Training set :  (21765, 27)
Shape of test set :  (14510, 27)
Percentage of classes in training set:
0   0.66855
1   0.33145
Name: booking_status, dtype: float64
Percentage of classes in test set:
0   0.67808
1   0.32192
Name: booking_status, dtype: float64

# Model evaluation criteria

def model_performance_classification_statsmodels(
    model, predictors, target, threshold=0.5
):
    """
    Function to compute different metrics to check classification model performance

    model: classifier
    predictors: independent variables
    target: dependent variable
    threshold: threshold for classifying the observation as class 1
    """

    # checking which probabilities are greater than threshold
    pred_temp = model.predict(predictors) > threshold
    # rounding off the above values to get classes
    pred = np.round(pred_temp)

    acc = accuracy_score(target, pred)  # to compute Accuracy
    recall = recall_score(target, pred)  # to compute Recall
    precision = precision_score(target, pred)  # to compute Precision
    f1 = f1_score(target, pred)  # to compute F1-score

    # creating a dataframe of metrics
    df_perf = pd.DataFrame(
        {"Accuracy": acc, "Recall": recall, "Precision": precision, "F1": f1,},
        index=[0],
    )

    return df_perf

## function to plot the confusion_matrix of a classification model

def confusion_matrix_statsmodels(model, predictors, target, threshold=0.5):
    """
    To plot the confusion_matrix with percentages

    model: classifier
    predictors: independent variables
    target: dependent variable
    threshold: threshold for classifying the observation as class 1
    """
    y_pred = model.predict(predictors) > threshold
    cm = confusion_matrix(target, y_pred)
    labels = np.asarray(
        [
            ["{0:0.0f}".format(item) + "\n{0:.2%}".format(item / cm.flatten().sum())]
            for item in cm.flatten()
        ]
    ).reshape(2, 2)

    plt.figure(figsize=(6, 4))
    sns.heatmap(cm, annot=labels, fmt="")
    plt.ylabel("True label")
    plt.xlabel("Predicted label")

Checking for Multicollinearity

def checking_vif(predictors):
    vif = pd.DataFrame()
    vif["feature"] = predictors.columns

    # calculating VIF for each feature
    vif["VIF"] = [
        variance_inflation_factor(predictors.values, i)
        for i in range(len(predictors.columns))
    ]
    return vif

checking_vif(X_train)

Building a Logistic Regression model¶

X = data.drop(["booking_status"], axis=1)
Y = data["booking_status"]
X = sm.add_constant(X)
X = pd.get_dummies(X, drop_first=True) 
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.4, random_state=1)

logit = sm.Logit(y_train, X_train.astype(float))
lg = logit.fit(
    disp=False
) 

print(lg.summary())

                           Logit Regression Results                           
==============================================================================
Dep. Variable:         booking_status   No. Observations:                21765
Model:                          Logit   Df Residuals:                    21737
Method:                           MLE   Df Model:                           27
Date:                Mon, 20 Jun 2022   Pseudo R-squ.:                  0.3303
Time:                        06:48:26   Log-Likelihood:                -9258.4
converged:                      False   LL-Null:                       -13825.
Covariance Type:            nonrobust   LLR p-value:                     0.000
========================================================================================================
                                           coef    std err          z      P>|z|      [0.025      0.975]
--------------------------------------------------------------------------------------------------------
const                                 -943.2369    131.051     -7.197      0.000   -1200.093    -686.381
no_of_adults                             0.1282      0.041      3.155      0.002       0.049       0.208
no_of_children                           0.1553      0.068      2.295      0.022       0.023       0.288
no_of_weekend_nights                     0.1040      0.021      4.863      0.000       0.062       0.146
no_of_week_nights                        0.0369      0.013      2.772      0.006       0.011       0.063
required_car_parking_space              -1.6247      0.151    -10.728      0.000      -1.922      -1.328
lead_time                                0.0157      0.000     54.630      0.000       0.015       0.016
arrival_year                             0.4662      0.065      7.179      0.000       0.339       0.594
arrival_month                           -0.0451      0.007     -6.431      0.000      -0.059      -0.031
arrival_date                             0.0001      0.002      0.060      0.952      -0.004       0.004
repeated_guest                          -2.2784      0.618     -3.688      0.000      -3.489      -1.067
no_of_previous_cancellations             0.2803      0.091      3.090      0.002       0.102       0.458
no_of_previous_bookings_not_canceled    -0.1293      0.131     -0.987      0.324      -0.386       0.127
avg_price_per_room                       0.0192      0.001     23.944      0.000       0.018       0.021
no_of_special_requests                  -1.4803      0.033    -45.362      0.000      -1.544      -1.416
type_of_meal_plan_Meal Plan 2            0.1560      0.072      2.166      0.030       0.015       0.297
type_of_meal_plan_Meal Plan 3           27.6681   7.12e+05   3.89e-05      1.000   -1.39e+06    1.39e+06
type_of_meal_plan_Not Selected           0.2734      0.057      4.779      0.000       0.161       0.386
room_type_reserved_Room_Type 2          -0.2864      0.141     -2.026      0.043      -0.563      -0.009
room_type_reserved_Room_Type 3          -0.0287      1.312     -0.022      0.983      -2.600       2.542
room_type_reserved_Room_Type 4          -0.3208      0.058     -5.576      0.000      -0.434      -0.208
room_type_reserved_Room_Type 5          -0.7809      0.223     -3.495      0.000      -1.219      -0.343
room_type_reserved_Room_Type 6          -0.9978      0.164     -6.078      0.000      -1.320      -0.676
room_type_reserved_Room_Type 7          -1.4153      0.322     -4.393      0.000      -2.047      -0.784
market_segment_type_Complementary      -51.4621   1.02e+06  -5.05e-05      1.000      -2e+06       2e+06
market_segment_type_Corporate           -1.2413      0.276     -4.505      0.000      -1.781      -0.701
market_segment_type_Offline             -2.2625      0.263     -8.600      0.000      -2.778      -1.747
market_segment_type_Online              -0.4730      0.259     -1.823      0.068      -0.981       0.035
========================================================================================================

print("Training performance:")
model_performance_classification_statsmodels(lg, X_train, y_train)

Training performance:

Dropping high P values

cols = X_train.columns.tolist()

# setting an initial max p-value
max_p_value = 1

while len(cols) > 0:
    # defining the train set
    x_train_aux = X_train[cols]

    # fitting the model
    model = sm.Logit(y_train, x_train_aux).fit(disp=False)

    # getting the p-values and the maximum p-value
    p_values = model.pvalues
    max_p_value = max(p_values)

    # name of the variable with maximum p-value
    feature_with_p_max = p_values.idxmax()

    if max_p_value > 0.05:
        cols.remove(feature_with_p_max)
    else:
        break

selected_features = cols
print(selected_features)

['const', 'no_of_adults', 'no_of_children', 'no_of_weekend_nights', 'no_of_week_nights', 'required_car_parking_space', 'lead_time', 'arrival_year', 'arrival_month', 'repeated_guest', 'no_of_previous_cancellations', 'avg_price_per_room', 'no_of_special_requests', 'type_of_meal_plan_Meal Plan 2', 'type_of_meal_plan_Not Selected', 'room_type_reserved_Room_Type 2', 'room_type_reserved_Room_Type 4', 'room_type_reserved_Room_Type 5', 'room_type_reserved_Room_Type 6', 'room_type_reserved_Room_Type 7', 'market_segment_type_Corporate', 'market_segment_type_Offline']

X_train1 = X_train[selected_features]
X_test1 = X_test[selected_features]

logit1 = sm.Logit(y_train, X_train1.astype(float)) 
lg1 = logit1.fit(
    disp=False
)

print(lg1.summary())

                           Logit Regression Results                           
==============================================================================
Dep. Variable:         booking_status   No. Observations:                21765
Model:                          Logit   Df Residuals:                    21743
Method:                           MLE   Df Model:                           21
Date:                Mon, 20 Jun 2022   Pseudo R-squ.:                  0.3292
Time:                        06:58:41   Log-Likelihood:                -9273.4
converged:                       True   LL-Null:                       -13825.
Covariance Type:            nonrobust   LLR p-value:                     0.000
==================================================================================================
                                     coef    std err          z      P>|z|      [0.025      0.975]
--------------------------------------------------------------------------------------------------
const                           -937.6790    130.617     -7.179      0.000   -1193.685    -681.673
no_of_adults                       0.1211      0.040      3.011      0.003       0.042       0.200
no_of_children                     0.1493      0.068      2.212      0.027       0.017       0.282
no_of_weekend_nights               0.1057      0.021      4.952      0.000       0.064       0.148
no_of_week_nights                  0.0392      0.013      2.954      0.003       0.013       0.065
required_car_parking_space        -1.6232      0.151    -10.715      0.000      -1.920      -1.326
lead_time                          0.0157      0.000     54.940      0.000       0.015       0.016
arrival_year                       0.4632      0.065      7.156      0.000       0.336       0.590
arrival_month                     -0.0458      0.007     -6.555      0.000      -0.059      -0.032
repeated_guest                    -2.5891      0.564     -4.587      0.000      -3.695      -1.483
no_of_previous_cancellations       0.2487      0.081      3.067      0.002       0.090       0.408
avg_price_per_room                 0.0196      0.001     24.825      0.000       0.018       0.021
no_of_special_requests            -1.4819      0.033    -45.482      0.000      -1.546      -1.418
type_of_meal_plan_Meal Plan 2      0.1443      0.072      2.007      0.045       0.003       0.285
type_of_meal_plan_Not Selected     0.2796      0.057      4.902      0.000       0.168       0.391
room_type_reserved_Room_Type 2    -0.2816      0.141     -1.994      0.046      -0.558      -0.005
room_type_reserved_Room_Type 4    -0.3203      0.057     -5.584      0.000      -0.433      -0.208
room_type_reserved_Room_Type 5    -0.7988      0.223     -3.589      0.000      -1.235      -0.363
room_type_reserved_Room_Type 6    -1.0150      0.164     -6.193      0.000      -1.336      -0.694
room_type_reserved_Room_Type 7    -1.4495      0.322     -4.506      0.000      -2.080      -0.819
market_segment_type_Corporate     -0.7700      0.110     -7.011      0.000      -0.985      -0.555
market_segment_type_Offline       -1.7800      0.056    -31.769      0.000      -1.890      -1.670
==================================================================================================

All P values are less than 0.5, hence this model is accurate.

print("Training performance:")
model_performance_classification_statsmodels(lg1, X_train1, y_train)

Training performance:

# Converting coefficients to odds

odds = np.exp(lg1.params)
perc_change_odds = (np.exp(lg1.params) - 1) * 100
pd.set_option("display.max_columns", None)
pd.DataFrame({"Odds": odds, "Change_odd%": perc_change_odds}, index=X_train1.columns).T

Odds of booking getting canceled for no of adults is 1.1 times or 12.8 %

Odds of booking getting canceled for no of children is 1.1 times or 16.1 %

Odds of booking getting canceled for weekends is 1.1 times or 11.1 %

Odds of booking getting canceled for weekdays is 1.0 times or 4.0 %

Odds of requiring car parking space is 0.1 and about 80% chance of booking getting canceled.

Odds of booking getting canceled for lead time is 1.0 times or 1.5 %

Odds of booking getting canceled based on arrival year is 1.5 times or 58.9 %

Odds of booking getting canceled based on arrival month is 0.9 times or -4.47 %

Odds of booking getting canceled for average price per room is 1.0 times or 1.9 %

Odds of booking getting canceled based on no of special requests is 0.2 times or -77.2 %

Model performance evaluation¶

Model performance on Training set

# Creatimg confusion matrix

confusion_matrix_statsmodels(lg1, X_train1, y_train)

print("Training performance:")
log_reg_model_train_perf = model_performance_classification_statsmodels(lg1, X_train1, y_train) ## Complete the code to check performance on X_train1 and y_train
log_reg_model_train_perf

Training performance:

F1 is 0.68 so further analysis needs to be done.

ROC-AUC on Training set

logit_roc_auc_train = roc_auc_score(y_train, lg1.predict(X_train1))
fpr, tpr, thresholds = roc_curve(y_train, lg1.predict(X_train1))
plt.figure(figsize=(7, 5))
plt.plot(fpr, tpr, label="Logistic Regression (area = %0.2f)" % logit_roc_auc_train)
plt.plot([0, 1], [0, 1], "r--")
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.01])
plt.xlabel("False Positive Rate")
plt.ylabel("True Positive Rate")
plt.title("Receiver operating characteristic")
plt.legend(loc="lower right")
plt.show()

# Optimal threshold as per AUC-ROC curve

fpr, tpr, thresholds = roc_curve(y_train, lg1.predict(X_train1))

optimal_idx = np.argmax(tpr - fpr)
optimal_threshold_auc_roc = thresholds[optimal_idx]
print(optimal_threshold_auc_roc)

0.36408133806810755

# creating confusion matrix

confusion_matrix_statsmodels(
    lg1, X_train1, y_train, 
)

# checking model performance for this model

log_reg_model_train_perf_threshold_auc_roc = model_performance_classification_statsmodels(
    lg1, X_train1, y_train, threshold=optimal_threshold_auc_roc
)
print("Training performance:")

log_reg_model_train_perf_threshold_auc_roc

Training performance:

# Precision-Recall curve

y_scores = lg1.predict(X_train1)
prec, rec, tre = precision_recall_curve(y_train, y_scores,)


def plot_prec_recall_vs_tresh(precisions, recalls, thresholds):
    plt.plot(thresholds, precisions[:-1], "b--", label="precision")
    plt.plot(thresholds, recalls[:-1], "g--", label="recall")
    plt.xlabel("Threshold")
    plt.legend(loc="upper left")
    plt.ylim([0, 1])


plt.figure(figsize=(10, 7))
plot_prec_recall_vs_tresh(prec, rec, tre)
plt.show()

# setting the threshold
optimal_threshold_curve = 0.42

confusion_matrix_statsmodels(
    lg1, X_train1, y_train 
)

log_reg_model_train_perf_threshold_curve = model_performance_classification_statsmodels(
    lg1, X_train1, y_train, threshold=optimal_threshold_curve
)
print("Training performance:")
log_reg_model_train_perf_threshold_curve

Training performance:

Recall helped with the model and F1 seems better with value 0.70.

Model performance on Test set

confusion_matrix_statsmodels(lg1, X_test1, y_test)

log_reg_model_test_perf = model_performance_classification_statsmodels(lg1, X_test1, y_test) 

print("Test performance:")
log_reg_model_test_perf

Test performance:

F1 seems to be less in this case.

# ROC curve

logit_roc_auc_train = roc_auc_score(y_test, lg1.predict(X_test1))
fpr, tpr, thresholds = roc_curve(y_test, lg1.predict(X_test1))
plt.figure(figsize=(7, 5))
plt.plot(fpr, tpr, label="Logistic Regression (area = %0.2f)" % logit_roc_auc_train)
plt.plot([0, 1], [0, 1], "r--")
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.01])
plt.xlabel("False Positive Rate")
plt.ylabel("True Positive Rate")
plt.title("Receiver operating characteristic")
plt.legend(loc="lower right")
plt.show()

# Using model with threshold=0.37

confusion_matrix_statsmodels(lg1, X_test1, y_test)

# checking model performance for this model
log_reg_model_test_perf_threshold_auc_roc = model_performance_classification_statsmodels(
    lg1, X_test1, y_test, threshold=optimal_threshold_auc_roc
)
print("Test performance:")
log_reg_model_test_perf_threshold_auc_roc

Test performance:

# Using model with threshold=0.42

confusion_matrix_statsmodels(lg1, X_test1, y_test)

log_reg_model_test_perf_threshold_curve = model_performance_classification_statsmodels(
    lg1, X_test1, y_test, threshold=optimal_threshold_curve
)
print("Test performance:")
log_reg_model_test_perf_threshold_curve

Test performance:

F1 seems to increase a little for test data after threshold of 0.42.

Final Model Summary¶

models_train_comp_df = pd.concat(
    [
        log_reg_model_train_perf.T,
        log_reg_model_train_perf_threshold_auc_roc.T,
        log_reg_model_train_perf_threshold_curve.T,
    ],
    axis=1,
)
models_train_comp_df.columns = [
    "Logistic Regression-default Threshold",
    "Logistic Regression-0.37 Threshold",
    "Logistic Regression-0.42 Threshold",
]

print("Training performance comparison:")
models_train_comp_df

Training performance comparison:

Logistic regression-0.37 seem to have better values.

# test performance comparison

models_test_comp_df = pd.concat(
    [
        log_reg_model_test_perf.T,
        log_reg_model_test_perf_threshold_auc_roc.T,
        log_reg_model_test_perf_threshold_curve.T,
    ],
    axis=1,
)
models_test_comp_df.columns = [
    "Logistic Regression-default Threshold",
    "Logistic Regression-0.37 Threshold",
    "Logistic Regression-0.42 Threshold",
]

print("Test performance comparison:")
models_test_comp_df

Test performance comparison:

Logistic regression-0.37 seem to have better values.

Building a Decision Tree model¶

X = data.drop(["booking_status"], axis=1)
Y = data["booking_status"]

X = pd.get_dummies(X, drop_first=True) ## Complete the code to create dummies for X

# Splitting data in train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.4, random_state=1)

# defining a function to compute different metrics to check performance of a classification model built using sklearn
def model_performance_classification_sklearn(model, predictors, target):
    """
    Function to compute different metrics to check classification model performance

    model: classifier
    predictors: independent variables
    target: dependent variable
    """

    # predicting using the independent variables
    pred = model.predict(predictors)

    acc = accuracy_score(target, pred)  # to compute Accuracy
    recall = recall_score(target, pred)  # to compute Recall
    precision = precision_score(target, pred)  # to compute Precision
    f1 = f1_score(target, pred)  # to compute F1-score

    # creating a dataframe of metrics
    df_perf = pd.DataFrame(
        {"Accuracy": acc, "Recall": recall, "Precision": precision, "F1": f1,},
        index=[0],
    )

    return df_perf

def confusion_matrix_sklearn(model, predictors, target):
    """
    To plot the confusion_matrix with percentages

    model: classifier
    predictors: independent variables
    target: dependent variable
    """
    y_pred = model.predict(predictors)
    cm = confusion_matrix(target, y_pred)
    labels = np.asarray(
        [
            ["{0:0.0f}".format(item) + "\n{0:.2%}".format(item / cm.flatten().sum())]
            for item in cm.flatten()
        ]
    ).reshape(2, 2)

    plt.figure(figsize=(6, 4))
    sns.heatmap(cm, annot=labels, fmt="")
    plt.ylabel("True label")
    plt.xlabel("Predicted label")

model = DecisionTreeClassifier(random_state=1)
model.fit(X_train, y_train)

DecisionTreeClassifier(random_state=1)

# model performance on training data

confusion_matrix_sklearn(model, X_train, y_train)

decision_tree_perf_train = model_performance_classification_sklearn(
    model, X_train, y_train
)
decision_tree_perf_train

# model performance on test data

confusion_matrix_sklearn(model, X_train, y_train)

decision_tree_perf_test = model_performance_classification_sklearn(model, X_train, y_train) 
decision_tree_perf_test

feature_names = list(X_train.columns)
importances = model.feature_importances_
indices = np.argsort(importances)

plt.figure(figsize=(8, 8))
plt.title("Feature Importances")
plt.barh(range(len(indices)), importances[indices], color="violet", align="center")
plt.yticks(range(len(indices)), [feature_names[i] for i in indices])
plt.xlabel("Relative Importance")
plt.show()

Decision tree clearly seem to have good F1 values, hence it is a good model to use for predictions.

Do we need to prune the tree?¶

Yes Pruning is required

# Pre Pruning

estimator = DecisionTreeClassifier(random_state=1, class_weight="balanced")
parameters = {
    "max_depth": np.arange(2, 7, 2),
    "max_leaf_nodes": [50, 75, 150, 250],
    "min_samples_split": [10, 30, 50, 70],
}

acc_scorer = make_scorer(f1_score)

grid_obj = GridSearchCV(estimator, parameters, scoring=acc_scorer, cv=5)
grid_obj = grid_obj.fit(X_train, y_train)
estimator = grid_obj.best_estimator_
estimator.fit(X_train, y_train)

DecisionTreeClassifier(class_weight='balanced', max_depth=6, max_leaf_nodes=50,
                       min_samples_split=10, random_state=1)

Model Performance Comparison and Conclusions¶

# Performance on training set

confusion_matrix_sklearn(model, X_train, y_train)

decision_tree_tune_perf_train = model_performance_classification_sklearn(model, X_train, y_train) 
decision_tree_tune_perf_train

# Performance on Test set

confusion_matrix_sklearn(model, X_test, y_test)

decision_tree_tune_perf_test = model_performance_classification_sklearn(model, X_test, y_test) 
decision_tree_tune_perf_test

Decision Tree Visual

plt.figure(figsize=(20, 10))
out = tree.plot_tree(
    estimator,
    feature_names=feature_names,
    filled=True,
    fontsize=9,
    node_ids=False,
    class_names=None,
)
for o in out:
    arrow = o.arrow_patch
    if arrow is not None:
        arrow.set_edgecolor("black")
        arrow.set_linewidth(1)
plt.show()

print(tree.export_text(estimator, feature_names=feature_names, show_weights=True))

|--- lead_time <= 151.50
|   |--- no_of_special_requests <= 0.50
|   |   |--- market_segment_type_Online <= 0.50
|   |   |   |--- lead_time <= 90.50
|   |   |   |   |--- no_of_weekend_nights <= 0.50
|   |   |   |   |   |--- avg_price_per_room <= 193.00
|   |   |   |   |   |   |--- weights: [1495.77, 119.17] class: 0
|   |   |   |   |   |--- avg_price_per_room >  193.00
|   |   |   |   |   |   |--- weights: [0.75, 21.12] class: 1
|   |   |   |   |--- no_of_weekend_nights >  0.50
|   |   |   |   |   |--- lead_time <= 65.50
|   |   |   |   |   |   |--- weights: [811.46, 176.50] class: 0
|   |   |   |   |   |--- lead_time >  65.50
|   |   |   |   |   |   |--- weights: [123.40, 152.36] class: 1
|   |   |   |--- lead_time >  90.50
|   |   |   |   |--- lead_time <= 117.50
|   |   |   |   |   |--- avg_price_per_room <= 92.72
|   |   |   |   |   |   |--- weights: [180.24, 191.58] class: 1
|   |   |   |   |   |--- avg_price_per_room >  92.72
|   |   |   |   |   |   |--- weights: [71.05, 253.43] class: 1
|   |   |   |   |--- lead_time >  117.50
|   |   |   |   |   |--- no_of_adults <= 1.50
|   |   |   |   |   |   |--- weights: [90.49, 3.02] class: 0
|   |   |   |   |   |--- no_of_adults >  1.50
|   |   |   |   |   |   |--- weights: [183.98, 117.66] class: 0
|   |   |--- market_segment_type_Online >  0.50
|   |   |   |--- lead_time <= 9.50
|   |   |   |   |--- avg_price_per_room <= 200.38
|   |   |   |   |   |--- arrival_month <= 8.50
|   |   |   |   |   |   |--- weights: [369.46, 250.42] class: 0
|   |   |   |   |   |--- arrival_month >  8.50
|   |   |   |   |   |   |--- weights: [232.59, 30.17] class: 0
|   |   |   |   |--- avg_price_per_room >  200.38
|   |   |   |   |   |--- weights: [0.75, 37.71] class: 1
|   |   |   |--- lead_time >  9.50
|   |   |   |   |--- avg_price_per_room <= 105.27
|   |   |   |   |   |--- lead_time <= 25.50
|   |   |   |   |   |   |--- weights: [172.76, 132.75] class: 0
|   |   |   |   |   |--- lead_time >  25.50
|   |   |   |   |   |   |--- weights: [423.30, 1099.71] class: 1
|   |   |   |   |--- avg_price_per_room >  105.27
|   |   |   |   |   |--- required_car_parking_space <= 0.50
|   |   |   |   |   |   |--- weights: [363.47, 2056.12] class: 1
|   |   |   |   |   |--- required_car_parking_space >  0.50
|   |   |   |   |   |   |--- weights: [23.93, 0.00] class: 0
|   |--- no_of_special_requests >  0.50
|   |   |--- no_of_special_requests <= 1.50
|   |   |   |--- market_segment_type_Online <= 0.50
|   |   |   |   |--- type_of_meal_plan_Not Selected <= 0.50
|   |   |   |   |   |--- lead_time <= 102.50
|   |   |   |   |   |   |--- weights: [604.29, 7.54] class: 0
|   |   |   |   |   |--- lead_time >  102.50
|   |   |   |   |   |   |--- weights: [68.81, 19.61] class: 0
|   |   |   |   |--- type_of_meal_plan_Not Selected >  0.50
|   |   |   |   |   |--- lead_time <= 63.00
|   |   |   |   |   |   |--- weights: [14.21, 1.51] class: 0
|   |   |   |   |   |--- lead_time >  63.00
|   |   |   |   |   |   |--- weights: [0.00, 9.05] class: 1
|   |   |   |--- market_segment_type_Online >  0.50
|   |   |   |   |--- lead_time <= 6.50
|   |   |   |   |   |--- no_of_week_nights <= 10.00
|   |   |   |   |   |   |--- weights: [562.41, 57.32] class: 0
|   |   |   |   |   |--- no_of_week_nights >  10.00
|   |   |   |   |   |   |--- weights: [0.75, 3.02] class: 1
|   |   |   |   |--- lead_time >  6.50
|   |   |   |   |   |--- required_car_parking_space <= 0.50
|   |   |   |   |   |   |--- weights: [2246.65, 1264.14] class: 0
|   |   |   |   |   |--- required_car_parking_space >  0.50
|   |   |   |   |   |   |--- weights: [121.16, 1.51] class: 0
|   |   |--- no_of_special_requests >  1.50
|   |   |   |--- lead_time <= 90.50
|   |   |   |   |--- no_of_week_nights <= 3.50
|   |   |   |   |   |--- weights: [1345.45, 0.00] class: 0
|   |   |   |   |--- no_of_week_nights >  3.50
|   |   |   |   |   |--- no_of_special_requests <= 2.50
|   |   |   |   |   |   |--- weights: [150.33, 46.76] class: 0
|   |   |   |   |   |--- no_of_special_requests >  2.50
|   |   |   |   |   |   |--- weights: [45.62, 0.00] class: 0
|   |   |   |--- lead_time >  90.50
|   |   |   |   |--- no_of_special_requests <= 2.50
|   |   |   |   |   |--- arrival_month <= 8.50
|   |   |   |   |   |   |--- weights: [157.80, 49.78] class: 0
|   |   |   |   |   |--- arrival_month >  8.50
|   |   |   |   |   |   |--- weights: [86.75, 85.99] class: 0
|   |   |   |   |--- no_of_special_requests >  2.50
|   |   |   |   |   |--- weights: [59.83, 0.00] class: 0
|--- lead_time >  151.50
|   |--- avg_price_per_room <= 100.04
|   |   |--- no_of_special_requests <= 0.50
|   |   |   |--- no_of_adults <= 1.50
|   |   |   |   |--- market_segment_type_Online <= 0.50
|   |   |   |   |   |--- lead_time <= 163.50
|   |   |   |   |   |   |--- weights: [3.74, 21.12] class: 1
|   |   |   |   |   |--- lead_time >  163.50
|   |   |   |   |   |   |--- weights: [222.12, 55.82] class: 0
|   |   |   |   |--- market_segment_type_Online >  0.50
|   |   |   |   |   |--- avg_price_per_room <= 30.53
|   |   |   |   |   |   |--- weights: [8.23, 1.51] class: 0
|   |   |   |   |   |--- avg_price_per_room >  30.53
|   |   |   |   |   |   |--- weights: [0.75, 81.46] class: 1
|   |   |   |--- no_of_adults >  1.50
|   |   |   |   |--- arrival_year <= 2017.50
|   |   |   |   |   |--- lead_time <= 215.50
|   |   |   |   |   |   |--- weights: [32.91, 85.99] class: 1
|   |   |   |   |   |--- lead_time >  215.50
|   |   |   |   |   |   |--- weights: [46.37, 7.54] class: 0
|   |   |   |   |--- arrival_year >  2017.50
|   |   |   |   |   |--- avg_price_per_room <= 84.19
|   |   |   |   |   |   |--- weights: [110.69, 518.93] class: 1
|   |   |   |   |   |--- avg_price_per_room >  84.19
|   |   |   |   |   |   |--- weights: [17.20, 819.13] class: 1
|   |   |--- no_of_special_requests >  0.50
|   |   |   |--- no_of_weekend_nights <= 0.50
|   |   |   |   |--- lead_time <= 180.50
|   |   |   |   |   |--- lead_time <= 159.50
|   |   |   |   |   |   |--- weights: [7.48, 7.54] class: 1
|   |   |   |   |   |--- lead_time >  159.50
|   |   |   |   |   |   |--- weights: [32.91, 4.53] class: 0
|   |   |   |   |--- lead_time >  180.50
|   |   |   |   |   |--- no_of_special_requests <= 2.50
|   |   |   |   |   |   |--- weights: [15.71, 193.09] class: 1
|   |   |   |   |   |--- no_of_special_requests >  2.50
|   |   |   |   |   |   |--- weights: [5.98, 0.00] class: 0
|   |   |   |--- no_of_weekend_nights >  0.50
|   |   |   |   |--- market_segment_type_Online <= 0.50
|   |   |   |   |   |--- no_of_week_nights <= 5.50
|   |   |   |   |   |   |--- weights: [97.23, 4.53] class: 0
|   |   |   |   |   |--- no_of_week_nights >  5.50
|   |   |   |   |   |   |--- weights: [0.75, 1.51] class: 1
|   |   |   |   |--- market_segment_type_Online >  0.50
|   |   |   |   |   |--- no_of_week_nights <= 9.00
|   |   |   |   |   |   |--- weights: [213.90, 116.16] class: 0
|   |   |   |   |   |--- no_of_week_nights >  9.00
|   |   |   |   |   |   |--- weights: [0.75, 10.56] class: 1
|   |--- avg_price_per_room >  100.04
|   |   |--- arrival_month <= 11.50
|   |   |   |--- no_of_special_requests <= 2.50
|   |   |   |   |--- weights: [0.00, 2742.50] class: 1
|   |   |   |--- no_of_special_requests >  2.50
|   |   |   |   |--- weights: [20.19, 0.00] class: 0
|   |   |--- arrival_month >  11.50
|   |   |   |--- no_of_special_requests <= 0.50
|   |   |   |   |--- weights: [31.41, 0.00] class: 0
|   |   |   |--- no_of_special_requests >  0.50
|   |   |   |   |--- arrival_date <= 24.50
|   |   |   |   |   |--- weights: [3.74, 0.00] class: 0
|   |   |   |   |--- arrival_date >  24.50
|   |   |   |   |   |--- weights: [2.99, 22.63] class: 1

# Features in tree building

importances = estimator.feature_importances_
indices = np.argsort(importances)

plt.figure(figsize=(8, 8))
plt.title("Feature Importances")
plt.barh(range(len(indices)), importances[indices], color="violet", align="center")
plt.yticks(range(len(indices)), [feature_names[i] for i in indices])
plt.xlabel("Relative Importance")
plt.show()

# Cost Complexity Pruning

clf = DecisionTreeClassifier(random_state=1, class_weight="balanced")
path = clf.cost_complexity_pruning_path(X_train, y_train)
ccp_alphas, impurities = abs(path.ccp_alphas), path.impurities

pd.DataFrame(path)

fig, ax = plt.subplots(figsize=(10, 5))
ax.plot(ccp_alphas[:-1], impurities[:-1], marker="o", drawstyle="steps-post")
ax.set_xlabel("effective alpha")
ax.set_ylabel("total impurity of leaves")
ax.set_title("Total Impurity vs effective alpha for training set")
plt.show()

clfs = []
for ccp_alpha in ccp_alphas:
    clf = DecisionTreeClassifier(
        random_state=1, ccp_alpha=ccp_alpha, class_weight="balanced"
    )
    clf.fit(X_train, y_train) ## Complete the code to fit decision tree on training data
    clfs.append(clf)
print(
    "Number of nodes in the last tree is: {} with ccp_alpha: {}".format(
        clfs[-1].tree_.node_count, ccp_alphas[-1]
    )
)

Number of nodes in the last tree is: 1 with ccp_alpha: 0.08086677213026633

F1 score vs Alpha for Training and Test Sets

f1_train = []
for clf in clfs:
    pred_train = clf.predict(X_train)
    values_train = f1_score(y_train, pred_train)
    f1_train.append(values_train)

f1_test = []
for clf in clfs:
    pred_test = clf.predict(X_test)
    values_test = f1_score(y_test, pred_test)
    f1_test.append(values_test)

index_best_model = np.argmax(f1_test)
best_model = clfs[index_best_model]
print(best_model)

DecisionTreeClassifier(ccp_alpha=0.00014254380289357126,
                       class_weight='balanced', random_state=1)

# Checking performance on Training set

confusion_matrix_sklearn(best_model, X_train, y_train)

decision_tree_post_perf_train = model_performance_classification_sklearn(
    best_model, X_train, y_train
)
decision_tree_post_perf_train

# Checking performance on Test set

confusion_matrix_sklearn(best_model, X_test, y_test)

decision_tree_post_test = model_performance_classification_sklearn(
    best_model, X_test, y_test
) 
decision_tree_post_test

plt.figure(figsize=(20, 10))

out = tree.plot_tree(
    best_model,
    feature_names=feature_names,
    filled=True,
    fontsize=9,
    node_ids=False,
    class_names=None,
)
for o in out:
    arrow = o.arrow_patch
    if arrow is not None:
        arrow.set_edgecolor("black")
        arrow.set_linewidth(1)
plt.show()

print(tree.export_text(best_model, feature_names=feature_names, show_weights=True))

|--- lead_time <= 151.50
|   |--- no_of_special_requests <= 0.50
|   |   |--- market_segment_type_Online <= 0.50
|   |   |   |--- lead_time <= 90.50
|   |   |   |   |--- no_of_weekend_nights <= 0.50
|   |   |   |   |   |--- avg_price_per_room <= 193.00
|   |   |   |   |   |   |--- market_segment_type_Offline <= 0.50
|   |   |   |   |   |   |   |--- lead_time <= 16.50
|   |   |   |   |   |   |   |   |--- avg_price_per_room <= 68.50
|   |   |   |   |   |   |   |   |   |--- weights: [177.25, 9.05] class: 0
|   |   |   |   |   |   |   |   |--- avg_price_per_room >  68.50
|   |   |   |   |   |   |   |   |   |--- arrival_date <= 29.50
|   |   |   |   |   |   |   |   |   |   |--- no_of_adults <= 1.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
|   |   |   |   |   |   |   |   |   |   |--- no_of_adults >  1.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 4
|   |   |   |   |   |   |   |   |   |--- arrival_date >  29.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [1.50, 6.03] class: 1
|   |   |   |   |   |   |   |--- lead_time >  16.50
|   |   |   |   |   |   |   |   |--- avg_price_per_room <= 135.00
|   |   |   |   |   |   |   |   |   |--- arrival_month <= 11.50
|   |   |   |   |   |   |   |   |   |   |--- lead_time <= 39.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 3
|   |   |   |   |   |   |   |   |   |   |--- lead_time >  39.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [42.63, 13.58] class: 0
|   |   |   |   |   |   |   |   |   |--- arrival_month >  11.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [18.70, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- avg_price_per_room >  135.00
|   |   |   |   |   |   |   |   |   |--- weights: [0.00, 10.56] class: 1
|   |   |   |   |   |   |--- market_segment_type_Offline >  0.50
|   |   |   |   |   |   |   |--- weights: [1035.82, 1.51] class: 0
|   |   |   |   |   |--- avg_price_per_room >  193.00
|   |   |   |   |   |   |--- weights: [0.75, 21.12] class: 1
|   |   |   |   |--- no_of_weekend_nights >  0.50
|   |   |   |   |   |--- lead_time <= 65.50
|   |   |   |   |   |   |--- no_of_weekend_nights <= 3.50
|   |   |   |   |   |   |   |--- arrival_date <= 27.50
|   |   |   |   |   |   |   |   |--- arrival_month <= 9.50
|   |   |   |   |   |   |   |   |   |--- arrival_date <= 11.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [144.34, 15.09] class: 0
|   |   |   |   |   |   |   |   |   |--- arrival_date >  11.50
|   |   |   |   |   |   |   |   |   |   |--- lead_time <= 59.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 8
|   |   |   |   |   |   |   |   |   |   |--- lead_time >  59.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 3
|   |   |   |   |   |   |   |   |--- arrival_month >  9.50
|   |   |   |   |   |   |   |   |   |--- weights: [323.09, 21.12] class: 0
|   |   |   |   |   |   |   |--- arrival_date >  27.50
|   |   |   |   |   |   |   |   |--- lead_time <= 1.50
|   |   |   |   |   |   |   |   |   |--- arrival_month <= 2.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [1.50, 37.71] class: 1
|   |   |   |   |   |   |   |   |   |--- arrival_month >  2.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [5.98, 1.51] class: 0
|   |   |   |   |   |   |   |   |--- lead_time >  1.50
|   |   |   |   |   |   |   |   |   |--- weights: [51.60, 3.02] class: 0
|   |   |   |   |   |   |--- no_of_weekend_nights >  3.50
|   |   |   |   |   |   |   |--- weights: [1.50, 15.09] class: 1
|   |   |   |   |   |--- lead_time >  65.50
|   |   |   |   |   |   |--- avg_price_per_room <= 99.98
|   |   |   |   |   |   |   |--- arrival_month <= 3.50
|   |   |   |   |   |   |   |   |--- avg_price_per_room <= 63.25
|   |   |   |   |   |   |   |   |   |--- weights: [15.71, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- avg_price_per_room >  63.25
|   |   |   |   |   |   |   |   |   |--- lead_time <= 77.00
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.75, 15.09] class: 1
|   |   |   |   |   |   |   |   |   |--- lead_time >  77.00
|   |   |   |   |   |   |   |   |   |   |--- no_of_weekend_nights <= 1.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [5.98, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- no_of_weekend_nights >  1.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [2.99, 9.05] class: 1
|   |   |   |   |   |   |   |--- arrival_month >  3.50
|   |   |   |   |   |   |   |   |--- lead_time <= 71.50
|   |   |   |   |   |   |   |   |   |--- no_of_week_nights <= 4.00
|   |   |   |   |   |   |   |   |   |   |--- no_of_children <= 0.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [14.21, 3.02] class: 0
|   |   |   |   |   |   |   |   |   |   |--- no_of_children >  0.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 3.02] class: 1
|   |   |   |   |   |   |   |   |   |--- no_of_week_nights >  4.00
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 4.53] class: 1
|   |   |   |   |   |   |   |   |--- lead_time >  71.50
|   |   |   |   |   |   |   |   |   |--- weights: [59.08, 6.03] class: 0
|   |   |   |   |   |   |--- avg_price_per_room >  99.98
|   |   |   |   |   |   |   |--- lead_time <= 81.00
|   |   |   |   |   |   |   |   |--- avg_price_per_room <= 123.25
|   |   |   |   |   |   |   |   |   |--- lead_time <= 68.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [2.24, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- lead_time >  68.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [5.24, 110.12] class: 1
|   |   |   |   |   |   |   |   |--- avg_price_per_room >  123.25
|   |   |   |   |   |   |   |   |   |--- weights: [6.73, 0.00] class: 0
|   |   |   |   |   |   |   |--- lead_time >  81.00
|   |   |   |   |   |   |   |   |--- weights: [10.47, 1.51] class: 0
|   |   |   |--- lead_time >  90.50
|   |   |   |   |--- lead_time <= 117.50
|   |   |   |   |   |--- avg_price_per_room <= 92.72
|   |   |   |   |   |   |--- avg_price_per_room <= 75.38
|   |   |   |   |   |   |   |--- no_of_week_nights <= 2.50
|   |   |   |   |   |   |   |   |--- avg_price_per_room <= 58.75
|   |   |   |   |   |   |   |   |   |--- weights: [5.98, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- avg_price_per_room >  58.75
|   |   |   |   |   |   |   |   |   |--- market_segment_type_Offline <= 0.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [3.74, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- market_segment_type_Offline >  0.50
|   |   |   |   |   |   |   |   |   |   |--- type_of_meal_plan_Meal Plan 2 <= 0.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 5
|   |   |   |   |   |   |   |   |   |   |--- type_of_meal_plan_Meal Plan 2 >  0.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [2.24, 0.00] class: 0
|   |   |   |   |   |   |   |--- no_of_week_nights >  2.50
|   |   |   |   |   |   |   |   |--- arrival_date <= 11.50
|   |   |   |   |   |   |   |   |   |--- weights: [25.43, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- arrival_date >  11.50
|   |   |   |   |   |   |   |   |   |--- weights: [22.44, 15.09] class: 0
|   |   |   |   |   |   |--- avg_price_per_room >  75.38
|   |   |   |   |   |   |   |--- arrival_month <= 3.50
|   |   |   |   |   |   |   |   |--- weights: [51.60, 1.51] class: 0
|   |   |   |   |   |   |   |--- arrival_month >  3.50
|   |   |   |   |   |   |   |   |--- avg_price_per_room <= 86.68
|   |   |   |   |   |   |   |   |   |--- no_of_adults <= 1.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [1.50, 16.59] class: 1
|   |   |   |   |   |   |   |   |   |--- no_of_adults >  1.50
|   |   |   |   |   |   |   |   |   |   |--- arrival_month <= 4.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [1.50, 15.09] class: 1
|   |   |   |   |   |   |   |   |   |   |--- arrival_month >  4.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 3
|   |   |   |   |   |   |   |   |--- avg_price_per_room >  86.68
|   |   |   |   |   |   |   |   |   |--- weights: [23.93, 3.02] class: 0
|   |   |   |   |   |--- avg_price_per_room >  92.72
|   |   |   |   |   |   |--- no_of_adults <= 1.50
|   |   |   |   |   |   |   |--- weights: [5.24, 98.05] class: 1
|   |   |   |   |   |   |--- no_of_adults >  1.50
|   |   |   |   |   |   |   |--- avg_price_per_room <= 108.50
|   |   |   |   |   |   |   |   |--- arrival_date <= 14.50
|   |   |   |   |   |   |   |   |   |--- avg_price_per_room <= 96.61
|   |   |   |   |   |   |   |   |   |   |--- weights: [11.97, 21.12] class: 1
|   |   |   |   |   |   |   |   |   |--- avg_price_per_room >  96.61
|   |   |   |   |   |   |   |   |   |   |--- weights: [11.97, 1.51] class: 0
|   |   |   |   |   |   |   |   |--- arrival_date >  14.50
|   |   |   |   |   |   |   |   |   |--- no_of_week_nights <= 2.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [3.74, 104.09] class: 1
|   |   |   |   |   |   |   |   |   |--- no_of_week_nights >  2.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [3.74, 3.02] class: 0
|   |   |   |   |   |   |   |--- avg_price_per_room >  108.50
|   |   |   |   |   |   |   |   |--- avg_price_per_room <= 109.50
|   |   |   |   |   |   |   |   |   |--- weights: [27.67, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- avg_price_per_room >  109.50
|   |   |   |   |   |   |   |   |   |--- weights: [6.73, 25.64] class: 1
|   |   |   |   |--- lead_time >  117.50
|   |   |   |   |   |--- no_of_adults <= 1.50
|   |   |   |   |   |   |--- avg_price_per_room <= 122.00
|   |   |   |   |   |   |   |--- weights: [90.49, 0.00] class: 0
|   |   |   |   |   |   |--- avg_price_per_room >  122.00
|   |   |   |   |   |   |   |--- weights: [0.00, 3.02] class: 1
|   |   |   |   |   |--- no_of_adults >  1.50
|   |   |   |   |   |   |--- no_of_week_nights <= 2.50
|   |   |   |   |   |   |   |--- arrival_date <= 7.50
|   |   |   |   |   |   |   |   |--- weights: [38.89, 1.51] class: 0
|   |   |   |   |   |   |   |--- arrival_date >  7.50
|   |   |   |   |   |   |   |   |--- arrival_date <= 27.50
|   |   |   |   |   |   |   |   |   |--- avg_price_per_room <= 89.88
|   |   |   |   |   |   |   |   |   |   |--- weights: [23.18, 18.10] class: 0
|   |   |   |   |   |   |   |   |   |--- avg_price_per_room >  89.88
|   |   |   |   |   |   |   |   |   |   |--- arrival_month <= 6.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [1.50, 33.19] class: 1
|   |   |   |   |   |   |   |   |   |   |--- arrival_month >  6.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
|   |   |   |   |   |   |   |   |--- arrival_date >  27.50
|   |   |   |   |   |   |   |   |   |--- weights: [12.71, 1.51] class: 0
|   |   |   |   |   |   |--- no_of_week_nights >  2.50
|   |   |   |   |   |   |   |--- arrival_date <= 20.50
|   |   |   |   |   |   |   |   |--- avg_price_per_room <= 74.12
|   |   |   |   |   |   |   |   |   |--- avg_price_per_room <= 65.88
|   |   |   |   |   |   |   |   |   |   |--- weights: [3.74, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- avg_price_per_room >  65.88
|   |   |   |   |   |   |   |   |   |   |--- weights: [2.24, 9.05] class: 1
|   |   |   |   |   |   |   |   |--- avg_price_per_room >  74.12
|   |   |   |   |   |   |   |   |   |--- lead_time <= 146.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [20.94, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- lead_time >  146.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.75, 3.02] class: 1
|   |   |   |   |   |   |   |--- arrival_date >  20.50
|   |   |   |   |   |   |   |   |--- weights: [53.10, 1.51] class: 0
|   |   |--- market_segment_type_Online >  0.50
|   |   |   |--- lead_time <= 9.50
|   |   |   |   |--- avg_price_per_room <= 200.38
|   |   |   |   |   |--- arrival_month <= 8.50
|   |   |   |   |   |   |--- lead_time <= 3.50
|   |   |   |   |   |   |   |--- no_of_weekend_nights <= 1.50
|   |   |   |   |   |   |   |   |--- arrival_month <= 1.50
|   |   |   |   |   |   |   |   |   |--- weights: [35.90, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- arrival_month >  1.50
|   |   |   |   |   |   |   |   |   |--- avg_price_per_room <= 74.57
|   |   |   |   |   |   |   |   |   |   |--- weights: [23.93, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- avg_price_per_room >  74.57
|   |   |   |   |   |   |   |   |   |   |--- avg_price_per_room <= 75.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.75, 4.53] class: 1
|   |   |   |   |   |   |   |   |   |   |--- avg_price_per_room >  75.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 6
|   |   |   |   |   |   |   |--- no_of_weekend_nights >  1.50
|   |   |   |   |   |   |   |   |--- arrival_date <= 26.50
|   |   |   |   |   |   |   |   |   |--- weights: [20.94, 7.54] class: 0
|   |   |   |   |   |   |   |   |--- arrival_date >  26.50
|   |   |   |   |   |   |   |   |   |--- weights: [1.50, 18.10] class: 1
|   |   |   |   |   |   |--- lead_time >  3.50
|   |   |   |   |   |   |   |--- avg_price_per_room <= 99.38
|   |   |   |   |   |   |   |   |--- arrival_month <= 1.50
|   |   |   |   |   |   |   |   |   |--- weights: [23.93, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- arrival_month >  1.50
|   |   |   |   |   |   |   |   |   |--- no_of_weekend_nights <= 1.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [41.88, 27.15] class: 0
|   |   |   |   |   |   |   |   |   |--- no_of_weekend_nights >  1.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [1.50, 7.54] class: 1
|   |   |   |   |   |   |   |--- avg_price_per_room >  99.38
|   |   |   |   |   |   |   |   |--- required_car_parking_space <= 0.50
|   |   |   |   |   |   |   |   |   |--- arrival_date <= 23.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [21.69, 102.58] class: 1
|   |   |   |   |   |   |   |   |   |--- arrival_date >  23.50
|   |   |   |   |   |   |   |   |   |   |--- no_of_weekend_nights <= 1.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 3
|   |   |   |   |   |   |   |   |   |   |--- no_of_weekend_nights >  1.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.75, 12.07] class: 1
|   |   |   |   |   |   |   |   |--- required_car_parking_space >  0.50
|   |   |   |   |   |   |   |   |   |--- weights: [2.24, 0.00] class: 0
|   |   |   |   |   |--- arrival_month >  8.50
|   |   |   |   |   |   |--- arrival_year <= 2017.50
|   |   |   |   |   |   |   |--- weights: [126.39, 1.51] class: 0
|   |   |   |   |   |   |--- arrival_year >  2017.50
|   |   |   |   |   |   |   |--- arrival_month <= 11.50
|   |   |   |   |   |   |   |   |--- arrival_date <= 24.50
|   |   |   |   |   |   |   |   |   |--- arrival_month <= 10.50
|   |   |   |   |   |   |   |   |   |   |--- avg_price_per_room <= 132.05
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [19.45, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- avg_price_per_room >  132.05
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
|   |   |   |   |   |   |   |   |   |--- arrival_month >  10.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [11.22, 16.59] class: 1
|   |   |   |   |   |   |   |   |--- arrival_date >  24.50
|   |   |   |   |   |   |   |   |   |--- weights: [23.93, 1.51] class: 0
|   |   |   |   |   |   |   |--- arrival_month >  11.50
|   |   |   |   |   |   |   |   |--- weights: [33.65, 0.00] class: 0
|   |   |   |   |--- avg_price_per_room >  200.38
|   |   |   |   |   |--- weights: [0.75, 37.71] class: 1
|   |   |   |--- lead_time >  9.50
|   |   |   |   |--- avg_price_per_room <= 105.27
|   |   |   |   |   |--- lead_time <= 25.50
|   |   |   |   |   |   |--- arrival_month <= 11.50
|   |   |   |   |   |   |   |--- arrival_month <= 1.50
|   |   |   |   |   |   |   |   |--- weights: [34.40, 0.00] class: 0
|   |   |   |   |   |   |   |--- arrival_month >  1.50
|   |   |   |   |   |   |   |   |--- arrival_year <= 2017.50
|   |   |   |   |   |   |   |   |   |--- weights: [23.93, 3.02] class: 0
|   |   |   |   |   |   |   |   |--- arrival_year >  2017.50
|   |   |   |   |   |   |   |   |   |--- avg_price_per_room <= 56.15
|   |   |   |   |   |   |   |   |   |   |--- weights: [3.74, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- avg_price_per_room >  56.15
|   |   |   |   |   |   |   |   |   |   |--- weights: [51.60, 129.73] class: 1
|   |   |   |   |   |   |--- arrival_month >  11.50
|   |   |   |   |   |   |   |--- weights: [59.08, 0.00] class: 0
|   |   |   |   |   |--- lead_time >  25.50
|   |   |   |   |   |   |--- avg_price_per_room <= 60.07
|   |   |   |   |   |   |   |--- lead_time <= 84.50
|   |   |   |   |   |   |   |   |--- weights: [37.39, 4.53] class: 0
|   |   |   |   |   |   |   |--- lead_time >  84.50
|   |   |   |   |   |   |   |   |--- arrival_date <= 27.00
|   |   |   |   |   |   |   |   |   |--- arrival_month <= 6.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [5.98, 1.51] class: 0
|   |   |   |   |   |   |   |   |   |--- arrival_month >  6.50
|   |   |   |   |   |   |   |   |   |   |--- lead_time <= 131.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 15.09] class: 1
|   |   |   |   |   |   |   |   |   |   |--- lead_time >  131.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [2.24, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- arrival_date >  27.00
|   |   |   |   |   |   |   |   |   |--- weights: [6.73, 0.00] class: 0
|   |   |   |   |   |   |--- avg_price_per_room >  60.07
|   |   |   |   |   |   |   |--- required_car_parking_space <= 0.50
|   |   |   |   |   |   |   |   |--- type_of_meal_plan_Not Selected <= 0.50
|   |   |   |   |   |   |   |   |   |--- type_of_meal_plan_Meal Plan 2 <= 0.50
|   |   |   |   |   |   |   |   |   |   |--- arrival_year <= 2017.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
|   |   |   |   |   |   |   |   |   |   |--- arrival_year >  2017.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 10
|   |   |   |   |   |   |   |   |   |--- type_of_meal_plan_Meal Plan 2 >  0.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [1.50, 42.24] class: 1
|   |   |   |   |   |   |   |   |--- type_of_meal_plan_Not Selected >  0.50
|   |   |   |   |   |   |   |   |   |--- weights: [80.77, 408.81] class: 1
|   |   |   |   |   |   |   |--- required_car_parking_space >  0.50
|   |   |   |   |   |   |   |   |--- weights: [12.71, 0.00] class: 0
|   |   |   |   |--- avg_price_per_room >  105.27
|   |   |   |   |   |--- required_car_parking_space <= 0.50
|   |   |   |   |   |   |--- arrival_year <= 2017.50
|   |   |   |   |   |   |   |--- type_of_meal_plan_Meal Plan 2 <= 0.50
|   |   |   |   |   |   |   |   |--- weights: [26.18, 6.03] class: 0
|   |   |   |   |   |   |   |--- type_of_meal_plan_Meal Plan 2 >  0.50
|   |   |   |   |   |   |   |   |--- weights: [0.00, 16.59] class: 1
|   |   |   |   |   |   |--- arrival_year >  2017.50
|   |   |   |   |   |   |   |--- arrival_month <= 10.50
|   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 5 <= 0.50
|   |   |   |   |   |   |   |   |   |--- avg_price_per_room <= 195.43
|   |   |   |   |   |   |   |   |   |   |--- arrival_date <= 24.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 3
|   |   |   |   |   |   |   |   |   |   |--- arrival_date >  24.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 6
|   |   |   |   |   |   |   |   |   |--- avg_price_per_room >  195.43
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 123.70] class: 1
|   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 5 >  0.50
|   |   |   |   |   |   |   |   |   |--- arrival_date <= 11.00
|   |   |   |   |   |   |   |   |   |   |--- weights: [5.98, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- arrival_date >  11.00
|   |   |   |   |   |   |   |   |   |   |--- weights: [4.49, 10.56] class: 1
|   |   |   |   |   |   |   |--- arrival_month >  10.50
|   |   |   |   |   |   |   |   |--- avg_price_per_room <= 168.06
|   |   |   |   |   |   |   |   |   |--- lead_time <= 22.00
|   |   |   |   |   |   |   |   |   |   |--- arrival_month <= 11.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [1.50, 15.09] class: 1
|   |   |   |   |   |   |   |   |   |   |--- arrival_month >  11.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [11.97, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- lead_time >  22.00
|   |   |   |   |   |   |   |   |   |   |--- weights: [13.46, 67.88] class: 1
|   |   |   |   |   |   |   |   |--- avg_price_per_room >  168.06
|   |   |   |   |   |   |   |   |   |--- weights: [9.72, 3.02] class: 0
|   |   |   |   |   |--- required_car_parking_space >  0.50
|   |   |   |   |   |   |--- weights: [23.93, 0.00] class: 0
|   |--- no_of_special_requests >  0.50
|   |   |--- no_of_special_requests <= 1.50
|   |   |   |--- market_segment_type_Online <= 0.50
|   |   |   |   |--- type_of_meal_plan_Not Selected <= 0.50
|   |   |   |   |   |--- lead_time <= 102.50
|   |   |   |   |   |   |--- weights: [604.29, 7.54] class: 0
|   |   |   |   |   |--- lead_time >  102.50
|   |   |   |   |   |   |--- arrival_month <= 8.50
|   |   |   |   |   |   |   |--- weights: [62.82, 12.07] class: 0
|   |   |   |   |   |   |--- arrival_month >  8.50
|   |   |   |   |   |   |   |--- no_of_adults <= 2.50
|   |   |   |   |   |   |   |   |--- weights: [2.24, 7.54] class: 1
|   |   |   |   |   |   |   |--- no_of_adults >  2.50
|   |   |   |   |   |   |   |   |--- weights: [3.74, 0.00] class: 0
|   |   |   |   |--- type_of_meal_plan_Not Selected >  0.50
|   |   |   |   |   |--- lead_time <= 63.00
|   |   |   |   |   |   |--- weights: [14.21, 1.51] class: 0
|   |   |   |   |   |--- lead_time >  63.00
|   |   |   |   |   |   |--- weights: [0.00, 9.05] class: 1
|   |   |   |--- market_segment_type_Online >  0.50
|   |   |   |   |--- lead_time <= 6.50
|   |   |   |   |   |--- no_of_week_nights <= 10.00
|   |   |   |   |   |   |--- weights: [562.41, 57.32] class: 0
|   |   |   |   |   |--- no_of_week_nights >  10.00
|   |   |   |   |   |   |--- weights: [0.75, 3.02] class: 1
|   |   |   |   |--- lead_time >  6.50
|   |   |   |   |   |--- required_car_parking_space <= 0.50
|   |   |   |   |   |   |--- avg_price_per_room <= 118.54
|   |   |   |   |   |   |   |--- lead_time <= 45.50
|   |   |   |   |   |   |   |   |--- no_of_week_nights <= 3.50
|   |   |   |   |   |   |   |   |   |--- arrival_month <= 11.50
|   |   |   |   |   |   |   |   |   |   |--- arrival_month <= 1.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [50.86, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- arrival_month >  1.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 7
|   |   |   |   |   |   |   |   |   |--- arrival_month >  11.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [71.80, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- no_of_week_nights >  3.50
|   |   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 2 <= 0.50
|   |   |   |   |   |   |   |   |   |   |--- lead_time <= 16.00
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [25.43, 4.53] class: 0
|   |   |   |   |   |   |   |   |   |   |--- lead_time >  16.00
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 7
|   |   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 2 >  0.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 7.54] class: 1
|   |   |   |   |   |   |   |--- lead_time >  45.50
|   |   |   |   |   |   |   |   |--- arrival_year <= 2017.50
|   |   |   |   |   |   |   |   |   |--- lead_time <= 61.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [25.43, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- lead_time >  61.50
|   |   |   |   |   |   |   |   |   |   |--- arrival_month <= 7.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
|   |   |   |   |   |   |   |   |   |   |--- arrival_month >  7.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 5
|   |   |   |   |   |   |   |   |--- arrival_year >  2017.50
|   |   |   |   |   |   |   |   |   |--- avg_price_per_room <= 71.08
|   |   |   |   |   |   |   |   |   |   |--- weights: [78.53, 7.54] class: 0
|   |   |   |   |   |   |   |   |   |--- avg_price_per_room >  71.08
|   |   |   |   |   |   |   |   |   |   |--- arrival_month <= 9.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 7
|   |   |   |   |   |   |   |   |   |   |--- arrival_month >  9.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 5
|   |   |   |   |   |   |--- avg_price_per_room >  118.54
|   |   |   |   |   |   |   |--- arrival_month <= 8.50
|   |   |   |   |   |   |   |   |--- arrival_date <= 19.50
|   |   |   |   |   |   |   |   |   |--- no_of_adults <= 2.50
|   |   |   |   |   |   |   |   |   |   |--- no_of_week_nights <= 5.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 6
|   |   |   |   |   |   |   |   |   |   |--- no_of_week_nights >  5.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 6.03] class: 1
|   |   |   |   |   |   |   |   |   |--- no_of_adults >  2.50
|   |   |   |   |   |   |   |   |   |   |--- arrival_date <= 17.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [40.39, 25.64] class: 0
|   |   |   |   |   |   |   |   |   |   |--- arrival_date >  17.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [4.49, 13.58] class: 1
|   |   |   |   |   |   |   |   |--- arrival_date >  19.50
|   |   |   |   |   |   |   |   |   |--- arrival_date <= 27.50
|   |   |   |   |   |   |   |   |   |   |--- avg_price_per_room <= 121.20
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [17.20, 7.54] class: 0
|   |   |   |   |   |   |   |   |   |   |--- avg_price_per_room >  121.20
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 4
|   |   |   |   |   |   |   |   |   |--- arrival_date >  27.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [62.07, 33.19] class: 0
|   |   |   |   |   |   |   |--- arrival_month >  8.50
|   |   |   |   |   |   |   |   |--- arrival_year <= 2017.50
|   |   |   |   |   |   |   |   |   |--- arrival_month <= 9.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [12.71, 9.05] class: 0
|   |   |   |   |   |   |   |   |   |--- arrival_month >  9.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [26.18, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- arrival_year >  2017.50
|   |   |   |   |   |   |   |   |   |--- arrival_month <= 11.50
|   |   |   |   |   |   |   |   |   |   |--- avg_price_per_room <= 159.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 10
|   |   |   |   |   |   |   |   |   |   |--- avg_price_per_room >  159.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
|   |   |   |   |   |   |   |   |   |--- arrival_month >  11.50
|   |   |   |   |   |   |   |   |   |   |--- lead_time <= 100.00
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [46.37, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- lead_time >  100.00
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 18.10] class: 1
|   |   |   |   |   |--- required_car_parking_space >  0.50
|   |   |   |   |   |   |--- weights: [121.16, 1.51] class: 0
|   |   |--- no_of_special_requests >  1.50
|   |   |   |--- lead_time <= 90.50
|   |   |   |   |--- no_of_week_nights <= 3.50
|   |   |   |   |   |--- weights: [1345.45, 0.00] class: 0
|   |   |   |   |--- no_of_week_nights >  3.50
|   |   |   |   |   |--- no_of_special_requests <= 2.50
|   |   |   |   |   |   |--- avg_price_per_room <= 92.42
|   |   |   |   |   |   |   |--- weights: [45.62, 25.64] class: 0
|   |   |   |   |   |   |--- avg_price_per_room >  92.42
|   |   |   |   |   |   |   |--- weights: [104.70, 21.12] class: 0
|   |   |   |   |   |--- no_of_special_requests >  2.50
|   |   |   |   |   |   |--- weights: [45.62, 0.00] class: 0
|   |   |   |--- lead_time >  90.50
|   |   |   |   |--- no_of_special_requests <= 2.50
|   |   |   |   |   |--- arrival_month <= 8.50
|   |   |   |   |   |   |--- avg_price_per_room <= 202.95
|   |   |   |   |   |   |   |--- arrival_year <= 2017.50
|   |   |   |   |   |   |   |   |--- weights: [8.97, 10.56] class: 1
|   |   |   |   |   |   |   |--- arrival_year >  2017.50
|   |   |   |   |   |   |   |   |--- lead_time <= 150.50
|   |   |   |   |   |   |   |   |   |--- weights: [148.83, 24.14] class: 0
|   |   |   |   |   |   |   |   |--- lead_time >  150.50
|   |   |   |   |   |   |   |   |   |--- weights: [0.00, 4.53] class: 1
|   |   |   |   |   |   |--- avg_price_per_room >  202.95
|   |   |   |   |   |   |   |--- weights: [0.00, 10.56] class: 1
|   |   |   |   |   |--- arrival_month >  8.50
|   |   |   |   |   |   |--- avg_price_per_room <= 152.10
|   |   |   |   |   |   |   |--- avg_price_per_room <= 73.53
|   |   |   |   |   |   |   |   |--- weights: [11.22, 3.02] class: 0
|   |   |   |   |   |   |   |--- avg_price_per_room >  73.53
|   |   |   |   |   |   |   |   |--- avg_price_per_room <= 90.42
|   |   |   |   |   |   |   |   |   |--- lead_time <= 107.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [2.99, 22.63] class: 1
|   |   |   |   |   |   |   |   |   |--- lead_time >  107.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [10.47, 9.05] class: 0
|   |   |   |   |   |   |   |   |--- avg_price_per_room >  90.42
|   |   |   |   |   |   |   |   |   |--- no_of_adults <= 1.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [5.98, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- no_of_adults >  1.50
|   |   |   |   |   |   |   |   |   |   |--- no_of_weekend_nights <= 1.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 3
|   |   |   |   |   |   |   |   |   |   |--- no_of_weekend_nights >  1.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [19.45, 10.56] class: 0
|   |   |   |   |   |   |--- avg_price_per_room >  152.10
|   |   |   |   |   |   |   |--- weights: [11.22, 1.51] class: 0
|   |   |   |   |--- no_of_special_requests >  2.50
|   |   |   |   |   |--- weights: [59.83, 0.00] class: 0
|--- lead_time >  151.50
|   |--- avg_price_per_room <= 100.04
|   |   |--- no_of_special_requests <= 0.50
|   |   |   |--- no_of_adults <= 1.50
|   |   |   |   |--- market_segment_type_Online <= 0.50
|   |   |   |   |   |--- lead_time <= 163.50
|   |   |   |   |   |   |--- arrival_month <= 5.00
|   |   |   |   |   |   |   |--- weights: [2.99, 0.00] class: 0
|   |   |   |   |   |   |--- arrival_month >  5.00
|   |   |   |   |   |   |   |--- weights: [0.75, 21.12] class: 1
|   |   |   |   |   |--- lead_time >  163.50
|   |   |   |   |   |   |--- lead_time <= 341.00
|   |   |   |   |   |   |   |--- lead_time <= 173.00
|   |   |   |   |   |   |   |   |--- arrival_date <= 3.50
|   |   |   |   |   |   |   |   |   |--- weights: [38.14, 9.05] class: 0
|   |   |   |   |   |   |   |   |--- arrival_date >  3.50
|   |   |   |   |   |   |   |   |   |--- no_of_weekend_nights <= 1.00
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 12.07] class: 1
|   |   |   |   |   |   |   |   |   |--- no_of_weekend_nights >  1.00
|   |   |   |   |   |   |   |   |   |   |--- weights: [2.24, 0.00] class: 0
|   |   |   |   |   |   |   |--- lead_time >  173.00
|   |   |   |   |   |   |   |   |--- arrival_month <= 5.50
|   |   |   |   |   |   |   |   |   |--- arrival_date <= 7.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 4.53] class: 1
|   |   |   |   |   |   |   |   |   |--- arrival_date >  7.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [5.24, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- arrival_month >  5.50
|   |   |   |   |   |   |   |   |   |--- weights: [164.54, 6.03] class: 0
|   |   |   |   |   |   |--- lead_time >  341.00
|   |   |   |   |   |   |   |--- weights: [11.97, 24.14] class: 1
|   |   |   |   |--- market_segment_type_Online >  0.50
|   |   |   |   |   |--- avg_price_per_room <= 30.53
|   |   |   |   |   |   |--- weights: [8.23, 1.51] class: 0
|   |   |   |   |   |--- avg_price_per_room >  30.53
|   |   |   |   |   |   |--- weights: [0.75, 81.46] class: 1
|   |   |   |--- no_of_adults >  1.50
|   |   |   |   |--- arrival_year <= 2017.50
|   |   |   |   |   |--- lead_time <= 215.50
|   |   |   |   |   |   |--- lead_time <= 167.50
|   |   |   |   |   |   |   |--- weights: [15.71, 3.02] class: 0
|   |   |   |   |   |   |--- lead_time >  167.50
|   |   |   |   |   |   |   |--- arrival_date <= 9.50
|   |   |   |   |   |   |   |   |--- avg_price_per_room <= 74.62
|   |   |   |   |   |   |   |   |   |--- weights: [17.20, 1.51] class: 0
|   |   |   |   |   |   |   |   |--- avg_price_per_room >  74.62
|   |   |   |   |   |   |   |   |   |--- weights: [0.00, 16.59] class: 1
|   |   |   |   |   |   |   |--- arrival_date >  9.50
|   |   |   |   |   |   |   |   |--- weights: [0.00, 64.87] class: 1
|   |   |   |   |   |--- lead_time >  215.50
|   |   |   |   |   |   |--- market_segment_type_Offline <= 0.50
|   |   |   |   |   |   |   |--- weights: [0.00, 7.54] class: 1
|   |   |   |   |   |   |--- market_segment_type_Offline >  0.50
|   |   |   |   |   |   |   |--- weights: [46.37, 0.00] class: 0
|   |   |   |   |--- arrival_year >  2017.50
|   |   |   |   |   |--- avg_price_per_room <= 84.19
|   |   |   |   |   |   |--- market_segment_type_Online <= 0.50
|   |   |   |   |   |   |   |--- lead_time <= 211.00
|   |   |   |   |   |   |   |   |--- no_of_previous_bookings_not_canceled <= 0.50
|   |   |   |   |   |   |   |   |   |--- weights: [54.60, 6.03] class: 0
|   |   |   |   |   |   |   |   |--- no_of_previous_bookings_not_canceled >  0.50
|   |   |   |   |   |   |   |   |   |--- weights: [0.00, 4.53] class: 1
|   |   |   |   |   |   |   |--- lead_time >  211.00
|   |   |   |   |   |   |   |   |--- arrival_month <= 11.50
|   |   |   |   |   |   |   |   |   |--- avg_price_per_room <= 80.38
|   |   |   |   |   |   |   |   |   |   |--- no_of_week_nights <= 3.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [10.47, 245.89] class: 1
|   |   |   |   |   |   |   |   |   |   |--- no_of_week_nights >  3.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 3
|   |   |   |   |   |   |   |   |   |--- avg_price_per_room >  80.38
|   |   |   |   |   |   |   |   |   |   |--- weights: [11.22, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- arrival_month >  11.50
|   |   |   |   |   |   |   |   |   |--- weights: [20.19, 0.00] class: 0
|   |   |   |   |   |   |--- market_segment_type_Online >  0.50
|   |   |   |   |   |   |   |--- weights: [2.99, 229.30] class: 1
|   |   |   |   |   |--- avg_price_per_room >  84.19
|   |   |   |   |   |   |--- no_of_adults <= 2.50
|   |   |   |   |   |   |   |--- arrival_month <= 11.50
|   |   |   |   |   |   |   |   |--- weights: [9.72, 810.08] class: 1
|   |   |   |   |   |   |   |--- arrival_month >  11.50
|   |   |   |   |   |   |   |   |--- market_segment_type_Offline <= 0.50
|   |   |   |   |   |   |   |   |   |--- weights: [0.00, 9.05] class: 1
|   |   |   |   |   |   |   |   |--- market_segment_type_Offline >  0.50
|   |   |   |   |   |   |   |   |   |--- weights: [3.74, 0.00] class: 0
|   |   |   |   |   |   |--- no_of_adults >  2.50
|   |   |   |   |   |   |   |--- weights: [3.74, 0.00] class: 0
|   |   |--- no_of_special_requests >  0.50
|   |   |   |--- no_of_weekend_nights <= 0.50
|   |   |   |   |--- lead_time <= 180.50
|   |   |   |   |   |--- lead_time <= 159.50
|   |   |   |   |   |   |--- arrival_month <= 8.50
|   |   |   |   |   |   |   |--- weights: [5.98, 0.00] class: 0
|   |   |   |   |   |   |--- arrival_month >  8.50
|   |   |   |   |   |   |   |--- weights: [1.50, 7.54] class: 1
|   |   |   |   |   |--- lead_time >  159.50
|   |   |   |   |   |   |--- weights: [32.91, 4.53] class: 0
|   |   |   |   |--- lead_time >  180.50
|   |   |   |   |   |--- no_of_special_requests <= 2.50
|   |   |   |   |   |   |--- market_segment_type_Online <= 0.50
|   |   |   |   |   |   |   |--- weights: [9.72, 6.03] class: 0
|   |   |   |   |   |   |--- market_segment_type_Online >  0.50
|   |   |   |   |   |   |   |--- weights: [5.98, 187.06] class: 1
|   |   |   |   |   |--- no_of_special_requests >  2.50
|   |   |   |   |   |   |--- weights: [5.98, 0.00] class: 0
|   |   |   |--- no_of_weekend_nights >  0.50
|   |   |   |   |--- market_segment_type_Online <= 0.50
|   |   |   |   |   |--- weights: [97.97, 6.03] class: 0
|   |   |   |   |--- market_segment_type_Online >  0.50
|   |   |   |   |   |--- no_of_week_nights <= 9.00
|   |   |   |   |   |   |--- arrival_month <= 11.50
|   |   |   |   |   |   |   |--- avg_price_per_room <= 76.48
|   |   |   |   |   |   |   |   |--- weights: [41.13, 3.02] class: 0
|   |   |   |   |   |   |   |--- avg_price_per_room >  76.48
|   |   |   |   |   |   |   |   |--- arrival_date <= 28.50
|   |   |   |   |   |   |   |   |   |--- no_of_week_nights <= 6.50
|   |   |   |   |   |   |   |   |   |   |--- lead_time <= 152.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.75, 4.53] class: 1
|   |   |   |   |   |   |   |   |   |   |--- lead_time >  152.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 3
|   |   |   |   |   |   |   |   |   |--- no_of_week_nights >  6.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.75, 6.03] class: 1
|   |   |   |   |   |   |   |   |--- arrival_date >  28.50
|   |   |   |   |   |   |   |   |   |--- arrival_month <= 9.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [10.47, 4.53] class: 0
|   |   |   |   |   |   |   |   |   |--- arrival_month >  9.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [4.49, 16.59] class: 1
|   |   |   |   |   |   |--- arrival_month >  11.50
|   |   |   |   |   |   |   |--- arrival_date <= 14.50
|   |   |   |   |   |   |   |   |--- weights: [7.48, 1.51] class: 0
|   |   |   |   |   |   |   |--- arrival_date >  14.50
|   |   |   |   |   |   |   |   |--- weights: [9.72, 22.63] class: 1
|   |   |   |   |   |--- no_of_week_nights >  9.00
|   |   |   |   |   |   |--- weights: [0.75, 10.56] class: 1
|   |--- avg_price_per_room >  100.04
|   |   |--- arrival_month <= 11.50
|   |   |   |--- no_of_special_requests <= 2.50
|   |   |   |   |--- weights: [0.00, 2742.50] class: 1
|   |   |   |--- no_of_special_requests >  2.50
|   |   |   |   |--- weights: [20.19, 0.00] class: 0
|   |   |--- arrival_month >  11.50
|   |   |   |--- no_of_special_requests <= 0.50
|   |   |   |   |--- weights: [31.41, 0.00] class: 0
|   |   |   |--- no_of_special_requests >  0.50
|   |   |   |   |--- arrival_date <= 24.50
|   |   |   |   |   |--- weights: [3.74, 0.00] class: 0
|   |   |   |   |--- arrival_date >  24.50
|   |   |   |   |   |--- weights: [2.99, 22.63] class: 1

importances = best_model.feature_importances_
indices = np.argsort(importances)

plt.figure(figsize=(12, 12))
plt.title("Feature Importances")
plt.barh(range(len(indices)), importances[indices], color="violet", align="center")
plt.yticks(range(len(indices)), [feature_names[i] for i in indices])
plt.xlabel("Relative Importance")
plt.show()

Comparing Decision Tree models

# training performance comparison

models_train_comp_df = pd.concat(
    [
        decision_tree_perf_train.T,
        decision_tree_tune_perf_train.T,
        decision_tree_post_perf_train.T,
    ],
    axis=1,
)
models_train_comp_df.columns = [
    "Decision Tree sklearn",
    "Decision Tree (Pre-Pruning)",
    "Decision Tree (Post-Pruning)",
]
print("Training performance comparison:")
models_train_comp_df

Training performance comparison:

# testing performance comparison

models_test_comp_df = pd.concat(
    [
        decision_tree_perf_test.T,
        decision_tree_tune_perf_test.T,
        decision_tree_post_test.T,
    ],
    axis=1,
)
models_test_comp_df.columns = [
    "Decision Tree sklearn",
    "Decision Tree (Pre-Pruning)",
    "Decision Tree (Post-Pruning)",
]
print("Test performance comparison:")
models_test_comp_df

Test performance comparison:

Post-Pruned has higher F1 value and difference between precision and recall is high.

Pre-Prunned - Difference between precision and recall are normal.

Hotel should use Pre-prunned model

Actionable Insights and Recommendations¶

What profitable policies for cancellations and refunds can the hotel adopt?
What other recommedations would you suggest to the hotel?

Insights -

Lead time and Average price per room have positive correlation with cancelled bookings.

Number of special requests have negative correlation with cancelled bookings.

Decision tree seems to be the better model to determine predictions of cancellations than logistic regression.

Recommendations -

Hotel needs to work on lead time, average price per room and number of special requests to keep up brand equity.

Customers can be notified of lead time and they can be asked for any special requets before stay.

Special amenities list can be provided after booking to reduce chances of cancellation.

Depending on the Lead time, average price per room can be adjusted to attract more customers to the hotel.

Depending on the number of special requests, hotel can adjust the average price per room to maintain resources and brand equity.

Hotel can come up with policies on cancellations or refunda based on lead time.

	no_of_adults	no_of_children	no_of_weekend_nights	no_of_week_nights	type_of_meal_plan	required_car_parking_space	room_type_reserved	lead_time	arrival_year	arrival_month	arrival_date	market_segment_type	repeated_guest	no_of_previous_cancellations	no_of_previous_bookings_not_canceled	avg_price_per_room	no_of_special_requests	booking_status
count	36275.00000	36275.00000	36275.00000	36275.00000	36275	36275.00000	36275	36275.00000	36275.00000	36275.00000	36275.00000	36275	36275.00000	36275.00000	36275.00000	36275.00000	36275.00000	36275
unique	NaN	NaN	NaN	NaN	4	NaN	7	NaN	NaN	NaN	NaN	5	NaN	NaN	NaN	NaN	NaN	2
top	NaN	NaN	NaN	NaN	Meal Plan 1	NaN	Room_Type 1	NaN	NaN	NaN	NaN	Online	NaN	NaN	NaN	NaN	NaN	Not_Canceled
freq	NaN	NaN	NaN	NaN	27835	NaN	28130	NaN	NaN	NaN	NaN	23214	NaN	NaN	NaN	NaN	NaN	24390
mean	1.84496	0.10528	0.81072	2.20430	NaN	0.03099	NaN	85.23256	2017.82043	7.42365	15.59700	NaN	0.02564	0.02335	0.15341	103.42354	0.61966	NaN
std	0.51871	0.40265	0.87064	1.41090	NaN	0.17328	NaN	85.93082	0.38384	3.06989	8.74045	NaN	0.15805	0.36833	1.75417	35.08942	0.78624	NaN
min	0.00000	0.00000	0.00000	0.00000	NaN	0.00000	NaN	0.00000	2017.00000	1.00000	1.00000	NaN	0.00000	0.00000	0.00000	0.00000	0.00000	NaN
25%	2.00000	0.00000	0.00000	1.00000	NaN	0.00000	NaN	17.00000	2018.00000	5.00000	8.00000	NaN	0.00000	0.00000	0.00000	80.30000	0.00000	NaN
50%	2.00000	0.00000	1.00000	2.00000	NaN	0.00000	NaN	57.00000	2018.00000	8.00000	16.00000	NaN	0.00000	0.00000	0.00000	99.45000	0.00000	NaN
75%	2.00000	0.00000	2.00000	3.00000	NaN	0.00000	NaN	126.00000	2018.00000	10.00000	23.00000	NaN	0.00000	0.00000	0.00000	120.00000	1.00000	NaN
max	4.00000	10.00000	7.00000	17.00000	NaN	1.00000	NaN	443.00000	2018.00000	12.00000	31.00000	NaN	1.00000	13.00000	58.00000	540.00000	5.00000	NaN

	const	no_of_adults	no_of_children	no_of_weekend_nights	no_of_week_nights	required_car_parking_space	lead_time	arrival_year	arrival_month	repeated_guest	no_of_previous_cancellations	avg_price_per_room	no_of_special_requests	type_of_meal_plan_Meal Plan 2	type_of_meal_plan_Not Selected	room_type_reserved_Room_Type 2	room_type_reserved_Room_Type 4	room_type_reserved_Room_Type 5	room_type_reserved_Room_Type 6	room_type_reserved_Room_Type 7	market_segment_type_Corporate	market_segment_type_Offline
Odds	0.00000	1.12876	1.16108	1.11150	1.04000	0.19726	1.01586	1.58920	0.95528	0.07509	1.28240	1.01981	0.22720	1.15519	1.32265	0.75457	0.72592	0.44988	0.36239	0.23469	0.46300	0.16863
Change_odd%	-100.00000	12.87602	16.10765	11.15004	4.00042	-80.27353	1.58634	58.92016	-4.47225	-92.49133	28.24029	1.98082	-77.28039	15.51860	32.26549	-24.54274	-27.40770	-55.01209	-63.76104	-76.53095	-53.70018	-83.13674

	Logistic Regression-default Threshold	Logistic Regression-0.37 Threshold	Logistic Regression-0.42 Threshold
Accuracy	0.80657	0.79076	0.80119
Recall	0.63945	0.74549	0.70183
Precision	0.74140	0.66428	0.69941
F1	0.68666	0.70255	0.70062

	Logistic Regression-default Threshold	Logistic Regression-0.37 Threshold	Logistic Regression-0.42 Threshold
Accuracy	0.80386	0.79104	0.80159
Recall	0.63220	0.74181	0.70285
Precision	0.72360	0.65489	0.68768
F1	0.67482	0.69564	0.69518

	ccp_alphas	impurities
0	0.00000	0.00824
1	-0.00000	0.00824
2	0.00000	0.00824
3	0.00000	0.00824
4	0.00000	0.00824
...	...	...
1429	0.00926	0.32787
1430	0.00980	0.33767
1431	0.01253	0.35020
1432	0.03447	0.41913
1433	0.08087	0.50000

	Booking_ID	no_of_adults	no_of_weekend_nights	no_of_week_nights	type_of_meal_plan	room_type_reserved	lead_time	arrival_year	arrival_month	arrival_date	market_segment_type	avg_price_per_room	no_of_special_requests	booking_status
0	INN00001	2	1	2	Meal Plan 1	Room_Type 1	224	2017	10	2	Offline	65.00000	0	Not_Canceled
1	INN00002	2	2	3	Not Selected	Room_Type 1	5	2018	11	6	Online	106.68000	1	Not_Canceled
2	INN00003	1	2	1	Meal Plan 1	Room_Type 1	1	2018	2	28	Online	60.00000	0	Canceled
3	INN00004	2	0	2	Meal Plan 1	Room_Type 1	211	2018	5	20	Online	100.00000	0	Canceled
4	INN00005	2	1	1	Not Selected	Room_Type 1	48	2018	4	11	Online	94.50000	0	Canceled

	Booking_ID	no_of_adults	no_of_weekend_nights	no_of_week_nights	type_of_meal_plan	room_type_reserved	lead_time	arrival_year	arrival_month	arrival_date	market_segment_type	avg_price_per_room	no_of_special_requests	booking_status
36270	INN36271	3	2	6	Meal Plan 1	Room_Type 4	85	2018	8	3	Online	167.80000	1	Not_Canceled
36271	INN36272	2	1	3	Meal Plan 1	Room_Type 1	228	2018	10	17	Online	90.95000	2	Canceled
36272	INN36273	2	2	6	Meal Plan 1	Room_Type 1	148	2018	7	1	Online	98.39000	2	Not_Canceled
36273	INN36274	2	0	3	Not Selected	Room_Type 1	63	2018	4	21	Online	94.50000	0	Canceled
36274	INN36275	2	1	2	Meal Plan 1	Room_Type 1	207	2018	12	30	Offline	161.67000	0	Not_Canceled

	feature	VIF
0	const	39673160.35796
1	no_of_adults	1.34473
2	no_of_children	2.11320
3	no_of_weekend_nights	1.06826
4	no_of_week_nights	1.09707
5	required_car_parking_space	1.03999
6	lead_time	1.39009
7	arrival_year	1.43715
8	arrival_month	1.27403
9	arrival_date	1.00724
10	repeated_guest	1.75958
11	no_of_previous_cancellations	1.41474
12	no_of_previous_bookings_not_canceled	1.66353
13	avg_price_per_room	2.06716
14	no_of_special_requests	1.24634
15	type_of_meal_plan_Meal Plan 2	1.27441
16	type_of_meal_plan_Meal Plan 3	1.02977
17	type_of_meal_plan_Not Selected	1.26974
18	room_type_reserved_Room_Type 2	1.10531
19	room_type_reserved_Room_Type 3	1.00376
20	room_type_reserved_Room_Type 4	1.36749
21	room_type_reserved_Room_Type 5	1.02637
22	room_type_reserved_Room_Type 6	2.06781
23	room_type_reserved_Room_Type 7	1.12051
24	market_segment_type_Complementary	4.36213
25	market_segment_type_Corporate	15.68286
26	market_segment_type_Offline	59.41340
27	market_segment_type_Online	65.92889

	Decision Tree sklearn	Decision Tree (Pre-Pruning)	Decision Tree (Post-Pruning)
Accuracy	0.99403	0.99403	0.89653
Recall	0.98655	0.98655	0.90241
Precision	0.99538	0.99538	0.80789
F1	0.99095	0.99095	0.85254

	Decision Tree sklearn	Decision Tree (Pre-Pruning)	Decision Tree (Post-Pruning)
Accuracy	0.99403	0.87078	0.86775
Recall	0.98655	0.80411	0.85207
Precision	0.99538	0.79644	0.76421
F1	0.99095	0.80026	0.80575