INN Hotels Project

Context

A significant number of hotel bookings are called-off due to cancellations or no-shows. The typical reasons for cancellations include change of plans, scheduling conflicts, etc. This is often made easier by the option to do so free of charge or preferably at a low cost which is beneficial to hotel guests but it is a less desirable and possibly revenue-diminishing factor for hotels to deal with. Such losses are particularly high on last-minute cancellations.

The new technologies involving online booking channels have dramatically changed customers’ booking possibilities and behavior. This adds a further dimension to the challenge of how hotels handle cancellations, which are no longer limited to traditional booking and guest characteristics.

The cancellation of bookings impact a hotel on various fronts:

  • Loss of resources (revenue) when the hotel cannot resell the room.
  • Additional costs of distribution channels by increasing commissions or paying for publicity to help sell these rooms.
  • Lowering prices last minute, so the hotel can resell a room, resulting in reducing the profit margin.
  • Human resources to make arrangements for the guests.

Objective

The increasing number of cancellations calls for a Machine Learning based solution that can help in predicting which booking is likely to be canceled. INN Hotels Group has a chain of hotels in Portugal, they are facing problems with the high number of booking cancellations and have reached out to your firm for data-driven solutions. You as a data scientist have to analyze the data provided to find which factors have a high influence on booking cancellations, build a predictive model that can predict which booking is going to be canceled in advance, and help in formulating profitable policies for cancellations and refunds.

Data Description

The data contains the different attributes of customers' booking details. The detailed data dictionary is given below.

Data Dictionary

  • Booking_ID: unique identifier of each booking
  • no_of_adults: Number of adults
  • no_of_children: Number of Children
  • no_of_weekend_nights: Number of weekend nights (Saturday or Sunday) the guest stayed or booked to stay at the hotel
  • no_of_week_nights: Number of week nights (Monday to Friday) the guest stayed or booked to stay at the hotel
  • type_of_meal_plan: Type of meal plan booked by the customer:
    • Not Selected – No meal plan selected
    • Meal Plan 1 – Breakfast
    • Meal Plan 2 – Half board (breakfast and one other meal)
    • Meal Plan 3 – Full board (breakfast, lunch, and dinner)
  • required_car_parking_space: Does the customer require a car parking space? (0 - No, 1- Yes)
  • room_type_reserved: Type of room reserved by the customer. The values are ciphered (encoded) by INN Hotels.
  • lead_time: Number of days between the date of booking and the arrival date
  • arrival_year: Year of arrival date
  • arrival_month: Month of arrival date
  • arrival_date: Date of the month
  • market_segment_type: Market segment designation.
  • repeated_guest: Is the customer a repeated guest? (0 - No, 1- Yes)
  • no_of_previous_cancellations: Number of previous bookings that were canceled by the customer prior to the current booking
  • no_of_previous_bookings_not_canceled: Number of previous bookings not canceled by the customer prior to the current booking
  • avg_price_per_room: Average price per day of the reservation; prices of the rooms are dynamic. (in euros)
  • no_of_special_requests: Total number of special requests made by the customer (e.g. high floor, view from the room, etc)
  • booking_status: Flag indicating if the booking was canceled or not.

Importing necessary libraries and data

In [ ]:
# Importing libraries
import warnings
warnings.filterwarnings("ignore")
from statsmodels.tools.sm_exceptions import ConvergenceWarning
warnings.simplefilter("ignore", ConvergenceWarning)
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
pd.set_option("display.max_columns", None)
pd.set_option("display.max_rows", 200)
pd.set_option("display.float_format", lambda x: "%.5f" % x)
from sklearn.model_selection import train_test_split
import statsmodels.stats.api as sms
from statsmodels.stats.outliers_influence import variance_inflation_factor
import statsmodels.api as sm
from statsmodels.tools.tools import add_constant
from sklearn.tree import DecisionTreeClassifier
from sklearn import tree
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import (
    f1_score,
    accuracy_score,
    recall_score,
    precision_score,
    confusion_matrix,
    roc_auc_score,
    plot_confusion_matrix,
    precision_recall_curve,
    roc_curve,
    make_scorer,
)
In [ ]:
# Importing data set from google drive

from google.colab import drive
drive.mount('/content/drive')
data = pd.read_csv('/content/drive/MyDrive/DSBA/Logistic Regression/INN Hotels Project/INNHotelsGroup.csv')
Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).

Data Overview

  • Observations
  • Sanity checks
In [ ]:
# First 5 rows

data.head()
Out[ ]:
Booking_ID no_of_adults no_of_children no_of_weekend_nights no_of_week_nights type_of_meal_plan required_car_parking_space room_type_reserved lead_time arrival_year arrival_month arrival_date market_segment_type repeated_guest no_of_previous_cancellations no_of_previous_bookings_not_canceled avg_price_per_room no_of_special_requests booking_status
0 INN00001 2 0 1 2 Meal Plan 1 0 Room_Type 1 224 2017 10 2 Offline 0 0 0 65.00000 0 Not_Canceled
1 INN00002 2 0 2 3 Not Selected 0 Room_Type 1 5 2018 11 6 Online 0 0 0 106.68000 1 Not_Canceled
2 INN00003 1 0 2 1 Meal Plan 1 0 Room_Type 1 1 2018 2 28 Online 0 0 0 60.00000 0 Canceled
3 INN00004 2 0 0 2 Meal Plan 1 0 Room_Type 1 211 2018 5 20 Online 0 0 0 100.00000 0 Canceled
4 INN00005 2 0 1 1 Not Selected 0 Room_Type 1 48 2018 4 11 Online 0 0 0 94.50000 0 Canceled
In [ ]:
# Last 5 rows

data.tail()
Out[ ]:
Booking_ID no_of_adults no_of_children no_of_weekend_nights no_of_week_nights type_of_meal_plan required_car_parking_space room_type_reserved lead_time arrival_year arrival_month arrival_date market_segment_type repeated_guest no_of_previous_cancellations no_of_previous_bookings_not_canceled avg_price_per_room no_of_special_requests booking_status
36270 INN36271 3 0 2 6 Meal Plan 1 0 Room_Type 4 85 2018 8 3 Online 0 0 0 167.80000 1 Not_Canceled
36271 INN36272 2 0 1 3 Meal Plan 1 0 Room_Type 1 228 2018 10 17 Online 0 0 0 90.95000 2 Canceled
36272 INN36273 2 0 2 6 Meal Plan 1 0 Room_Type 1 148 2018 7 1 Online 0 0 0 98.39000 2 Not_Canceled
36273 INN36274 2 0 0 3 Not Selected 0 Room_Type 1 63 2018 4 21 Online 0 0 0 94.50000 0 Canceled
36274 INN36275 2 0 1 2 Meal Plan 1 0 Room_Type 1 207 2018 12 30 Offline 0 0 0 161.67000 0 Not_Canceled
In [ ]:
# Shape of data

data.shape 
Out[ ]:
(36275, 19)
In [ ]:
# Data types of the columns in dataset

data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 36275 entries, 0 to 36274
Data columns (total 19 columns):
 #   Column                                Non-Null Count  Dtype  
---  ------                                --------------  -----  
 0   Booking_ID                            36275 non-null  object 
 1   no_of_adults                          36275 non-null  int64  
 2   no_of_children                        36275 non-null  int64  
 3   no_of_weekend_nights                  36275 non-null  int64  
 4   no_of_week_nights                     36275 non-null  int64  
 5   type_of_meal_plan                     36275 non-null  object 
 6   required_car_parking_space            36275 non-null  int64  
 7   room_type_reserved                    36275 non-null  object 
 8   lead_time                             36275 non-null  int64  
 9   arrival_year                          36275 non-null  int64  
 10  arrival_month                         36275 non-null  int64  
 11  arrival_date                          36275 non-null  int64  
 12  market_segment_type                   36275 non-null  object 
 13  repeated_guest                        36275 non-null  int64  
 14  no_of_previous_cancellations          36275 non-null  int64  
 15  no_of_previous_bookings_not_canceled  36275 non-null  int64  
 16  avg_price_per_room                    36275 non-null  float64
 17  no_of_special_requests                36275 non-null  int64  
 18  booking_status                        36275 non-null  object 
dtypes: float64(1), int64(13), object(5)
memory usage: 5.3+ MB

There are 14 numeric (float and int type) and 5 string (object type) columns in the data.

In [ ]:
# Checking duplicate values

data.duplicated().sum()
Out[ ]:
0

There are no duplicate values in the dataset.

In [ ]:
# Dropping Booking_ID column

data = data.drop(["Booking_ID"], axis=1)
In [ ]:
data.head()
Out[ ]:
no_of_adults no_of_children no_of_weekend_nights no_of_week_nights type_of_meal_plan required_car_parking_space room_type_reserved lead_time arrival_year arrival_month arrival_date market_segment_type repeated_guest no_of_previous_cancellations no_of_previous_bookings_not_canceled avg_price_per_room no_of_special_requests booking_status
0 2 0 1 2 Meal Plan 1 0 Room_Type 1 224 2017 10 2 Offline 0 0 0 65.00000 0 Not_Canceled
1 2 0 2 3 Not Selected 0 Room_Type 1 5 2018 11 6 Online 0 0 0 106.68000 1 Not_Canceled
2 1 0 2 1 Meal Plan 1 0 Room_Type 1 1 2018 2 28 Online 0 0 0 60.00000 0 Canceled
3 2 0 0 2 Meal Plan 1 0 Room_Type 1 211 2018 5 20 Online 0 0 0 100.00000 0 Canceled
4 2 0 1 1 Not Selected 0 Room_Type 1 48 2018 4 11 Online 0 0 0 94.50000 0 Canceled

Exploratory Data Analysis (EDA)

  • EDA is an important part of any project involving data.
  • It is important to investigate and understand the data better before building a model with it.
  • A few questions have been mentioned below which will help you approach the analysis in the right manner and generate insights from the data.
  • A thorough analysis of the data, in addition to the questions mentioned below, should be done.

Leading Questions:

  1. What are the busiest months in the hotel?
  2. Which market segment do most of the guests come from?
  3. Hotel rates are dynamic and change according to demand and customer demographics. What are the differences in room prices in different market segments?
  4. What percentage of bookings are canceled?
  5. Repeating guests are the guests who stay in the hotel often and are important to brand equity. What percentage of repeating guests cancel?
  6. Many guests have special requirements when booking a hotel room. Do these requirements affect booking cancellation?
In [ ]:
# Statistical summary of data

data.describe(include="all") 
Out[ ]:
no_of_adults no_of_children no_of_weekend_nights no_of_week_nights type_of_meal_plan required_car_parking_space room_type_reserved lead_time arrival_year arrival_month arrival_date market_segment_type repeated_guest no_of_previous_cancellations no_of_previous_bookings_not_canceled avg_price_per_room no_of_special_requests booking_status
count 36275.00000 36275.00000 36275.00000 36275.00000 36275 36275.00000 36275 36275.00000 36275.00000 36275.00000 36275.00000 36275 36275.00000 36275.00000 36275.00000 36275.00000 36275.00000 36275
unique NaN NaN NaN NaN 4 NaN 7 NaN NaN NaN NaN 5 NaN NaN NaN NaN NaN 2
top NaN NaN NaN NaN Meal Plan 1 NaN Room_Type 1 NaN NaN NaN NaN Online NaN NaN NaN NaN NaN Not_Canceled
freq NaN NaN NaN NaN 27835 NaN 28130 NaN NaN NaN NaN 23214 NaN NaN NaN NaN NaN 24390
mean 1.84496 0.10528 0.81072 2.20430 NaN 0.03099 NaN 85.23256 2017.82043 7.42365 15.59700 NaN 0.02564 0.02335 0.15341 103.42354 0.61966 NaN
std 0.51871 0.40265 0.87064 1.41090 NaN 0.17328 NaN 85.93082 0.38384 3.06989 8.74045 NaN 0.15805 0.36833 1.75417 35.08942 0.78624 NaN
min 0.00000 0.00000 0.00000 0.00000 NaN 0.00000 NaN 0.00000 2017.00000 1.00000 1.00000 NaN 0.00000 0.00000 0.00000 0.00000 0.00000 NaN
25% 2.00000 0.00000 0.00000 1.00000 NaN 0.00000 NaN 17.00000 2018.00000 5.00000 8.00000 NaN 0.00000 0.00000 0.00000 80.30000 0.00000 NaN
50% 2.00000 0.00000 1.00000 2.00000 NaN 0.00000 NaN 57.00000 2018.00000 8.00000 16.00000 NaN 0.00000 0.00000 0.00000 99.45000 0.00000 NaN
75% 2.00000 0.00000 2.00000 3.00000 NaN 0.00000 NaN 126.00000 2018.00000 10.00000 23.00000 NaN 0.00000 0.00000 0.00000 120.00000 1.00000 NaN
max 4.00000 10.00000 7.00000 17.00000 NaN 1.00000 NaN 443.00000 2018.00000 12.00000 31.00000 NaN 1.00000 13.00000 58.00000 540.00000 5.00000 NaN

Most popular meal plan is Meal Plan 1

Room_Type 1 is the most reserved room type

Average price of room seems to be ~103 euros

Bookings for week nights seem to be more than weekend nights

Average time from booking to staying at the hotel seems to be ~85 days

Number of previous cancellations seem to be 13

Univariate Analysis

In [ ]:
# Functions for histogram

def histogram_boxplot(data, feature, figsize=(15, 10), kde=False, bins=None):
    """
    Boxplot and histogram combined

    data: dataframe
    feature: dataframe column
    figsize: size of figure (default (15,10))
    kde: whether to show the density curve (default False)
    bins: number of bins for histogram (default None)
    """
    f2, (ax_box2, ax_hist2) = plt.subplots(
        nrows=2,  
        sharex=True,  
        gridspec_kw={"height_ratios": (0.25, 0.75)},
        figsize=figsize,
    )  
    sns.boxplot(
        data=data, x=feature, ax=ax_box2, showmeans=True, color="violet"
    )  
    sns.histplot(
        data=data, x=feature, kde=kde, ax=ax_hist2, bins=bins
    ) if bins else sns.histplot(
        data=data, x=feature, kde=kde, ax=ax_hist2
    ) 
    ax_hist2.axvline(
        data[feature].mean(), color="green", linestyle="--"
    )  
    ax_hist2.axvline(
        data[feature].median(), color="black", linestyle="-"
    )  
In [ ]:
# Plotting Lead_time

histogram_boxplot(data, "lead_time")

There seem to be lot of outliers in the data for lead_time.

In [ ]:
# Plotting Average room per person

histogram_boxplot(data, "avg_price_per_room")

There seem to be lot of outliers in the data for average price per room.

In [ ]:
data[data["avg_price_per_room"] == 0]
Out[ ]:
no_of_adults no_of_children no_of_weekend_nights no_of_week_nights type_of_meal_plan required_car_parking_space room_type_reserved lead_time arrival_year arrival_month arrival_date market_segment_type repeated_guest no_of_previous_cancellations no_of_previous_bookings_not_canceled avg_price_per_room no_of_special_requests booking_status
63 1 0 0 1 Meal Plan 1 0 Room_Type 1 2 2017 9 10 Complementary 0 0 0 0.00000 1 Not_Canceled
145 1 0 0 2 Meal Plan 1 0 Room_Type 1 13 2018 6 1 Complementary 1 3 5 0.00000 1 Not_Canceled
209 1 0 0 0 Meal Plan 1 0 Room_Type 1 4 2018 2 27 Complementary 0 0 0 0.00000 1 Not_Canceled
266 1 0 0 2 Meal Plan 1 0 Room_Type 1 1 2017 8 12 Complementary 1 0 1 0.00000 1 Not_Canceled
267 1 0 2 1 Meal Plan 1 0 Room_Type 1 4 2017 8 23 Complementary 0 0 0 0.00000 1 Not_Canceled
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
35983 1 0 0 1 Meal Plan 1 0 Room_Type 7 0 2018 6 7 Complementary 1 4 17 0.00000 1 Not_Canceled
36080 1 0 1 1 Meal Plan 1 0 Room_Type 7 0 2018 3 21 Complementary 1 3 15 0.00000 1 Not_Canceled
36114 1 0 0 1 Meal Plan 1 0 Room_Type 1 1 2018 3 2 Online 0 0 0 0.00000 0 Not_Canceled
36217 2 0 2 1 Meal Plan 1 0 Room_Type 2 3 2017 8 9 Online 0 0 0 0.00000 2 Not_Canceled
36250 1 0 0 2 Meal Plan 2 0 Room_Type 1 6 2017 12 10 Online 0 0 0 0.00000 0 Not_Canceled

545 rows × 18 columns

In [ ]:
data.loc[data["avg_price_per_room"] == 0, "market_segment_type"].value_counts()
Out[ ]:
Complementary    354
Online           191
Name: market_segment_type, dtype: int64
In [ ]:
# Calculating 25th quantile
Q1 = data["avg_price_per_room"].quantile(0.25)

# Calculating 75th quantile
Q3 = data["avg_price_per_room"].quantile(0.75)  ## Complete the code to calculate 75th quantile for average price per room

# Calculating IQR
IQR = Q3 - Q1

# Calculating value of upper whisker
Upper_Whisker = Q3 + 1.5 * IQR
Upper_Whisker
Out[ ]:
179.55
In [ ]:
# assigning the outliers the value of upper whisker
data.loc[data["avg_price_per_room"] >= 500, "avg_price_per_room"] = Upper_Whisker
In [ ]:
# Plotting previous booking cancellations

histogram_boxplot(data, "no_of_previous_cancellations")
In [ ]:
# Plotting previous booking not cancellations

histogram_boxplot(data, "no_of_previous_bookings_not_canceled")
In [ ]:
# Functions for labeled barplot

def labeled_barplot(data, feature, perc=False, n=None):
    """
    Barplot with percentage at the top

    data: dataframe
    feature: dataframe column
    perc: whether to display percentages instead of count (default is False)
    n: displays the top n category levels (default is None, i.e., display all levels)
    """

    total = len(data[feature])  
    count = data[feature].nunique()
    if n is None:
        plt.figure(figsize=(count + 2, 6))
    else:
        plt.figure(figsize=(n + 2, 6))

    plt.xticks(rotation=90, fontsize=15)
    ax = sns.countplot(
        data=data,
        x=feature,
        palette="Paired",
        order=data[feature].value_counts().index[:n],
    )

    for p in ax.patches:
        if perc == True:
            label = "{:.1f}%".format(
                100 * p.get_height() / total
            )  
        else:
            label = p.get_height()  

        x = p.get_x() + p.get_width() / 2  
        y = p.get_height()  

        ax.annotate(
            label,
            (x, y),
            ha="center",
            va="center",
            size=12,
            xytext=(0, 5),
            textcoords="offset points",
        )  

    plt.show()  
In [ ]:
# Plotting number of adults

labeled_barplot(data, "no_of_adults", perc=True)
In [ ]:
#Plotting number of children 

labeled_barplot(data, "no_of_children", perc=True) 
In [ ]:
# replacing 9, and 10 children with 3
data["no_of_children"] = data["no_of_children"].replace([9, 10], 3)
In [ ]:
# Plotting number of week nights

labeled_barplot(data, "no_of_week_nights", perc=True)
In [ ]:
# Plotting number of weekend nights

labeled_barplot(data, "no_of_weekend_nights", perc=True)
In [ ]:
# Plotting Required car parking space

labeled_barplot(data, "required_car_parking_space", perc=True)
In [ ]:
# Plotting Type of meal plan

labeled_barplot(data, "type_of_meal_plan", perc=True)
In [ ]:
# Plotting Room Type Reserved

labeled_barplot(data, "room_type_reserved", perc=True) 
In [ ]:
# Plotting Arrival month

labeled_barplot(data, "arrival_month", perc=True) 

October seems to be the busiest month with most number of arrivals.

In [ ]:
# Plotting Market Segment Type

labeled_barplot(data, "market_segment_type", perc=True)

Online seem to be the most commonly used segment through which guests come to the hotel.

In [ ]:
# Plotting Number of special requests

labeled_barplot(data, "no_of_special_requests", perc=True)
In [ ]:
# Plotting Booking status

labeled_barplot(data, "booking_status", perc=True)
In [ ]:
# Encoding canceled bookings to 2 and Not_canceled as 0

data["booking_status"] = data["booking_status"].apply(
    lambda x: 1 if x == "Canceled" else 0
)

Bivariate Analysis

In [ ]:
cols_list = data.select_dtypes(include=np.number).columns.tolist()

plt.figure(figsize=(12, 7))
sns.heatmap(
    data[cols_list].corr(), annot=True, vmin=-1, vmax=1, fmt=".2f", cmap="Spectral"
)
plt.show()
In [ ]:
## Function to plot distributions wrt target


def distribution_plot_wrt_target(data, predictor, target):

    fig, axs = plt.subplots(2, 2, figsize=(12, 10))

    target_uniq = data[target].unique()

    axs[0, 0].set_title("Distribution of target for target=" + str(target_uniq[0]))
    sns.histplot(
        data=data[data[target] == target_uniq[0]],
        x=predictor,
        kde=True,
        ax=axs[0, 0],
        color="teal",
        stat="density",
    )

    axs[0, 1].set_title("Distribution of target for target=" + str(target_uniq[1]))
    sns.histplot(
        data=data[data[target] == target_uniq[1]],
        x=predictor,
        kde=True,
        ax=axs[0, 1],
        color="orange",
        stat="density",
    )

    axs[1, 0].set_title("Boxplot w.r.t target")
    sns.boxplot(data=data, x=target, y=predictor, ax=axs[1, 0], palette="gist_rainbow")

    axs[1, 1].set_title("Boxplot (without outliers) w.r.t target")
    sns.boxplot(
        data=data,
        x=target,
        y=predictor,
        ax=axs[1, 1],
        showfliers=False,
        palette="gist_rainbow",
    )

    plt.tight_layout()
    plt.show()
In [ ]:
## Functions for Stacked barplot

def stacked_barplot(data, predictor, target):
    """
    Print the category counts and plot a stacked bar chart

    data: dataframe
    predictor: independent variable
    target: target variable
    """
    count = data[predictor].nunique()
    sorter = data[target].value_counts().index[-1]
    tab1 = pd.crosstab(data[predictor], data[target], margins=True).sort_values(
        by=sorter, ascending=False
    )
    print(tab1)
    print("-" * 120)
    tab = pd.crosstab(data[predictor], data[target], normalize="index").sort_values(
        by=sorter, ascending=False
    )
    tab.plot(kind="bar", stacked=True, figsize=(count + 5, 5))
    plt.legend(
        loc="lower left", frameon=False,
    )
    plt.legend(loc="upper left", bbox_to_anchor=(1, 1))
    plt.show()

Data Preprocessing

  • Missing value treatment (if needed)
  • Feature engineering (if needed)
  • Outlier detection and treatment (if needed)
  • Preparing data for modeling
  • Any other preprocessing steps (if needed)
In [ ]:
# Plotting Prices of rooms across various market segments

plt.figure(figsize=(10, 6))
sns.boxplot(
    data=data, x="market_segment_type", y="avg_price_per_room", palette="gist_rainbow"
)
plt.show()

Average room price per room seems to be higher for Online market segment type.

In [ ]:
# Plotting Booking status across various market segments

stacked_barplot(data, "market_segment_type", "booking_status")
booking_status           0      1    All
market_segment_type                     
All                  24390  11885  36275
Online               14739   8475  23214
Offline               7375   3153  10528
Corporate             1797    220   2017
Aviation                88     37    125
Complementary          391      0    391
------------------------------------------------------------------------------------------------------------------------
In [ ]:
# Plotting Booking status and Number of special requests

stacked_barplot(data, "no_of_special_requests", "booking_status") 
booking_status              0      1    All
no_of_special_requests                     
All                     24390  11885  36275
0                       11232   8545  19777
1                        8670   2703  11373
2                        3727    637   4364
3                         675      0    675
4                          78      0     78
5                           8      0      8
------------------------------------------------------------------------------------------------------------------------

Customers seem to confirm booking with increased special requests.

In [ ]:
# Plotting Number of special requests and Average prie per room

plt.figure(figsize=(10, 5))
sns.boxplot(data=data, x="no_of_special_requests", y="avg_price_per_room", palette="gist_rainbow")  
plt.show()

Average price per room seems to increase with number of special requests from guests.

In [ ]:
# Distribution between Average price per room and booking status

distribution_plot_wrt_target(data, "avg_price_per_room", "booking_status")
In [ ]:
# Distribution between Lead time and booking status

distribution_plot_wrt_target(data, "lead_time", "booking_status")
In [ ]:
# Combining children and adults to families

family_data = data[(data["no_of_children"] >= 0) & (data["no_of_adults"] > 1)]
family_data.shape
Out[ ]:
(28441, 18)
In [ ]:
family_data["no_of_family_members"] = (
    family_data["no_of_adults"] + family_data["no_of_children"]
)
In [ ]:
# Plotting Number of families and booking status

stacked_barplot(family_data, "no_of_family_members", "booking_status")
booking_status            0     1    All
no_of_family_members                    
All                   18456  9985  28441
2                     15506  8213  23719
3                      2425  1368   3793
4                       514   398    912
5                        11     6     17
------------------------------------------------------------------------------------------------------------------------
In [ ]:
# Combining week days and weekend stays

stay_data = data[(data["no_of_week_nights"] > 0) & (data["no_of_weekend_nights"] > 0)]
stay_data.shape
Out[ ]:
(17094, 18)
In [ ]:
stay_data["total_days"] = (
    stay_data["no_of_week_nights"] + stay_data["no_of_weekend_nights"]
)
In [ ]:
stacked_barplot(stay_data, "total_days", "booking_status")
booking_status      0     1    All
total_days                        
All             10979  6115  17094
3                3689  2183   5872
4                2977  1387   4364
5                1593   738   2331
2                1301   639   1940
6                 566   465   1031
7                 590   383    973
8                 100    79    179
10                 51    58    109
9                  58    53    111
14                  5    27     32
15                  5    26     31
13                  3    15     18
12                  9    15     24
11                 24    15     39
20                  3     8     11
19                  1     5      6
16                  1     5      6
17                  1     4      5
18                  0     3      3
21                  1     3      4
22                  0     2      2
23                  1     1      2
24                  0     1      1
------------------------------------------------------------------------------------------------------------------------
In [ ]:
# Plotting Repeated guests and booking status

stacked_barplot(data, "repeated_guest", "booking_status")
booking_status      0      1    All
repeated_guest                     
All             24390  11885  36275
0               23476  11869  35345
1                 914     16    930
------------------------------------------------------------------------------------------------------------------------

Repeated guests seem to confirm bookings at the hotel.

In [ ]:
# Busiest months at hotel with grouping 

monthly_data = data.groupby(["arrival_month"])["booking_status"].count()
monthly_data = pd.DataFrame(
    {"Month": list(monthly_data.index), "Guests": list(monthly_data.values)}
)
plt.figure(figsize=(10, 5))
sns.lineplot(data=monthly_data, x="Month", y="Guests")
plt.show()
In [ ]:
# Percentage of bookings canceled each month

stacked_barplot(data, "arrival_month", "booking_status")
booking_status      0      1    All
arrival_month                      
All             24390  11885  36275
10               3437   1880   5317
9                3073   1538   4611
8                2325   1488   3813
7                1606   1314   2920
6                1912   1291   3203
4                1741    995   2736
5                1650    948   2598
11               2105    875   2980
3                1658    700   2358
2                1274    430   1704
12               2619    402   3021
1                 990     24   1014
------------------------------------------------------------------------------------------------------------------------

Cancelations of bookings seem to be more in July and least in January.

In [ ]:
# Plotting averge price per room and arrival month

plt.figure(figsize=(10, 5))
sns.lineplot(data=data, x="arrival_month", y="avg_price_per_room") 
plt.show()

Average price per room seems to increase during busier months.

Outlier Check

In [ ]:
# checking for outliers using boxplot by dropping booking status

numeric_columns = data.select_dtypes(include=np.number).columns.tolist()
numeric_columns.remove("booking_status")

plt.figure(figsize=(15, 12))

for i, variable in enumerate(numeric_columns):
    plt.subplot(4, 4, i + 1)
    plt.boxplot(data[variable], whis=1.5)
    plt.tight_layout()
    plt.title(variable)

plt.show()

EDA

  • It is a good idea to explore the data once again after manipulating it.

Data Preparation for modeling

In [ ]:
# Encoding categorical values and splitting data into test & train

X = data.drop(["booking_status"], axis=1)
Y = data["booking_status"]
X = pd.get_dummies(X, drop_first=True)
X.head()  
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.4, random_state=1)
In [ ]:
print("Shape of Training set : ", X_train.shape)
print("Shape of test set : ", X_test.shape)
print("Percentage of classes in training set:")
print(y_train.value_counts(normalize=True))
print("Percentage of classes in test set:")
print(y_test.value_counts(normalize=True))
Shape of Training set :  (21765, 27)
Shape of test set :  (14510, 27)
Percentage of classes in training set:
0   0.66855
1   0.33145
Name: booking_status, dtype: float64
Percentage of classes in test set:
0   0.67808
1   0.32192
Name: booking_status, dtype: float64
In [ ]:
# Model evaluation criteria

def model_performance_classification_statsmodels(
    model, predictors, target, threshold=0.5
):
    """
    Function to compute different metrics to check classification model performance

    model: classifier
    predictors: independent variables
    target: dependent variable
    threshold: threshold for classifying the observation as class 1
    """

    # checking which probabilities are greater than threshold
    pred_temp = model.predict(predictors) > threshold
    # rounding off the above values to get classes
    pred = np.round(pred_temp)

    acc = accuracy_score(target, pred)  # to compute Accuracy
    recall = recall_score(target, pred)  # to compute Recall
    precision = precision_score(target, pred)  # to compute Precision
    f1 = f1_score(target, pred)  # to compute F1-score

    # creating a dataframe of metrics
    df_perf = pd.DataFrame(
        {"Accuracy": acc, "Recall": recall, "Precision": precision, "F1": f1,},
        index=[0],
    )

    return df_perf
In [ ]:
## function to plot the confusion_matrix of a classification model

def confusion_matrix_statsmodels(model, predictors, target, threshold=0.5):
    """
    To plot the confusion_matrix with percentages

    model: classifier
    predictors: independent variables
    target: dependent variable
    threshold: threshold for classifying the observation as class 1
    """
    y_pred = model.predict(predictors) > threshold
    cm = confusion_matrix(target, y_pred)
    labels = np.asarray(
        [
            ["{0:0.0f}".format(item) + "\n{0:.2%}".format(item / cm.flatten().sum())]
            for item in cm.flatten()
        ]
    ).reshape(2, 2)

    plt.figure(figsize=(6, 4))
    sns.heatmap(cm, annot=labels, fmt="")
    plt.ylabel("True label")
    plt.xlabel("Predicted label")

Checking for Multicollinearity

In [ ]:
def checking_vif(predictors):
    vif = pd.DataFrame()
    vif["feature"] = predictors.columns

    # calculating VIF for each feature
    vif["VIF"] = [
        variance_inflation_factor(predictors.values, i)
        for i in range(len(predictors.columns))
    ]
    return vif
In [ ]:
checking_vif(X_train)
Out[ ]:
feature VIF
0 const 39673160.35796
1 no_of_adults 1.34473
2 no_of_children 2.11320
3 no_of_weekend_nights 1.06826
4 no_of_week_nights 1.09707
5 required_car_parking_space 1.03999
6 lead_time 1.39009
7 arrival_year 1.43715
8 arrival_month 1.27403
9 arrival_date 1.00724
10 repeated_guest 1.75958
11 no_of_previous_cancellations 1.41474
12 no_of_previous_bookings_not_canceled 1.66353
13 avg_price_per_room 2.06716
14 no_of_special_requests 1.24634
15 type_of_meal_plan_Meal Plan 2 1.27441
16 type_of_meal_plan_Meal Plan 3 1.02977
17 type_of_meal_plan_Not Selected 1.26974
18 room_type_reserved_Room_Type 2 1.10531
19 room_type_reserved_Room_Type 3 1.00376
20 room_type_reserved_Room_Type 4 1.36749
21 room_type_reserved_Room_Type 5 1.02637
22 room_type_reserved_Room_Type 6 2.06781
23 room_type_reserved_Room_Type 7 1.12051
24 market_segment_type_Complementary 4.36213
25 market_segment_type_Corporate 15.68286
26 market_segment_type_Offline 59.41340
27 market_segment_type_Online 65.92889

Building a Logistic Regression model

In [ ]:
X = data.drop(["booking_status"], axis=1)
Y = data["booking_status"]
X = sm.add_constant(X)
X = pd.get_dummies(X, drop_first=True) 
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.4, random_state=1)
In [ ]:
logit = sm.Logit(y_train, X_train.astype(float))
lg = logit.fit(
    disp=False
) 

print(lg.summary())
                           Logit Regression Results                           
==============================================================================
Dep. Variable:         booking_status   No. Observations:                21765
Model:                          Logit   Df Residuals:                    21737
Method:                           MLE   Df Model:                           27
Date:                Mon, 20 Jun 2022   Pseudo R-squ.:                  0.3303
Time:                        06:48:26   Log-Likelihood:                -9258.4
converged:                      False   LL-Null:                       -13825.
Covariance Type:            nonrobust   LLR p-value:                     0.000
========================================================================================================
                                           coef    std err          z      P>|z|      [0.025      0.975]
--------------------------------------------------------------------------------------------------------
const                                 -943.2369    131.051     -7.197      0.000   -1200.093    -686.381
no_of_adults                             0.1282      0.041      3.155      0.002       0.049       0.208
no_of_children                           0.1553      0.068      2.295      0.022       0.023       0.288
no_of_weekend_nights                     0.1040      0.021      4.863      0.000       0.062       0.146
no_of_week_nights                        0.0369      0.013      2.772      0.006       0.011       0.063
required_car_parking_space              -1.6247      0.151    -10.728      0.000      -1.922      -1.328
lead_time                                0.0157      0.000     54.630      0.000       0.015       0.016
arrival_year                             0.4662      0.065      7.179      0.000       0.339       0.594
arrival_month                           -0.0451      0.007     -6.431      0.000      -0.059      -0.031
arrival_date                             0.0001      0.002      0.060      0.952      -0.004       0.004
repeated_guest                          -2.2784      0.618     -3.688      0.000      -3.489      -1.067
no_of_previous_cancellations             0.2803      0.091      3.090      0.002       0.102       0.458
no_of_previous_bookings_not_canceled    -0.1293      0.131     -0.987      0.324      -0.386       0.127
avg_price_per_room                       0.0192      0.001     23.944      0.000       0.018       0.021
no_of_special_requests                  -1.4803      0.033    -45.362      0.000      -1.544      -1.416
type_of_meal_plan_Meal Plan 2            0.1560      0.072      2.166      0.030       0.015       0.297
type_of_meal_plan_Meal Plan 3           27.6681   7.12e+05   3.89e-05      1.000   -1.39e+06    1.39e+06
type_of_meal_plan_Not Selected           0.2734      0.057      4.779      0.000       0.161       0.386
room_type_reserved_Room_Type 2          -0.2864      0.141     -2.026      0.043      -0.563      -0.009
room_type_reserved_Room_Type 3          -0.0287      1.312     -0.022      0.983      -2.600       2.542
room_type_reserved_Room_Type 4          -0.3208      0.058     -5.576      0.000      -0.434      -0.208
room_type_reserved_Room_Type 5          -0.7809      0.223     -3.495      0.000      -1.219      -0.343
room_type_reserved_Room_Type 6          -0.9978      0.164     -6.078      0.000      -1.320      -0.676
room_type_reserved_Room_Type 7          -1.4153      0.322     -4.393      0.000      -2.047      -0.784
market_segment_type_Complementary      -51.4621   1.02e+06  -5.05e-05      1.000      -2e+06       2e+06
market_segment_type_Corporate           -1.2413      0.276     -4.505      0.000      -1.781      -0.701
market_segment_type_Offline             -2.2625      0.263     -8.600      0.000      -2.778      -1.747
market_segment_type_Online              -0.4730      0.259     -1.823      0.068      -0.981       0.035
========================================================================================================
In [ ]:
print("Training performance:")
model_performance_classification_statsmodels(lg, X_train, y_train)
Training performance:
Out[ ]:
Accuracy Recall Precision F1
0 0.80639 0.63876 0.74131 0.68622

Dropping high P values

In [ ]:
cols = X_train.columns.tolist()

# setting an initial max p-value
max_p_value = 1

while len(cols) > 0:
    # defining the train set
    x_train_aux = X_train[cols]

    # fitting the model
    model = sm.Logit(y_train, x_train_aux).fit(disp=False)

    # getting the p-values and the maximum p-value
    p_values = model.pvalues
    max_p_value = max(p_values)

    # name of the variable with maximum p-value
    feature_with_p_max = p_values.idxmax()

    if max_p_value > 0.05:
        cols.remove(feature_with_p_max)
    else:
        break

selected_features = cols
print(selected_features)
['const', 'no_of_adults', 'no_of_children', 'no_of_weekend_nights', 'no_of_week_nights', 'required_car_parking_space', 'lead_time', 'arrival_year', 'arrival_month', 'repeated_guest', 'no_of_previous_cancellations', 'avg_price_per_room', 'no_of_special_requests', 'type_of_meal_plan_Meal Plan 2', 'type_of_meal_plan_Not Selected', 'room_type_reserved_Room_Type 2', 'room_type_reserved_Room_Type 4', 'room_type_reserved_Room_Type 5', 'room_type_reserved_Room_Type 6', 'room_type_reserved_Room_Type 7', 'market_segment_type_Corporate', 'market_segment_type_Offline']
In [ ]:
X_train1 = X_train[selected_features]
X_test1 = X_test[selected_features]
In [ ]:
logit1 = sm.Logit(y_train, X_train1.astype(float)) 
lg1 = logit1.fit(
    disp=False
)

print(lg1.summary())
                           Logit Regression Results                           
==============================================================================
Dep. Variable:         booking_status   No. Observations:                21765
Model:                          Logit   Df Residuals:                    21743
Method:                           MLE   Df Model:                           21
Date:                Mon, 20 Jun 2022   Pseudo R-squ.:                  0.3292
Time:                        06:58:41   Log-Likelihood:                -9273.4
converged:                       True   LL-Null:                       -13825.
Covariance Type:            nonrobust   LLR p-value:                     0.000
==================================================================================================
                                     coef    std err          z      P>|z|      [0.025      0.975]
--------------------------------------------------------------------------------------------------
const                           -937.6790    130.617     -7.179      0.000   -1193.685    -681.673
no_of_adults                       0.1211      0.040      3.011      0.003       0.042       0.200
no_of_children                     0.1493      0.068      2.212      0.027       0.017       0.282
no_of_weekend_nights               0.1057      0.021      4.952      0.000       0.064       0.148
no_of_week_nights                  0.0392      0.013      2.954      0.003       0.013       0.065
required_car_parking_space        -1.6232      0.151    -10.715      0.000      -1.920      -1.326
lead_time                          0.0157      0.000     54.940      0.000       0.015       0.016
arrival_year                       0.4632      0.065      7.156      0.000       0.336       0.590
arrival_month                     -0.0458      0.007     -6.555      0.000      -0.059      -0.032
repeated_guest                    -2.5891      0.564     -4.587      0.000      -3.695      -1.483
no_of_previous_cancellations       0.2487      0.081      3.067      0.002       0.090       0.408
avg_price_per_room                 0.0196      0.001     24.825      0.000       0.018       0.021
no_of_special_requests            -1.4819      0.033    -45.482      0.000      -1.546      -1.418
type_of_meal_plan_Meal Plan 2      0.1443      0.072      2.007      0.045       0.003       0.285
type_of_meal_plan_Not Selected     0.2796      0.057      4.902      0.000       0.168       0.391
room_type_reserved_Room_Type 2    -0.2816      0.141     -1.994      0.046      -0.558      -0.005
room_type_reserved_Room_Type 4    -0.3203      0.057     -5.584      0.000      -0.433      -0.208
room_type_reserved_Room_Type 5    -0.7988      0.223     -3.589      0.000      -1.235      -0.363
room_type_reserved_Room_Type 6    -1.0150      0.164     -6.193      0.000      -1.336      -0.694
room_type_reserved_Room_Type 7    -1.4495      0.322     -4.506      0.000      -2.080      -0.819
market_segment_type_Corporate     -0.7700      0.110     -7.011      0.000      -0.985      -0.555
market_segment_type_Offline       -1.7800      0.056    -31.769      0.000      -1.890      -1.670
==================================================================================================

All P values are less than 0.5, hence this model is accurate.

In [ ]:
print("Training performance:")
model_performance_classification_statsmodels(lg1, X_train1, y_train)
Training performance:
Out[ ]:
Accuracy Recall Precision F1
0 0.80657 0.63945 0.74140 0.68666
In [ ]:
# Converting coefficients to odds

odds = np.exp(lg1.params)
perc_change_odds = (np.exp(lg1.params) - 1) * 100
pd.set_option("display.max_columns", None)
pd.DataFrame({"Odds": odds, "Change_odd%": perc_change_odds}, index=X_train1.columns).T
Out[ ]:
const no_of_adults no_of_children no_of_weekend_nights no_of_week_nights required_car_parking_space lead_time arrival_year arrival_month repeated_guest no_of_previous_cancellations avg_price_per_room no_of_special_requests type_of_meal_plan_Meal Plan 2 type_of_meal_plan_Not Selected room_type_reserved_Room_Type 2 room_type_reserved_Room_Type 4 room_type_reserved_Room_Type 5 room_type_reserved_Room_Type 6 room_type_reserved_Room_Type 7 market_segment_type_Corporate market_segment_type_Offline
Odds 0.00000 1.12876 1.16108 1.11150 1.04000 0.19726 1.01586 1.58920 0.95528 0.07509 1.28240 1.01981 0.22720 1.15519 1.32265 0.75457 0.72592 0.44988 0.36239 0.23469 0.46300 0.16863
Change_odd% -100.00000 12.87602 16.10765 11.15004 4.00042 -80.27353 1.58634 58.92016 -4.47225 -92.49133 28.24029 1.98082 -77.28039 15.51860 32.26549 -24.54274 -27.40770 -55.01209 -63.76104 -76.53095 -53.70018 -83.13674

Odds of booking getting canceled for no of adults is 1.1 times or 12.8 %

Odds of booking getting canceled for no of children is 1.1 times or 16.1 %

Odds of booking getting canceled for weekends is 1.1 times or 11.1 %

Odds of booking getting canceled for weekdays is 1.0 times or 4.0 %

Odds of requiring car parking space is 0.1 and about 80% chance of booking getting canceled.

Odds of booking getting canceled for lead time is 1.0 times or 1.5 %

Odds of booking getting canceled based on arrival year is 1.5 times or 58.9 %

Odds of booking getting canceled based on arrival month is 0.9 times or -4.47 %

Odds of booking getting canceled for average price per room is 1.0 times or 1.9 %

Odds of booking getting canceled based on no of special requests is 0.2 times or -77.2 %

Model performance evaluation

Model performance on Training set

In [ ]:
# Creatimg confusion matrix

confusion_matrix_statsmodels(lg1, X_train1, y_train)
In [ ]:
print("Training performance:")
log_reg_model_train_perf = model_performance_classification_statsmodels(lg1, X_train1, y_train) ## Complete the code to check performance on X_train1 and y_train
log_reg_model_train_perf
Training performance:
Out[ ]:
Accuracy Recall Precision F1
0 0.80657 0.63945 0.74140 0.68666

F1 is 0.68 so further analysis needs to be done.

ROC-AUC on Training set

In [ ]:
logit_roc_auc_train = roc_auc_score(y_train, lg1.predict(X_train1))
fpr, tpr, thresholds = roc_curve(y_train, lg1.predict(X_train1))
plt.figure(figsize=(7, 5))
plt.plot(fpr, tpr, label="Logistic Regression (area = %0.2f)" % logit_roc_auc_train)
plt.plot([0, 1], [0, 1], "r--")
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.01])
plt.xlabel("False Positive Rate")
plt.ylabel("True Positive Rate")
plt.title("Receiver operating characteristic")
plt.legend(loc="lower right")
plt.show()
In [ ]:
# Optimal threshold as per AUC-ROC curve

fpr, tpr, thresholds = roc_curve(y_train, lg1.predict(X_train1))

optimal_idx = np.argmax(tpr - fpr)
optimal_threshold_auc_roc = thresholds[optimal_idx]
print(optimal_threshold_auc_roc)
0.36408133806810755
In [ ]:
# creating confusion matrix

confusion_matrix_statsmodels(
    lg1, X_train1, y_train, 
)
In [ ]:
# checking model performance for this model

log_reg_model_train_perf_threshold_auc_roc = model_performance_classification_statsmodels(
    lg1, X_train1, y_train, threshold=optimal_threshold_auc_roc
)
print("Training performance:")

log_reg_model_train_perf_threshold_auc_roc
Training performance:
Out[ ]:
Accuracy Recall Precision F1
0 0.79076 0.74549 0.66428 0.70255
In [ ]:
# Precision-Recall curve

y_scores = lg1.predict(X_train1)
prec, rec, tre = precision_recall_curve(y_train, y_scores,)


def plot_prec_recall_vs_tresh(precisions, recalls, thresholds):
    plt.plot(thresholds, precisions[:-1], "b--", label="precision")
    plt.plot(thresholds, recalls[:-1], "g--", label="recall")
    plt.xlabel("Threshold")
    plt.legend(loc="upper left")
    plt.ylim([0, 1])


plt.figure(figsize=(10, 7))
plot_prec_recall_vs_tresh(prec, rec, tre)
plt.show()
In [ ]:
# setting the threshold
optimal_threshold_curve = 0.42
In [ ]:
confusion_matrix_statsmodels(
    lg1, X_train1, y_train 
) 
In [ ]:
log_reg_model_train_perf_threshold_curve = model_performance_classification_statsmodels(
    lg1, X_train1, y_train, threshold=optimal_threshold_curve
)
print("Training performance:")
log_reg_model_train_perf_threshold_curve
Training performance:
Out[ ]:
Accuracy Recall Precision F1
0 0.80119 0.70183 0.69941 0.70062

Recall helped with the model and F1 seems better with value 0.70.

Model performance on Test set

In [ ]:
confusion_matrix_statsmodels(lg1, X_test1, y_test)
In [ ]:
log_reg_model_test_perf = model_performance_classification_statsmodels(lg1, X_test1, y_test) 

print("Test performance:")
log_reg_model_test_perf 
Test performance:
Out[ ]:
Accuracy Recall Precision F1
0 0.80386 0.63220 0.72360 0.67482

F1 seems to be less in this case.

In [ ]:
# ROC curve

logit_roc_auc_train = roc_auc_score(y_test, lg1.predict(X_test1))
fpr, tpr, thresholds = roc_curve(y_test, lg1.predict(X_test1))
plt.figure(figsize=(7, 5))
plt.plot(fpr, tpr, label="Logistic Regression (area = %0.2f)" % logit_roc_auc_train)
plt.plot([0, 1], [0, 1], "r--")
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.01])
plt.xlabel("False Positive Rate")
plt.ylabel("True Positive Rate")
plt.title("Receiver operating characteristic")
plt.legend(loc="lower right")
plt.show()
In [ ]:
# Using model with threshold=0.37

confusion_matrix_statsmodels(lg1, X_test1, y_test)
In [ ]:
# checking model performance for this model
log_reg_model_test_perf_threshold_auc_roc = model_performance_classification_statsmodels(
    lg1, X_test1, y_test, threshold=optimal_threshold_auc_roc
)
print("Test performance:")
log_reg_model_test_perf_threshold_auc_roc
Test performance:
Out[ ]:
Accuracy Recall Precision F1
0 0.79104 0.74181 0.65489 0.69564
In [ ]:
# Using model with threshold=0.42

confusion_matrix_statsmodels(lg1, X_test1, y_test)
In [ ]:
log_reg_model_test_perf_threshold_curve = model_performance_classification_statsmodels(
    lg1, X_test1, y_test, threshold=optimal_threshold_curve
)
print("Test performance:")
log_reg_model_test_perf_threshold_curve
Test performance:
Out[ ]:
Accuracy Recall Precision F1
0 0.80159 0.70285 0.68768 0.69518

F1 seems to increase a little for test data after threshold of 0.42.

Final Model Summary

In [ ]:
models_train_comp_df = pd.concat(
    [
        log_reg_model_train_perf.T,
        log_reg_model_train_perf_threshold_auc_roc.T,
        log_reg_model_train_perf_threshold_curve.T,
    ],
    axis=1,
)
models_train_comp_df.columns = [
    "Logistic Regression-default Threshold",
    "Logistic Regression-0.37 Threshold",
    "Logistic Regression-0.42 Threshold",
]

print("Training performance comparison:")
models_train_comp_df
Training performance comparison:
Out[ ]:
Logistic Regression-default Threshold Logistic Regression-0.37 Threshold Logistic Regression-0.42 Threshold
Accuracy 0.80657 0.79076 0.80119
Recall 0.63945 0.74549 0.70183
Precision 0.74140 0.66428 0.69941
F1 0.68666 0.70255 0.70062

Logistic regression-0.37 seem to have better values.

In [ ]:
# test performance comparison

models_test_comp_df = pd.concat(
    [
        log_reg_model_test_perf.T,
        log_reg_model_test_perf_threshold_auc_roc.T,
        log_reg_model_test_perf_threshold_curve.T,
    ],
    axis=1,
)
models_test_comp_df.columns = [
    "Logistic Regression-default Threshold",
    "Logistic Regression-0.37 Threshold",
    "Logistic Regression-0.42 Threshold",
]

print("Test performance comparison:")
models_test_comp_df
Test performance comparison:
Out[ ]:
Logistic Regression-default Threshold Logistic Regression-0.37 Threshold Logistic Regression-0.42 Threshold
Accuracy 0.80386 0.79104 0.80159
Recall 0.63220 0.74181 0.70285
Precision 0.72360 0.65489 0.68768
F1 0.67482 0.69564 0.69518

Logistic regression-0.37 seem to have better values.

Building a Decision Tree model

In [ ]:
X = data.drop(["booking_status"], axis=1)
Y = data["booking_status"]

X = pd.get_dummies(X, drop_first=True) ## Complete the code to create dummies for X

# Splitting data in train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.4, random_state=1)
In [ ]:
# defining a function to compute different metrics to check performance of a classification model built using sklearn
def model_performance_classification_sklearn(model, predictors, target):
    """
    Function to compute different metrics to check classification model performance

    model: classifier
    predictors: independent variables
    target: dependent variable
    """

    # predicting using the independent variables
    pred = model.predict(predictors)

    acc = accuracy_score(target, pred)  # to compute Accuracy
    recall = recall_score(target, pred)  # to compute Recall
    precision = precision_score(target, pred)  # to compute Precision
    f1 = f1_score(target, pred)  # to compute F1-score

    # creating a dataframe of metrics
    df_perf = pd.DataFrame(
        {"Accuracy": acc, "Recall": recall, "Precision": precision, "F1": f1,},
        index=[0],
    )

    return df_perf
In [ ]:
def confusion_matrix_sklearn(model, predictors, target):
    """
    To plot the confusion_matrix with percentages

    model: classifier
    predictors: independent variables
    target: dependent variable
    """
    y_pred = model.predict(predictors)
    cm = confusion_matrix(target, y_pred)
    labels = np.asarray(
        [
            ["{0:0.0f}".format(item) + "\n{0:.2%}".format(item / cm.flatten().sum())]
            for item in cm.flatten()
        ]
    ).reshape(2, 2)

    plt.figure(figsize=(6, 4))
    sns.heatmap(cm, annot=labels, fmt="")
    plt.ylabel("True label")
    plt.xlabel("Predicted label")
In [ ]:
model = DecisionTreeClassifier(random_state=1)
model.fit(X_train, y_train)
Out[ ]:
DecisionTreeClassifier(random_state=1)
In [ ]:
# model performance on training data

confusion_matrix_sklearn(model, X_train, y_train)
In [ ]:
decision_tree_perf_train = model_performance_classification_sklearn(
    model, X_train, y_train
)
decision_tree_perf_train
Out[ ]:
Accuracy Recall Precision F1
0 0.99403 0.98655 0.99538 0.99095
In [ ]:
# model performance on test data

confusion_matrix_sklearn(model, X_train, y_train)
In [ ]:
decision_tree_perf_test = model_performance_classification_sklearn(model, X_train, y_train) 
decision_tree_perf_test
Out[ ]:
Accuracy Recall Precision F1
0 0.99403 0.98655 0.99538 0.99095
In [ ]:
feature_names = list(X_train.columns)
importances = model.feature_importances_
indices = np.argsort(importances)

plt.figure(figsize=(8, 8))
plt.title("Feature Importances")
plt.barh(range(len(indices)), importances[indices], color="violet", align="center")
plt.yticks(range(len(indices)), [feature_names[i] for i in indices])
plt.xlabel("Relative Importance")
plt.show()

Decision tree clearly seem to have good F1 values, hence it is a good model to use for predictions.

Do we need to prune the tree?

Yes Pruning is required

In [ ]:
# Pre Pruning

estimator = DecisionTreeClassifier(random_state=1, class_weight="balanced")
parameters = {
    "max_depth": np.arange(2, 7, 2),
    "max_leaf_nodes": [50, 75, 150, 250],
    "min_samples_split": [10, 30, 50, 70],
}

acc_scorer = make_scorer(f1_score)

grid_obj = GridSearchCV(estimator, parameters, scoring=acc_scorer, cv=5)
grid_obj = grid_obj.fit(X_train, y_train)
estimator = grid_obj.best_estimator_
estimator.fit(X_train, y_train)
Out[ ]:
DecisionTreeClassifier(class_weight='balanced', max_depth=6, max_leaf_nodes=50,
                       min_samples_split=10, random_state=1)

Model Performance Comparison and Conclusions

In [ ]:
# Performance on training set

confusion_matrix_sklearn(model, X_train, y_train)
In [ ]:
decision_tree_tune_perf_train = model_performance_classification_sklearn(model, X_train, y_train) 
decision_tree_tune_perf_train
Out[ ]:
Accuracy Recall Precision F1
0 0.99403 0.98655 0.99538 0.99095
In [ ]:
# Performance on Test set

confusion_matrix_sklearn(model, X_test, y_test)
In [ ]:
decision_tree_tune_perf_test = model_performance_classification_sklearn(model, X_test, y_test) 
decision_tree_tune_perf_test
Out[ ]:
Accuracy Recall Precision F1
0 0.87078 0.80411 0.79644 0.80026

Decision Tree Visual

In [ ]:
plt.figure(figsize=(20, 10))
out = tree.plot_tree(
    estimator,
    feature_names=feature_names,
    filled=True,
    fontsize=9,
    node_ids=False,
    class_names=None,
)
for o in out:
    arrow = o.arrow_patch
    if arrow is not None:
        arrow.set_edgecolor("black")
        arrow.set_linewidth(1)
plt.show()
In [ ]:
print(tree.export_text(estimator, feature_names=feature_names, show_weights=True))
|--- lead_time <= 151.50
|   |--- no_of_special_requests <= 0.50
|   |   |--- market_segment_type_Online <= 0.50
|   |   |   |--- lead_time <= 90.50
|   |   |   |   |--- no_of_weekend_nights <= 0.50
|   |   |   |   |   |--- avg_price_per_room <= 193.00
|   |   |   |   |   |   |--- weights: [1495.77, 119.17] class: 0
|   |   |   |   |   |--- avg_price_per_room >  193.00
|   |   |   |   |   |   |--- weights: [0.75, 21.12] class: 1
|   |   |   |   |--- no_of_weekend_nights >  0.50
|   |   |   |   |   |--- lead_time <= 65.50
|   |   |   |   |   |   |--- weights: [811.46, 176.50] class: 0
|   |   |   |   |   |--- lead_time >  65.50
|   |   |   |   |   |   |--- weights: [123.40, 152.36] class: 1
|   |   |   |--- lead_time >  90.50
|   |   |   |   |--- lead_time <= 117.50
|   |   |   |   |   |--- avg_price_per_room <= 92.72
|   |   |   |   |   |   |--- weights: [180.24, 191.58] class: 1
|   |   |   |   |   |--- avg_price_per_room >  92.72
|   |   |   |   |   |   |--- weights: [71.05, 253.43] class: 1
|   |   |   |   |--- lead_time >  117.50
|   |   |   |   |   |--- no_of_adults <= 1.50
|   |   |   |   |   |   |--- weights: [90.49, 3.02] class: 0
|   |   |   |   |   |--- no_of_adults >  1.50
|   |   |   |   |   |   |--- weights: [183.98, 117.66] class: 0
|   |   |--- market_segment_type_Online >  0.50
|   |   |   |--- lead_time <= 9.50
|   |   |   |   |--- avg_price_per_room <= 200.38
|   |   |   |   |   |--- arrival_month <= 8.50
|   |   |   |   |   |   |--- weights: [369.46, 250.42] class: 0
|   |   |   |   |   |--- arrival_month >  8.50
|   |   |   |   |   |   |--- weights: [232.59, 30.17] class: 0
|   |   |   |   |--- avg_price_per_room >  200.38
|   |   |   |   |   |--- weights: [0.75, 37.71] class: 1
|   |   |   |--- lead_time >  9.50
|   |   |   |   |--- avg_price_per_room <= 105.27
|   |   |   |   |   |--- lead_time <= 25.50
|   |   |   |   |   |   |--- weights: [172.76, 132.75] class: 0
|   |   |   |   |   |--- lead_time >  25.50
|   |   |   |   |   |   |--- weights: [423.30, 1099.71] class: 1
|   |   |   |   |--- avg_price_per_room >  105.27
|   |   |   |   |   |--- required_car_parking_space <= 0.50
|   |   |   |   |   |   |--- weights: [363.47, 2056.12] class: 1
|   |   |   |   |   |--- required_car_parking_space >  0.50
|   |   |   |   |   |   |--- weights: [23.93, 0.00] class: 0
|   |--- no_of_special_requests >  0.50
|   |   |--- no_of_special_requests <= 1.50
|   |   |   |--- market_segment_type_Online <= 0.50
|   |   |   |   |--- type_of_meal_plan_Not Selected <= 0.50
|   |   |   |   |   |--- lead_time <= 102.50
|   |   |   |   |   |   |--- weights: [604.29, 7.54] class: 0
|   |   |   |   |   |--- lead_time >  102.50
|   |   |   |   |   |   |--- weights: [68.81, 19.61] class: 0
|   |   |   |   |--- type_of_meal_plan_Not Selected >  0.50
|   |   |   |   |   |--- lead_time <= 63.00
|   |   |   |   |   |   |--- weights: [14.21, 1.51] class: 0
|   |   |   |   |   |--- lead_time >  63.00
|   |   |   |   |   |   |--- weights: [0.00, 9.05] class: 1
|   |   |   |--- market_segment_type_Online >  0.50
|   |   |   |   |--- lead_time <= 6.50
|   |   |   |   |   |--- no_of_week_nights <= 10.00
|   |   |   |   |   |   |--- weights: [562.41, 57.32] class: 0
|   |   |   |   |   |--- no_of_week_nights >  10.00
|   |   |   |   |   |   |--- weights: [0.75, 3.02] class: 1
|   |   |   |   |--- lead_time >  6.50
|   |   |   |   |   |--- required_car_parking_space <= 0.50
|   |   |   |   |   |   |--- weights: [2246.65, 1264.14] class: 0
|   |   |   |   |   |--- required_car_parking_space >  0.50
|   |   |   |   |   |   |--- weights: [121.16, 1.51] class: 0
|   |   |--- no_of_special_requests >  1.50
|   |   |   |--- lead_time <= 90.50
|   |   |   |   |--- no_of_week_nights <= 3.50
|   |   |   |   |   |--- weights: [1345.45, 0.00] class: 0
|   |   |   |   |--- no_of_week_nights >  3.50
|   |   |   |   |   |--- no_of_special_requests <= 2.50
|   |   |   |   |   |   |--- weights: [150.33, 46.76] class: 0
|   |   |   |   |   |--- no_of_special_requests >  2.50
|   |   |   |   |   |   |--- weights: [45.62, 0.00] class: 0
|   |   |   |--- lead_time >  90.50
|   |   |   |   |--- no_of_special_requests <= 2.50
|   |   |   |   |   |--- arrival_month <= 8.50
|   |   |   |   |   |   |--- weights: [157.80, 49.78] class: 0
|   |   |   |   |   |--- arrival_month >  8.50
|   |   |   |   |   |   |--- weights: [86.75, 85.99] class: 0
|   |   |   |   |--- no_of_special_requests >  2.50
|   |   |   |   |   |--- weights: [59.83, 0.00] class: 0
|--- lead_time >  151.50
|   |--- avg_price_per_room <= 100.04
|   |   |--- no_of_special_requests <= 0.50
|   |   |   |--- no_of_adults <= 1.50
|   |   |   |   |--- market_segment_type_Online <= 0.50
|   |   |   |   |   |--- lead_time <= 163.50
|   |   |   |   |   |   |--- weights: [3.74, 21.12] class: 1
|   |   |   |   |   |--- lead_time >  163.50
|   |   |   |   |   |   |--- weights: [222.12, 55.82] class: 0
|   |   |   |   |--- market_segment_type_Online >  0.50
|   |   |   |   |   |--- avg_price_per_room <= 30.53
|   |   |   |   |   |   |--- weights: [8.23, 1.51] class: 0
|   |   |   |   |   |--- avg_price_per_room >  30.53
|   |   |   |   |   |   |--- weights: [0.75, 81.46] class: 1
|   |   |   |--- no_of_adults >  1.50
|   |   |   |   |--- arrival_year <= 2017.50
|   |   |   |   |   |--- lead_time <= 215.50
|   |   |   |   |   |   |--- weights: [32.91, 85.99] class: 1
|   |   |   |   |   |--- lead_time >  215.50
|   |   |   |   |   |   |--- weights: [46.37, 7.54] class: 0
|   |   |   |   |--- arrival_year >  2017.50
|   |   |   |   |   |--- avg_price_per_room <= 84.19
|   |   |   |   |   |   |--- weights: [110.69, 518.93] class: 1
|   |   |   |   |   |--- avg_price_per_room >  84.19
|   |   |   |   |   |   |--- weights: [17.20, 819.13] class: 1
|   |   |--- no_of_special_requests >  0.50
|   |   |   |--- no_of_weekend_nights <= 0.50
|   |   |   |   |--- lead_time <= 180.50
|   |   |   |   |   |--- lead_time <= 159.50
|   |   |   |   |   |   |--- weights: [7.48, 7.54] class: 1
|   |   |   |   |   |--- lead_time >  159.50
|   |   |   |   |   |   |--- weights: [32.91, 4.53] class: 0
|   |   |   |   |--- lead_time >  180.50
|   |   |   |   |   |--- no_of_special_requests <= 2.50
|   |   |   |   |   |   |--- weights: [15.71, 193.09] class: 1
|   |   |   |   |   |--- no_of_special_requests >  2.50
|   |   |   |   |   |   |--- weights: [5.98, 0.00] class: 0
|   |   |   |--- no_of_weekend_nights >  0.50
|   |   |   |   |--- market_segment_type_Online <= 0.50
|   |   |   |   |   |--- no_of_week_nights <= 5.50
|   |   |   |   |   |   |--- weights: [97.23, 4.53] class: 0
|   |   |   |   |   |--- no_of_week_nights >  5.50
|   |   |   |   |   |   |--- weights: [0.75, 1.51] class: 1
|   |   |   |   |--- market_segment_type_Online >  0.50
|   |   |   |   |   |--- no_of_week_nights <= 9.00
|   |   |   |   |   |   |--- weights: [213.90, 116.16] class: 0
|   |   |   |   |   |--- no_of_week_nights >  9.00
|   |   |   |   |   |   |--- weights: [0.75, 10.56] class: 1
|   |--- avg_price_per_room >  100.04
|   |   |--- arrival_month <= 11.50
|   |   |   |--- no_of_special_requests <= 2.50
|   |   |   |   |--- weights: [0.00, 2742.50] class: 1
|   |   |   |--- no_of_special_requests >  2.50
|   |   |   |   |--- weights: [20.19, 0.00] class: 0
|   |   |--- arrival_month >  11.50
|   |   |   |--- no_of_special_requests <= 0.50
|   |   |   |   |--- weights: [31.41, 0.00] class: 0
|   |   |   |--- no_of_special_requests >  0.50
|   |   |   |   |--- arrival_date <= 24.50
|   |   |   |   |   |--- weights: [3.74, 0.00] class: 0
|   |   |   |   |--- arrival_date >  24.50
|   |   |   |   |   |--- weights: [2.99, 22.63] class: 1

In [ ]:
# Features in tree building

importances = estimator.feature_importances_
indices = np.argsort(importances)

plt.figure(figsize=(8, 8))
plt.title("Feature Importances")
plt.barh(range(len(indices)), importances[indices], color="violet", align="center")
plt.yticks(range(len(indices)), [feature_names[i] for i in indices])
plt.xlabel("Relative Importance")
plt.show()
In [ ]:
# Cost Complexity Pruning

clf = DecisionTreeClassifier(random_state=1, class_weight="balanced")
path = clf.cost_complexity_pruning_path(X_train, y_train)
ccp_alphas, impurities = abs(path.ccp_alphas), path.impurities
In [ ]:
pd.DataFrame(path)
Out[ ]:
ccp_alphas impurities
0 0.00000 0.00824
1 -0.00000 0.00824
2 0.00000 0.00824
3 0.00000 0.00824
4 0.00000 0.00824
... ... ...
1429 0.00926 0.32787
1430 0.00980 0.33767
1431 0.01253 0.35020
1432 0.03447 0.41913
1433 0.08087 0.50000

1434 rows × 2 columns

In [ ]:
fig, ax = plt.subplots(figsize=(10, 5))
ax.plot(ccp_alphas[:-1], impurities[:-1], marker="o", drawstyle="steps-post")
ax.set_xlabel("effective alpha")
ax.set_ylabel("total impurity of leaves")
ax.set_title("Total Impurity vs effective alpha for training set")
plt.show()
In [ ]:
clfs = []
for ccp_alpha in ccp_alphas:
    clf = DecisionTreeClassifier(
        random_state=1, ccp_alpha=ccp_alpha, class_weight="balanced"
    )
    clf.fit(X_train, y_train) ## Complete the code to fit decision tree on training data
    clfs.append(clf)
print(
    "Number of nodes in the last tree is: {} with ccp_alpha: {}".format(
        clfs[-1].tree_.node_count, ccp_alphas[-1]
    )
)
Number of nodes in the last tree is: 1 with ccp_alpha: 0.08086677213026633

F1 score vs Alpha for Training and Test Sets

In [ ]:
f1_train = []
for clf in clfs:
    pred_train = clf.predict(X_train)
    values_train = f1_score(y_train, pred_train)
    f1_train.append(values_train)

f1_test = []
for clf in clfs:
    pred_test = clf.predict(X_test)
    values_test = f1_score(y_test, pred_test)
    f1_test.append(values_test)
In [ ]:
index_best_model = np.argmax(f1_test)
best_model = clfs[index_best_model]
print(best_model)
DecisionTreeClassifier(ccp_alpha=0.00014254380289357126,
                       class_weight='balanced', random_state=1)
In [ ]:
# Checking performance on Training set

confusion_matrix_sklearn(best_model, X_train, y_train)
In [ ]:
decision_tree_post_perf_train = model_performance_classification_sklearn(
    best_model, X_train, y_train
)
decision_tree_post_perf_train
Out[ ]:
Accuracy Recall Precision F1
0 0.89653 0.90241 0.80789 0.85254
In [ ]:
# Checking performance on Test set

confusion_matrix_sklearn(best_model, X_test, y_test)
In [ ]:
decision_tree_post_test = model_performance_classification_sklearn(
    best_model, X_test, y_test
) 
decision_tree_post_test
Out[ ]:
Accuracy Recall Precision F1
0 0.86775 0.85207 0.76421 0.80575
In [ ]:
plt.figure(figsize=(20, 10))

out = tree.plot_tree(
    best_model,
    feature_names=feature_names,
    filled=True,
    fontsize=9,
    node_ids=False,
    class_names=None,
)
for o in out:
    arrow = o.arrow_patch
    if arrow is not None:
        arrow.set_edgecolor("black")
        arrow.set_linewidth(1)
plt.show()
In [ ]:
print(tree.export_text(best_model, feature_names=feature_names, show_weights=True))
|--- lead_time <= 151.50
|   |--- no_of_special_requests <= 0.50
|   |   |--- market_segment_type_Online <= 0.50
|   |   |   |--- lead_time <= 90.50
|   |   |   |   |--- no_of_weekend_nights <= 0.50
|   |   |   |   |   |--- avg_price_per_room <= 193.00
|   |   |   |   |   |   |--- market_segment_type_Offline <= 0.50
|   |   |   |   |   |   |   |--- lead_time <= 16.50
|   |   |   |   |   |   |   |   |--- avg_price_per_room <= 68.50
|   |   |   |   |   |   |   |   |   |--- weights: [177.25, 9.05] class: 0
|   |   |   |   |   |   |   |   |--- avg_price_per_room >  68.50
|   |   |   |   |   |   |   |   |   |--- arrival_date <= 29.50
|   |   |   |   |   |   |   |   |   |   |--- no_of_adults <= 1.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
|   |   |   |   |   |   |   |   |   |   |--- no_of_adults >  1.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 4
|   |   |   |   |   |   |   |   |   |--- arrival_date >  29.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [1.50, 6.03] class: 1
|   |   |   |   |   |   |   |--- lead_time >  16.50
|   |   |   |   |   |   |   |   |--- avg_price_per_room <= 135.00
|   |   |   |   |   |   |   |   |   |--- arrival_month <= 11.50
|   |   |   |   |   |   |   |   |   |   |--- lead_time <= 39.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 3
|   |   |   |   |   |   |   |   |   |   |--- lead_time >  39.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [42.63, 13.58] class: 0
|   |   |   |   |   |   |   |   |   |--- arrival_month >  11.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [18.70, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- avg_price_per_room >  135.00
|   |   |   |   |   |   |   |   |   |--- weights: [0.00, 10.56] class: 1
|   |   |   |   |   |   |--- market_segment_type_Offline >  0.50
|   |   |   |   |   |   |   |--- weights: [1035.82, 1.51] class: 0
|   |   |   |   |   |--- avg_price_per_room >  193.00
|   |   |   |   |   |   |--- weights: [0.75, 21.12] class: 1
|   |   |   |   |--- no_of_weekend_nights >  0.50
|   |   |   |   |   |--- lead_time <= 65.50
|   |   |   |   |   |   |--- no_of_weekend_nights <= 3.50
|   |   |   |   |   |   |   |--- arrival_date <= 27.50
|   |   |   |   |   |   |   |   |--- arrival_month <= 9.50
|   |   |   |   |   |   |   |   |   |--- arrival_date <= 11.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [144.34, 15.09] class: 0
|   |   |   |   |   |   |   |   |   |--- arrival_date >  11.50
|   |   |   |   |   |   |   |   |   |   |--- lead_time <= 59.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 8
|   |   |   |   |   |   |   |   |   |   |--- lead_time >  59.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 3
|   |   |   |   |   |   |   |   |--- arrival_month >  9.50
|   |   |   |   |   |   |   |   |   |--- weights: [323.09, 21.12] class: 0
|   |   |   |   |   |   |   |--- arrival_date >  27.50
|   |   |   |   |   |   |   |   |--- lead_time <= 1.50
|   |   |   |   |   |   |   |   |   |--- arrival_month <= 2.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [1.50, 37.71] class: 1
|   |   |   |   |   |   |   |   |   |--- arrival_month >  2.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [5.98, 1.51] class: 0
|   |   |   |   |   |   |   |   |--- lead_time >  1.50
|   |   |   |   |   |   |   |   |   |--- weights: [51.60, 3.02] class: 0
|   |   |   |   |   |   |--- no_of_weekend_nights >  3.50
|   |   |   |   |   |   |   |--- weights: [1.50, 15.09] class: 1
|   |   |   |   |   |--- lead_time >  65.50
|   |   |   |   |   |   |--- avg_price_per_room <= 99.98
|   |   |   |   |   |   |   |--- arrival_month <= 3.50
|   |   |   |   |   |   |   |   |--- avg_price_per_room <= 63.25
|   |   |   |   |   |   |   |   |   |--- weights: [15.71, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- avg_price_per_room >  63.25
|   |   |   |   |   |   |   |   |   |--- lead_time <= 77.00
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.75, 15.09] class: 1
|   |   |   |   |   |   |   |   |   |--- lead_time >  77.00
|   |   |   |   |   |   |   |   |   |   |--- no_of_weekend_nights <= 1.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [5.98, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- no_of_weekend_nights >  1.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [2.99, 9.05] class: 1
|   |   |   |   |   |   |   |--- arrival_month >  3.50
|   |   |   |   |   |   |   |   |--- lead_time <= 71.50
|   |   |   |   |   |   |   |   |   |--- no_of_week_nights <= 4.00
|   |   |   |   |   |   |   |   |   |   |--- no_of_children <= 0.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [14.21, 3.02] class: 0
|   |   |   |   |   |   |   |   |   |   |--- no_of_children >  0.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 3.02] class: 1
|   |   |   |   |   |   |   |   |   |--- no_of_week_nights >  4.00
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 4.53] class: 1
|   |   |   |   |   |   |   |   |--- lead_time >  71.50
|   |   |   |   |   |   |   |   |   |--- weights: [59.08, 6.03] class: 0
|   |   |   |   |   |   |--- avg_price_per_room >  99.98
|   |   |   |   |   |   |   |--- lead_time <= 81.00
|   |   |   |   |   |   |   |   |--- avg_price_per_room <= 123.25
|   |   |   |   |   |   |   |   |   |--- lead_time <= 68.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [2.24, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- lead_time >  68.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [5.24, 110.12] class: 1
|   |   |   |   |   |   |   |   |--- avg_price_per_room >  123.25
|   |   |   |   |   |   |   |   |   |--- weights: [6.73, 0.00] class: 0
|   |   |   |   |   |   |   |--- lead_time >  81.00
|   |   |   |   |   |   |   |   |--- weights: [10.47, 1.51] class: 0
|   |   |   |--- lead_time >  90.50
|   |   |   |   |--- lead_time <= 117.50
|   |   |   |   |   |--- avg_price_per_room <= 92.72
|   |   |   |   |   |   |--- avg_price_per_room <= 75.38
|   |   |   |   |   |   |   |--- no_of_week_nights <= 2.50
|   |   |   |   |   |   |   |   |--- avg_price_per_room <= 58.75
|   |   |   |   |   |   |   |   |   |--- weights: [5.98, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- avg_price_per_room >  58.75
|   |   |   |   |   |   |   |   |   |--- market_segment_type_Offline <= 0.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [3.74, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- market_segment_type_Offline >  0.50
|   |   |   |   |   |   |   |   |   |   |--- type_of_meal_plan_Meal Plan 2 <= 0.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 5
|   |   |   |   |   |   |   |   |   |   |--- type_of_meal_plan_Meal Plan 2 >  0.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [2.24, 0.00] class: 0
|   |   |   |   |   |   |   |--- no_of_week_nights >  2.50
|   |   |   |   |   |   |   |   |--- arrival_date <= 11.50
|   |   |   |   |   |   |   |   |   |--- weights: [25.43, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- arrival_date >  11.50
|   |   |   |   |   |   |   |   |   |--- weights: [22.44, 15.09] class: 0
|   |   |   |   |   |   |--- avg_price_per_room >  75.38
|   |   |   |   |   |   |   |--- arrival_month <= 3.50
|   |   |   |   |   |   |   |   |--- weights: [51.60, 1.51] class: 0
|   |   |   |   |   |   |   |--- arrival_month >  3.50
|   |   |   |   |   |   |   |   |--- avg_price_per_room <= 86.68
|   |   |   |   |   |   |   |   |   |--- no_of_adults <= 1.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [1.50, 16.59] class: 1
|   |   |   |   |   |   |   |   |   |--- no_of_adults >  1.50
|   |   |   |   |   |   |   |   |   |   |--- arrival_month <= 4.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [1.50, 15.09] class: 1
|   |   |   |   |   |   |   |   |   |   |--- arrival_month >  4.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 3
|   |   |   |   |   |   |   |   |--- avg_price_per_room >  86.68
|   |   |   |   |   |   |   |   |   |--- weights: [23.93, 3.02] class: 0
|   |   |   |   |   |--- avg_price_per_room >  92.72
|   |   |   |   |   |   |--- no_of_adults <= 1.50
|   |   |   |   |   |   |   |--- weights: [5.24, 98.05] class: 1
|   |   |   |   |   |   |--- no_of_adults >  1.50
|   |   |   |   |   |   |   |--- avg_price_per_room <= 108.50
|   |   |   |   |   |   |   |   |--- arrival_date <= 14.50
|   |   |   |   |   |   |   |   |   |--- avg_price_per_room <= 96.61
|   |   |   |   |   |   |   |   |   |   |--- weights: [11.97, 21.12] class: 1
|   |   |   |   |   |   |   |   |   |--- avg_price_per_room >  96.61
|   |   |   |   |   |   |   |   |   |   |--- weights: [11.97, 1.51] class: 0
|   |   |   |   |   |   |   |   |--- arrival_date >  14.50
|   |   |   |   |   |   |   |   |   |--- no_of_week_nights <= 2.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [3.74, 104.09] class: 1
|   |   |   |   |   |   |   |   |   |--- no_of_week_nights >  2.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [3.74, 3.02] class: 0
|   |   |   |   |   |   |   |--- avg_price_per_room >  108.50
|   |   |   |   |   |   |   |   |--- avg_price_per_room <= 109.50
|   |   |   |   |   |   |   |   |   |--- weights: [27.67, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- avg_price_per_room >  109.50
|   |   |   |   |   |   |   |   |   |--- weights: [6.73, 25.64] class: 1
|   |   |   |   |--- lead_time >  117.50
|   |   |   |   |   |--- no_of_adults <= 1.50
|   |   |   |   |   |   |--- avg_price_per_room <= 122.00
|   |   |   |   |   |   |   |--- weights: [90.49, 0.00] class: 0
|   |   |   |   |   |   |--- avg_price_per_room >  122.00
|   |   |   |   |   |   |   |--- weights: [0.00, 3.02] class: 1
|   |   |   |   |   |--- no_of_adults >  1.50
|   |   |   |   |   |   |--- no_of_week_nights <= 2.50
|   |   |   |   |   |   |   |--- arrival_date <= 7.50
|   |   |   |   |   |   |   |   |--- weights: [38.89, 1.51] class: 0
|   |   |   |   |   |   |   |--- arrival_date >  7.50
|   |   |   |   |   |   |   |   |--- arrival_date <= 27.50
|   |   |   |   |   |   |   |   |   |--- avg_price_per_room <= 89.88
|   |   |   |   |   |   |   |   |   |   |--- weights: [23.18, 18.10] class: 0
|   |   |   |   |   |   |   |   |   |--- avg_price_per_room >  89.88
|   |   |   |   |   |   |   |   |   |   |--- arrival_month <= 6.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [1.50, 33.19] class: 1
|   |   |   |   |   |   |   |   |   |   |--- arrival_month >  6.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
|   |   |   |   |   |   |   |   |--- arrival_date >  27.50
|   |   |   |   |   |   |   |   |   |--- weights: [12.71, 1.51] class: 0
|   |   |   |   |   |   |--- no_of_week_nights >  2.50
|   |   |   |   |   |   |   |--- arrival_date <= 20.50
|   |   |   |   |   |   |   |   |--- avg_price_per_room <= 74.12
|   |   |   |   |   |   |   |   |   |--- avg_price_per_room <= 65.88
|   |   |   |   |   |   |   |   |   |   |--- weights: [3.74, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- avg_price_per_room >  65.88
|   |   |   |   |   |   |   |   |   |   |--- weights: [2.24, 9.05] class: 1
|   |   |   |   |   |   |   |   |--- avg_price_per_room >  74.12
|   |   |   |   |   |   |   |   |   |--- lead_time <= 146.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [20.94, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- lead_time >  146.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.75, 3.02] class: 1
|   |   |   |   |   |   |   |--- arrival_date >  20.50
|   |   |   |   |   |   |   |   |--- weights: [53.10, 1.51] class: 0
|   |   |--- market_segment_type_Online >  0.50
|   |   |   |--- lead_time <= 9.50
|   |   |   |   |--- avg_price_per_room <= 200.38
|   |   |   |   |   |--- arrival_month <= 8.50
|   |   |   |   |   |   |--- lead_time <= 3.50
|   |   |   |   |   |   |   |--- no_of_weekend_nights <= 1.50
|   |   |   |   |   |   |   |   |--- arrival_month <= 1.50
|   |   |   |   |   |   |   |   |   |--- weights: [35.90, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- arrival_month >  1.50
|   |   |   |   |   |   |   |   |   |--- avg_price_per_room <= 74.57
|   |   |   |   |   |   |   |   |   |   |--- weights: [23.93, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- avg_price_per_room >  74.57
|   |   |   |   |   |   |   |   |   |   |--- avg_price_per_room <= 75.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.75, 4.53] class: 1
|   |   |   |   |   |   |   |   |   |   |--- avg_price_per_room >  75.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 6
|   |   |   |   |   |   |   |--- no_of_weekend_nights >  1.50
|   |   |   |   |   |   |   |   |--- arrival_date <= 26.50
|   |   |   |   |   |   |   |   |   |--- weights: [20.94, 7.54] class: 0
|   |   |   |   |   |   |   |   |--- arrival_date >  26.50
|   |   |   |   |   |   |   |   |   |--- weights: [1.50, 18.10] class: 1
|   |   |   |   |   |   |--- lead_time >  3.50
|   |   |   |   |   |   |   |--- avg_price_per_room <= 99.38
|   |   |   |   |   |   |   |   |--- arrival_month <= 1.50
|   |   |   |   |   |   |   |   |   |--- weights: [23.93, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- arrival_month >  1.50
|   |   |   |   |   |   |   |   |   |--- no_of_weekend_nights <= 1.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [41.88, 27.15] class: 0
|   |   |   |   |   |   |   |   |   |--- no_of_weekend_nights >  1.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [1.50, 7.54] class: 1
|   |   |   |   |   |   |   |--- avg_price_per_room >  99.38
|   |   |   |   |   |   |   |   |--- required_car_parking_space <= 0.50
|   |   |   |   |   |   |   |   |   |--- arrival_date <= 23.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [21.69, 102.58] class: 1
|   |   |   |   |   |   |   |   |   |--- arrival_date >  23.50
|   |   |   |   |   |   |   |   |   |   |--- no_of_weekend_nights <= 1.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 3
|   |   |   |   |   |   |   |   |   |   |--- no_of_weekend_nights >  1.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.75, 12.07] class: 1
|   |   |   |   |   |   |   |   |--- required_car_parking_space >  0.50
|   |   |   |   |   |   |   |   |   |--- weights: [2.24, 0.00] class: 0
|   |   |   |   |   |--- arrival_month >  8.50
|   |   |   |   |   |   |--- arrival_year <= 2017.50
|   |   |   |   |   |   |   |--- weights: [126.39, 1.51] class: 0
|   |   |   |   |   |   |--- arrival_year >  2017.50
|   |   |   |   |   |   |   |--- arrival_month <= 11.50
|   |   |   |   |   |   |   |   |--- arrival_date <= 24.50
|   |   |   |   |   |   |   |   |   |--- arrival_month <= 10.50
|   |   |   |   |   |   |   |   |   |   |--- avg_price_per_room <= 132.05
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [19.45, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- avg_price_per_room >  132.05
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
|   |   |   |   |   |   |   |   |   |--- arrival_month >  10.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [11.22, 16.59] class: 1
|   |   |   |   |   |   |   |   |--- arrival_date >  24.50
|   |   |   |   |   |   |   |   |   |--- weights: [23.93, 1.51] class: 0
|   |   |   |   |   |   |   |--- arrival_month >  11.50
|   |   |   |   |   |   |   |   |--- weights: [33.65, 0.00] class: 0
|   |   |   |   |--- avg_price_per_room >  200.38
|   |   |   |   |   |--- weights: [0.75, 37.71] class: 1
|   |   |   |--- lead_time >  9.50
|   |   |   |   |--- avg_price_per_room <= 105.27
|   |   |   |   |   |--- lead_time <= 25.50
|   |   |   |   |   |   |--- arrival_month <= 11.50
|   |   |   |   |   |   |   |--- arrival_month <= 1.50
|   |   |   |   |   |   |   |   |--- weights: [34.40, 0.00] class: 0
|   |   |   |   |   |   |   |--- arrival_month >  1.50
|   |   |   |   |   |   |   |   |--- arrival_year <= 2017.50
|   |   |   |   |   |   |   |   |   |--- weights: [23.93, 3.02] class: 0
|   |   |   |   |   |   |   |   |--- arrival_year >  2017.50
|   |   |   |   |   |   |   |   |   |--- avg_price_per_room <= 56.15
|   |   |   |   |   |   |   |   |   |   |--- weights: [3.74, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- avg_price_per_room >  56.15
|   |   |   |   |   |   |   |   |   |   |--- weights: [51.60, 129.73] class: 1
|   |   |   |   |   |   |--- arrival_month >  11.50
|   |   |   |   |   |   |   |--- weights: [59.08, 0.00] class: 0
|   |   |   |   |   |--- lead_time >  25.50
|   |   |   |   |   |   |--- avg_price_per_room <= 60.07
|   |   |   |   |   |   |   |--- lead_time <= 84.50
|   |   |   |   |   |   |   |   |--- weights: [37.39, 4.53] class: 0
|   |   |   |   |   |   |   |--- lead_time >  84.50
|   |   |   |   |   |   |   |   |--- arrival_date <= 27.00
|   |   |   |   |   |   |   |   |   |--- arrival_month <= 6.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [5.98, 1.51] class: 0
|   |   |   |   |   |   |   |   |   |--- arrival_month >  6.50
|   |   |   |   |   |   |   |   |   |   |--- lead_time <= 131.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 15.09] class: 1
|   |   |   |   |   |   |   |   |   |   |--- lead_time >  131.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [2.24, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- arrival_date >  27.00
|   |   |   |   |   |   |   |   |   |--- weights: [6.73, 0.00] class: 0
|   |   |   |   |   |   |--- avg_price_per_room >  60.07
|   |   |   |   |   |   |   |--- required_car_parking_space <= 0.50
|   |   |   |   |   |   |   |   |--- type_of_meal_plan_Not Selected <= 0.50
|   |   |   |   |   |   |   |   |   |--- type_of_meal_plan_Meal Plan 2 <= 0.50
|   |   |   |   |   |   |   |   |   |   |--- arrival_year <= 2017.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
|   |   |   |   |   |   |   |   |   |   |--- arrival_year >  2017.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 10
|   |   |   |   |   |   |   |   |   |--- type_of_meal_plan_Meal Plan 2 >  0.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [1.50, 42.24] class: 1
|   |   |   |   |   |   |   |   |--- type_of_meal_plan_Not Selected >  0.50
|   |   |   |   |   |   |   |   |   |--- weights: [80.77, 408.81] class: 1
|   |   |   |   |   |   |   |--- required_car_parking_space >  0.50
|   |   |   |   |   |   |   |   |--- weights: [12.71, 0.00] class: 0
|   |   |   |   |--- avg_price_per_room >  105.27
|   |   |   |   |   |--- required_car_parking_space <= 0.50
|   |   |   |   |   |   |--- arrival_year <= 2017.50
|   |   |   |   |   |   |   |--- type_of_meal_plan_Meal Plan 2 <= 0.50
|   |   |   |   |   |   |   |   |--- weights: [26.18, 6.03] class: 0
|   |   |   |   |   |   |   |--- type_of_meal_plan_Meal Plan 2 >  0.50
|   |   |   |   |   |   |   |   |--- weights: [0.00, 16.59] class: 1
|   |   |   |   |   |   |--- arrival_year >  2017.50
|   |   |   |   |   |   |   |--- arrival_month <= 10.50
|   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 5 <= 0.50
|   |   |   |   |   |   |   |   |   |--- avg_price_per_room <= 195.43
|   |   |   |   |   |   |   |   |   |   |--- arrival_date <= 24.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 3
|   |   |   |   |   |   |   |   |   |   |--- arrival_date >  24.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 6
|   |   |   |   |   |   |   |   |   |--- avg_price_per_room >  195.43
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 123.70] class: 1
|   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 5 >  0.50
|   |   |   |   |   |   |   |   |   |--- arrival_date <= 11.00
|   |   |   |   |   |   |   |   |   |   |--- weights: [5.98, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- arrival_date >  11.00
|   |   |   |   |   |   |   |   |   |   |--- weights: [4.49, 10.56] class: 1
|   |   |   |   |   |   |   |--- arrival_month >  10.50
|   |   |   |   |   |   |   |   |--- avg_price_per_room <= 168.06
|   |   |   |   |   |   |   |   |   |--- lead_time <= 22.00
|   |   |   |   |   |   |   |   |   |   |--- arrival_month <= 11.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [1.50, 15.09] class: 1
|   |   |   |   |   |   |   |   |   |   |--- arrival_month >  11.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [11.97, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- lead_time >  22.00
|   |   |   |   |   |   |   |   |   |   |--- weights: [13.46, 67.88] class: 1
|   |   |   |   |   |   |   |   |--- avg_price_per_room >  168.06
|   |   |   |   |   |   |   |   |   |--- weights: [9.72, 3.02] class: 0
|   |   |   |   |   |--- required_car_parking_space >  0.50
|   |   |   |   |   |   |--- weights: [23.93, 0.00] class: 0
|   |--- no_of_special_requests >  0.50
|   |   |--- no_of_special_requests <= 1.50
|   |   |   |--- market_segment_type_Online <= 0.50
|   |   |   |   |--- type_of_meal_plan_Not Selected <= 0.50
|   |   |   |   |   |--- lead_time <= 102.50
|   |   |   |   |   |   |--- weights: [604.29, 7.54] class: 0
|   |   |   |   |   |--- lead_time >  102.50
|   |   |   |   |   |   |--- arrival_month <= 8.50
|   |   |   |   |   |   |   |--- weights: [62.82, 12.07] class: 0
|   |   |   |   |   |   |--- arrival_month >  8.50
|   |   |   |   |   |   |   |--- no_of_adults <= 2.50
|   |   |   |   |   |   |   |   |--- weights: [2.24, 7.54] class: 1
|   |   |   |   |   |   |   |--- no_of_adults >  2.50
|   |   |   |   |   |   |   |   |--- weights: [3.74, 0.00] class: 0
|   |   |   |   |--- type_of_meal_plan_Not Selected >  0.50
|   |   |   |   |   |--- lead_time <= 63.00
|   |   |   |   |   |   |--- weights: [14.21, 1.51] class: 0
|   |   |   |   |   |--- lead_time >  63.00
|   |   |   |   |   |   |--- weights: [0.00, 9.05] class: 1
|   |   |   |--- market_segment_type_Online >  0.50
|   |   |   |   |--- lead_time <= 6.50
|   |   |   |   |   |--- no_of_week_nights <= 10.00
|   |   |   |   |   |   |--- weights: [562.41, 57.32] class: 0
|   |   |   |   |   |--- no_of_week_nights >  10.00
|   |   |   |   |   |   |--- weights: [0.75, 3.02] class: 1
|   |   |   |   |--- lead_time >  6.50
|   |   |   |   |   |--- required_car_parking_space <= 0.50
|   |   |   |   |   |   |--- avg_price_per_room <= 118.54
|   |   |   |   |   |   |   |--- lead_time <= 45.50
|   |   |   |   |   |   |   |   |--- no_of_week_nights <= 3.50
|   |   |   |   |   |   |   |   |   |--- arrival_month <= 11.50
|   |   |   |   |   |   |   |   |   |   |--- arrival_month <= 1.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [50.86, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- arrival_month >  1.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 7
|   |   |   |   |   |   |   |   |   |--- arrival_month >  11.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [71.80, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- no_of_week_nights >  3.50
|   |   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 2 <= 0.50
|   |   |   |   |   |   |   |   |   |   |--- lead_time <= 16.00
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [25.43, 4.53] class: 0
|   |   |   |   |   |   |   |   |   |   |--- lead_time >  16.00
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 7
|   |   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 2 >  0.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 7.54] class: 1
|   |   |   |   |   |   |   |--- lead_time >  45.50
|   |   |   |   |   |   |   |   |--- arrival_year <= 2017.50
|   |   |   |   |   |   |   |   |   |--- lead_time <= 61.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [25.43, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- lead_time >  61.50
|   |   |   |   |   |   |   |   |   |   |--- arrival_month <= 7.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
|   |   |   |   |   |   |   |   |   |   |--- arrival_month >  7.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 5
|   |   |   |   |   |   |   |   |--- arrival_year >  2017.50
|   |   |   |   |   |   |   |   |   |--- avg_price_per_room <= 71.08
|   |   |   |   |   |   |   |   |   |   |--- weights: [78.53, 7.54] class: 0
|   |   |   |   |   |   |   |   |   |--- avg_price_per_room >  71.08
|   |   |   |   |   |   |   |   |   |   |--- arrival_month <= 9.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 7
|   |   |   |   |   |   |   |   |   |   |--- arrival_month >  9.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 5
|   |   |   |   |   |   |--- avg_price_per_room >  118.54
|   |   |   |   |   |   |   |--- arrival_month <= 8.50
|   |   |   |   |   |   |   |   |--- arrival_date <= 19.50
|   |   |   |   |   |   |   |   |   |--- no_of_adults <= 2.50
|   |   |   |   |   |   |   |   |   |   |--- no_of_week_nights <= 5.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 6
|   |   |   |   |   |   |   |   |   |   |--- no_of_week_nights >  5.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 6.03] class: 1
|   |   |   |   |   |   |   |   |   |--- no_of_adults >  2.50
|   |   |   |   |   |   |   |   |   |   |--- arrival_date <= 17.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [40.39, 25.64] class: 0
|   |   |   |   |   |   |   |   |   |   |--- arrival_date >  17.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [4.49, 13.58] class: 1
|   |   |   |   |   |   |   |   |--- arrival_date >  19.50
|   |   |   |   |   |   |   |   |   |--- arrival_date <= 27.50
|   |   |   |   |   |   |   |   |   |   |--- avg_price_per_room <= 121.20
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [17.20, 7.54] class: 0
|   |   |   |   |   |   |   |   |   |   |--- avg_price_per_room >  121.20
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 4
|   |   |   |   |   |   |   |   |   |--- arrival_date >  27.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [62.07, 33.19] class: 0
|   |   |   |   |   |   |   |--- arrival_month >  8.50
|   |   |   |   |   |   |   |   |--- arrival_year <= 2017.50
|   |   |   |   |   |   |   |   |   |--- arrival_month <= 9.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [12.71, 9.05] class: 0
|   |   |   |   |   |   |   |   |   |--- arrival_month >  9.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [26.18, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- arrival_year >  2017.50
|   |   |   |   |   |   |   |   |   |--- arrival_month <= 11.50
|   |   |   |   |   |   |   |   |   |   |--- avg_price_per_room <= 159.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 10
|   |   |   |   |   |   |   |   |   |   |--- avg_price_per_room >  159.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
|   |   |   |   |   |   |   |   |   |--- arrival_month >  11.50
|   |   |   |   |   |   |   |   |   |   |--- lead_time <= 100.00
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [46.37, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- lead_time >  100.00
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 18.10] class: 1
|   |   |   |   |   |--- required_car_parking_space >  0.50
|   |   |   |   |   |   |--- weights: [121.16, 1.51] class: 0
|   |   |--- no_of_special_requests >  1.50
|   |   |   |--- lead_time <= 90.50
|   |   |   |   |--- no_of_week_nights <= 3.50
|   |   |   |   |   |--- weights: [1345.45, 0.00] class: 0
|   |   |   |   |--- no_of_week_nights >  3.50
|   |   |   |   |   |--- no_of_special_requests <= 2.50
|   |   |   |   |   |   |--- avg_price_per_room <= 92.42
|   |   |   |   |   |   |   |--- weights: [45.62, 25.64] class: 0
|   |   |   |   |   |   |--- avg_price_per_room >  92.42
|   |   |   |   |   |   |   |--- weights: [104.70, 21.12] class: 0
|   |   |   |   |   |--- no_of_special_requests >  2.50
|   |   |   |   |   |   |--- weights: [45.62, 0.00] class: 0
|   |   |   |--- lead_time >  90.50
|   |   |   |   |--- no_of_special_requests <= 2.50
|   |   |   |   |   |--- arrival_month <= 8.50
|   |   |   |   |   |   |--- avg_price_per_room <= 202.95
|   |   |   |   |   |   |   |--- arrival_year <= 2017.50
|   |   |   |   |   |   |   |   |--- weights: [8.97, 10.56] class: 1
|   |   |   |   |   |   |   |--- arrival_year >  2017.50
|   |   |   |   |   |   |   |   |--- lead_time <= 150.50
|   |   |   |   |   |   |   |   |   |--- weights: [148.83, 24.14] class: 0
|   |   |   |   |   |   |   |   |--- lead_time >  150.50
|   |   |   |   |   |   |   |   |   |--- weights: [0.00, 4.53] class: 1
|   |   |   |   |   |   |--- avg_price_per_room >  202.95
|   |   |   |   |   |   |   |--- weights: [0.00, 10.56] class: 1
|   |   |   |   |   |--- arrival_month >  8.50
|   |   |   |   |   |   |--- avg_price_per_room <= 152.10
|   |   |   |   |   |   |   |--- avg_price_per_room <= 73.53
|   |   |   |   |   |   |   |   |--- weights: [11.22, 3.02] class: 0
|   |   |   |   |   |   |   |--- avg_price_per_room >  73.53
|   |   |   |   |   |   |   |   |--- avg_price_per_room <= 90.42
|   |   |   |   |   |   |   |   |   |--- lead_time <= 107.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [2.99, 22.63] class: 1
|   |   |   |   |   |   |   |   |   |--- lead_time >  107.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [10.47, 9.05] class: 0
|   |   |   |   |   |   |   |   |--- avg_price_per_room >  90.42
|   |   |   |   |   |   |   |   |   |--- no_of_adults <= 1.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [5.98, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- no_of_adults >  1.50
|   |   |   |   |   |   |   |   |   |   |--- no_of_weekend_nights <= 1.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 3
|   |   |   |   |   |   |   |   |   |   |--- no_of_weekend_nights >  1.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [19.45, 10.56] class: 0
|   |   |   |   |   |   |--- avg_price_per_room >  152.10
|   |   |   |   |   |   |   |--- weights: [11.22, 1.51] class: 0
|   |   |   |   |--- no_of_special_requests >  2.50
|   |   |   |   |   |--- weights: [59.83, 0.00] class: 0
|--- lead_time >  151.50
|   |--- avg_price_per_room <= 100.04
|   |   |--- no_of_special_requests <= 0.50
|   |   |   |--- no_of_adults <= 1.50
|   |   |   |   |--- market_segment_type_Online <= 0.50
|   |   |   |   |   |--- lead_time <= 163.50
|   |   |   |   |   |   |--- arrival_month <= 5.00
|   |   |   |   |   |   |   |--- weights: [2.99, 0.00] class: 0
|   |   |   |   |   |   |--- arrival_month >  5.00
|   |   |   |   |   |   |   |--- weights: [0.75, 21.12] class: 1
|   |   |   |   |   |--- lead_time >  163.50
|   |   |   |   |   |   |--- lead_time <= 341.00
|   |   |   |   |   |   |   |--- lead_time <= 173.00
|   |   |   |   |   |   |   |   |--- arrival_date <= 3.50
|   |   |   |   |   |   |   |   |   |--- weights: [38.14, 9.05] class: 0
|   |   |   |   |   |   |   |   |--- arrival_date >  3.50
|   |   |   |   |   |   |   |   |   |--- no_of_weekend_nights <= 1.00
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 12.07] class: 1
|   |   |   |   |   |   |   |   |   |--- no_of_weekend_nights >  1.00
|   |   |   |   |   |   |   |   |   |   |--- weights: [2.24, 0.00] class: 0
|   |   |   |   |   |   |   |--- lead_time >  173.00
|   |   |   |   |   |   |   |   |--- arrival_month <= 5.50
|   |   |   |   |   |   |   |   |   |--- arrival_date <= 7.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 4.53] class: 1
|   |   |   |   |   |   |   |   |   |--- arrival_date >  7.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [5.24, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- arrival_month >  5.50
|   |   |   |   |   |   |   |   |   |--- weights: [164.54, 6.03] class: 0
|   |   |   |   |   |   |--- lead_time >  341.00
|   |   |   |   |   |   |   |--- weights: [11.97, 24.14] class: 1
|   |   |   |   |--- market_segment_type_Online >  0.50
|   |   |   |   |   |--- avg_price_per_room <= 30.53
|   |   |   |   |   |   |--- weights: [8.23, 1.51] class: 0
|   |   |   |   |   |--- avg_price_per_room >  30.53
|   |   |   |   |   |   |--- weights: [0.75, 81.46] class: 1
|   |   |   |--- no_of_adults >  1.50
|   |   |   |   |--- arrival_year <= 2017.50
|   |   |   |   |   |--- lead_time <= 215.50
|   |   |   |   |   |   |--- lead_time <= 167.50
|   |   |   |   |   |   |   |--- weights: [15.71, 3.02] class: 0
|   |   |   |   |   |   |--- lead_time >  167.50
|   |   |   |   |   |   |   |--- arrival_date <= 9.50
|   |   |   |   |   |   |   |   |--- avg_price_per_room <= 74.62
|   |   |   |   |   |   |   |   |   |--- weights: [17.20, 1.51] class: 0
|   |   |   |   |   |   |   |   |--- avg_price_per_room >  74.62
|   |   |   |   |   |   |   |   |   |--- weights: [0.00, 16.59] class: 1
|   |   |   |   |   |   |   |--- arrival_date >  9.50
|   |   |   |   |   |   |   |   |--- weights: [0.00, 64.87] class: 1
|   |   |   |   |   |--- lead_time >  215.50
|   |   |   |   |   |   |--- market_segment_type_Offline <= 0.50
|   |   |   |   |   |   |   |--- weights: [0.00, 7.54] class: 1
|   |   |   |   |   |   |--- market_segment_type_Offline >  0.50
|   |   |   |   |   |   |   |--- weights: [46.37, 0.00] class: 0
|   |   |   |   |--- arrival_year >  2017.50
|   |   |   |   |   |--- avg_price_per_room <= 84.19
|   |   |   |   |   |   |--- market_segment_type_Online <= 0.50
|   |   |   |   |   |   |   |--- lead_time <= 211.00
|   |   |   |   |   |   |   |   |--- no_of_previous_bookings_not_canceled <= 0.50
|   |   |   |   |   |   |   |   |   |--- weights: [54.60, 6.03] class: 0
|   |   |   |   |   |   |   |   |--- no_of_previous_bookings_not_canceled >  0.50
|   |   |   |   |   |   |   |   |   |--- weights: [0.00, 4.53] class: 1
|   |   |   |   |   |   |   |--- lead_time >  211.00
|   |   |   |   |   |   |   |   |--- arrival_month <= 11.50
|   |   |   |   |   |   |   |   |   |--- avg_price_per_room <= 80.38
|   |   |   |   |   |   |   |   |   |   |--- no_of_week_nights <= 3.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [10.47, 245.89] class: 1
|   |   |   |   |   |   |   |   |   |   |--- no_of_week_nights >  3.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 3
|   |   |   |   |   |   |   |   |   |--- avg_price_per_room >  80.38
|   |   |   |   |   |   |   |   |   |   |--- weights: [11.22, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- arrival_month >  11.50
|   |   |   |   |   |   |   |   |   |--- weights: [20.19, 0.00] class: 0
|   |   |   |   |   |   |--- market_segment_type_Online >  0.50
|   |   |   |   |   |   |   |--- weights: [2.99, 229.30] class: 1
|   |   |   |   |   |--- avg_price_per_room >  84.19
|   |   |   |   |   |   |--- no_of_adults <= 2.50
|   |   |   |   |   |   |   |--- arrival_month <= 11.50
|   |   |   |   |   |   |   |   |--- weights: [9.72, 810.08] class: 1
|   |   |   |   |   |   |   |--- arrival_month >  11.50
|   |   |   |   |   |   |   |   |--- market_segment_type_Offline <= 0.50
|   |   |   |   |   |   |   |   |   |--- weights: [0.00, 9.05] class: 1
|   |   |   |   |   |   |   |   |--- market_segment_type_Offline >  0.50
|   |   |   |   |   |   |   |   |   |--- weights: [3.74, 0.00] class: 0
|   |   |   |   |   |   |--- no_of_adults >  2.50
|   |   |   |   |   |   |   |--- weights: [3.74, 0.00] class: 0
|   |   |--- no_of_special_requests >  0.50
|   |   |   |--- no_of_weekend_nights <= 0.50
|   |   |   |   |--- lead_time <= 180.50
|   |   |   |   |   |--- lead_time <= 159.50
|   |   |   |   |   |   |--- arrival_month <= 8.50
|   |   |   |   |   |   |   |--- weights: [5.98, 0.00] class: 0
|   |   |   |   |   |   |--- arrival_month >  8.50
|   |   |   |   |   |   |   |--- weights: [1.50, 7.54] class: 1
|   |   |   |   |   |--- lead_time >  159.50
|   |   |   |   |   |   |--- weights: [32.91, 4.53] class: 0
|   |   |   |   |--- lead_time >  180.50
|   |   |   |   |   |--- no_of_special_requests <= 2.50
|   |   |   |   |   |   |--- market_segment_type_Online <= 0.50
|   |   |   |   |   |   |   |--- weights: [9.72, 6.03] class: 0
|   |   |   |   |   |   |--- market_segment_type_Online >  0.50
|   |   |   |   |   |   |   |--- weights: [5.98, 187.06] class: 1
|   |   |   |   |   |--- no_of_special_requests >  2.50
|   |   |   |   |   |   |--- weights: [5.98, 0.00] class: 0
|   |   |   |--- no_of_weekend_nights >  0.50
|   |   |   |   |--- market_segment_type_Online <= 0.50
|   |   |   |   |   |--- weights: [97.97, 6.03] class: 0
|   |   |   |   |--- market_segment_type_Online >  0.50
|   |   |   |   |   |--- no_of_week_nights <= 9.00
|   |   |   |   |   |   |--- arrival_month <= 11.50
|   |   |   |   |   |   |   |--- avg_price_per_room <= 76.48
|   |   |   |   |   |   |   |   |--- weights: [41.13, 3.02] class: 0
|   |   |   |   |   |   |   |--- avg_price_per_room >  76.48
|   |   |   |   |   |   |   |   |--- arrival_date <= 28.50
|   |   |   |   |   |   |   |   |   |--- no_of_week_nights <= 6.50
|   |   |   |   |   |   |   |   |   |   |--- lead_time <= 152.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.75, 4.53] class: 1
|   |   |   |   |   |   |   |   |   |   |--- lead_time >  152.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 3
|   |   |   |   |   |   |   |   |   |--- no_of_week_nights >  6.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.75, 6.03] class: 1
|   |   |   |   |   |   |   |   |--- arrival_date >  28.50
|   |   |   |   |   |   |   |   |   |--- arrival_month <= 9.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [10.47, 4.53] class: 0
|   |   |   |   |   |   |   |   |   |--- arrival_month >  9.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [4.49, 16.59] class: 1
|   |   |   |   |   |   |--- arrival_month >  11.50
|   |   |   |   |   |   |   |--- arrival_date <= 14.50
|   |   |   |   |   |   |   |   |--- weights: [7.48, 1.51] class: 0
|   |   |   |   |   |   |   |--- arrival_date >  14.50
|   |   |   |   |   |   |   |   |--- weights: [9.72, 22.63] class: 1
|   |   |   |   |   |--- no_of_week_nights >  9.00
|   |   |   |   |   |   |--- weights: [0.75, 10.56] class: 1
|   |--- avg_price_per_room >  100.04
|   |   |--- arrival_month <= 11.50
|   |   |   |--- no_of_special_requests <= 2.50
|   |   |   |   |--- weights: [0.00, 2742.50] class: 1
|   |   |   |--- no_of_special_requests >  2.50
|   |   |   |   |--- weights: [20.19, 0.00] class: 0
|   |   |--- arrival_month >  11.50
|   |   |   |--- no_of_special_requests <= 0.50
|   |   |   |   |--- weights: [31.41, 0.00] class: 0
|   |   |   |--- no_of_special_requests >  0.50
|   |   |   |   |--- arrival_date <= 24.50
|   |   |   |   |   |--- weights: [3.74, 0.00] class: 0
|   |   |   |   |--- arrival_date >  24.50
|   |   |   |   |   |--- weights: [2.99, 22.63] class: 1

In [ ]:
importances = best_model.feature_importances_
indices = np.argsort(importances)

plt.figure(figsize=(12, 12))
plt.title("Feature Importances")
plt.barh(range(len(indices)), importances[indices], color="violet", align="center")
plt.yticks(range(len(indices)), [feature_names[i] for i in indices])
plt.xlabel("Relative Importance")
plt.show()

Comparing Decision Tree models

In [ ]:
# training performance comparison

models_train_comp_df = pd.concat(
    [
        decision_tree_perf_train.T,
        decision_tree_tune_perf_train.T,
        decision_tree_post_perf_train.T,
    ],
    axis=1,
)
models_train_comp_df.columns = [
    "Decision Tree sklearn",
    "Decision Tree (Pre-Pruning)",
    "Decision Tree (Post-Pruning)",
]
print("Training performance comparison:")
models_train_comp_df
Training performance comparison:
Out[ ]:
Decision Tree sklearn Decision Tree (Pre-Pruning) Decision Tree (Post-Pruning)
Accuracy 0.99403 0.99403 0.89653
Recall 0.98655 0.98655 0.90241
Precision 0.99538 0.99538 0.80789
F1 0.99095 0.99095 0.85254
In [ ]:
# testing performance comparison

models_test_comp_df = pd.concat(
    [
        decision_tree_perf_test.T,
        decision_tree_tune_perf_test.T,
        decision_tree_post_test.T,
    ],
    axis=1,
)
models_test_comp_df.columns = [
    "Decision Tree sklearn",
    "Decision Tree (Pre-Pruning)",
    "Decision Tree (Post-Pruning)",
]
print("Test performance comparison:")
models_test_comp_df 
Test performance comparison:
Out[ ]:
Decision Tree sklearn Decision Tree (Pre-Pruning) Decision Tree (Post-Pruning)
Accuracy 0.99403 0.87078 0.86775
Recall 0.98655 0.80411 0.85207
Precision 0.99538 0.79644 0.76421
F1 0.99095 0.80026 0.80575

Post-Pruned has higher F1 value and difference between precision and recall is high.

Pre-Prunned - Difference between precision and recall are normal.

Hotel should use Pre-prunned model

Actionable Insights and Recommendations

  • What profitable policies for cancellations and refunds can the hotel adopt?
  • What other recommedations would you suggest to the hotel?

Insights -

Lead time and Average price per room have positive correlation with cancelled bookings.

Number of special requests have negative correlation with cancelled bookings.

Decision tree seems to be the better model to determine predictions of cancellations than logistic regression.

Recommendations -

Hotel needs to work on lead time, average price per room and number of special requests to keep up brand equity.

Customers can be notified of lead time and they can be asked for any special requets before stay.

Special amenities list can be provided after booking to reduce chances of cancellation.

Depending on the Lead time, average price per room can be adjusted to attract more customers to the hotel.

Depending on the number of special requests, hotel can adjust the average price per room to maintain resources and brand equity.

Hotel can come up with policies on cancellations or refunda based on lead time.