Predicting Hotel Bookings on Expedia

Kelsey Heng
Towards Data Science
4 min readOct 16, 2019

--

Photo: https://designanthologymag.com/story/st-regis-hong-kong

Travellers come in many shapes and forms. With the internet, travelling is no longer limited to the businessman or the privileged. Thus, competition for consumers increases.

Expedia, an online booking platform, that has 600 million users on its site monthly and accommodations in more than 200 countries. To retain customers and merchants, Expedia looked at consumer behaviour and came out with a data strategy — personalization. Personalization has become one of the key strategies for two reasons: an average user will visit the site 4–9 times before booking and make a reservation from the first page of the search. Therefore, it is essential that similar hotels show up on each search.

Using the dataset by Expedia on Kaggle, I will attempt to build a machine learning model to predict the group of hotels that a customer will make a reservation. The dataset contains a log of customer behaviour including what they searched for, if any reservation were made, and if the search was a travel package. All features in the dataset were encoded for privacy purposes.

Expedia used in-house algorithms to form hotel clusters, whereby similar hotels are grouped based on factors such as price, star ratings, distance to the city centre. For this project, I used 1% of the dataset given and only the top ten clusters, which still leaves me with 290,000 rows of data.

With the features given, I generated a couple of insights in an attempt to build a model with higher accuracy. Some of the features that were important to the model were a mixture of hotel-related and user-related.

Three machine learning models, namely Keras neural network, decision tree and boosted tree, were used to find the most suitable one. Surprisingly, the neural network did not do as well as the other models.

There was a mixture of evaluation being used for the different models.

  1. Keras Neural Network

Accuracy and loss did not improve after many rounds of training the model. Accuracy remained around the 30% mark which is not the most ideal case.

2. Decision Tree

Despite having a higher accuracy than the neural network, the decision tree was not an ideal model. The precision result for various hotel clusters was significantly different, ranging from 40–90%.

3. Boosted tree

As compared to the other models, the Boosted tree has the highest accuracy. A k-fold evaluation also showed no variation, ensuing this being the best model to work on.

Conclusion

If given more time, I would like to utilize all of the data that was given to build a model with higher accuracy. Moving on, it will be very useful for a booking platform to be highly customizable for each customer.

And I have come to the end of my immersive programme with Metis. This is just the beginning of my data science journey. I will continue to work on data science projects and hopefully land myself in a job soon.

The codes for this project can be found on my Github. I can be contacted via LinkedIn if you would like to connect.

--

--

Neuroscience researcher turned analytics consultant. Huge love for data storytelling, turning numbers into fun facts!