Level of Education of Students Involved


Faculty Sponsor

Jon Baegley


Computer Science, Data Analytics

Presentation Type

Poster Presentation

Symposium Date

Spring 4-27-2023


For my capstone project, we will be working on predicting the prices of the properties that are marketed and sold for short time boarding by the company called Airbnb. The location of the properties we have chosen to predict the prices for is the city of Berlin which is situated in Germany. The reason for choosing this location is that it happens to be one of the busiest and most favorable places for short-term lodging. The exciting element about this project to me is that it uses real-time data that represents a major city of a well-known country.

We will predict the prices of the properties depending on various attributes of the property like its location, amenities it provides, host, reviews from the past and much more. To achieve this objective, we will like to employ several Data mining techniques such as Exploratory Data analysis, Feature Engineering, and Model building. For the model-building part, after getting to know every bit of data and making it suitable enough for model-building, we will like to use the Regression model and see how well it fits the data. The basis for judging the model’s performance will be the Mean Absolute Error (MAE), Root Mean Square Error (RMSE), R2, and Adjusted R2 values. Talking about the data, the data constitutes about 22553 rows and 72 columns analysing and processing the same would question my technical capabilities which I am sure will get enhanced as the project reaches its endpoint. One of the methodologies that we will perform to fit the data is Exploratory data analysis in which we will consider looking into all the attributes and their contribution in predicting the price of the property as well as if the combinations of selected attributes in the data give our whole analysis a deeper meaning. Some of the basic steps in doing the EDA include eliminating null values, normalizing the data, checking for multi-collinearity, and depending on the model’s performance we will further plan on what modifications are to be done to make sure we are getting accurate results. Talking about the source of the data, I have found the CSV file on the website of the company and the data represents the city called Berlin in Germany. The primary goal behind opting for this as my capstone project is that it pretty much covers all the technologies and technical aspects of what I was taught throughout the master’s program.

Biographical Information about Author(s)

I have had a great interest in playing with the data and getting to know the valuable insights that it carries. With this Intention, I tried to explore Airbnb's data set that listed its properties in berlin city. My primary goal was to predict the price of the listings. To achieve this I have built a regression model.

Cpa.docx (14 kB)

This document is currently not available here.