Nshuti David
March 12, 2025
Have you ever wondered how companies determine the price of used heavy machinery like bulldozers? Just like real estate or used cars, bulldozer prices fluctuate over time based on various factors such as age, condition, and market demand.
In this blog, we’ll walk through a machine learning project that tackles this challenge—predicting the future sale price of bulldozers using historical auction data. This is a time series regression problem, meaning we are using past data to predict future values.
By the end of this article, you’ll understand:
✅ How time series regression works in pricing prediction.
✅ The importance of feature engineering for structured datasets.
✅ How machine learning models can provide valuable business insights.
For businesses in the construction and heavy equipment industry, accurately predicting the resale value of machines can lead to better investment decisions, optimized pricing strategies, and improved financial planning.
Since bulldozer prices change over time, this is a time series regression problem. Instead of just predicting a single value, we analyze historical data and learn from past trends to estimate future prices.
For this project, we used data from the Kaggle Bluebook for Bulldozers competition:
🔗 Kaggle Dataset
Train.csv
→ Contains historical sales data until the end of 2011.Valid.csv
→ Contains bulldozer sales from Jan 2012 – Apr 2012 (used for validation).Test.csv
→ Contains sales from May 2012 – Nov 2012 (used for final predictions).📅 Sale Date – Helps track seasonal trends in bulldozer pricing.
🚜 Machine Attributes – Brand, model, manufacturing year, condition.
💰 Sale Price – The target variable we aim to predict.
Since this is a time series problem, we also extract date-based features such as the sale year, sale month, and whether the sale happened during a peak season.
Before training our model, we perform data cleaning and transformation, which includes:
✔ Handling missing values in the dataset.
✔ Converting categorical variables into numerical representations.
✔ Extracting useful features from timestamps.
We experiment with different regression models:
We use Root Mean Squared Log Error (RMSLE) as the evaluation metric:
🔗 More on RMSLE
💡 Why RMSLE?
📈 Key Takeaways from Our Model:
✔ The year of manufacture has a strong impact on bulldozer prices.
✔ Older machines depreciate over time, but some brands hold their value better.
✔ Certain auction houses sell machines at higher/lower prices due to demand differences.
🔹 Final Model Performance:
The Random Forest Regressor provided the best results, achieving a lower RMSLE than other models.
The techniques used in this project can be applied to many industries:
Predicting house prices based on location, size, and market trends.
Using historical stock prices to predict future market trends.
Determining the resale value of second-hand vehicles based on age and mileage.
Forecasting future crop prices using weather conditions and past data.
In all these cases, time series regression helps businesses make data-driven decisions and optimize pricing strategies.
This project showcases how machine learning can be used for predictive pricing in the heavy equipment industry. By applying time series regression and feature engineering, we built a model that can accurately forecast bulldozer prices.
✅ Want to explore the full project?
🔗 GitHub Repository
🚀 Stay tuned for more machine learning insights!
Our mission is to help organizations across Rwanda and East Africa unlock the power of data and technology to make smarter decisions, automate processes, and achieve sustainable growth.
Do not want to miss any news, updates, notice or any offer on our products, then please subscribe to our mailing list.
Copyright by Q-TECH LTD. All rights reserved.