2.2 Data Cleaning Log
Task: Merge GPS logs and weather data.
Source: bus_gps_logs_raw.csv
(4.5M rows), historical_weather.csv
(1K rows)
Date: Week 3, Day 2
Steps:
- Handle Null Values: Dropped ~1% of rows in
bus_gps_logs_raw.csv
wherebus_id
was null. - Timestamp Standardization: Converted all timestamp columns to a single, consistent format (YYYY-MM-DD HH:MM:SS) to enable merging.
- Merge Data: Joined the two datasets on the nearest timestamp to the GPS log entry, effectively linking bus delays to weather conditions at that time.
- Calculate Metrics: Created a new column,
delay_minutes
, by subtracting the scheduled arrival time from the actual arrival time. Result: A new, clean datasetmerged_bus_data.csv
with all the necessary fields for analysis.
Navigation
- ← Previous: Hand-Drawn Sketch Log
- 🏠 Stage 2: Design Phase Overview
- → Next: Digital Sketch Notes
- 📚 Case Study: Transport Example