Task: Merge GPS logs and weather data. Source: bus_gps_logs_raw.csv (4.5M rows), historical_weather.csv (1K rows) Date: Week 3, Day 2 Steps:

  1. Handle Null Values: Dropped ~1% of rows in bus_gps_logs_raw.csv where bus_id was null.
  2. Timestamp Standardization: Converted all timestamp columns to a single, consistent format (YYYY-MM-DD HH:MM:SS) to enable merging.
  3. Merge Data: Joined the two datasets on the nearest timestamp to the GPS log entry, effectively linking bus delays to weather conditions at that time.
  4. Calculate Metrics: Created a new column, delay_minutes, by subtracting the scheduled arrival time from the actual arrival time. Result: A new, clean dataset merged_bus_data.csv with all the necessary fields for analysis.