Analyzing Location Data with Python: A Streamlit Application for Visualizing Vacation Destinations

Understanding the places we've visited over time can provide fascinating insights into our travel habits, favorite destinations, and the extent of our adventures. This blog post delves into the concept of analyzing geolocation data using a Python-based application built with Streamlit, showcasing how to turn raw data into interactive visual insights.

In this article, we will explore a Python script that reads location history data (from JSON files), processes it, and then visualizes the results on a map. We’ll also discuss the key concepts and techniques involved, including working with location-based APIs, handling data with Pandas, and leveraging Streamlit for creating interactive web applications.

Extracting Your Location Data from Google Takeout

Google Takeout is a service that allows you to export various types of data from your Google account, including your location history. Here’s a brief guide on how to extract your location data:

Visit Google Takeout: Go to Google Takeout.
Select Data to Export: Choose only the "Location History" option from the list of Google services.
Choose the Export Format: Opt for a JSON file as the export format.
Download the Data: Once the export is ready, download the zip file containing your location history data.

After extracting the zip file, you will find a JSON file that contains detailed information about your movements, which you can then use with the Python script presented here.

The Core Idea: Analyzing Travel History

The script's primary goal is to analyze and visualize location history data. Users can upload a file containing their location history, specify a distance threshold, and filter the results by date. The application processes this data to extract meaningful information about the locations visited and then displays the results in a table and on a map.

Loading Location Data

The first step in this application is to load the location history data from a JSON file. This is achieved through the load_location_history function, which reads the file and returns a list of timeline objects.

def load_location_history(file_path):
    with open(file_path, 'r', encoding='utf-8') as file:
        data = json.load(file)
    return data['timelineObjects']

This function is straightforward but critical as it serves as the foundation for the rest of the application. The data is expected to be in a specific format, with each timeline object containing information about a place visit, including its latitude, longitude, and timestamp.

Geocoding with OpenStreetMap

One interesting aspect of the script is its use of the Nominatim API from OpenStreetMap to reverse-geocode latitude and longitude into a human-readable address (i.e., city and country). This is done in the get_location_info function.

def get_location_info(lat, lon):
    url = f"https://nominatim.openstreetmap.org/reverse?format=json&lat={lat}&lon={lon}"
    headers = {'User-Agent': 'VacationAnalyzer/1.0'}
    response = requests.get(url, headers=headers)
    time.sleep(1)  # Respect API rate limits
    
    if response.status_code == 200:
        data = response.json()
        address = data.get('address', {})
        city = address.get('city') or address.get('town') or address.get('village') or 'Unknown'
        country = address.get('country', 'Unknown')
        return city, country
    else:
        return 'Unknown', 'Unknown'

This method uses the latitude and longitude coordinates to fetch the corresponding city and country from the API. The use of time.sleep(1) ensures that the script respects the API's rate limits, which is a good practice to prevent getting blocked by the service.

Filtering and Analyzing Data

Once the location data is loaded, the next step is to analyze it, filtering out locations based on the user-defined distance threshold and avoiding duplicates. The analyze_trips function handles this by comparing each place visit's distance from the user's home location.

def analyze_trips(timeline_objects, home_lat, home_lon, existing_trips):
    trips = []
    existing_locations_dates = {(trip['lat'], trip['lon'], trip['date']) for trip in existing_trips}

    for obj in timeline_objects:
        if 'placeVisit' in obj:
            try:
                place = obj['placeVisit']
                location = place['location']
                lat = location['latitudeE7'] / 1e7
                lon = location['longitudeE7'] / 1e7
                start_time = datetime.fromisoformat(place['duration']['startTimestamp'].replace('Z', '+00:00'))
                start_date_str = start_time.date().strftime('%Y-%m-%d')
    
                if (lat, lon, start_date_str) in existing_locations_dates:
                    continue
    
                distance_from_home = calculate_distance(lat, lon, home_lat, home_lon)
    
                if distance_from_home > distance_threshold:
                    city, country = get_location_info(lat, lon)
                    google_maps_link = f"https://www.google.com/maps/search/?api=1&query={lat},{lon}"
                    trips.append({
                        'date': start_date_str,
                        'lat': lat,
                        'lon': lon,
                        'city': city,
                        'country': country,
                        'google_maps_link': google_maps_link,
                        'distance_from_home': distance_from_home
                    })
            except Exception as e:
                pass
    return trips

This function checks if a place visit has already been recorded to avoid duplicates, filters locations by distance, and gathers all relevant information for each trip. The calculate_distance function is used to determine the distance between the user's home and the visited location using a simple Euclidean distance approximation, which is suitable for short distances but could be improved for more accuracy over larger distances.

Visualizing the Data

Once the data is filtered and processed, it’s time to visualize it. This script uses Folium to create an interactive map, adding markers for each location the user visited.

m = folium.Map(location=[home_lat, home_lon], zoom_start=4)

folium.Marker(
    [home_lat, home_lon],
    popup="Home",
    icon=folium.Icon(color="green", icon="home"),
).add_to(m)

for _, row in filtered_df.iterrows():
    folium.Marker(
        [row['lat'], row['lon']],
        popup=f"{row['city']}, {row['country']} - {row['date']}",
        icon=folium.Icon(color="red", icon="info-sign"),
    ).add_to(m)

This block of code initializes the map centered around the user's home location and then iteratively adds markers for each location. The result is a visually appealing and interactive map that users can explore to see the extent of their travels.

Conclusion

This Python script exemplifies how to use various libraries and APIs to analyze and visualize personal geolocation data. The key takeaways include using APIs for geocoding, applying filtering techniques to clean data, and employing visualization tools like Folium to present insights in an engaging manner.

By leveraging tools like Streamlit, this script not only processes data but also provides an interactive interface for users, making it easier to explore and understand their travel history. This approach can be expanded to more complex analyses or integrated with other datasets to provide even richer insights.

Analyzing Location Data with Python: A Streamlit Application for Visualizing Vacation Destinations