| name | geopandas |
| description | Open source project to make working with geospatial data in python easier. Extends the datatypes used by pandas to allow spatial operations on geometric types. Built on top of Shapely, Fiona, and Pyproj. Use for reading and writing spatial formats (Shapefile, GeoJSON, GeoPackage, KML), performing spatial joins, coordinate system transformations (reprojecting), geometric analysis (buffers, centroids, convex hulls), thematic mapping (Choropleth maps), calculating spatial relationships (contains, overlaps, touches, within), working with OpenStreetMap data or satellite-derived vector data. |
| version | 0.14 |
| license | BSD-3-Clause |
GeoPandas - Geospatial Data Analysis
GeoPandas enables you to perform spatial joins, geometric manipulations, and coordinate transformations using the familiar Pandas API. It treats "geometry" as just another column in a DataFrame, but one that knows how to calculate areas, distances, and intersections.
When to Use
- Reading and writing spatial formats (Shapefile, GeoJSON, GeoPackage, KML).
- Performing spatial joins (e.g., "which points fall inside this polygon?").
- Coordinating system transformations (reprojecting from Lat/Lon to Meters).
- Geometric analysis (calculating buffers, centroids, convex hulls).
- Thematic mapping (Choropleth maps).
- Calculating spatial relationships (contains, overlaps, touches, within).
- Working with OpenStreetMap data or satellite-derived vector data.
Reference Documentation
Official docs: https://geopandas.org/
Interactive tutorials: https://geopandas.org/en/stable/gallery/index.html
Search patterns: gpd.read_file, gdf.to_crs, gpd.sjoin, gdf.buffer, gdf.explore
Core Principles
The GeoDataFrame
A GeoDataFrame is a pandas.DataFrame that has at least one GeoSeries column (usually named geometry). Each row represents a feature (point, line, or polygon).
Coordinate Reference Systems (CRS)
Data without a CRS is just numbers on a grid. To perform real-world calculations (like area in km²), you must define the CRS (e.g., WGS84 - EPSG:4326 or UTM).
Predicates and Set Operations
Spatial analysis relies on binary predicates (intersects, within, contains) and set-theoretic operations (union, intersection, difference).
Quick Reference
Installation
pip install geopandas pyarrow pyproj fiona shapely
Standard Imports
import geopandas as gpd
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from shapely.geometry import Point, LineString, Polygon
Basic Pattern - Load and Plot
import geopandas as gpd
world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
europe = world[world.continent == 'Europe']
europe = europe.to_crs(epsg=3035)
europe.plot(column='pop_est', legend=True, cmap='viridis')
Critical Rules
✅ DO
- Always check the CRS - Verify
gdf.crs before any spatial operation.
- Project for measurements - Use a projected CRS (meters/feet) like UTM before calculating area or distance.
- Use Spatial Indexing - For large datasets, use
gdf.sindex or ensure sjoin is used to speed up queries.
- Validate Geometries - Use
gdf.is_valid to find broken polygons (self-intersections).
- Simplify for visualization - Use
gdf.simplify() to speed up plotting of complex borders.
- Use .explore() - For quick interactive maps in Jupyter (uses Leaflet/Folium).
❌ DON'T
- Measure Area in Degrees - Never calculate
.area on a Lat/Lon CRS (EPSG:4326). The result will be in "square degrees" (meaningless).
- Iterate with loops - Avoid looping over rows; use vectorized spatial operations.
- Ignore Topology - Be aware that "touches" and "intersects" are different (boundary vs. interior).
- Forget to set the Active Geometry - If a GeoDataFrame has multiple geometry columns, specify which one to use via
gdf.set_geometry().
Anti-Patterns (NEVER)
def dist(p1, p2):
return np.sqrt((p1.x - p2.x)**2 + (p1.y - p2.y)**2)
gdf = gdf.to_crs(epsg=3857)
distances = gdf.distance(other_point)
for i, poly in countries.iterrows():
for j, pt in cities.iterrows():
if poly.geometry.contains(pt.geometry):
print("Found")
cities_with_country = gpd.sjoin(cities, countries, predicate='within')
Geometry Creation and Manipulation
Creating from Coordinates
df = pd.DataFrame({'City': ['NY', 'London'], 'Lat': [40.7, 51.5], 'Lon': [-74.0, -0.1]})
gdf = gpd.GeoDataFrame(df, geometry=gpd.points_from_xy(df.Lon, df.Lat))
gdf.set_crs(epsg=4326, inplace=True)
Geometric Operations
gdf['centroid'] = gdf.centroid
gdf['envelope'] = gdf.envelope
stations_buffered = metro_stations.to_crs(epsg=32633).buffer(500)
total_area = gdf.union_all()
Spatial Queries
Spatial Joins (sjoin)
schools_in_districts = gpd.sjoin(schools, districts, how="inner", predicate="within")
Overlays (Set Operations)
forest_in_park = gpd.overlay(forests, parks, how='intersection')
Coordinate Reference Systems (CRS)
Reprojection
print(gdf.crs)
gdf_meters = gdf.to_crs(epsg=3857)
gdf_2 = gdf_2.to_crs(gdf_1.crs)
Visualization
Static and Interactive Maps
fig, ax = plt.subplots(figsize=(10, 10))
base = countries.plot(ax=ax, color='white', edgecolor='black')
cities.plot(ax=base, marker='o', color='red', markersize=5)
cities.explore(column='population', cmap='magma', m=None)
Practical Workflows
1. Proximity Analysis (Point-in-Buffer)
def find_entities_near_road(roads, entities, distance_m=1000):
"""Find all entities within 1km of any road."""
roads_m = roads.to_crs(epsg=3857)
entities_m = entities.to_crs(epsg=3857)
road_buffer = roads_m.buffer(distance_m)
buffer_gdf = gpd.GeoDataFrame(geometry=road_buffer, crs=roads_m.crs)
nearby = gpd.sjoin(entities_m, buffer_gdf, predicate='within')
return nearby
2. Clipping Data to a Boundary
def clip_data(data, boundary):
"""Clip a large vector dataset to a specific boundary polygon."""
return gpd.clip(data, boundary)
3. Calculating Percentage Area Coverage
def calculate_land_use_pct(region, land_use_layer):
"""Calculate what % of 'region' is covered by each land use type."""
land_use_layer = land_use_layer.to_crs(region.crs)
intersections = gpd.overlay(land_use_layer, region, how='intersection')
intersections['area'] = intersections.area
total_area = region.area.sum()
return intersections.groupby('class')['area'].sum() / total_area * 100
Performance Optimization
Using Spatial Index (sindex)
spatial_index = countries.sindex
possible_matches_index = list(spatial_index.intersection(target_point.bounds))
possible_matches = countries.iloc[possible_matches_index]
precise_match = possible_matches[possible_matches.intersects(target_point)]
Reading Large Files (Parquet)
gdf.to_parquet("large_data.parquet")
gdf = gpd.read_parquet("large_data.parquet")
Common Pitfalls and Solutions
CRS Mismatch
if cities.crs != districts.crs:
cities = cities.to_crs(districts.crs)
Invalid Geometries (Self-intersections)
invalid = gdf[~gdf.is_valid]
gdf['geometry'] = gdf['geometry'].buffer(0)
Memory Exhaustion with Buffers
gdf.buffer(100, resolution=4)
GeoPandas bridges the gap between traditional GIS software and the Python data science stack. It makes spatial analysis as easy as writing a line of Pandas code.