| name | geopandas |
| description | Python library for working with geospatial vector data including shapefiles, GeoJSON, and GeoPackage files. Use when working with geographic data for spatial analysis, geometric operations, coordinate transformations, spatial joins, overlay operations, choropleth mapping, or any task involving reading/writing/analyzing vector geographic data. Supports PostGIS databases, interactive maps, and integration with matplotlib/folium/cartopy. Use for tasks like buffer analysis, spatial joins between datasets, dissolving boundaries, clipping data, calculating areas/distances, reprojecting coordinate systems, creating maps, or converting between spatial file formats. |
| license | BSD-3-Clause license |
| metadata | {"skill-author":"K-Dense Inc."} |
GeoPandas
GeoPandas extends pandas to enable spatial operations on geometric types. It combines the capabilities of pandas and shapely for geospatial data analysis.
Installation
uv pip install geopandas
Optional Dependencies
uv pip install folium
uv pip install mapclassify
uv pip install pyarrow
uv pip install psycopg2
uv pip install geoalchemy2
uv pip install contextily
uv pip install cartopy
Quick Start
import geopandas as gpd
gdf = gpd.read_file("data.geojson")
print(gdf.head())
print(gdf.crs)
print(gdf.geometry.geom_type)
gdf.plot()
gdf_projected = gdf.to_crs("EPSG:3857")
gdf_projected['area'] = gdf_projected.geometry.area
gdf.to_file("output.gpkg")
Core Concepts
Data Structures
- GeoSeries: Vector of geometries with spatial operations
- GeoDataFrame: Tabular data structure with geometry column
See data-structures.md for details.
Reading and Writing Data
GeoPandas reads/writes multiple formats: Shapefile, GeoJSON, GeoPackage, PostGIS, Parquet.
gdf = gpd.read_file("data.gpkg", bbox=(xmin, ymin, xmax, ymax))
gdf.to_file("output.gpkg", use_arrow=True)
See data-io.md for comprehensive I/O operations.
Coordinate Reference Systems
Always check and manage CRS for accurate spatial operations:
print(gdf.crs)
gdf_projected = gdf.to_crs("EPSG:3857")
gdf = gdf.set_crs("EPSG:4326")
See crs-management.md for CRS operations.
Common Operations
Geometric Operations
Buffer, simplify, centroid, convex hull, affine transformations:
buffered = gdf.geometry.buffer(10)
simplified = gdf.geometry.simplify(tolerance=5, preserve_topology=True)
centroids = gdf.geometry.centroid
See geometric-operations.md for all operations.
Spatial Analysis
Spatial joins, overlay operations, dissolve:
joined = gpd.sjoin(gdf1, gdf2, predicate='intersects')
nearest = gpd.sjoin_nearest(gdf1, gdf2, max_distance=1000)
intersection = gpd.overlay(gdf1, gdf2, how='intersection')
dissolved = gdf.dissolve(by='region', aggfunc='sum')
See spatial-analysis.md for analysis operations.
Visualization
Create static and interactive maps:
gdf.plot(column='population', cmap='YlOrRd', legend=True)
gdf.explore(column='population', legend=True).save('map.html')
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
gdf1.plot(ax=ax, color='blue')
gdf2.plot(ax=ax, color='red')
See visualization.md for mapping techniques.
Detailed Documentation
Common Workflows
Load, Transform, Analyze, Export
gdf = gpd.read_file("data.shp")
print(gdf.crs)
gdf = gdf.to_crs("EPSG:3857")
gdf['area'] = gdf.geometry.area
buffered = gdf.copy()
buffered['geometry'] = gdf.geometry.buffer(100)
gdf.to_file("results.gpkg", layer='original')
buffered.to_file("results.gpkg", layer='buffered')
Spatial Join and Aggregate
points_in_polygons = gpd.sjoin(points_gdf, polygons_gdf, predicate='within')
aggregated = points_in_polygons.groupby('index_right').agg({
'value': 'sum',
'count': 'size'
})
result = polygons_gdf.merge(aggregated, left_index=True, right_index=True)
Multi-Source Data Integration
roads = gpd.read_file("roads.shp")
buildings = gpd.read_file("buildings.geojson")
parcels = gpd.read_postgis("SELECT * FROM parcels", con=engine, geom_col='geom')
buildings = buildings.to_crs(roads.crs)
parcels = parcels.to_crs(roads.crs)
buildings_near_roads = buildings[buildings.geometry.distance(roads.union_all()) < 50]
Performance Tips
- Use spatial indexing: GeoPandas creates spatial indexes automatically for most operations
- Filter during read: Use
bbox, mask, or where parameters to load only needed data
- Use Arrow for I/O: Add
use_arrow=True for 2-4x faster reading/writing
- Simplify geometries: Use
.simplify() to reduce complexity when precision isn't critical
- Batch operations: Vectorized operations are much faster than iterating rows
- Use appropriate CRS: Projected CRS for area/distance, geographic for visualization
Best Practices
- Always check CRS before spatial operations
- Use projected CRS for area and distance calculations
- Match CRS before spatial joins or overlays
- Validate geometries with
.is_valid before operations
- Use
.copy() when modifying geometry columns to avoid side effects
- Preserve topology when simplifying for analysis
- Use GeoPackage format for modern workflows (better than Shapefile)
- Set max_distance in sjoin_nearest for better performance