| name | numpy |
| description | This skill should be used when the user asks to "use NumPy", "write NumPy code", "optimize NumPy arrays", "vectorize with NumPy", or needs guidance on NumPy best practices, array operations, broadcasting, memory management, or scientific computing with Python. |
NumPy Best Practices
NumPy is the fundamental package for scientific computing with Python. It provides N-dimensional array objects, vectorized math operations, broadcasting, linear algebra, Fourier transforms, and random number generation. This skill covers best practices for writing correct, efficient, and maintainable NumPy code.
Import Convention
Always import NumPy with the standard alias:
import numpy as np
Never use from numpy import * — it pollutes the namespace and makes code harder to read.
Array Creation
Choose the right creation function
| Use case | Function |
|---|
| Known values | np.array([1, 2, 3]) |
| Zeros | np.zeros(shape) |
| Ones | np.ones(shape) |
| Uninitialized (fill later) | np.empty(shape) |
| Integer range | np.arange(start, stop, step) |
| Evenly spaced floats | np.linspace(start, stop, num) |
| Identity matrix | np.eye(n) |
| Like existing array | np.zeros_like(arr), np.ones_like(arr) |
Specify dtype explicitly
Always specify dtype when the intended type differs from NumPy's default (float64 for floats, int64 for integers):
weights = np.ones(1000, dtype=np.float32)
indices = np.arange(100, dtype=np.int32)
flags = np.zeros(50, dtype=np.bool_)
Do not rely on implicit upcasting — declare the dtype the data actually needs.
Use np.random.default_rng() for random numbers
The legacy np.random.* functions (e.g., np.random.rand) are deprecated in favour of the Generator API:
rng = np.random.default_rng(seed=42)
samples = rng.normal(loc=0.0, scale=1.0, size=(100, 3))
integers = rng.integers(0, 10, size=50)
np.random.seed(42)
samples = np.random.randn(100, 3)
Pass seed to default_rng for reproducibility in tests and experiments.
Vectorization Over Loops
Replace Python loops with vectorized NumPy operations wherever possible. NumPy operations execute in optimized C code, making them orders of magnitude faster.
result = []
for x in data:
result.append(x ** 2 + 2 * x + 1)
result = np.array(result)
result = data ** 2 + 2 * data + 1
Use np.vectorize only as a convenience wrapper for scalar functions — it does not improve performance since it still calls Python per element.
Broadcasting Rules
Broadcasting allows operations on arrays of different shapes without copying data. Apply broadcasting instead of explicit tile or repeat calls.
Broadcasting rules (trailing dimensions are compared):
- Dimensions are equal — compatible.
- One dimension is 1 — that dimension is stretched.
- Otherwise —
ValueError.
matrix = np.zeros((4, 3))
bias = np.array([1.0, 2.0, 3.0])
result = matrix + bias
a = np.array([0.0, 10.0, 20.0])
b = np.array([1.0, 2.0, 3.0])
outer = a[:, np.newaxis] + b
Avoid broadcasting that produces very large intermediate arrays — use an explicit loop for memory-constrained cases.
Views vs Copies
Basic indexing (slices) returns a view — modifying it modifies the original:
x = np.arange(10)
y = x[2:5]
y[0] = 99
Advanced indexing (integer arrays, boolean masks) returns a copy:
x = np.arange(10)
idx = [1, 3, 5]
y = x[idx]
Check ownership with arr.base:
y.base is None
y.base is x
When to force a copy
Call .copy() explicitly when an independent array is needed:
backup = original.copy()
Use .ravel() (view when possible) over .flatten() (always copies) when write access to the parent is acceptable. Use reshape(-1) as the most reliable way to get a flat view.
Indexing and Selection
Boolean indexing for filtering
arr = np.array([1, -2, 3, -4, 5])
positive = arr[arr > 0]
arr[arr < 0] = 0
np.where for conditional selection
cleaned = np.where(arr > 0, arr, 0)
Avoid loops for aggregations
Use axis-aware aggregation functions instead of looping over rows or columns:
matrix = np.arange(12).reshape(3, 4)
row_sums = matrix.sum(axis=1)
col_max = matrix.max(axis=0)
Data Types and Precision
Choose the smallest sufficient dtype
| Scenario | Recommended dtype |
|---|
| ML model weights | np.float32 |
| High-precision scientific | np.float64 |
| Small integer counts (<32 k) | np.int16 |
| Large integer counts | np.int32 or np.int64 |
| Boolean flags | np.bool_ |
| Complex numbers | np.complex64 or np.complex128 |
Use arr.astype(np.float32, copy=False) to cast in-place when the data is already the right type — copy=False avoids an unnecessary allocation.
Watch for integer overflow
NumPy integer arithmetic wraps silently:
x = np.array([200], dtype=np.int8)
x + 100
Cast to a wider type before operations that risk overflow.
Saving and Loading Arrays
| Format | Function | Use case |
|---|
| Single array (binary) | np.save / np.load | Fast, preserves dtype and shape |
| Multiple arrays | np.savez / np.savez_compressed | Archive multiple arrays |
| Text (CSV etc.) | np.savetxt / np.loadtxt | Human-readable interchange |
np.save("data.npy", arr)
arr_loaded = np.load("data.npy")
np.savez("dataset.npz", X=X_train, y=y_train)
npz = np.load("dataset.npz")
X_train = npz["X"]
Prefer .npy/.npz over text formats for large arrays — binary I/O is faster and lossless.
Reshaping and Shape Manipulation
Use -1 as a wildcard dimension — NumPy infers the correct size:
flat = arr.reshape(-1)
col = arr.reshape(-1, 1)
row = arr.reshape(1, -1)
Use np.newaxis (equivalent to None) to insert a dimension for broadcasting:
a = np.array([1, 2, 3])
a_col = a[:, np.newaxis]
a_row = a[np.newaxis, :]
Linear Algebra
Use np.linalg for matrix operations:
A = np.array([[1, 2], [3, 4]], dtype=np.float64)
C = A @ B
vals, vecs = np.linalg.eig(A)
inv_A = np.linalg.inv(A)
rank = np.linalg.matrix_rank(A)
det = np.linalg.det(A)
x = np.linalg.solve(A, b)
Never use np.matrix — it is deprecated. Use 2-D ndarray with @ instead.
Quick Reference
| Task | Idiomatic code |
|---|
| Import | import numpy as np |
| Array from list | np.array([1, 2, 3]) |
| Shape / ndim / size | arr.shape, arr.ndim, arr.size |
| Reshape | arr.reshape(rows, -1) |
| Flatten (view) | arr.reshape(-1) or arr.ravel() |
| Flatten (copy) | arr.flatten() |
| Transpose | arr.T or arr.transpose() |
| Boolean mask | arr[arr > 0] |
| Axis aggregation | arr.sum(axis=0) |
| Matrix multiply | A @ B |
| Copy | arr.copy() |
| Check view | arr.base is not None |
| Cast dtype | arr.astype(np.float32, copy=False) |
| Random (modern) | np.random.default_rng(seed) |
| Save / load | np.save / np.load |
Additional Resources
Reference Files
For deeper guidance, consult:
references/performance-and-memory.md — Vectorization patterns, memory layout, dtype selection, profiling, and avoiding common performance traps
references/array-operations.md — Broadcasting in depth, advanced indexing, ufuncs, structured arrays, and I/O patterns