com um clique
benchmark
// Run performance benchmarks for transform changes. Use when the user asks to benchmark, measure performance, compare speed, or when changes affect apply methods, functional layer, get_params, or core pipeline code.
// Run performance benchmarks for transform changes. Use when the user asks to benchmark, measure performance, compare speed, or when changes affect apply methods, functional layer, get_params, or core pipeline code.
Full checklist for adding a new transform to AlbumentationsX. Use when the user asks to add, implement, or create a new transform/augmentation.
Quality bar for docstrings in albumentations. Use when writing or updating docstrings in albumentations/, especially for transforms and public APIs.
Use the repo `_internal/` directory for anything that must not be committed — scratch files, temporary outputs, local demos, Codex artifacts, or one-off scripts. Use when creating temp files, debug dumps, or local-only tooling during a task.
Policy for AlbumentationsX transforms that combine multiple images or objects. Use when implementing, reviewing, or using Mosaic, CopyAndPaste, OverlayElements, HistogramMatching, PixelDistributionAdaptation, or other mixing transforms.
Generate release notes for AlbumentationsX. Use when the user asks to prepare, draft, or write release notes for a new version (e.g. "prepare release notes for 2.x.y", "draft release X").
Run the full shared Codex review checklist against a transform. Use when the user asks to review, audit, or check a transform for correctness, performance, or API consistency.
| name | benchmark |
| description | Run performance benchmarks for transform changes. Use when the user asks to benchmark, measure performance, compare speed, or when changes affect apply methods, functional layer, get_params, or core pipeline code. |
Any change touching apply_*, functional.py, get_params, get_params_dependent_on_data, composition.py, or transforms_interface.py must include benchmark results.
Always benchmark all 9 combinations:
| Size | Channels | Use case |
|---|---|---|
| 256×256 | 1 | Grayscale classification |
| 256×256 | 3 | RGB classification |
| 256×256 | 5 | Multispectral |
| 512×512 | 1 | Depth maps |
| 512×512 | 3 | Detection/segmentation (YOLO, U-Net) |
| 512×512 | 5 | Multispectral segmentation |
| 1024×1024 | 1 | Medical imaging |
| 1024×1024 | 3 | High-res segmentation |
| 1024×1024 | 5 | Satellite imagery |
Skip channel counts the transform explicitly doesn't support. Always include the channel axis: grayscale inputs are
(H, W, 1), not (H, W).
If the optimization changes dtype conversion or a @uint8_io / @float32_io wrapped function, benchmark the hot dtype
and add correctness tests for the other supported dtype. For example, a uint8-only speedup in a @uint8_io function
still needs a float32 regression test that verifies wrapper round-tripping.
import timeit
import numpy as np
SIZES = {"small": (256, 256), "medium": (512, 512), "large": (1024, 1024)}
CHANNELS = [1, 3, 5]
N = 100
for size_name, (h, w) in SIZES.items():
for ch in CHANNELS:
shape = (h, w, ch)
img = np.random.randint(0, 256, shape, dtype=np.uint8)
old_t = timeit.timeit(lambda img=img: old_func(img, **params), number=N)
new_t = timeit.timeit(lambda img=img: new_func(img, **params), number=N)
print(f"{size_name} {h}x{w}x{ch}: old={old_t:.4f}s new={new_t:.4f}s speedup={old_t/new_t:.2f}x")
import timeit
import numpy as np
import albumentations as A
SIZES = {"small": (256, 256), "medium": (512, 512), "large": (1024, 1024)}
CHANNELS = [1, 3, 5]
transform = A.Compose([A.YourTransform(p=1.0)])
for size_name, (h, w) in SIZES.items():
for ch in CHANNELS:
shape = (h, w, ch)
img = np.random.randint(0, 256, shape, dtype=np.uint8)
t = timeit.timeit(lambda img=img: transform(image=img), number=100)
print(f"{size_name} {h}x{w}x{ch}: {t:.4f}s (100 calls)")
main / original code, save output to a JSON fileSave benchmark results as JSON for automated comparison:
import json
results = {}
for transform_name, (h, w), ch, elapsed in all_results:
key = f"{transform_name}_{h}x{w}x{ch}"
results[key] = {"time": elapsed, "iterations": N}
with open("benchmark_results.json", "w") as f:
json.dump(results, f, indent=2)
import json
with open("bench_old.json") as f:
old = json.load(f)
with open("bench_new.json") as f:
new = json.load(f)
for key in sorted(old):
if key in new:
speedup = old[key]["time"] / new[key]["time"]
indicator = "FASTER" if speedup > 1.05 else "SLOWER" if speedup < 0.95 else "SAME"
print(f"{key}: {old[key]['time']:.4f}s -> {new[key]['time']:.4f}s {speedup:.2f}x {indicator}")
Benchmark (uint8, 100 iterations):
Function direct:
256x256x1 — Before: 0.0200s After: 0.0100s Speedup: 2.00x
256x256x3 — Before: 0.0500s After: 0.0300s Speedup: 1.67x
...
Compose single:
256x256x1 — 0.0120s
256x256x3 — 0.0340s
...
When benchmarking batch optimizations (kernel pre-computation, 4D indexing, pre-allocated loops):
import timeit
import numpy as np
import albumentations as A
BATCH_SIZES = [4, 8, 16]
SIZES = {"small": (256, 256), "medium": (512, 512)}
transform = A.Compose([A.YourTransform(p=1.0)])
for batch_size in BATCH_SIZES:
for size_name, (h, w) in SIZES.items():
# Grayscale batch — benefits from reshape trick
images = [np.random.randint(0, 256, (h, w, 1), dtype=np.uint8) for _ in range(batch_size)]
t = timeit.timeit(lambda: transform(images=images), number=50)
print(f"batch={batch_size} {size_name} {h}x{w}x1: {t:.4f}s")
# RGB batch — baseline
images_rgb = [np.random.randint(0, 256, (h, w, 3), dtype=np.uint8) for _ in range(batch_size)]
t = timeit.timeit(lambda: transform(images=images_rgb), number=50)
print(f"batch={batch_size} {size_name} {h}x{w}x3: {t:.4f}s")
(H,W,C), image batches (N,H,W,C), volumes (D,H,W,C),
volume batches (N,D,H,W,C)