一键导入
add-transform
// Full checklist for adding a new transform to AlbumentationsX. Use when the user asks to add, implement, or create a new transform/augmentation.
// Full checklist for adding a new transform to AlbumentationsX. Use when the user asks to add, implement, or create a new transform/augmentation.
Run performance benchmarks for transform changes. Use when the user asks to benchmark, measure performance, compare speed, or when changes affect apply methods, functional layer, get_params, or core pipeline code.
Quality bar for docstrings in albumentations. Use when writing or updating docstrings in albumentations/, especially for transforms and public APIs.
Use the repo `_internal/` directory for anything that must not be committed — scratch files, temporary outputs, local demos, Codex artifacts, or one-off scripts. Use when creating temp files, debug dumps, or local-only tooling during a task.
Policy for AlbumentationsX transforms that combine multiple images or objects. Use when implementing, reviewing, or using Mosaic, CopyAndPaste, OverlayElements, HistogramMatching, PixelDistributionAdaptation, or other mixing transforms.
Generate release notes for AlbumentationsX. Use when the user asks to prepare, draft, or write release notes for a new version (e.g. "prepare release notes for 2.x.y", "draft release X").
Run the full shared Codex review checklist against a transform. Use when the user asks to review, audit, or check a transform for correctness, performance, or API consistency.
| name | add-transform |
| description | Full checklist for adding a new transform to AlbumentationsX. Use when the user asks to add, implement, or create a new transform/augmentation. |
Follow this checklist in order. Do not skip steps.
Put the transform in the most specific matching subpackage:
albumentations/augmentations/geometric/ — spatial transforms (flip, rotate, warp, etc.)albumentations/augmentations/pixel/ — pixel-level (color, brightness, noise, etc.)albumentations/augmentations/dropout/ — masking/dropoutalbumentations/augmentations/blur/ — blurringalbumentations/augmentations/crops/ — croppingalbumentations/augmentations/mixing/ — multi-image mixingalbumentations/augmentations/transforms3d/ — 3D/volumealbumentations/augmentations/other/ — everything elseAdd the pure function in the corresponding functional.py file (no class state, no RNG):
def my_transform(img: np.ndarray, param1: float, param2: int) -> np.ndarray:
...
np.ndarray, return np.ndarrayget_params / get_params_dependent_on_datacv2 over numpy for performance (see benchmarking rules)cv2.LUT for lookup-based pixel ops (fastest)@uint8_io / @float32_io decorators if dtype conversion is neededapply or apply_to_* methods; the transform class docstring and transforms_interface are sufficient.class MyTransform(DualTransform): # or ImageOnlyTransform / NoOp
"""First paragraph (120–160 chars): elevator pitch — what the transform does, how it works in one sentence, when to use it. No "Parameters: x, y", "Targets:", return type, or "Supports uint8/float32"; no "Used by X". Two lines, wrap at 120.
More detail about what the transform does.
Args:
param_range: (min, max) tuple controlling X. Default: (0.1, 0.3).
fill: Padding value for image. Default: 0.
fill_mask: Padding value for masks. Default: 0.
p: Probability. Default: 0.5.
Targets:
image, mask, bboxes, keypoints, volume, mask3d
Image types:
uint8, float32
Examples:
>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>> mask = np.random.randint(0, 2, (100, 100), dtype=np.uint8)
>>> bboxes = np.array([[10, 10, 50, 50]], dtype=np.float32)
>>> bbox_labels = [1]
>>> keypoints = np.array([[20, 30]], dtype=np.float32)
>>> keypoint_labels = [0]
>>>
>>> transform = A.Compose([
... A.MyTransform(param_range=(0.1, 0.3), p=1.0)
... ], bbox_params=A.BboxParams(coord_format='pascal_voc', label_fields=['bbox_labels']),
... keypoint_params=A.KeypointParams(coord_format='xy', label_fields=['keypoint_labels']))
>>>
>>> result = transform(
... image=image, mask=mask,
... bboxes=bboxes, bbox_labels=bbox_labels,
... keypoints=keypoints, keypoint_labels=keypoint_labels,
... )
"""
class InitSchema(BaseTransformInitSchema):
param_range: Annotated[tuple[float, float], AfterValidator(nondecreasing)]
# NO default values here (except discriminator fields)
def __init__(self, param_range: tuple[float, float], p: float = 0.5):
super().__init__(p=p)
self.param_range = param_range
def apply(self, img: ImageType, param1: float, **params: Any) -> ImageType:
# NO default values for param1 here
return fpixel.my_transform(img, param1)
def get_params(self) -> dict[str, Any]:
return {
"param1": self.py_random.uniform(*self.param_range),
}
get_transform_init_args_names() override — the base class auto-infers init arg names from __init__ via MRO introspection. Do not define this method._range suffix: brightness_range, not brightness_limitfill not fill_value, fill_mask not fill_mask_valueborder_mode not mode or pad_modeInitSchema (except Pydantic discriminator fields)tuple[T, T], never T | tuple[T, T] — no union with a scalar. Users always pass a tuple.apply_* methods (other than self, **params)get_params or get_params_dependent_on_data, never in apply_*self.py_random for simple random ops, self.random_generator only when numpy arrays needednp.random.* or random.* module directlyImageType for image/mask/volume type hints, np.ndarray only for bboxes/keypointsx, y, dx, dy, cx, cy. Prefer pixel_cols, norm_x, center_col, run_starts, col_x, etc. Names should read like documentation.(H, W, C), image batches are
(N, H, W, C), volumes are (D, H, W, C), and volume batches are (N, D, H, W, C).(H, W, 1), not (H, W). Do not add functional-layer compatibility branches for
2D grayscale images in code reached through Compose.ndim to distinguish image vs batch vs volume paths when needed, not to infer whether channels exist.functional.py, never in the transform class file.apply_to_images)Override apply_to_images only if you can beat the default per-image loop. Priority patterns:
Pre-compute expensive setup once per batch (kernels, LUTs, gradient maps):
def apply_to_images(self, images: ImageType, *args: Any, **params: Any) -> ImageType:
kernel = create_kernel(params["size"]) # once, not N times
return self._apply_to_batch(images, lambda img: convolve(img, kernel))
Direct 4D indexing for simple array ops:
def apply_to_images(self, images: ImageType, channels_to_drop: list[int], **params: Any) -> ImageType:
result = images.copy()
result[:, :, :, channels_to_drop] = self.fill
return result
Pre-allocated loop as fallback when params vary per image:
def apply_to_images(self, images: ImageType, *args: Any, **params: Any) -> ImageType:
result = np.empty_like(images)
for i, image in enumerate(images):
result[i] = self.apply(image, **params)
return result
DO NOT reshape
(N,H,W,1)to(H,W,N)to call cv2 once — this is 2–4× slower in practice (transpose → non-contiguous copy + cv2 sequential channel processing).
Add to albumentations/__init__.py:
from albumentations.augmentations.<module>.transforms import MyTransform
Add to albumentations/augmentations/<module>/__init__.py if one exists.
Add to tests/test_transforms.py or tests/test_<category>.py:
@pytest.mark.parametrize(
("param_range", "expected_..."),
[
((0.1, 0.3), ...),
((0.5, 0.8), ...),
],
)
def test_my_transform(param_range, expected_...):
image = TestDataFactory.create_image((100, 100, 3), dtype=np.uint8, seed=137)
aug = A.MyTransform(param_range=param_range, p=1.0)
result = aug(image=image)
# use np.testing assertions, not plain assert
np.testing.assert_...
Also add it to the parametrized lists in tests/utils.py:
get_dual_transforms() if it's a DualTransformget_image_only_transforms() if it's ImageOnlyTransformCheck edge cases: uint8, float32, single channel, multichannel.
get_transform_init_args_names() override (auto-inferred from __init__)_range suffix on range paramsfill / fill_mask (not fill_value / fill_mask_value)InitSchemaapply_* method argsget_params / get_params_dependent_on_dataself.py_random or self.random_generator (not np.random / random)ImageType for image type hintsapply_to_images if expensive setup can be shared across batchArgs, Targets, Image types, Examples sectionsalbumentations/__init__.pynp.testing assertions)pre-commit run --all-filesuv run pytest -m "not slow"