| name | opencv |
| description | Open Source Computer Vision Library (OpenCV) for real-time image processing, video analysis, object detection, face recognition, and camera calibration. Use when working with images, videos, cameras, edge detection, contours, feature detection, image transformations, object tracking, optical flow, or any computer vision task. |
| version | 4.9.0 |
| license | Apache-2.0 |
OpenCV - Computer Vision and Image Processing
OpenCV (Open Source Computer Vision Library) is the de facto standard library for computer vision tasks. It provides 2500+ optimized algorithms for real-time image and video processing, from basic operations like reading images to advanced tasks like face recognition and 3D reconstruction.
When to Use
- Reading, writing, and displaying images and videos from files or cameras.
- Image preprocessing (resizing, cropping, rotating, color conversion).
- Edge detection (Canny, Sobel) and contour finding.
- Feature detection and matching (SIFT, ORB, AKAZE).
- Object detection (Haar Cascades, HOG, DNN module for YOLO/SSD).
- Face detection and recognition.
- Image segmentation (thresholding, watershed, GrabCut).
- Video analysis (motion detection, object tracking, optical flow).
- Camera calibration and 3D reconstruction.
- Image stitching and panorama creation.
- Real-time applications requiring fast performance.
Reference Documentation
Official docs: https://docs.opencv.org/4.x/
GitHub: https://github.com/opencv/opencv
Tutorials: https://docs.opencv.org/4.x/d9/df8/tutorial_root.html
Search patterns: cv2.imread, cv2.cvtColor, cv2.Canny, cv2.findContours, cv2.VideoCapture
Core Principles
Image as NumPy Array
OpenCV represents images as NumPy arrays with shape (height, width, channels). This allows seamless integration with NumPy operations and other scientific Python libraries.
BGR Color Space (Not RGB!)
OpenCV uses BGR (Blue-Green-Red) instead of RGB by default. This is critical to remember when displaying images or integrating with other libraries.
In-Place vs Copy Operations
Many OpenCV functions modify images in-place for performance. Understanding when copies are made is essential for efficient code.
C++ Performance in Python
OpenCV is written in optimized C++, making it extremely fast even when called from Python. Avoid Python loops when OpenCV vectorized operations exist.
Quick Reference
Installation
pip install opencv-python
pip install opencv-contrib-python
pip install opencv-python-headless
Standard Imports
import cv2
import numpy as np
import matplotlib.pyplot as plt
Basic Pattern - Read, Process, Display
import cv2
img = cv2.imread('image.jpg')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
cv2.imshow('Grayscale', gray)
cv2.waitKey(0)
cv2.destroyAllWindows()
Basic Pattern - Video Processing
import cv2
cap = cv2.VideoCapture(0)
while True:
ret, frame = cap.read()
if not ret:
break
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
cv2.imshow('Video', gray)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cap.release()
cv2.destroyAllWindows()
Critical Rules
✅ DO
- Check Image Loaded - Always verify
img is not None after cv2.imread() to catch file errors.
- Use cv2.cvtColor() for Color Conversion - Don't manually rearrange channels; use the provided conversion codes.
- Release Resources - Always call
cap.release() and cv2.destroyAllWindows() when done with video/windows.
- Copy Before Modifying - Use
img.copy() if you need to preserve the original image.
- Use Appropriate Data Types - Keep images as uint8 (0-255) for display, convert to float32 (0-1) for mathematical operations.
- Validate VideoCapture - Check
cap.isOpened() before reading frames.
- Use BGR2RGB for Matplotlib - Convert BGR to RGB when displaying with matplotlib.
- Vectorize Operations - Use OpenCV's built-in functions instead of Python loops over pixels.
❌ DON'T
- Don't Assume RGB - OpenCV uses BGR by default; convert to RGB for matplotlib or PIL.
- Don't Forget waitKey() - Without
cv2.waitKey(), windows won't display properly.
- Don't Mix PIL and OpenCV Directly - Convert between them explicitly (OpenCV uses BGR, PIL uses RGB).
- Don't Process Video in Memory - Process frame-by-frame to avoid memory issues with large videos.
- Don't Use Python Loops for Pixels - This is 100x slower than vectorized operations.
- Don't Hardcode Paths - Use
os.path.join() or pathlib for cross-platform compatibility.
Anti-Patterns (NEVER)
import cv2
import numpy as np
img = cv2.imread('image.jpg')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
img = cv2.imread('image.jpg')
if img is None:
raise FileNotFoundError("Image not found")
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
for i in range(img.shape[0]):
for j in range(img.shape[1]):
img[i, j] = img[i, j] * 0.5
img = (img * 0.5).astype(np.uint8)
plt.imshow(img)
img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
plt.imshow(img_rgb)
cap = cv2.VideoCapture('video.mp4')
while cap.read()[0]:
pass
cap = cv2.VideoCapture('video.mp4')
try:
while cap.read()[0]:
pass
finally:
cap.release()
Image I/O and Display
Reading and Writing Images
import cv2
img = cv2.imread('image.jpg')
gray = cv2.imread('image.jpg', cv2.IMREAD_GRAYSCALE)
img_alpha = cv2.imread('image.png', cv2.IMREAD_UNCHANGED)
cv2.imwrite('output.jpg', img)
cv2.imwrite('output.jpg', img, [cv2.IMWRITE_JPEG_QUALITY, 95])
cv2.imwrite('output.png', img, [cv2.IMWRITE_PNG_COMPRESSION, 9])
if img is None:
print("Error: Could not load image")
else:
print(f"Image shape: {img.shape}")
Display Images
import cv2
cv2.imshow('Window Name', img)
cv2.waitKey(0)
cv2.destroyAllWindows()
cv2.imshow('Image', img)
cv2.waitKey(3000)
cv2.destroyAllWindows()
cv2.imshow('Original', img)
cv2.imshow('Gray', gray)
cv2.waitKey(0)
cv2.destroyAllWindows()
import matplotlib.pyplot as plt
img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
plt.imshow(img_rgb)
plt.axis('off')
plt.show()
Video Capture
import cv2
cap = cv2.VideoCapture(0)
cap = cv2.VideoCapture('video.mp4')
if not cap.isOpened():
print("Error: Could not open video")
exit()
fps = cap.get(cv2.CAP_PROP_FPS)
width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
total_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
print(f"Video: {width}x{height} @ {fps} fps, {total_frames} frames")
while True:
ret, frame = cap.read()
if not ret:
print("End of video or error")
break
cv2.imshow('Frame', frame)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cap.release()
cv2.destroyAllWindows()
Writing Videos
import cv2
cap = cv2.VideoCapture('input.mp4')
fps = int(cap.get(cv2.CAP_PROP_FPS))
width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
fourcc = cv2.VideoWriter_fourcc(*'mp4v')
out = cv2.VideoWriter('output.mp4', fourcc, fps, (width, height))
while True:
ret, frame = cap.read()
if not ret:
break
processed = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
processed = cv2.cvtColor(processed, cv2.COLOR_GRAY2BGR)
out.write(processed)
cap.release()
out.release()
cv2.destroyAllWindows()
Image Transformations
Resizing and Cropping
import cv2
img = cv2.imread('image.jpg')
resized = cv2.resize(img, (800, 600))
scaled = cv2.resize(img, None, fx=0.5, fy=0.5)
resized_linear = cv2.resize(img, (800, 600), interpolation=cv2.INTER_LINEAR)
resized_cubic = cv2.resize(img, (800, 600), interpolation=cv2.INTER_CUBIC)
resized_area = cv2.resize(img, (400, 300), interpolation=cv2.INTER_AREA)
height, width = img.shape[:2]
cropped = img[100:400, 200:600]
crop_size = 300
center_x, center_y = width // 2, height // 2
x1 = center_x - crop_size // 2
y1 = center_y - crop_size // 2
center_cropped = img[y1:y1+crop_size, x1:x1+crop_size]
Rotation and Flipping
import cv2
flipped_h = cv2.flip(img, 1)
flipped_v = cv2.flip(img, 0)
flipped_both = cv2.flip(img, -1)
rotated_90 = cv2.rotate(img, cv2.ROTATE_90_CLOCKWISE)
rotated_180 = cv2.rotate(img, cv2.ROTATE_180)
rotated_90_ccw = cv2.rotate(img, cv2.ROTATE_90_COUNTERCLOCKWISE)
height, width = img.shape[:2]
center = (width // 2, height // 2)
angle = 45
M = cv2.getRotationMatrix2D(center, angle, scale=1.0)
rotated = cv2.warpAffine(img, M, (width, height))
M_scaled = cv2.getRotationMatrix2D(center, 30, scale=0.8)
rotated_scaled = cv2.warpAffine(img, M_scaled, (width, height))
Color Space Conversions
import cv2
img = cv2.imread('image.jpg')
rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
lab = cv2.cvtColor(img, cv2.COLOR_BGR2LAB)
gray_bgr = cv2.cvtColor(gray, cv2.COLOR_GRAY2BGR)
b, g, r = cv2.split(img)
merged = cv2.merge([b, g, r])
Image Filtering and Enhancement
Blurring and Smoothing
import cv2
blurred = cv2.GaussianBlur(img, (5, 5), 0)
median = cv2.medianBlur(img, 5)
bilateral = cv2.bilateralFilter(img, 9, 75, 75)
avg_blur = cv2.blur(img, (5, 5))
box = cv2.boxFilter(img, -1, (5, 5))
Edge Detection
import cv2
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
edges = cv2.Canny(gray, threshold1=50, threshold2=150)
sobelx = cv2.Sobel(gray, cv2.CV_64F, 1, 0, ksize=3)
sobely = cv2.Sobel(gray, cv2.CV_64F, 0, 1, ksize=3)
sobel = cv2.magnitude(sobelx, sobely)
laplacian = cv2.Laplacian(gray, cv2.CV_64F)
scharrx = cv2.Scharr(gray, cv2.CV_64F, 1, 0)
scharry = cv2.Scharr(gray, cv2.CV_64F, 0, 1)
Morphological Operations
import cv2
import numpy as np
kernel = np.ones((5, 5), np.uint8)
eroded = cv2.erode(img, kernel, iterations=1)
dilated = cv2.dilate(img, kernel, iterations=1)
opening = cv2.morphologyEx(img, cv2.MORPH_OPEN, kernel)
closing = cv2.morphologyEx(img, cv2.MORPH_CLOSE, kernel)
gradient = cv2.morphologyEx(img, cv2.MORPH_GRADIENT, kernel)
tophat = cv2.morphologyEx(img, cv2.MORPH_TOPHAT, kernel)
blackhat = cv2.morphologyEx(img, cv2.MORPH_BLACKHAT, kernel)
Thresholding
import cv2
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
ret, thresh = cv2.threshold(gray, 127, 255, cv2.THRESH_BINARY)
ret, thresh_inv = cv2.threshold(gray, 127, 255, cv2.THRESH_BINARY_INV)
ret, thresh_trunc = cv2.threshold(gray, 127, 255, cv2.THRESH_TRUNC)
ret, thresh_tozero = cv2.threshold(gray, 127, 255, cv2.THRESH_TOZERO)
ret, thresh_otsu = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)
adaptive_mean = cv2.adaptiveThreshold(
gray, 255, cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY, 11, 2
)
adaptive_gaussian = cv2.adaptiveThreshold(
gray, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 11, 2
)
Contours and Shape Detection
Finding and Drawing Contours
import cv2
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
ret, thresh = cv2.threshold(gray, 127, 255, cv2.THRESH_BINARY)
contours, hierarchy = cv2.findContours(thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
img_contours = img.copy()
cv2.drawContours(img_contours, contours, -1, (0, 255, 0), 2)
cv2.drawContours(img_contours, contours, 0, (255, 0, 0), 3)
for i, contour in enumerate(contours):
area = cv2.contourArea(contour)
perimeter = cv2.arcLength(contour, True)
if area > 1000:
cv2.drawContours(img_contours, [contour], -1, (0, 0, 255), 2)
x, y, w, h = cv2.boundingRect(contour)
cv2.rectangle(img_contours, (x, y), (x+w, y+h), (255, 0, 0), 2)
Shape Approximation
import cv2
for contour in contours:
epsilon = 0.02 * cv2.arcLength(contour, True)
approx = cv2.approxPolyDP(contour, epsilon, True)
n_vertices = len(approx)
if n_vertices == 3:
shape = "Triangle"
elif n_vertices == 4:
x, y, w, h = cv2.boundingRect(approx)
aspect_ratio = float(w) / h
shape = "Square" if 0.95 <= aspect_ratio <= 1.05 else "Rectangle"
elif n_vertices > 4:
shape = "Circle" if n_vertices > 10 else "Polygon"
cv2.drawContours(img, [approx], -1, (0, 255, 0), 2)
x, y = approx[0][0]
cv2.putText(img, shape, (x, y), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (255, 255, 255), 2)
Contour Features
import cv2
import numpy as np
for contour in contours:
M = cv2.moments(contour)
if M['m00'] != 0:
cx = int(M['m10'] / M['m00'])
cy = int(M['m01'] / M['m00'])
cv2.circle(img, (cx, cy), 5, (255, 0, 0), -1)
(x, y), radius = cv2.minEnclosingCircle(contour)
center = (int(x), int(y))
radius = int(radius)
cv2.circle(img, center, radius, (0, 255, 0), 2)
if len(contour) >= 5:
ellipse = cv2.fitEllipse(contour)
cv2.ellipse(img, ellipse, (255, 0, 255), 2)
hull = cv2.convexHull(contour)
cv2.drawContours(img, [hull], -1, (0, 255, 255), 2)
hull_area = cv2.contourArea(hull)
contour_area = cv2.contourArea(contour)
solidity = contour_area / hull_area if hull_area > 0 else 0
Feature Detection and Matching
ORB (Oriented FAST and Rotated BRIEF)
import cv2
img1 = cv2.imread('image1.jpg', cv2.IMREAD_GRAYSCALE)
img2 = cv2.imread('image2.jpg', cv2.IMREAD_GRAYSCALE)
orb = cv2.ORB_create(nfeatures=1000)
kp1, des1 = orb.detectAndCompute(img1, None)
kp2, des2 = orb.detectAndCompute(img2, None)
img1_kp = cv2.drawKeypoints(img1, kp1, None, color=(0, 255, 0))
bf = cv2.BFMatcher(cv2.NORM_HAMMING, crossCheck=True)
matches = bf.match(des1, des2)
matches = sorted(matches, key=lambda x: x.distance)
img_matches = cv2.drawMatches(
img1, kp1, img2, kp2, matches[:50],
None, flags=cv2.DrawMatchesFlags_NOT_DRAW_SINGLE_POINTS
)
cv2.imshow('Matches', img_matches)
cv2.waitKey(0)
SIFT (Scale-Invariant Feature Transform)
import cv2
img = cv2.imread('image.jpg', cv2.IMREAD_GRAYSCALE)
sift = cv2.SIFT_create()
keypoints, descriptors = sift.detectAndCompute(img, None)
img_kp = cv2.drawKeypoints(
img, keypoints, None,
flags=cv2.DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINTS
)
print(f"Number of keypoints: {len(keypoints)}")
Feature Matching with FLANN
import cv2
import numpy as np
sift = cv2.SIFT_create()
kp1, des1 = sift.detectAndCompute(img1, None)
kp2, des2 = sift.detectAndCompute(img2, None)
FLANN_INDEX_KDTREE = 1
index_params = dict(algorithm=FLANN_INDEX_KDTREE, trees=5)
search_params = dict(checks=50)
flann = cv2.FlannBasedMatcher(index_params, search_params)
matches = flann.knnMatch(des1, des2, k=2)
good_matches = []
for m, n in matches:
if m.distance < 0.7 * n.distance:
good_matches.append(m)
print(f"Good matches: {len(good_matches)}")
img_matches = cv2.drawMatches(
img1, kp1, img2, kp2, good_matches, None,
flags=cv2.DrawMatchesFlags_NOT_DRAW_SINGLE_POINTS
)
Object Detection
Haar Cascade (Face Detection)
import cv2
face_cascade = cv2.CascadeClassifier(
cv2.data.haarcascades + 'haarcascade_frontalface_default.xml'
)
eye_cascade = cv2.CascadeClassifier(
cv2.data.haarcascades + 'haarcascade_eye.xml'
)
img = cv2.imread('people.jpg')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
faces = face_cascade.detectMultiScale(
gray,
scaleFactor=1.1,
minNeighbors=5,
minSize=(30, 30)
)
for (x, y, w, h) in faces:
cv2.rectangle(img, (x, y), (x+w, y+h), (255, 0, 0), 2)
roi_gray = gray[y:y+h, x:x+w]
roi_color = img[y:y+h, x:x+w]
eyes = eye_cascade.detectMultiScale(roi_gray)
for (ex, ey, ew, eh) in eyes:
cv2.rectangle(roi_color, (ex, ey), (ex+ew, ey+eh), (0, 255, 0), 2)
cv2.imshow('Faces', img)
cv2.waitKey(0)
Template Matching
import cv2
img = cv2.imread('image.jpg')
template = cv2.imread('template.jpg')
img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
template_gray = cv2.cvtColor(template, cv2.COLOR_BGR2GRAY)
h, w = template_gray.shape
result = cv2.matchTemplate(img_gray, template_gray, cv2.TM_CCOEFF_NORMED)
threshold = 0.8
locations = np.where(result >= threshold)
for pt in zip(*locations[::-1]):
cv2.rectangle(img, pt, (pt[0] + w, pt[1] + h), (0, 255, 0), 2)
cv2.imshow('Matches', img)
cv2.waitKey(0)
Practical Workflows
1. Document Scanner (Perspective Transform)
import cv2
import numpy as np
def order_points(pts):
"""Order points: top-left, top-right, bottom-right, bottom-left."""
rect = np.zeros((4, 2), dtype="float32")
s = pts.sum(axis=1)
rect[0] = pts[np.argmin(s)]
rect[2] = pts[np.argmax(s)]
diff = np.diff(pts, axis=1)
rect[1] = pts[np.argmin(diff)]
rect[3] = pts[np.argmax(diff)]
return rect
def four_point_transform(image, pts):
"""Apply perspective transform to get bird's eye view."""
rect = order_points(pts)
(tl, tr, br, bl) = rect
widthA = np.linalg.norm(br - bl)
widthB = np.linalg.norm(tr - tl)
maxWidth = max(int(widthA), int(widthB))
heightA = np.linalg.norm(tr - br)
heightB = np.linalg.norm(tl - bl)
maxHeight = max(int(heightA), int(heightB))
dst = np.array([
[0, 0],
[maxWidth - 1, 0],
[maxWidth - 1, maxHeight - 1],
[0, maxHeight - 1]
], dtype="float32")
M = cv2.getPerspectiveTransform(rect, dst)
warped = cv2.warpPerspective(image, M, (maxWidth, maxHeight))
return warped
img = cv2.imread('document.jpg')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
edges = cv2.Canny(gray, 50, 150)
contours, _ = cv2.findContours(edges, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
contours = sorted(contours, key=cv2.contourArea, reverse=True)
for contour in contours:
peri = cv2.arcLength(contour, True)
approx = cv2.approxPolyDP(contour, 0.02 * peri, True)
if len(approx) == 4:
pts = approx.reshape(4, 2)
scanned = four_point_transform(img, pts)
cv2.imshow('Scanned', scanned)
cv2.waitKey(0)
break
2. Motion Detection
import cv2
def detect_motion(video_path):
"""Detect motion in video using frame differencing."""
cap = cv2.VideoCapture(video_path)
ret, frame1 = cap.read()
ret, frame2 = cap.read()
while cap.isOpened():
diff = cv2.absdiff(frame1, frame2)
gray = cv2.cvtColor(diff, cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray, (5, 5), 0)
_, thresh = cv2.threshold(blur, 20, 255, cv2.THRESH_BINARY)
dilated = cv2.dilate(thresh, None, iterations=3)
contours, _ = cv2.findContours(dilated, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
for contour in contours:
if cv2.contourArea(contour) < 500:
continue
x, y, w, h = cv2.boundingRect(contour)
cv2.rectangle(frame1, (x, y), (x+w, y+h), (0, 255, 0), 2)
cv2.putText(frame1, "Motion", (x, y-10),
cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)
cv2.imshow('Motion Detection', frame1)
frame1 = frame2
ret, frame2 = cap.read()
if not ret or cv2.waitKey(1) & 0xFF == ord('q'):
break
cap.release()
cv2.destroyAllWindows()
3. Color-Based Object Tracking
import cv2
import numpy as np
def track_colored_object(video_path, lower_color, upper_color):
"""Track object by color in HSV space."""
cap = cv2.VideoCapture(video_path)
while True:
ret, frame = cap.read()
if not ret:
break
hsv = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV)
mask = cv2.inRange(hsv, lower_color, upper_color)
mask = cv2.erode(mask, None, iterations=2)
mask = cv2.dilate(mask, None, iterations=2)
contours, _ = cv2.findContours(mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
if contours:
largest = max(contours, key=cv2.contourArea)
((x, y), radius) = cv2.minEnclosingCircle(largest)
if radius > 10:
cv2.circle(frame, (int(x), int(y)), int(radius), (0, 255, 0), 2)
cv2.circle(frame, (int(x), int(y)), 5, (0, 0, 255), -1)
cv2.imshow('Tracking', frame)
cv2.imshow('Mask', mask)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cap.release()
cv2.destroyAllWindows()
4. QR Code Detection
import cv2
def detect_qr_code(image_path):
"""Detect and decode QR codes."""
img = cv2.imread(image_path)
detector = cv2.QRCodeDetector()
data, bbox, straight_qrcode = detector.detectAndDecode(img)
if bbox is not None:
n_lines = len(bbox)
for i in range(n_lines):
point1 = tuple(bbox[i][0].astype(int))
point2 = tuple(bbox[(i+1) % n_lines][0].astype(int))
cv2.line(img, point1, point2, (0, 255, 0), 3)
if data:
print(f"QR Code data: {data}")
cv2.putText(img, data, (50, 50),
cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)
cv2.imshow('QR Code', img)
cv2.waitKey(0)
cv2.destroyAllWindows()
5. Image Stitching (Panorama)
import cv2
def create_panorama(images):
"""Stitch multiple images into panorama."""
stitcher = cv2.Stitcher_create()
status, pano = stitcher.stitch(images)
if status == cv2.Stitcher_OK:
print("Panorama created successfully")
return pano
else:
print(f"Error: {status}")
return None
img1 = cv2.imread('image1.jpg')
img2 = cv2.imread('image2.jpg')
img3 = cv2.imread('image3.jpg')
panorama = create_panorama([img1, img2, img3])
if panorama is not None:
cv2.imshow('Panorama', panorama)
cv2.waitKey(0)
Performance Optimization
Use GPU Acceleration
import cv2
print(f"CUDA devices: {cv2.cuda.getCudaEnabledDeviceCount()}")
gpu_img = cv2.cuda_GpuMat()
gpu_img.upload(img)
gpu_gray = cv2.cuda.cvtColor(gpu_img, cv2.COLOR_BGR2GRAY)
result = gpu_gray.download()
Vectorize Operations
for i in range(height):
for j in range(width):
img[i, j] = img[i, j] * 0.5
img = (img * 0.5).astype(np.uint8)
img = cv2.convertScaleAbs(img, alpha=0.5, beta=0)
Multi-threading for Video
import cv2
from threading import Thread
from queue import Queue
class VideoCapture:
"""Threaded video capture for better performance."""
def __init__(self, src):
self.cap = cv2.VideoCapture(src)
self.q = Queue(maxsize=128)
self.stopped = False
def start(self):
Thread(target=self._reader, daemon=True).start()
return self
def _reader(self):
while not self.stopped:
ret, frame = self.cap.read()
if not ret:
self.stop()
break
self.q.put(frame)
def read(self):
return self.q.get()
def stop(self):
self.stopped = True
self.cap.release()
cap = VideoCapture(0).start()
while True:
frame = cap.read()
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cap.stop()
Common Pitfalls and Solutions
The "BGR vs RGB" Color Confusion
OpenCV uses BGR, most other libraries use RGB.
img = cv2.imread('image.jpg')
plt.imshow(img)
img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
plt.imshow(img_rgb)
cv2.imshow('Correct Colors', img)
cv2.waitKey(0)
The "Window Won't Close" Problem
Windows stay open without proper key handling.
cv2.imshow('Image', img)
cv2.imshow('Image', img)
cv2.waitKey(0)
cv2.destroyAllWindows()
The "Video Capture Not Released" Problem
Camera stays locked if not released properly.
cap = cv2.VideoCapture(0)
cap = cv2.VideoCapture(0)
try:
while True:
ret, frame = cap.read()
finally:
cap.release()
cv2.destroyAllWindows()
The "Image Modification" Confusion
Some operations modify in-place, others return new images.
cv2.rectangle(img, (10, 10), (100, 100), (0, 255, 0), 2)
blurred = cv2.GaussianBlur(img, (5, 5), 0)
img_copy = img.copy()
cv2.rectangle(img_copy, (10, 10), (100, 100), (0, 255, 0), 2)
The "Contour Hierarchy" Misunderstanding
findContours returns different structures based on retrieval mode.
contours, hierarchy = cv2.findContours(
thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE
)
contours, hierarchy = cv2.findContours(
thresh, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE
)
OpenCV is the Swiss Army knife of computer vision. Its vast library of optimized algorithms, combined with Python's ease of use, makes it the perfect tool for everything from simple image processing to complex real-time vision systems. Master these fundamentals, and you'll have the foundation to tackle any computer vision challenge.