| name | transaction-classification-debugger |
| description | Debug and test Budget Buddy's fuzzy matching transaction classification system using 0.85 similarity threshold. Use when debugging classification issues, understanding fuzzy matching algorithm, testing merchant matching, finding similar transactions, or explaining how the smart batch update feature works. |
| allowed-tools | ["Bash(python*)","Read","Grep","LSP"] |
Transaction Classification Debugger
Overview
This skill helps you understand and debug Budget Buddy's transaction classification system, which uses fuzzy matching to find similar transactions at an 85% similarity threshold. It's critical for the "smart batch update" feature that suggests applying classifications to similar unclassified transactions.
Prerequisites
- Database exists with transactions:
budget_buddy.db
- Backend code accessible
- Understanding of Python's
difflib.SequenceMatcher
Quick Start
Step 1: Understand the Fuzzy Matching Algorithm
The core algorithm is in /backend/services/database_service.py - method get_similar_unclassified_transactions:
from difflib import SequenceMatcher
def get_similar_unclassified_transactions(
transaction_id: int,
similarity_threshold: float = 0.85
):
reference_tx = get_transaction_by_id(transaction_id)
candidates = session.query(Transaction).filter(
Transaction.bb_category_manual == False,
Transaction.id != transaction_id,
or_(
Transaction.merchant_name == reference_tx.merchant_name,
)
).all()
similar_transactions = []
for candidate in candidates:
similarity = SequenceMatcher(
None,
reference_tx.description.lower(),
candidate.description.lower()
).ratio()
if similarity >= similarity_threshold:
similar_transactions.append({
'transaction': candidate,
'similarity_score': similarity,
'match_reason': 'description' if similarity >= 0.85 else 'merchant'
})
return similar_transactions
Step 2: Test Fuzzy Matching
Test with sample descriptions:
from difflib import SequenceMatcher
desc1 = "CHECK #80 - MONTHLY RENT"
desc2 = "CHECK #79 - MONTHLY RENT"
similarity = SequenceMatcher(None, desc1.lower(), desc2.lower()).ratio()
print(f"Similarity: {similarity:.2%}")
Test in database:
import sqlite3
conn = sqlite3.connect('budget_buddy.db')
cursor = conn.cursor()
cursor.execute("SELECT id, description, merchant_name FROM transactions WHERE id = 123")
reference = cursor.fetchone()
print(f"Reference: {reference}")
cursor.execute("""
SELECT id, description, merchant_name, bb_category_manual
FROM transactions
WHERE id != 123
AND bb_category_manual = 0
LIMIT 100
""")
from difflib import SequenceMatcher
for tx in cursor.fetchall():
similarity = SequenceMatcher(None, reference[1].lower(), tx[1].lower()).ratio()
if similarity >= 0.85:
print(f"Match: ID={tx[0]}, Similarity={similarity:.2%}, Desc={tx[1]}")
conn.close()
Step 3: Debug Classification Issues
Check bb_category_manual flag:
sqlite3 budget_buddy.db "
SELECT
id,
description,
merchant_name,
bb_category,
bb_category_manual
FROM transactions
WHERE merchant_name = 'TARGET'
LIMIT 10;
"
Find unclassified transactions:
sqlite3 budget_buddy.db "
SELECT COUNT(*) as unclassified_count
FROM transactions
WHERE bb_category_manual = 0;
"
Check for similar descriptions:
import sqlite3
from difflib import SequenceMatcher
conn = sqlite3.connect('budget_buddy.db')
cursor = conn.cursor()
cursor.execute("SELECT id, description FROM transactions LIMIT 1000")
transactions = cursor.fetchall()
target_desc = "WHOLE FOODS MARKET #12345"
for tx_id, desc in transactions:
similarity = SequenceMatcher(None, target_desc.lower(), desc.lower()).ratio()
if similarity >= 0.85 and similarity < 1.0:
print(f"ID {tx_id}: {similarity:.2%} - {desc}")
conn.close()
Key Validation Points
Matching Criteria
-
Merchant Name Match (Exact):
merchant_name must be identical
- Case-sensitive comparison
- Example: "TARGET" ≠ "Target"
-
Description Match (Fuzzy, 85%):
- Uses
difflib.SequenceMatcher
- Threshold: 0.85 (85% similarity)
- Case-insensitive (converted to lowercase)
- Example: "CHECK #80" ≈ "CHECK #79" (95% similar)
-
Manual Classification Filter:
bb_category_manual = False (REQUIRED)
- Never suggest already-manually-classified transactions
- Prevents overwriting user decisions
Similarity Threshold Analysis
| Threshold | Strictness | Use Case |
|---|
| 0.95-1.0 | Very strict | Nearly identical (e.g., "CHECK #80" vs "CHECK #79") |
| 0.85-0.95 | Balanced (DEFAULT) | Similar patterns (e.g., same merchant with different check numbers) |
| 0.75-0.85 | Loose | Broader matches (may include false positives) |
| < 0.75 | Very loose | Too many false positives |
Why 0.85?
- Captures variations like check numbers, dates, locations
- Avoids false positives from unrelated merchants
- Proven effective over 70+ commits
Common Issues & Solutions
Issue: No similar transactions found
Possible Causes:
- All similar transactions already manually classified (
bb_category_manual = True)
merchant_name is null/empty AND description similarity < 0.85
- Reference transaction is the only one of its kind
Debug:
sqlite3 budget_buddy.db "
SELECT COUNT(*)
FROM transactions
WHERE merchant_name = 'YOUR_MERCHANT'
AND bb_category_manual = 0;
"
sqlite3 budget_buddy.db "
SELECT description
FROM transactions
WHERE description LIKE '%PATTERN%'
LIMIT 20;
"
Issue: Too many false positive matches
Cause: Threshold too low or descriptions too generic
Solution:
similar = get_similar_unclassified_transactions(
transaction_id=123,
similarity_threshold=0.90
)
Example False Positives:
- "PAYMENT THANK YOU" vs "PAYMENT RECEIVED" (85% similar but different meaning)
- Generic descriptions matching unrelated transactions
Issue: Missing obvious matches
Cause: Threshold too high or merchant_name mismatch
Solution:
similar = get_similar_unclassified_transactions(
transaction_id=123,
similarity_threshold=0.80
)
Example Missed Matches:
- "WHOLE FOODS #123" vs "WHOLE FOODS MARKET #456" (if threshold too high)
- Merchant name variations: "TARGET" vs "TARGET CORP"
Issue: Manually classified transactions appearing in suggestions
Cause: bb_category_manual not properly set
Solution:
sqlite3 budget_buddy.db "
SELECT id, description, bb_category, bb_category_manual
FROM transactions
WHERE id IN (123, 456, 789);
"
sqlite3 budget_buddy.db "
UPDATE transactions
SET bb_category_manual = 1
WHERE id IN (SELECT id FROM transactions WHERE bb_category IS NOT NULL);
"
Smart Batch Update Workflow
User Journey
-
User manually classifies transaction (inline or modal)
- Updates
bb_category and sets bb_category_manual = True
-
Backend checks for similar transactions
- Calls
get_similar_unclassified_transactions()
- Finds matches with
merchant_name OR fuzzy description
- Filters to only unclassified (
bb_category_manual = False)
-
Frontend shows modal with checkboxes
- Lists similar transactions
- Shows similarity score for each
- User selects which to update
-
Batch update endpoint applies classification
- Updates selected transactions
- Sets
bb_category_manual = True for all
- Maintains audit trail
Integration Points
-
backend/services/database_service.py
- Method:
get_similar_unclassified_transactions()
- Line: ~varies (search for method)
-
Frontend: ClassificationManagement.js
- Inline dropdown editing
- Triggers similarity check on change
-
Frontend: EnhancedTransactionModal.js
- Modal form editing
- OnSaveSuccess callback triggers similarity check
-
Frontend: Batch Edit Modal
- Checkbox selection
- Batch update API call
Testing the Fuzzy Matcher
Test Case 1: Check Numbers
from difflib import SequenceMatcher
desc1 = "CHECK #1234 - MONTHLY RENT"
desc2 = "CHECK #1235 - MONTHLY RENT"
similarity = SequenceMatcher(None, desc1.lower(), desc2.lower()).ratio()
print(f"Similarity: {similarity:.2%}")
assert similarity >= 0.85
Test Case 2: Merchant Variations
desc1 = "WHOLE FOODS MARKET #12345"
desc2 = "WHOLE FOODS MARKET #67890"
similarity = SequenceMatcher(None, desc1.lower(), desc2.lower()).ratio()
print(f"Similarity: {similarity:.2%}")
assert similarity >= 0.85
Test Case 3: Unrelated Transactions
desc1 = "STARBUCKS COFFEE #123"
desc2 = "TARGET STORE #456"
similarity = SequenceMatcher(None, desc1.lower(), desc2.lower()).ratio()
print(f"Similarity: {similarity:.2%}")
assert similarity < 0.85
Test Case 4: Date Variations
desc1 = "PAYMENT DUE 01/15/2026"
desc2 = "PAYMENT DUE 02/15/2026"
similarity = SequenceMatcher(None, desc1.lower(), desc2.lower()).ratio()
print(f"Similarity: {similarity:.2%}")
assert similarity >= 0.85
Advanced Debugging
Visualize Similarity Scores
import sqlite3
from difflib import SequenceMatcher
import matplotlib.pyplot as plt
conn = sqlite3.connect('budget_buddy.db')
cursor = conn.cursor()
ref_id = 123
cursor.execute("SELECT description FROM transactions WHERE id = ?", (ref_id,))
ref_desc = cursor.fetchone()[0]
cursor.execute("SELECT id, description FROM transactions WHERE id != ?", (ref_id,))
transactions = cursor.fetchall()
similarities = []
for tx_id, desc in transactions:
score = SequenceMatcher(None, ref_desc.lower(), desc.lower()).ratio()
similarities.append((tx_id, score, desc))
similarities.sort(key=lambda x: x[1], reverse=True)
print(f"\nTop 10 matches for: {ref_desc}\n")
for tx_id, score, desc in similarities[:10]:
print(f"{score:.2%} - ID {tx_id}: {desc}")
conn.close()
Test Threshold Variations
thresholds = [0.70, 0.75, 0.80, 0.85, 0.90, 0.95]
for threshold in thresholds:
similar = get_similar_unclassified_transactions(
transaction_id=123,
similarity_threshold=threshold
)
print(f"Threshold {threshold:.2f}: {len(similar)} matches")
Technical Details
difflib.SequenceMatcher
from difflib import SequenceMatcher
matcher = SequenceMatcher(None, "string1", "string2")
ratio = matcher.ratio()
blocks = matcher.get_matching_blocks()
opcodes = matcher.get_opcodes()
Ratio Calculation:
ratio = 2 * M / T
Where:
M = number of matching characters
T = total number of characters in both strings
Database Schema
Transactions Table:
id - Primary key
description - Original transaction description
merchant_name - Extracted merchant (from Plaid or manual)
bb_category - Assigned budget category
bb_category_manual - Boolean (0=auto, 1=manual)
amount - Transaction amount
date - Transaction date
Key Insight: Only transactions with bb_category_manual = 0 (False) are suggested for batch updates.
Integration with Other Skills
- Code Explanation - Can explain fuzzy matching algorithm visually
- Development Diagnostics - Validates database has transactions to classify
- Testing & Validation Suite - Can include fuzzy matching tests
References
Last Updated
January 1, 2026