name	manual-sync
description	Fetch new LeetCode comp posts and manually parse/normalize them (replacing the LLM pipeline). Invoke with /manual-sync.
disable-model-invocation	true

LeetComp Manual Sync

You replace the LLM steps of leetcomp-sync. The fetch script grabs raw posts, then YOU parse, normalize, and rebuild final data.

Step 1: Fetch Raw Posts

cd {{workspace}} && uv run python -c "
import asyncio
from leetcomp.utils import last_fetched_info
from leetcomp.fetch import fetch_posts_in_bulk
from leetcomp import POSTS_FILE
till_id, till_ts = last_fetched_info(POSTS_FILE)
print(f'OLD_FIRST_ID={till_id}')
asyncio.run(fetch_posts_in_bulk(till_id=till_id, till_timestamp=till_ts))
print('FETCH_DONE')
"

Note the OLD_FIRST_ID printed. All lines in data/posts.jsonl before that id's line are new posts. If no new posts were fetched, stop and report that.

Step 2: Parse New Posts

Read the new posts from data/posts.jsonl. For each post:

Skip if downvotes > upvotes → {"id": ..., "created_at": ..., "skip": true}
Skip if not a compensation post → {"id": ..., "created_at": ..., "skip": true}
Parse India-based offers only (Lakhs/LPA/INR/₹ = India signal)

Parsed record JSON line:

{"id": <int>, "created_at": "<str>", "skip": false, "compensation-post": true,
 "offer-type": "full-time", "company": "<raw>", "company-normalized": "<lowercase>",
 "role": "<raw>", "role-normalized": "<lowercase short form>", "yoe": <float>,
 "base": <float, LPA>, "total": <float LPA explicit>, "total-calculated": <float LPA computed>,
 "location": "<city>", "currency": "INR"}

Key rules:

LPA values: "30 lacs"→30, "₹10,00,000"→10, "1.5 Cr"→150
total = user-stated total; total-calculated = annual compensation computed from recurring annual components only (base + annual bonus/target bonus + annualized stocks/equity), and only if no explicit total is stated
Exclude one-time or non-recurring items from total-calculated: joining/sign-on bonus, relocation bonus, retention bonus, PF, gratuity, retirals, benefits, reimbursements, clawback notes, etc.
RSU/stock grants: annualize over 4 years (industry standard) even if the vesting period isn't explicitly stated, as long as the grant value is given in currency (INR/$). If RSUs are in units only, look up the current stock price and USD/INR rate to convert to INR, then annualize over 4 years.
Always compute total-calculated when there are enough components to exceed just base alone, and no explicit total is stated. Do not leave it out just because one component (e.g. vesting period) uses a standard default.
Mark duplicate/reposted compensation posts as skip (same author, same comp details, different post ID)
If the user explicitly states a first-year/total compensation number, prefer total exactly as stated instead of recomputing
Omit missing fields entirely (never N/A)
One post can yield multiple parsed records
company-normalized: lowercase normalized ("Samsung R&D" → "samsung")
role-normalized: lowercase normalized key matching existing data/role_map.json conventions when possible; prefer specific normalized titles over overly generic ones when the role is explicitly stated (e.g. application engineer l3 is better than l3)

Critical accuracy rules:

Never infer unstated fields. Only include fields explicitly stated in the post or directly derivable from explicit compensation text
If role is not stated, omit role and role-normalized
If location for the offer is not stated, omit location even if the author mentions current city, preferred city, NCR, office preference, or a company office elsewhere
Do not invent a generic role like sde/sse just because the YOE/company suggests it
For multi-offer posts, keep each offer isolated; do not leak role/location/YOE/comp components from one offer into another unless clearly shared
Preserve raw role text from the post when present, and normalize separately in role-normalized

Mandatory verification before writing:

Re-read every newly parsed record against the raw post text
Check that every non-derived field (company, role, location, yoe, explicit total) appears in the post text
Check that every derived field is traceable and formula-correct
Check role-normalized against existing data/role_map.json patterns before creating a new style of normalization
Check that total-calculated excludes one-time components unless the post explicitly gives a total, in which case use total
Only after this verification, prepend all new parsed lines to data/parsed_posts.jsonl (newest first)

Step 3: Update Entity Maps

For each of company_map.json, role_map.json, location_map.json:

Load existing map (keys are lowercase)
Collect new entity values from your parsed records (company-normalized, role-normalized, location)
For any value not already in the map, add: "lowercase_variant": "Canonical Name"
Save sorted by key

Canonical naming:

Company: proper casing (Google, Amazon, JPMorgan Chase, PhonePe, IBM)
Role: UPPERCASE abbrevs (SDE, SSE, MLE), Title Case full names (Data Scientist). Preserve levels (SDE 2)
Location: official name (Bengaluru, Gurugram, Hyderabad)

Step 4: Rebuild final_data.json

cd {{workspace}} && uv run python -c "from leetcomp.transform import create_final_data; create_final_data()"

Step 5: Report

Print summary: new posts fetched, parsed offers vs skipped, new mappings added per entity type, total final records.

cd {{workspace}} && python3 -c "import json; print(f'Final records: {len(json.load(open(\"data/final_data.json\")))}')"

name	manual-sync
description	Fetch new LeetCode comp posts and manually parse/normalize them (replacing the LLM pipeline). Invoke with /manual-sync.
disable-model-invocation	true

LeetComp Manual Sync

You replace the LLM steps of leetcomp-sync. The fetch script grabs raw posts, then YOU parse, normalize, and rebuild final data.

Step 1: Fetch Raw Posts

cd {{workspace}} && uv run python -c "
import asyncio
from leetcomp.utils import last_fetched_info
from leetcomp.fetch import fetch_posts_in_bulk
from leetcomp import POSTS_FILE
till_id, till_ts = last_fetched_info(POSTS_FILE)
print(f'OLD_FIRST_ID={till_id}')
asyncio.run(fetch_posts_in_bulk(till_id=till_id, till_timestamp=till_ts))
print('FETCH_DONE')
"

Note the OLD_FIRST_ID printed. All lines in data/posts.jsonl before that id's line are new posts. If no new posts were fetched, stop and report that.

Step 2: Parse New Posts

Read the new posts from data/posts.jsonl. For each post:

Skip if downvotes > upvotes → {"id": ..., "created_at": ..., "skip": true}
Skip if not a compensation post → {"id": ..., "created_at": ..., "skip": true}
Parse India-based offers only (Lakhs/LPA/INR/₹ = India signal)

Parsed record JSON line:

{"id": <int>, "created_at": "<str>", "skip": false, "compensation-post": true,
 "offer-type": "full-time", "company": "<raw>", "company-normalized": "<lowercase>",
 "role": "<raw>", "role-normalized": "<lowercase short form>", "yoe": <float>,
 "base": <float, LPA>, "total": <float LPA explicit>, "total-calculated": <float LPA computed>,
 "location": "<city>", "currency": "INR"}

Key rules:

LPA values: "30 lacs"→30, "₹10,00,000"→10, "1.5 Cr"→150
total = user-stated total; total-calculated = annual compensation computed from recurring annual components only (base + annual bonus/target bonus + annualized stocks/equity), and only if no explicit total is stated
Exclude one-time or non-recurring items from total-calculated: joining/sign-on bonus, relocation bonus, retention bonus, PF, gratuity, retirals, benefits, reimbursements, clawback notes, etc.
RSU/stock grants: annualize over 4 years (industry standard) even if the vesting period isn't explicitly stated, as long as the grant value is given in currency (INR/$). If RSUs are in units only, look up the current stock price and USD/INR rate to convert to INR, then annualize over 4 years.
Always compute total-calculated when there are enough components to exceed just base alone, and no explicit total is stated. Do not leave it out just because one component (e.g. vesting period) uses a standard default.
Mark duplicate/reposted compensation posts as skip (same author, same comp details, different post ID)
If the user explicitly states a first-year/total compensation number, prefer total exactly as stated instead of recomputing
Omit missing fields entirely (never N/A)
One post can yield multiple parsed records
company-normalized: lowercase normalized ("Samsung R&D" → "samsung")
role-normalized: lowercase normalized key matching existing data/role_map.json conventions when possible; prefer specific normalized titles over overly generic ones when the role is explicitly stated (e.g. application engineer l3 is better than l3)

Critical accuracy rules:

Never infer unstated fields. Only include fields explicitly stated in the post or directly derivable from explicit compensation text
If role is not stated, omit role and role-normalized
If location for the offer is not stated, omit location even if the author mentions current city, preferred city, NCR, office preference, or a company office elsewhere
Do not invent a generic role like sde/sse just because the YOE/company suggests it
For multi-offer posts, keep each offer isolated; do not leak role/location/YOE/comp components from one offer into another unless clearly shared
Preserve raw role text from the post when present, and normalize separately in role-normalized

Mandatory verification before writing:

Re-read every newly parsed record against the raw post text
Check that every non-derived field (company, role, location, yoe, explicit total) appears in the post text
Check that every derived field is traceable and formula-correct
Check role-normalized against existing data/role_map.json patterns before creating a new style of normalization
Check that total-calculated excludes one-time components unless the post explicitly gives a total, in which case use total
Only after this verification, prepend all new parsed lines to data/parsed_posts.jsonl (newest first)

Step 3: Update Entity Maps

For each of company_map.json, role_map.json, location_map.json:

Load existing map (keys are lowercase)
Collect new entity values from your parsed records (company-normalized, role-normalized, location)
For any value not already in the map, add: "lowercase_variant": "Canonical Name"
Save sorted by key

Canonical naming:

Company: proper casing (Google, Amazon, JPMorgan Chase, PhonePe, IBM)
Role: UPPERCASE abbrevs (SDE, SSE, MLE), Title Case full names (Data Scientist). Preserve levels (SDE 2)
Location: official name (Bengaluru, Gurugram, Hyderabad)

Step 4: Rebuild final_data.json

cd {{workspace}} && uv run python -c "from leetcomp.transform import create_final_data; create_final_data()"

Step 5: Report

Print summary: new posts fetched, parsed offers vs skipped, new mappings added per entity type, total final records.

cd {{workspace}} && python3 -c "import json; print(f'Final records: {len(json.load(open(\"data/final_data.json\")))}')"

manual-sync

LeetComp Manual Sync

Step 1: Fetch Raw Posts

Step 2: Parse New Posts

Step 3: Update Entity Maps

Step 4: Rebuild final_data.json

Step 5: Report

LeetComp Manual Sync

Step 1: Fetch Raw Posts

Step 2: Parse New Posts

Step 3: Update Entity Maps

Step 4: Rebuild final_data.json

Step 5: Report