| name | ideer-daily-paper-chatbot |
| description | Use iDeer as a daily paper-reading workflow for chatbot-first users such as Codex, Gemini, or ChatGPT. Keep the original iDeer paper-digest setup, source selection, history validation, email/report/ideas workflow, but replace in-repo LLM API summarization and scoring with the current chatbot session. 适用于不用单独配置 OpenAI/SiliconFlow/Ollama API key 的每日论文整理、报告、想法生成与自动化。 |
| allowed-tools | read(*), write(.env), write(.web_config.json), write(.client_config.json), write(profiles/**), write(state/**), write(history/**), write(chatbot_test_outputs/**), grep(*), glob(*), bash(*), web_fetch(*), web_search(*) |
iDeer Daily Paper Chatbot
Use this skill when the user wants the iDeer daily-paper workflow but does not want the repo to call its own LLM API. The chatbot should do the reading, scoring, grouping, report writing, and idea generation directly in the current conversation.
Constants
PROJECT_DIR: the current iDeer repository root. When installed by scripts/install_internshannon_skill.py, this becomes the absolute clone path.
SKILL_DIR: skills/ideer-daily-paper-chatbot inside the iDeer repository.
- Default sources:
arxiv semanticscholar huggingface rss
- Default RSS feed:
https://imjuya.github.io/juya-ai-daily/rss.xml
- First-run schedule preference:
Asia/Shanghai daily at 13:00, saved but not enabled.
- First validation mode: dry run only, save local artifacts, do not send email, do not enable recurring schedules.
Core rule
Keep as much of the original iDeer workflow as possible:
- reuse the repo layout
- reuse the source fetchers when they work
- reuse
.env, profiles/description.txt, and profiles/researcher_profile.md
- reuse
history/ as the artifact destination when saving outputs
But do not rely on main.py for any step that requires MODEL_NAME, BASE_URL, API_KEY, or Ollama. Instead, fetch raw items and have the chatbot perform the intelligence layer.
Never use the Tinder/swipe product path for this skill. Do not call /api/swipe, read client/src/swipeView.tsx for workflow state, or use saved swipe queues as recommendation input.
What stays the same
- source defaults and source-selection heuristics
- profile-driven filtering using
profiles/description.txt
- optional stronger report/ideas guidance from
profiles/researcher_profile.md
- artifact validation in
history/
- optional SMTP sending when the user explicitly wants live email and SMTP config exists
- Codex automation support for recurring runs
What changes
Replace these original in-repo LLM tasks with chatbot work in-session:
- per-item Chinese summary
- per-item relevance scoring
- per-source daily summary
- cross-source narrative report
- research idea generation
Do not call python main.py or bash scripts/run_daily.sh unless the user explicitly wants to test the original API-based pipeline. For chatbot-first runs, fetch raw data with the repo's fetchers or with web browsing and continue in the conversation.
Files to inspect first
Always check:
.env
profiles/description.txt
Check when needed:
profiles/researcher_profile.md
profiles/x_accounts.txt
If .env does not exist, or if .env lacks SMTP_RECEIVER, or if profiles/description.txt is missing/empty, enter First-run setup before any digest run.
Modes
Map the user request to one of these modes:
- First-run setup: ask the user for required setup fields, write
.env/profiles/UI config with the helper script, then run a small dry run
- Chatbot dry run: fetch sources, summarize in-chat, save markdown/html/json artifacts, do not send email
- Chatbot full digest: fetch sources, summarize in-chat, save artifacts, send email only if SMTP config is complete and the user asked for live send
- Setup/fix: adjust
.env, profiles, categories, or fetchers so source collection works
- Recurring automation: create or update a Codex automation that performs a chatbot-first digest
First-run setup
Use this mode when the user is installing iDeer for the first time or when the config files are missing. If the client supports option boxes or structured follow-up questions, use them; otherwise ask concise numbered questions.
Required questions
Ask for:
- receiver email address
- research direction / interest description
- information sources
- preferred delivery time
Use these defaults when the user accepts defaults or gives an incomplete answer:
- sources:
arxiv semanticscholar huggingface rss
- schedule:
daily, 13:00, Asia/Shanghai
- arXiv categories:
cs.AI cs.CL cs.LG
- Hugging Face content type:
papers
- Semantic Scholar field:
Computer Science
- report: enabled
- ideas: disabled
- email sending: disabled
- recurring schedule: disabled
Optional questions
Ask, but allow the user to skip:
- Google Scholar or personal homepage URL
- SMTP server, sender, and app password
- whether to include GitHub
- whether to generate research ideas
Only include twitter if the user explicitly chooses it and an X_RAPIDAPI_KEY is available. Do not ask for repo LLM API keys during chatbot-first setup.
Write setup files
After collecting answers, pass JSON to the helper:
cat <<'JSON' | .venv/bin/python skills/ideer-daily-paper-chatbot/scripts/setup_chatbot_config.py
{
"receiver": "user@example.com",
"description": "User research interests here",
"scholar_urls": [],
"sources": ["arxiv", "semanticscholar", "huggingface", "rss"],
"schedule": {
"frequency": "daily",
"time": "13:00",
"timezone": "Asia/Shanghai"
},
"generate_ideas": false
}
JSON
If .venv does not exist yet, use python3 skills/ideer-daily-paper-chatbot/scripts/setup_chatbot_config.py for this setup step. The helper writes .env, profiles/description.txt, optional profiles/researcher_profile.md, state/ideer_chatbot_setup.json, .web_config.json, and .client_config.json.
The helper must not invent SMTP passwords or API keys. If SMTP is incomplete, report that email is not configured and that the first run will only save local artifacts.
After setup, run a small chatbot-first dry run, such as arxiv and huggingface with low limits, then report the files created and that scheduling remains disabled.
Source defaults
- Default paper/news sources:
arxiv semanticscholar huggingface rss
- RSS defaults to Juya AI Daily:
https://imjuya.github.io/juya-ai-daily/rss.xml
- Add
github only when the user wants code/repo signals
- Add
twitter only when the user explicitly wants social signals and credentials exist
- For Hugging Face, default to papers only
- For CS users, start arXiv from
cs.AI cs.CL cs.LG; expand to cs.CV cs.RO for embodied, spatial, or robotics interests
- Prefer explicit Semantic Scholar queries when the profile is broad
Chatbot-first pipeline
Step 1: Classify and configure
If first-run setup is needed, complete it before this step.
Read the profile and decide:
- which sources to fetch
- whether report and idea generation are requested
- whether email is requested
- whether the request is one-off or recurring
Use skills/ideer-daily-paper-chatbot/references/presets.md for presets.
Step 2: Fetch raw items
Prefer the repo fetchers first when the repo is available:
fetchers/arxiv_fetcher.py
fetchers/huggingface_fetcher.py
fetchers/semanticscholar_fetcher.py
fetchers/rss_fetcher.py
fetchers/github_fetcher.py
fetchers/twitter_fetcher.py
If the repo is not available or a fetcher is broken, use browsing and cite the public source pages.
Fetch raw candidates only. Do not call the repo's LLM scoring path.
Run commands from PROJECT_DIR. Prefer .venv/bin/python; if the virtualenv is missing, use Python 3.10+ to create it before fetching:
python3 -m venv .venv
.venv/bin/python -m pip install -r requirements.txt
Step 3: Deduplicate and curate
The chatbot should:
- remove duplicates across sources when the same paper appears in HF and arXiv
- score relevance qualitatively or numerically in the conversation
- organize results by the user's stated interest directions
- write concise Chinese summaries and recommendation reasons
When the user gave explicit directions such as Agent / Spatial Intelligence / World Model, preserve those headings in the final digest.
Step 4: Save artifacts in iDeer-compatible places
Prefer these output shapes:
history/<source>/<date>/<date>.md for source-level markdown digests
history/reports/<date>/report.md for cross-source report
history/ideas/<date>/ideas.json for structured idea output
- optional
history/<source>/<date>/<source>_email.html if you render an HTML email body
It is acceptable for chatbot-first runs to write fewer files than the original pipeline, as long as you report exactly what was written.
If the user wants HTML artifacts without touching the main repo scripts, use the bundled renderer:
.venv/bin/python skills/ideer-daily-paper-chatbot/scripts/render_chatbot_artifacts.py \
--date YYYY-MM-DD \
--base-dir <artifact-dir>
This script should render report.html and digest_email.html from chatbot-written markdown/json outputs inside the chosen artifact directory.
Step 5: Email behavior
If SMTP is incomplete, do not claim that email was sent. Save the digest locally and tell the user what is missing.
If SMTP is complete and the user explicitly asked for sending, either:
- reuse the repo's email templates/utilities if convenient, or
- render a simple HTML body and send it through SMTP
Never send email on the first validation run unless the user clearly asked for a live send.
Step 6: Recurring automation
For chatbot-first automation, prefer the native agent/workflow scheduler when available. Use the repo root as the working directory and write the prompt so the chatbot fetches raw source items, performs summarization itself, saves artifacts, and only sends email if SMTP exists.
First-run setup saves the user's schedule preference but does not enable it. Create or enable a recurring task only after the user confirms the dry-run artifacts look correct.
See skills/ideer-daily-paper-chatbot/references/automation.md.
Safe command patterns
Use small fetch/test commands instead of the full original pipeline.
Examples:
.venv/bin/python - <<'PY'
from fetchers.huggingface_fetcher import get_daily_papers
print(len(get_daily_papers(10)))
PY
.venv/bin/python - <<'PY'
from fetchers.arxiv_fetcher import fetch_papers_for_categories
print(fetch_papers_for_categories(['cs.AI','cs.LG'], max_entries=25, sleep_range=(0,0)).keys())
PY
Use bash scripts/run_daily.sh only to debug the legacy API-based path.
Validation checklist
After each run, report:
- the date that actually ran
- whether first-run setup was needed and which config files were written
- which sources were fetched
- whether summarization was done by the chatbot or by the repo pipeline
- which files were created
- whether email was sent, skipped, or blocked
- whether recurring scheduling is enabled or still only saved as a preference
- the first concrete blocker if anything failed
Safety rules
- Never print API keys, SMTP passwords, or tokens
- Never claim files exist before checking them
- Never claim email was sent before checking SMTP success
- Do not overwrite user-authored profile files unless the user asked
- Prefer writing additive chatbot-first artifacts over changing core repo code unless a fetcher is actually broken
Good default
For users who want paper digestion without API keys, start with:
- raw fetch from
arxiv and huggingface
- chatbot-written markdown digest
- optional chatbot-written cross-source report
- no live email on the first pass