name	github-repo-access
category	devops
description	Access and extract data from GitHub repositories — browse tree, read files, download content — including fallback strategies when standard methods (git clone, raw.githubusercontent.com CDN) are unavailable due to network restrictions or timeouts.
triggers	["User asks to read/extract data from a GitHub repo","git clone fails or times out","raw.githubusercontent.com content is unreachable","Need to inspect repo contents without cloning"]

GitHub Repo Access

Standard Approach (works most of the time)

Clone with depth limit

git clone --depth 1 <repo-url> <target-dir>

--depth 1 skips history, fastest clone
Use --single-branch if only one branch needed

Read a single file (raw CDN)

curl -s "https://raw.githubusercontent.com/<owner>/<repo>/<branch>/<path>"

Fallback: GitHub REST API (when clone/CDN fail)

When to use

git clone times out (network slowness, firewall restrictions)
raw.githubusercontent.com is unreachable but api.github.com works
You only need to browse/inspect files, not clone the full history

1. List the repo tree (recursive)

curl -s "https://api.github.com/repos/<owner>/<repo>/git/trees/<branch>?recursive=1" \
  | jq '.tree[] | "\(.type) \(.path)"' -r

Or with Python:

import json, urllib.request
url = f"https://api.github.com/repos/{owner}/{repo}/git/trees/{branch}?recursive=1"
data = json.loads(urllib.request.urlopen(url).read())
for item in data['tree']:
    print(f"{item['type']:4s} {item['path']}")

2. List top-level directory

curl -s "https://api.github.com/repos/<owner>/<repo>/contents/"

3. Read a specific file (base64-encoded)

curl -s "https://api.github.com/repos/<owner>/<repo>/contents/<path>" \
  | jq -r '.content' | base64 -d

With Python:

import json, base64, urllib.request
url = f"https://api.github.com/repos/{owner}/{repo}/contents/{path}"
data = json.loads(urllib.request.urlopen(url).read())
content = base64.b64decode(data["content"]).decode("utf-8")

4. Bulk download multiple files (Python)

import json, base64, urllib.request

files = ["README.md", "config.yaml", "path/to/file.md"]
for f in files:
    url = f"https://api.github.com/repos/{owner}/{repo}/contents/{f}"
    with urllib.request.urlopen(url, timeout=15) as resp:
        data = json.loads(resp.read())
        content = base64.b64decode(data["content"]).decode("utf-8")
        # content now has the file text

API Notes

Rate limit: Unauthenticated: 60 req/hr. Authenticated: 5,000 req/hr
Authentication: Add header Authorization: Bearer <token> for higher limits
Large repos: Recursive tree may be truncated if too large; paginate or use tree SHA
Binary files: Contents API returns base64; for very large files use the blob API instead

Safer Bulk Download Pattern (no pipe-to-interpreter)

Instead of curl | python3 (triggers security scanners), use execute_code with urllib.request:

# Inside execute_code block - no shell piping needed
import json, base64, urllib.request

files = ["README.md", "config.yaml", "path/to/file.md"]
for f in files:
    url = f"https://api.github.com/repos/{owner}/{repo}/contents/{f}"
    with urllib.request.urlopen(url, timeout=15) as resp:
        data = json.loads(resp.read())
        content = base64.b64decode(data["content"]).decode("utf-8")
        print(f"FILE: {f}  ({data['size']} bytes)")
        print(content[:3000])

This avoids shell pipelines entirely and works within the agent's native execute_code tool.

Pitfalls

raw.githubusercontent.com may be reachable when api.github.com is not, and vice versa — try both
Base64 content from the API includes \n every 60 chars — base64.b64decode() in Python handles this automatically; jq -r strips it
Recursive tree (?recursive=1) is limited to ~100,000 entries; beyond that, paginate by subtree SHA
Git clone over https:// with token in URL: https://TOKEN@github.com/owner/repo.git — avoid shell history leakage

name	github-repo-access
category	devops
description	Access and extract data from GitHub repositories — browse tree, read files, download content — including fallback strategies when standard methods (git clone, raw.githubusercontent.com CDN) are unavailable due to network restrictions or timeouts.
triggers	["User asks to read/extract data from a GitHub repo","git clone fails or times out","raw.githubusercontent.com content is unreachable","Need to inspect repo contents without cloning"]

GitHub Repo Access

Standard Approach (works most of the time)

Clone with depth limit

git clone --depth 1 <repo-url> <target-dir>

--depth 1 skips history, fastest clone
Use --single-branch if only one branch needed

Read a single file (raw CDN)

curl -s "https://raw.githubusercontent.com/<owner>/<repo>/<branch>/<path>"

Fallback: GitHub REST API (when clone/CDN fail)

When to use

git clone times out (network slowness, firewall restrictions)
raw.githubusercontent.com is unreachable but api.github.com works
You only need to browse/inspect files, not clone the full history

1. List the repo tree (recursive)

curl -s "https://api.github.com/repos/<owner>/<repo>/git/trees/<branch>?recursive=1" \
  | jq '.tree[] | "\(.type) \(.path)"' -r

Or with Python:

import json, urllib.request
url = f"https://api.github.com/repos/{owner}/{repo}/git/trees/{branch}?recursive=1"
data = json.loads(urllib.request.urlopen(url).read())
for item in data['tree']:
    print(f"{item['type']:4s} {item['path']}")

2. List top-level directory

curl -s "https://api.github.com/repos/<owner>/<repo>/contents/"

3. Read a specific file (base64-encoded)

curl -s "https://api.github.com/repos/<owner>/<repo>/contents/<path>" \
  | jq -r '.content' | base64 -d

With Python:

import json, base64, urllib.request
url = f"https://api.github.com/repos/{owner}/{repo}/contents/{path}"
data = json.loads(urllib.request.urlopen(url).read())
content = base64.b64decode(data["content"]).decode("utf-8")

4. Bulk download multiple files (Python)

import json, base64, urllib.request

files = ["README.md", "config.yaml", "path/to/file.md"]
for f in files:
    url = f"https://api.github.com/repos/{owner}/{repo}/contents/{f}"
    with urllib.request.urlopen(url, timeout=15) as resp:
        data = json.loads(resp.read())
        content = base64.b64decode(data["content"]).decode("utf-8")
        # content now has the file text

API Notes

Rate limit: Unauthenticated: 60 req/hr. Authenticated: 5,000 req/hr
Authentication: Add header Authorization: Bearer <token> for higher limits
Large repos: Recursive tree may be truncated if too large; paginate or use tree SHA
Binary files: Contents API returns base64; for very large files use the blob API instead

Safer Bulk Download Pattern (no pipe-to-interpreter)

Instead of curl | python3 (triggers security scanners), use execute_code with urllib.request:

# Inside execute_code block - no shell piping needed
import json, base64, urllib.request

files = ["README.md", "config.yaml", "path/to/file.md"]
for f in files:
    url = f"https://api.github.com/repos/{owner}/{repo}/contents/{f}"
    with urllib.request.urlopen(url, timeout=15) as resp:
        data = json.loads(resp.read())
        content = base64.b64decode(data["content"]).decode("utf-8")
        print(f"FILE: {f}  ({data['size']} bytes)")
        print(content[:3000])

This avoids shell pipelines entirely and works within the agent's native execute_code tool.

Pitfalls

raw.githubusercontent.com may be reachable when api.github.com is not, and vice versa — try both
Base64 content from the API includes \n every 60 chars — base64.b64decode() in Python handles this automatically; jq -r strips it
Recursive tree (?recursive=1) is limited to ~100,000 entries; beyond that, paginate by subtree SHA
Git clone over https:// with token in URL: https://TOKEN@github.com/owner/repo.git — avoid shell history leakage

github-repo-access

GitHub Repo Access

Standard Approach (works most of the time)

Clone with depth limit

Read a single file (raw CDN)

Fallback: GitHub REST API (when clone/CDN fail)

When to use

1. List the repo tree (recursive)

2. List top-level directory

3. Read a specific file (base64-encoded)

4. Bulk download multiple files (Python)

API Notes

Safer Bulk Download Pattern (no pipe-to-interpreter)

Pitfalls

GitHub Repo Access

Standard Approach (works most of the time)

Clone with depth limit

Read a single file (raw CDN)

Fallback: GitHub REST API (when clone/CDN fail)

When to use

1. List the repo tree (recursive)

2. List top-level directory

3. Read a specific file (base64-encoded)

4. Bulk download multiple files (Python)

API Notes

Safer Bulk Download Pattern (no pipe-to-interpreter)

Pitfalls