Why Weekly Releases Matter
Maintaining a core library like huggingface_hub is a balancing act. It's the Python client powering transformers, datasets, diffusers, and dozens of other libraries. Every week without a release means fixes and features sit on main, unavailable to the ecosystem. For a long time, the team shipped every 4 to 6 weeks. Now they ship every week, and they did it without vendor lock-in or closed models.
The key insight: the release process splits into two kinds of work. Mechanical steps (bumping versions, tagging, pushing to PyPI) are perfect for automation. Creative steps (writing release notes, drafting announcements) benefit from AI drafting with human oversight. The result is a pipeline that costs about $0.25 per release and saves hours of manual effort.
This isn't a theoretical exercise. The full workflow is public and designed so any maintainer can fork it. Let's walk through how it works.
Reference: Original Hugging Face blog post

The Pipeline: From Manual to Automated
The old process was partly automated, mostly manual. CI already handled publishing to PyPI on tag push and opening downstream test branches. But everything else—creating the release branch, bumping __version__, writing release notes, drafting Slack announcements—was a half-day of human work spread over several days.
The New Workflow
The entire pipeline lives in a single GitHub Actions file triggered manually with one input: release_type (minor-prerelease, minor-release, or patch-release). Jobs run in sequence:
- Prepare: Compute next version, create/reuse release branch, bump version, commit, tag, push.
- Publish to PyPI: Build and upload
huggingface_huband thehfCLI as separate packages. - Release notes: Diff commit range, pull PR metadata from GitHub API, have an open-weights model draft a structured changelog.
- Downstream test branches: For RCs, open branches in
transformers,datasets, etc. with the RC pinned. - Slack announcement: Read the notes and produce an internal announcement.
- Archive notes: Upload raw AI draft and human-edited version to a Hugging Face Bucket.
- Post-release bump: Open a PR on
mainbumping to nextdev0. - Comment on shipped PRs: Leave a "this shipped in vX.Y.Z" comment on every PR in the release.
- Sync CLI docs: Open a PR with regenerated CLI skill docs.
- Report to Slack: Every step posts status; final job updates root message.
The Human-in-the-Loop Core
Here's the critical design: the model drafts, deterministic code verifies, and a human decides. Before the model runs, a Python script extracts all PR numbers from squash-merge commits in the release range:
import re
# Deterministic: extract PR numbers from squash-merge commits in the range.
PR_NUMBER_PATTERN = re.compile(r"\(#(\d+)\)quot;)
pr_numbers = [
int(m.group(1))
for commit in commits_since_last_tag
if (m := PR_NUMBER_PATTERN.search(commit.title))
]
save_manifest(pr_numbers) # source of truth
The model drafts notes from these PRs. Then the workflow checks the output against the manifest:
expected = set(load_manifest()) # what should be there
found = extract_pr_refs(notes_md) # what the model wrote
missing = expected - found # silently dropped
extra = found - expected # belongs to a different release
If anything is missing or extra, the agent is re-prompted to fix exactly those PRs. This loop runs up to MAX_ITERATIONS times until the notes match exactly.
Grounding the Model
To prevent the model from inventing code examples, the workflow also pulls documentation diffs from each PR:
def fetch_doc_diffs(pr):
return [
{"filename": f.filename, "status": f.status, "patch": f.patch}
for f in pr.get_files()
if f.filename.startswith("docs/") and f.filename.endswith(".md") and f.patch
]
These diffs go into the model's context, so when it writes "here's the new CLI command," it's quoting the actual example from the PR.
Security: Open and Secure
The pipeline uses PyPI Trusted Publishing with OIDC tokens—no long-lived secrets. The agent runtime (OpenCode) is pinned and SHA256-verified before execution:
curl -fsSL https://opencode.ai/install | bash -s -- --version "${OPENCODE_VERSION}"
echo "${OPENCODE_SHA256} $(which opencode)" | sha256sum -c -
Cost and Impact
A full release costs about $0.25 on Inference Providers with open weights. The cadence went from 4–6 weeks to weekly. Secondary effects: release notes improved (consistent grouping, fewer omissions), breakages surface earlier via downstream CI, and the automatic "shipped in" comments shortened contributor feedback loops.

Limitations and Caveats
This pipeline is designed for Python libraries with a stable release cadence. It assumes:
- A squash-merge workflow (to extract PR numbers from commit messages).
- A single
__version__string in the repository. - Downstream libraries that can be tested with release candidates.
For projects without these characteristics, adaptation is needed. The trust-but-verify loop is the most transferable part, but the model's drafting quality depends on well-structured PR titles and documentation diffs. If your PRs lack descriptive titles or documentation updates, the generated notes may be sparse.
Also, the pipeline doesn't auto-triage downstream failures yet. That's a planned improvement: checking failing logs and reporting them in the Slack message.

Making It Yours
The workflow is reusable almost as-is. To adapt:
- Fork the workflow file and scripts.
- Point it at your package.
- Rewrite the skill Markdown for your project's voice.
- Set two repo variables: the model ID and your OpenCode version.
- Set up Trusted Publishing on PyPI.
- Delete the downstream-testing job if you don't have downstreams.
The trust-but-verify loop is the part worth reusing unchanged. It's what makes a generated artifact safe to ship.
Next Steps
- Auto-triaging downstream failures: Today the workflow opens test branches and a human reads the CI. An obvious next step is to check the failing logs to report them in the Slack message.
- Extending the pattern: Most of this is generic. Expect to reuse large parts across other Python libraries.
Takeaway
The parts of a release that used to need a half-day of focused human work (writing notes, drafting announcements, coordinating downstream checks) are the parts a model is good at drafting. Everything else is mechanical and fits in a YAML file. The trick was never just "let the AI do it." It's to let the model draft, let deterministic code verify, and let a human decide. Built entirely from open tools and open weights, the cost rounds to zero and anyone can run it.
Related resources: