Why Weekly Releases Matter

Maintaining a core library like huggingface_hub is a balancing act. It's the Python client powering transformers, datasets, diffusers, and dozens of other libraries. Every week without a release means fixes and features sit on main, unavailable to the ecosystem. For a long time, the team shipped every 4 to 6 weeks. Now they ship every week, and they did it without vendor lock-in or closed models.

The key insight: the release process splits into two kinds of work. Mechanical steps (bumping versions, tagging, pushing to PyPI) are perfect for automation. Creative steps (writing release notes, drafting announcements) benefit from AI drafting with human oversight. The result is a pipeline that costs about $0.25 per release and saves hours of manual effort.

This isn't a theoretical exercise. The full workflow is public and designed so any maintainer can fork it. Let's walk through how it works.

Reference: Original Hugging Face blog post

Python code and GitHub Actions workflow for automated release pipeline with AI drafting release notes Dev Environment Setup

The Pipeline: From Manual to Automated

The old process was partly automated, mostly manual. CI already handled publishing to PyPI on tag push and opening downstream test branches. But everything else—creating the release branch, bumping __version__, writing release notes, drafting Slack announcements—was a half-day of human work spread over several days.

The New Workflow

The entire pipeline lives in a single GitHub Actions file triggered manually with one input: release_type (minor-prerelease, minor-release, or patch-release). Jobs run in sequence:

  1. Prepare: Compute next version, create/reuse release branch, bump version, commit, tag, push.
  2. Publish to PyPI: Build and upload huggingface_hub and the hf CLI as separate packages.
  3. Release notes: Diff commit range, pull PR metadata from GitHub API, have an open-weights model draft a structured changelog.
  4. Downstream test branches: For RCs, open branches in transformers, datasets, etc. with the RC pinned.
  5. Slack announcement: Read the notes and produce an internal announcement.
  6. Archive notes: Upload raw AI draft and human-edited version to a Hugging Face Bucket.
  7. Post-release bump: Open a PR on main bumping to next dev0.
  8. Comment on shipped PRs: Leave a "this shipped in vX.Y.Z" comment on every PR in the release.
  9. Sync CLI docs: Open a PR with regenerated CLI skill docs.
  10. Report to Slack: Every step posts status; final job updates root message.

The Human-in-the-Loop Core

Here's the critical design: the model drafts, deterministic code verifies, and a human decides. Before the model runs, a Python script extracts all PR numbers from squash-merge commits in the release range:

import re

# Deterministic: extract PR numbers from squash-merge commits in the range.
PR_NUMBER_PATTERN = re.compile(r"\(#(\d+)\)
quot;) pr_numbers = [ int(m.group(1)) for commit in commits_since_last_tag if (m := PR_NUMBER_PATTERN.search(commit.title)) ] save_manifest(pr_numbers) # source of truth

The model drafts notes from these PRs. Then the workflow checks the output against the manifest:

expected = set(load_manifest())  # what should be there
found = extract_pr_refs(notes_md)  # what the model wrote
missing = expected - found  # silently dropped
extra = found - expected  # belongs to a different release

If anything is missing or extra, the agent is re-prompted to fix exactly those PRs. This loop runs up to MAX_ITERATIONS times until the notes match exactly.

Grounding the Model

To prevent the model from inventing code examples, the workflow also pulls documentation diffs from each PR:

def fetch_doc_diffs(pr):
    return [
        {"filename": f.filename, "status": f.status, "patch": f.patch}
        for f in pr.get_files()
        if f.filename.startswith("docs/") and f.filename.endswith(".md") and f.patch
    ]

These diffs go into the model's context, so when it writes "here's the new CLI command," it's quoting the actual example from the PR.

Security: Open and Secure

The pipeline uses PyPI Trusted Publishing with OIDC tokens—no long-lived secrets. The agent runtime (OpenCode) is pinned and SHA256-verified before execution:

curl -fsSL https://opencode.ai/install | bash -s -- --version "${OPENCODE_VERSION}"
echo "${OPENCODE_SHA256} $(which opencode)" | sha256sum -c -

Cost and Impact

A full release costs about $0.25 on Inference Providers with open weights. The cadence went from 4–6 weeks to weekly. Secondary effects: release notes improved (consistent grouping, fewer omissions), breakages surface earlier via downstream CI, and the automatic "shipped in" comments shortened contributor feedback loops.

Cloud infrastructure diagram showing CI/CD pipeline with OpenCode agent and Hugging Face Hub integration System Abstract Visual

Limitations and Caveats

This pipeline is designed for Python libraries with a stable release cadence. It assumes:

  • A squash-merge workflow (to extract PR numbers from commit messages).
  • A single __version__ string in the repository.
  • Downstream libraries that can be tested with release candidates.

For projects without these characteristics, adaptation is needed. The trust-but-verify loop is the most transferable part, but the model's drafting quality depends on well-structured PR titles and documentation diffs. If your PRs lack descriptive titles or documentation updates, the generated notes may be sparse.

Also, the pipeline doesn't auto-triage downstream failures yet. That's a planned improvement: checking failing logs and reporting them in the Slack message.

Developer reviewing AI-generated release notes on laptop with human-in-the-loop approval process IT Technology Image

Making It Yours

The workflow is reusable almost as-is. To adapt:

  1. Fork the workflow file and scripts.
  2. Point it at your package.
  3. Rewrite the skill Markdown for your project's voice.
  4. Set two repo variables: the model ID and your OpenCode version.
  5. Set up Trusted Publishing on PyPI.
  6. Delete the downstream-testing job if you don't have downstreams.

The trust-but-verify loop is the part worth reusing unchanged. It's what makes a generated artifact safe to ship.

Next Steps

  • Auto-triaging downstream failures: Today the workflow opens test branches and a human reads the CI. An obvious next step is to check the failing logs to report them in the Slack message.
  • Extending the pattern: Most of this is generic. Expect to reuse large parts across other Python libraries.

Takeaway

The parts of a release that used to need a half-day of focused human work (writing notes, drafting announcements, coordinating downstream checks) are the parts a model is good at drafting. Everything else is mechanical and fits in a YAML file. The trick was never just "let the AI do it." It's to let the model draft, let deterministic code verify, and let a human decide. Built entirely from open tools and open weights, the cost rounds to zero and anyone can run it.

Related resources:

This content was drafted using AI tools based on reliable sources, and has been reviewed by our editorial team before publication. It is not intended to replace professional advice.