Skip to content

Instantly share code, notes, and snippets.

@Wauplin
Wauplin / parse_units.py
Last active October 14, 2025 10:54
Parse bytes and durations units (string => int)
import re
import pytest
RE_NUMBER_WITH_UNIT = re.compile(r"(\d+)([a-z]+)", re.IGNORECASE)
BYTE_UNITS = {
"k": 1_000,
"m": 1_000_000,
@Wauplin
Wauplin / hf.py
Created October 10, 2025 09:28
How to avoid concurrency issues when uploading to the Hugging Face Hub from many workers
# This is a hacky script, working at the time it has been published.
# It uses internal from the huggingface_hub library so expect breaking changes without prior notice.
# The biggest challenge when uploading from many workers is to avoid concurrency issues during the /commit call.
# The solution is to (pre-)upload files from the workers and put them in a queue.
#
# Then a last worker is dedicated to making commits on the Hub, using preuploaded files.
# The queue to coordinate the workers can be done be local files, a database, a Python queue, etc. as long as it's robust to concurrency.
# You should also add a retry mechanism in the "commit worker" in case of failure while committing.
#
# Upload TBs of data from many workers is challening and puts a high load on our infra.
@Wauplin
Wauplin / hf_mcp.py
Created May 6, 2025 16:28
Python-based HF MCP server
"""
WARNING: This is an experimental implementation. Expect rough edges while using it.
-------------------------------------------------
Defines a FastMCP server that exposes the Hugging Face Hub API as a set of tools.
In practice, all public methods from `HfApi` are exposed as tools, except for the ones dealing with files:
- `create_commit`
- `hf_hub_download`
- `preupload_lfs_files`
@Wauplin
Wauplin / syncify_test.py
Created June 28, 2023 08:42
Test syncify-like decorator to define both a Sync and Async client
import asyncio
import functools
from typing import Callable, Coroutine, Any, TypeVar, ParamSpec
T_Retval = TypeVar("T_Retval")
T_ParamSpec = ParamSpec("T_ParamSpec")
def syncify(async_function: Callable[T_ParamSpec, Coroutine[Any, Any, T_Retval]]) -> Callable[T_ParamSpec, T_Retval]:
# Taken from https://github.com/tiangolo/asyncer and simplified