Skip to content

Instantly share code, notes, and snippets.

@JGalego
Last active October 14, 2025 23:39
Show Gist options
  • Select an option

  • Save JGalego/dc5945d798f948625c4111c4844de563 to your computer and use it in GitHub Desktop.

Select an option

Save JGalego/dc5945d798f948625c4111c4844de563 to your computer and use it in GitHub Desktop.
Deploy LLaVA-OneVision on Amazon SageMaker using the Hugging Face Inference Toolkit πŸ€—πŸŒΏ
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"id": "09606c0f-34d9-4c8a-9d53-3b969e81795d",
"metadata": {},
"source": [
"# Deploying LLaVA-OneVision on Amazon SageMaker\n",
"\n",
"This guide provides instructions for deploying the [LLaVA-OneVision](https://llava-vl.github.io/blog/2024-08-05-llava-onevision/) model on [Amazon SageMaker](https://aws.amazon.com/sagemaker/) using the [Hugging Face Inference Toolkit](https://github.com/aws/sagemaker-huggingface-inference-toolkit).\n",
"\n",
"<img src=\"https://llava-vl.github.io/blog/2024-08-05-llava-onevision/demos/fig1.png\" width=\"75%\"/>"
]
},
{
"cell_type": "markdown",
"id": "b6057254-51fa-47ed-9200-ccdc2ffaefb2",
"metadata": {},
"source": [
"## Prerequisites βœ…"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "d1a8c914-106c-40d3-a326-ed79391f9440",
"metadata": {
"execution": {
"iopub.execute_input": "2025-10-14T23:29:10.194107Z",
"iopub.status.busy": "2025-10-14T23:29:10.193948Z",
"iopub.status.idle": "2025-10-14T23:29:12.215509Z",
"shell.execute_reply": "2025-10-14T23:29:12.214942Z",
"shell.execute_reply.started": "2025-10-14T23:29:10.194089Z"
}
},
"outputs": [],
"source": [
"# Make sure Amazon SageMaker Python SDK is installed / updated\n",
"!pip install -qU --use-deprecated=legacy-resolver sagemaker"
]
},
{
"cell_type": "markdown",
"id": "362c6f76-1257-417d-96bd-8e6ef09066e3",
"metadata": {},
"source": [
"## Initial Setup ➑️"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "97b5b215-8b32-4059-a77c-31284a355cc8",
"metadata": {
"execution": {
"iopub.execute_input": "2025-10-14T23:29:12.218730Z",
"iopub.status.busy": "2025-10-14T23:29:12.218323Z",
"iopub.status.idle": "2025-10-14T23:29:12.221930Z",
"shell.execute_reply": "2025-10-14T23:29:12.221345Z",
"shell.execute_reply.started": "2025-10-14T23:29:12.218708Z"
}
},
"outputs": [],
"source": [
"import logging\n",
"import warnings\n",
"\n",
"# Suppress all warnings\n",
"warnings.filterwarnings(\"ignore\")\n",
"\n",
"# Sagemaker continuously complains about config, so we'll suppress that too\n",
"logging.getLogger(\"sagemaker.config\").setLevel(logging.WARNING)"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "ddf8465f-dafc-46a2-b7e1-4d799fea0dfe",
"metadata": {
"execution": {
"iopub.execute_input": "2025-10-14T23:29:12.222615Z",
"iopub.status.busy": "2025-10-14T23:29:12.222445Z",
"iopub.status.idle": "2025-10-14T23:29:14.189340Z",
"shell.execute_reply": "2025-10-14T23:29:14.188787Z",
"shell.execute_reply.started": "2025-10-14T23:29:12.222598Z"
}
},
"outputs": [],
"source": [
"import sagemaker\n",
"\n",
"# Initialize SageMaker session\n",
"sess = sagemaker.Session()\n",
"role = sagemaker.get_execution_role()"
]
},
{
"cell_type": "markdown",
"id": "74c7e7f4-0809-4698-bc89-971af8493bdd",
"metadata": {},
"source": [
"## Model Setup πŸ€—"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "53adaa07-2274-479f-a6aa-59d9fe8bef3e",
"metadata": {
"execution": {
"iopub.execute_input": "2025-10-14T23:29:14.191642Z",
"iopub.status.busy": "2025-10-14T23:29:14.191472Z",
"iopub.status.idle": "2025-10-14T23:29:14.341955Z",
"shell.execute_reply": "2025-10-14T23:29:14.341423Z",
"shell.execute_reply.started": "2025-10-14T23:29:14.191625Z"
}
},
"outputs": [],
"source": [
"from sagemaker.huggingface import HuggingFaceModel\n",
"\n",
"hub = {\n",
" 'HF_MODEL_ID': \"jgalego/llava-onevision-qwen2-0.5b-ov-hf\", # original repo + code folder\n",
" 'HF_TASK': \"image-text-to-text\"\n",
"}\n",
"\n",
"huggingface_model = HuggingFaceModel(\n",
" transformers_version=\"4.49\",\n",
" pytorch_version=\"2.6\",\n",
" py_version=\"py312\",\n",
" env=hub,\n",
" role=role,\n",
" entry_point='inference.py',\n",
" source_dir='./code'\n",
")"
]
},
{
"cell_type": "markdown",
"id": "3324f323-f6ac-494b-ab7e-dde7c6c43cef",
"metadata": {},
"source": [
"## Model Deployment πŸš€"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "00635cce-2258-4294-9b23-435ec4ff7f1b",
"metadata": {
"execution": {
"iopub.execute_input": "2025-10-14T23:29:14.344348Z",
"iopub.status.busy": "2025-10-14T23:29:14.344040Z",
"iopub.status.idle": "2025-10-14T23:35:47.049548Z",
"shell.execute_reply": "2025-10-14T23:35:47.048902Z",
"shell.execute_reply.started": "2025-10-14T23:29:14.344329Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"------------!"
]
}
],
"source": [
"predictor = huggingface_model.deploy(\n",
" initial_instance_count=1,\n",
" instance_type='ml.g4dn.xlarge',\n",
" endpoint_name='llava-onevision-endpoint',\n",
" model_data_download_timeout=5*60,\n",
" container_startup_health_check_timeout=5*60\n",
")"
]
},
{
"cell_type": "markdown",
"id": "1276c497-6c68-4113-975d-02a09bec616f",
"metadata": {},
"source": [
"## Test Endpoint πŸ§ͺ\n",
"\n",
"Download a sample image"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "b27f04f7-5330-43df-a5f9-64ee43f1d77b",
"metadata": {
"execution": {
"iopub.execute_input": "2025-10-14T23:35:47.052362Z",
"iopub.status.busy": "2025-10-14T23:35:47.052158Z",
"iopub.status.idle": "2025-10-14T23:35:47.289291Z",
"shell.execute_reply": "2025-10-14T23:35:47.288409Z",
"shell.execute_reply.started": "2025-10-14T23:35:47.052344Z"
}
},
"outputs": [],
"source": [
"!wget -q -O example.jpg https://www.surfertoday.com/images/stories/dog-surfing-guide.jpg"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "56e3b73d-7795-4897-80ea-b85eee29c2c5",
"metadata": {
"execution": {
"iopub.execute_input": "2025-10-14T23:35:47.292917Z",
"iopub.status.busy": "2025-10-14T23:35:47.292676Z",
"iopub.status.idle": "2025-10-14T23:35:47.297307Z",
"shell.execute_reply": "2025-10-14T23:35:47.296733Z",
"shell.execute_reply.started": "2025-10-14T23:35:47.292896Z"
}
},
"outputs": [],
"source": [
"import base64\n",
"\n",
"# Read and encode your image\n",
"with open('example.jpg', 'rb') as f:\n",
" image_bytes = base64.b64encode(f.read()).decode('utf-8')\n",
"\n",
"# Prepare request data\n",
"payload = {\n",
" \"inputs\": \"Describe this image in detail.\",\n",
" \"images\": [image_bytes],\n",
" \"parameters\": {\n",
" \"max_new_tokens\": 256,\n",
" \"temperature\": 0.7,\n",
" \"top_p\": 0.9,\n",
" \"do_sample\": True,\n",
" \"repetition_penalty\": 1.2,\n",
" \"no_repeat_ngram_size\": 3\n",
" }\n",
"}"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "5b30c0fa-bae1-4e73-8301-8726bff5a8a0",
"metadata": {
"execution": {
"iopub.execute_input": "2025-10-14T23:35:47.298118Z",
"iopub.status.busy": "2025-10-14T23:35:47.297927Z",
"iopub.status.idle": "2025-10-14T23:35:54.913910Z",
"shell.execute_reply": "2025-10-14T23:35:54.913303Z",
"shell.execute_reply.started": "2025-10-14T23:35:47.298101Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"In the heart of a clear blue sky, a small white and brown dog is having an adventure on a vibrant yellow surfboard that floats gently in the water. The dog, adorned with a pair of striking red sunglasses, gazes directly into the camera, capturing our attention with its infectious smile. A black leash clings to one side of the board, ready for any unforeseen adventures ahead. The scene is serene yet filled with joy, encapsulating a moment of pure fun and excitement for our canine friend.\n"
]
}
],
"source": [
"response = predictor.predict(payload)\n",
"print(response['generated_text'])"
]
},
{
"cell_type": "markdown",
"id": "03220441-a72d-4268-832d-61563f228542",
"metadata": {},
"source": [
"## Cleanup"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "0541f59f-cedd-47fe-bd5b-749a72fea5c8",
"metadata": {
"execution": {
"iopub.execute_input": "2025-10-14T23:35:54.916660Z",
"iopub.status.busy": "2025-10-14T23:35:54.916456Z",
"iopub.status.idle": "2025-10-14T23:35:55.483667Z",
"shell.execute_reply": "2025-10-14T23:35:55.482992Z",
"shell.execute_reply.started": "2025-10-14T23:35:54.916641Z"
}
},
"outputs": [],
"source": [
"predictor.delete_model()\n",
"predictor.delete_endpoint()"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.9"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment