Skip to content

Instantly share code, notes, and snippets.

@Nick-Harvey
Created December 9, 2024 03:10
Show Gist options
  • Select an option

  • Save Nick-Harvey/4cf5a160513e7bb63ff72db97e3be1fc to your computer and use it in GitHub Desktop.

Select an option

Save Nick-Harvey/4cf5a160513e7bb63ff72db97e3be1fc to your computer and use it in GitHub Desktop.
Using the Lambda Inference API#
The Lambda Inference API enables you to use the Llama 3.1 405B Instruct large language model (LLM), and fine-tuned versions such as Nous Research's Hermes 3 and Liquid AI's LFM 40.3B MoE (Mixture of Experts), without the need to set up your own vLLM API server on an on-demand instance or 1-Click Cluster (1CC).
Tip
Try Lambda Chat!
Also try Companion, powered by the Lambda Inference API.
Contact us to learn more about our:
Public inference endpoints
Private inference endpoints
Since the Lambda Inference API is compatible with the OpenAI API, you can use it as a drop-in replacement for applications currently using the OpenAI API. See, for example, our guide on integrating the Lambda Inference API into VS Code.
The Lambda Inference API implements endpoints for:
Creating chat completions (/chat/completions)
Creating completions (/completions)
Listing models (/models)
Currently, the following models are available:
deepseek-coder-v2-lite-instruct
dracarys2-72b-instruct
hermes3-405b
hermes3-405b-fp8-128k
hermes3-70b
hermes3-8b
lfm-40b
llama3.1-405b-instruct-fp8
llama3.1-70b-instruct-fp8
llama3.1-8b-instruct
llama3.2-3b-instruct
llama3.1-nemotron-70b-instruct
Note
If a request using the hermes3-405b model is made with a context length greater than 18K, the request will fall back to using the hermes-3-llama-3.1-405b-fp8-128k model.
To use the Lambda Inference API, first generate a Cloud API key from the dashboard. You can also use a Cloud API key that you've already generated.
In the examples below:
Replace <MODEL> with one of the models listed above.
Replace <API-KEY> with your actual Cloud API key.
Creating chat completions#
The /chat/completions endpoint takes a list of messages that make up a conversation, then outputs a response.
Curl
Python
Run:
curl -sS https://api.lambdalabs.com/v1/chat/completions \
-H "Authorization: Bearer <API-KEY>" \
-H "Content-Type: application/json" \
-d '{
"model": "<MODEL>",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant named Hermes, made by Nous Research."
},
{
"role": "user",
"content": "Who won the world series in 2020?"
},
{
"role": "assistant",
"content": "The Los Angeles Dodgers won the World Series in 2020."
},
{
"role": "user",
"content": "Where was it played?"
}
]
}' | jq .
You should see output similar to:
{
"id": "chatcmpl-cbb10ffe2bf24c81a37d86204a3ec835",
"object": "chat.completion",
"created": 1733448149,
"model": "hermes3-8b",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "The 2020 World Series was played at Globe Life Field in Arlington, Texas, due to the COVID-19 pandemic restrictions. All games were played at this neutral site to minimize travel and potential exposure to the virus."
},
"finish_reason": "stop",
"content_filter_results": {
"hate": {
"filtered": false
},
"self_harm": {
"filtered": false
},
"sexual": {
"filtered": false
},
"violence": {
"filtered": false
},
"jailbreak": {
"filtered": false,
"detected": false
},
"profanity": {
"filtered": false,
"detected": false
}
}
}
],
"usage": {
"prompt_tokens": 65,
"completion_tokens": 45,
"total_tokens": 110,
"prompt_tokens_details": null,
"completion_tokens_details": null
},
"system_fingerprint": ""
}
Creating completions#
The /completions endpoint takes a single text string (a prompt) as input, then outputs a response. In comparison, the /chat/completions endpoint takes a list of messages as input.
To use the /completions endpoint:
Curl
Python
Run:
curl -sS https://api.lambdalabs.com/v1/completions \
-H "Authorization: Bearer <API-KEY>" \
-H "Content-Type: application/json" \
-d '{
"model": "<MODEL>",
"prompt": "Computers are",
"temperature": 0
}' | jq .
You should see output similar to:
{
"id": "chatcmpl-8e46443e199a446ea8a49ed124cad61b",
"object": "text_completion",
"created": 1733448483,
"model": "hermes3-8b",
"choices": [
{
"text": "1. Electronic devices that process data and perform a wide range of tasks\n2. Calculating machines used for complex mathematical operations\n3. Devices that can store and retrieve information\n4. Tools that enhance communication through email, instant messaging, and video conferencing\n5. Platforms for creating and sharing multimedia content, such as videos, photos, and music\n6. Essential tools for businesses and organizations in managing operations, financial transactions, and customer relations\n7. Systems used in scientific research and data analysis\n8. Devices that can be programmed to perform specific tasks and solve problems\n9. Networked tools that enable collaboration and resource sharing among users\n10. Powerful machines capable of performing complex computations, simulations, and artificial intelligence tasks.",
"index": 0,
"finish_reason": "stop",
"logprobs": {
"tokens": null,
"token_logprobs": null,
"top_logprobs": null,
"text_offset": null
}
}
],
"usage": {
"prompt_tokens": 23,
"completion_tokens": 149,
"total_tokens": 172,
"prompt_tokens_details": null,
"completion_tokens_details": null
}
}
Listing models#
The /models endpoint lists the models available for use through the Lambda Inference API.
To use the /models endpoint:
Curl
Python
Run:
curl https://api.lambdalabs.com/v1/models -H "Authorization: Bearer <API-KEY>" | jq .
You should see output similar to:
{
"object": "list",
"data": [
{
"id": "hermes3-405b",
"object": "model",
"created": 1724347380,
"owned_by": "lambda"
},
{
"id": "hermes3-70b",
"object": "model",
"created": 1724347380,
"owned_by": "lambda"
},
{
"id": "hermes3-8b",
"object": "model",
"created": 1724347380,
"owned_by": "lambda"
},
{
"id": "lfm-40b",
"object": "model",
"created": 1724347380,
"owned_by": "lambda"
},
{
"id": "llama3.1-405b-instruct-fp8",
"object": "model",
"created": 1724347380,
"owned_by": "lambda"
},
[…]
{
"id": "qwen25-coder-32b-instruct",
"object": "model",
"created": 1724347380,
"owned_by": "lambda"
}
]
}
Back to top
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment