Nick-Harvey · December 9, 2024 03:10
diff --git a/gistfile1.txt b/gistfile1.txt
 Using the Lambda Inference API#
 The Lambda Inference API enables you to use the Llama 3.1 405B Instruct large language model (LLM), and fine-tuned versions such as Nous Research's Hermes 3 and Liquid AI's LFM 40.3B MoE (Mixture of Experts), without the need to set up your own vLLM API server on an on-demand instance or 1-Click Cluster (1CC).

 Tip

 Try Lambda Chat!

 Also try Companion, powered by the Lambda Inference API.

 Contact us to learn more about our:

 Public inference endpoints
 Private inference endpoints
 Since the Lambda Inference API is compatible with the OpenAI API, you can use it as a drop-in replacement for applications currently using the OpenAI API. See, for example, our guide on integrating the Lambda Inference API into VS Code.

 The Lambda Inference API implements endpoints for:

 Creating chat completions (/chat/completions)
 Creating completions (/completions)
 Listing models (/models)
 Currently, the following models are available:

 deepseek-coder-v2-lite-instruct
 dracarys2-72b-instruct
 hermes3-405b
 hermes3-405b-fp8-128k
 hermes3-70b
 hermes3-8b
 lfm-40b
 llama3.1-405b-instruct-fp8
 llama3.1-70b-instruct-fp8
 llama3.1-8b-instruct
 llama3.2-3b-instruct
 llama3.1-nemotron-70b-instruct
 Note

 If a request using the hermes3-405b model is made with a context length greater than 18K, the request will fall back to using the hermes-3-llama-3.1-405b-fp8-128k model.

 To use the Lambda Inference API, first generate a Cloud API key from the dashboard. You can also use a Cloud API key that you've already generated.

 In the examples below:

 Replace <MODEL> with one of the models listed above.
 Replace <API-KEY> with your actual Cloud API key.
 Creating chat completions#
 The /chat/completions endpoint takes a list of messages that make up a conversation, then outputs a response.


 Curl
 Python
 Run:


 curl -sS https://api.lambdalabs.com/v1/chat/completions \
  -H "Authorization: Bearer <API-KEY>" \
  -H "Content-Type: application/json" \
  -d '{
        "model": "<MODEL>",
        "messages": [
          {
            "role": "system",
            "content": "You are a helpful assistant named Hermes, made by Nous Research."
          },
          {
            "role": "user",
            "content": "Who won the world series in 2020?"
          },
          {
            "role": "assistant",
            "content": "The Los Angeles Dodgers won the World Series in 2020."
          },
          {
            "role": "user",
            "content": "Where was it played?"
          }
        ]
      }' | jq .
 You should see output similar to:


 {
  "id": "chatcmpl-cbb10ffe2bf24c81a37d86204a3ec835",
  "object": "chat.completion",
  "created": 1733448149,
  "model": "hermes3-8b",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The 2020 World Series was played at Globe Life Field in Arlington, Texas, due to the COVID-19 pandemic restrictions. All games were played at this neutral site to minimize travel and potential exposure to the virus."
      },
      "finish_reason": "stop",
      "content_filter_results": {
        "hate": {
          "filtered": false
        },
        "self_harm": {
          "filtered": false
        },
        "sexual": {
          "filtered": false
        },
        "violence": {
          "filtered": false
        },
        "jailbreak": {
          "filtered": false,
          "detected": false
        },
        "profanity": {
          "filtered": false,
          "detected": false
        }
      }
    }
  ],
  "usage": {
    "prompt_tokens": 65,
    "completion_tokens": 45,
    "total_tokens": 110,
    "prompt_tokens_details": null,
    "completion_tokens_details": null
  },
  "system_fingerprint": ""
 }

 Creating completions#
 The /completions endpoint takes a single text string (a prompt) as input, then outputs a response. In comparison, the /chat/completions endpoint takes a list of messages as input.

 To use the /completions endpoint:


 Curl
 Python
 Run:


 curl -sS https://api.lambdalabs.com/v1/completions \
  -H "Authorization: Bearer <API-KEY>" \
  -H "Content-Type: application/json" \
  -d '{
        "model": "<MODEL>",
        "prompt": "Computers are",
        "temperature": 0
      }' | jq .
 You should see output similar to:


 {
  "id": "chatcmpl-8e46443e199a446ea8a49ed124cad61b",
  "object": "text_completion",
  "created": 1733448483,
  "model": "hermes3-8b",
  "choices": [
    {
      "text": "1. Electronic devices that process data and perform a wide range of tasks\n2. Calculating machines used for complex mathematical operations\n3. Devices that can store and retrieve information\n4. Tools that enhance communication through email, instant messaging, and video conferencing\n5. Platforms for creating and sharing multimedia content, such as videos, photos, and music\n6. Essential tools for businesses and organizations in managing operations, financial transactions, and customer relations\n7. Systems used in scientific research and data analysis\n8. Devices that can be programmed to perform specific tasks and solve problems\n9. Networked tools that enable collaboration and resource sharing among users\n10. Powerful machines capable of performing complex computations, simulations, and artificial intelligence tasks.",
      "index": 0,
      "finish_reason": "stop",
      "logprobs": {
        "tokens": null,
        "token_logprobs": null,
        "top_logprobs": null,
        "text_offset": null
      }
    }
  ],
  "usage": {
    "prompt_tokens": 23,
    "completion_tokens": 149,
    "total_tokens": 172,
    "prompt_tokens_details": null,
    "completion_tokens_details": null
  }
 }

 Listing models#
 The /models endpoint lists the models available for use through the Lambda Inference API.

 To use the /models endpoint:


 Curl
 Python
 Run:


 curl https://api.lambdalabs.com/v1/models -H "Authorization: Bearer <API-KEY>" | jq .
 You should see output similar to:


 {
 "object": "list",
 "data": [
   {
     "id": "hermes3-405b",
     "object": "model",
     "created": 1724347380,
     "owned_by": "lambda"
   },
   {
     "id": "hermes3-70b",
     "object": "model",
     "created": 1724347380,
     "owned_by": "lambda"
   },
   {
     "id": "hermes3-8b",
     "object": "model",
     "created": 1724347380,
     "owned_by": "lambda"
   },
   {
     "id": "lfm-40b",
     "object": "model",
     "created": 1724347380,
     "owned_by": "lambda"
   },
   {
     "id": "llama3.1-405b-instruct-fp8",
     "object": "model",
     "created": 1724347380,
     "owned_by": "lambda"
   },

   […]

   {
     "id": "qwen25-coder-32b-instruct",
     "object": "model",
     "created": 1724347380,
     "owned_by": "lambda"
   }
  ]
 }

 Back to top
	Using the Lambda Inference API#
	The Lambda Inference API enables you to use the Llama 3.1 405B Instruct large language model (LLM), and fine-tuned versions such as Nous Research's Hermes 3 and Liquid AI's LFM 40.3B MoE (Mixture of Experts), without the need to set up your own vLLM API server on an on-demand instance or 1-Click Cluster (1CC).

	Tip

	Try Lambda Chat!

	Also try Companion, powered by the Lambda Inference API.

	Contact us to learn more about our:

	Public inference endpoints
	Private inference endpoints
	Since the Lambda Inference API is compatible with the OpenAI API, you can use it as a drop-in replacement for applications currently using the OpenAI API. See, for example, our guide on integrating the Lambda Inference API into VS Code.

	The Lambda Inference API implements endpoints for:

	Creating chat completions (/chat/completions)
	Creating completions (/completions)
	Listing models (/models)
	Currently, the following models are available:

	deepseek-coder-v2-lite-instruct
	dracarys2-72b-instruct
	hermes3-405b
	hermes3-405b-fp8-128k
	hermes3-70b
	hermes3-8b
	lfm-40b
	llama3.1-405b-instruct-fp8
	llama3.1-70b-instruct-fp8
	llama3.1-8b-instruct
	llama3.2-3b-instruct
	llama3.1-nemotron-70b-instruct
	Note

	If a request using the hermes3-405b model is made with a context length greater than 18K, the request will fall back to using the hermes-3-llama-3.1-405b-fp8-128k model.

	To use the Lambda Inference API, first generate a Cloud API key from the dashboard. You can also use a Cloud API key that you've already generated.

	In the examples below:

	Replace <MODEL> with one of the models listed above.
	Replace <API-KEY> with your actual Cloud API key.
	Creating chat completions#
	The /chat/completions endpoint takes a list of messages that make up a conversation, then outputs a response.


	Curl
	Python
	Run:


	curl -sS https://api.lambdalabs.com/v1/chat/completions \
	-H "Authorization: Bearer <API-KEY>" \
	-H "Content-Type: application/json" \
	-d '{
	"model": "<MODEL>",
	"messages": [
	{
	"role": "system",
	"content": "You are a helpful assistant named Hermes, made by Nous Research."
	},
	{
	"role": "user",
	"content": "Who won the world series in 2020?"
	},
	{
	"role": "assistant",
	"content": "The Los Angeles Dodgers won the World Series in 2020."
	},
	{
	"role": "user",
	"content": "Where was it played?"
	}
	]
	}' \| jq .
	You should see output similar to:


	{
	"id": "chatcmpl-cbb10ffe2bf24c81a37d86204a3ec835",
	"object": "chat.completion",
	"created": 1733448149,
	"model": "hermes3-8b",
	"choices": [
	{
	"index": 0,
	"message": {
	"role": "assistant",
	"content": "The 2020 World Series was played at Globe Life Field in Arlington, Texas, due to the COVID-19 pandemic restrictions. All games were played at this neutral site to minimize travel and potential exposure to the virus."
	},
	"finish_reason": "stop",
	"content_filter_results": {
	"hate": {
	"filtered": false
	},
	"self_harm": {
	"filtered": false
	},
	"sexual": {
	"filtered": false
	},
	"violence": {
	"filtered": false
	},
	"jailbreak": {
	"filtered": false,
	"detected": false
	},
	"profanity": {
	"filtered": false,
	"detected": false
	}
	}
	}
	],
	"usage": {
	"prompt_tokens": 65,
	"completion_tokens": 45,
	"total_tokens": 110,
	"prompt_tokens_details": null,
	"completion_tokens_details": null
	},
	"system_fingerprint": ""
	}

	Creating completions#
	The /completions endpoint takes a single text string (a prompt) as input, then outputs a response. In comparison, the /chat/completions endpoint takes a list of messages as input.

	To use the /completions endpoint:


	Curl
	Python
	Run:


	curl -sS https://api.lambdalabs.com/v1/completions \
	-H "Authorization: Bearer <API-KEY>" \
	-H "Content-Type: application/json" \
	-d '{
	"model": "<MODEL>",
	"prompt": "Computers are",
	"temperature": 0
	}' \| jq .
	You should see output similar to:


	{
	"id": "chatcmpl-8e46443e199a446ea8a49ed124cad61b",
	"object": "text_completion",
	"created": 1733448483,
	"model": "hermes3-8b",
	"choices": [
	{
	"text": "1. Electronic devices that process data and perform a wide range of tasks\n2. Calculating machines used for complex mathematical operations\n3. Devices that can store and retrieve information\n4. Tools that enhance communication through email, instant messaging, and video conferencing\n5. Platforms for creating and sharing multimedia content, such as videos, photos, and music\n6. Essential tools for businesses and organizations in managing operations, financial transactions, and customer relations\n7. Systems used in scientific research and data analysis\n8. Devices that can be programmed to perform specific tasks and solve problems\n9. Networked tools that enable collaboration and resource sharing among users\n10. Powerful machines capable of performing complex computations, simulations, and artificial intelligence tasks.",
	"index": 0,
	"finish_reason": "stop",
	"logprobs": {
	"tokens": null,
	"token_logprobs": null,
	"top_logprobs": null,
	"text_offset": null
	}
	}
	],
	"usage": {
	"prompt_tokens": 23,
	"completion_tokens": 149,
	"total_tokens": 172,
	"prompt_tokens_details": null,
	"completion_tokens_details": null
	}
	}

	Listing models#
	The /models endpoint lists the models available for use through the Lambda Inference API.

	To use the /models endpoint:


	Curl
	Python
	Run:


	curl https://api.lambdalabs.com/v1/models -H "Authorization: Bearer <API-KEY>" \| jq .
	You should see output similar to:


	{
	"object": "list",
	"data": [
	{
	"id": "hermes3-405b",
	"object": "model",
	"created": 1724347380,
	"owned_by": "lambda"
	},
	{
	"id": "hermes3-70b",
	"object": "model",
	"created": 1724347380,
	"owned_by": "lambda"
	},
	{
	"id": "hermes3-8b",
	"object": "model",
	"created": 1724347380,
	"owned_by": "lambda"
	},
	{
	"id": "lfm-40b",
	"object": "model",
	"created": 1724347380,
	"owned_by": "lambda"
	},
	{
	"id": "llama3.1-405b-instruct-fp8",
	"object": "model",
	"created": 1724347380,
	"owned_by": "lambda"
	},

	[…]

	{
	"id": "qwen25-coder-32b-instruct",
	"object": "model",
	"created": 1724347380,
	"owned_by": "lambda"
	}
	]
	}

	Back to top
No results found