Here is some documentation for the OpenAI API compatible endpoints:
Generates text completions for the provided prompt.
Parameters:
-
prompt(required): The prompt to generate completions for, as a string or list of strings. -
model: Unused parameter. To change the model, use the/v1/internal/model/loadendpoint. -
stream: Iftrue, will stream back partial responses as text is generated. -
max_tokens: The maximum number of tokens to generate. -
temperature: Sampling temperature, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. -
top_p: An alternative to sampling with temperature, called nucleus sampling. -
echo: Iftrue, the prompt will be included in the completion. -
stop: Up to 4 sequences where generation will stop if any are matched.
See GenerationOptions in typing.py for other generation parameters.
Returns:
-
id: ID of the completion. -
choices: List containing the generated completions. -
usage: Number of prompt tokens, completion tokens, and total tokens used.
Generates chat message completions based on a provided chat history.
Parameters:
-
messages(required): Chat history as a list of messages withrole(user, assistant) andcontent. -
model: Unused parameter. To change the model, use the/v1/internal/model/loadendpoint. -
stream: Iftrue, will stream back partial responses as text is generated. -
mode:instruct,chat, orchat-instruct. Controls whether assistant is in character. -
instruction_template: Name of instruction template file to use. -
character: Name of character file to use for assistant.
See ChatCompletionRequest in typing.py for other parameters.
Returns:
Same as /v1/completions.
Lists the currently available models.
Gets information about the specified model.
Gets usage statistics for billing purposes.
Parameters:
-
start_date: Start date for usage stats, in YYYY-MM-DD format. -
end_date: End date for usage stats, in YYYY-MM-DD format.
Returns:
total_usage: Total token usage during the specified period.
Transcribes an audio file using Whisper.
Parameters:
-
file(required): The audio file to transcribe. -
language: Language spoken in the audio. -
model: Whisper model to use,tinyorbase.
Returns:
text: Transcription text.
Generates images using Stable Diffusion.
Parameters:
-
prompt(required): The text prompt to generate images for. -
size: Size of images to generate, like512x512. -
n: Number of images to generate.
Returns:
data: List of generated images.
Gets sentence embeddings for the provided input text.
Parameters:
-
input(required): Input text to get embeddings for, as string or list of strings. -
encoding_format:floatorbase64.
Returns:
-
object:list -
data: List of embeddings, one for each input.
Checks input text for harmful content.
Parameters:
input(required): Input text to moderate.
Returns:
results: List of moderation results, one for each input text.
Encodes text into tokens.
Parameters:
text(required): Text to encode.
Decodes tokens into text.
Parameters:
tokens(required): Tokens to decode.
Gets the number of tokens for text.
Parameters:
text(required): Text to get token count for.
Gets information about the currently loaded model.
Loads a new model. Can be used to switch model on the fly.
Parameters:
-
model_name(required): Name of model to load. -
args: Dict of args to pass to model loading. -
settings: Dict of settings to update after loading.
Stops any ongoing text generation.
So in summary, the main endpoints are /v1/completions, /v1/chat/completions, and /v1/models for generating text, chat, and listing models. The /v1/internal endpoints can be used to switch models, encode/decode, and more.
Tried again, this time feeding it the entire repo, and it came up with these:
Here are the API endpoints I identified in the code along with their associated parameters and python file/function references:
POST /v1/completions
Parameters:
prompt (str|List[str]) - Prompt text for text completion. See
completions_common()inextensions/openai/completions.py.stream (bool) - Whether to stream back partial responses. See
completions_common()inextensions/openai/completions.py.POST /v1/chat/completions
Parameters:
messages (List[dict]) - Chat history. See
chat_completions_common()inextensions/openai/completions.py.stream (bool) - Whether to stream back partial responses. See
chat_completions_common()inextensions/openai/completions.py.POST /v1/images/generations
Parameters:
generations()inextensions/openai/images.py.POST /v1/embeddings
Parameters:
embeddings()inextensions/openai/embeddings.py.POST /v1/moderations
Parameters:
moderations()inextensions/openai/moderations.py.GET /v1/models
Lists models. See
list_models()inextensions/openai/models.py.GET /v1/models/{model}
Gets model info. See
model_info_dict()inextensions/openai/models.py.POST /v1/internal/model/load
Loads a model. See
_load_model()inextensions/openai/models.py.POST /v1/internal/encode
Encodes text. See
token_encode()inextensions/openai/tokens.py.POST /v1/internal/decode
Decodes tokens. See
token_decode()inextensions/openai/tokens.py.POST /v1/internal/token-count
Gets token count for text. See
token_count()inextensions/openai/tokens.py.GET /v1/internal/model/info
Gets current model info. See
get_current_model_info()inextensions/openai/models.py.Let me know if you need any clarification or have additional questions!