Search + K

Command Palette

Search for a command to run...

Sign In

Create an Azure OpenAI inference endpoint

PUT /_inference/{task_type}/{azureopenai_inference_id}
Copy endpoint

Create an inference endpoint to perform an inference task with the azureopenai service.

The list of chat completion models that you can choose from in your Azure OpenAI deployment include:

The list of embeddings models that you can choose from in your deployment can be found in the Azure models documentation.

Required authorization

  • Cluster privileges: manage_inference

Parameters

path Path Parameters

Name Type
task_type required

The type of the inference task that the model will perform. NOTE: The chat_completion task type only supports streaming and only through the _stream API.

type InferenceTypesAzureOpenAITaskType = "completion" | "chat_completion" | "text_embedding"
azureopenai_inference_id required

The unique identifier of the inference endpoint.

type TypesId = string

query Query Parameters

Name Type
timeout

Specifies the amount of time to wait for the inference endpoint to be created.

type TypesDuration = string | "-1" | "0"

Request Body

application/json required
{
chunking_settings?: InferenceTypesInferenceChunkingSettings

Chunking configuration object

interface InferenceTypesInferenceChunkingSettings {
max_chunk_size?: number;
overlap?: number;
sentence_overlap?: number;
separator_group?: string;
separators?: string[];
strategy?: string;
}
;
service: InferenceTypesAzureOpenAIServiceType
type InferenceTypesAzureOpenAIServiceType = "azureopenai"
;
service_settings: InferenceTypesAzureOpenAIServiceSettings
interface InferenceTypesAzureOpenAIServiceSettings {
api_key?: string;
api_version: string;
deployment_id: string;
entra_id?: string;
rate_limit?: InferenceTypesRateLimitSetting;
resource_name: string;
}
;
task_settings?: InferenceTypesAzureOpenAITaskSettings
interface InferenceTypesAzureOpenAITaskSettings {
user?: string;
}
;
}

Responses

200 application/json
type InferenceTypesInferenceEndpointInfoAzureOpenAI = interface InferenceTypesInferenceEndpoint {
chunking_settings?: InferenceTypesInferenceChunkingSettings

Chunking configuration object

interface InferenceTypesInferenceChunkingSettings {
max_chunk_size?: number;
overlap?: number;
sentence_overlap?: number;
separator_group?: string;
separators?: string[];
strategy?: string;
}
;
service: string;
service_settings: InferenceTypesServiceSettings
interface InferenceTypesServiceSettings {}
;
task_settings?: InferenceTypesTaskSettings
interface InferenceTypesTaskSettings {}
;
}
& { inference_id: string;task_type: InferenceTypesTaskTypeAzureOpenAI
type InferenceTypesTaskTypeAzureOpenAI = "text_embedding" | "completion" | "chat_completion"
; }