Create an Azure OpenAI inference endpoint

PUT /_inference/{task_type}/{azureopenai_inference_id}

Copy endpoint

Create an inference endpoint to perform an inference task with the azureopenai service.

The list of chat completion models that you can choose from in your Azure OpenAI deployment include:

The list of embeddings models that you can choose from in your deployment can be found in the Azure models documentation.

Required authorization

Cluster privileges: manage_inference

Parameters

path Path Parameters

Name	Type
`task_type` required The type of the inference task that the model will perform. NOTE: The `chat_completion` task type only supports streaming and only through the _stream API.	type InferenceTypesAzureOpenAITaskType = "completion" \| "chat_completion" \| "text_embedding"
`azureopenai_inference_id` required The unique identifier of the inference endpoint.	type TypesId = string

query Query Parameters

Name	Type
`timeout` Specifies the amount of time to wait for the inference endpoint to be created.	type TypesDuration = string \| "-1" \| "0"

Request Body

application/json required

{
chunking_settings?:

InferenceTypesInferenceChunkingSettings

Chunking configuration object

interface InferenceTypesInferenceChunkingSettings {
max_chunk_size?: number;
overlap?: number;
sentence_overlap?: number;
separator_group?: string;
separators?: string[];
strategy?: string;
}

;
service:

InferenceTypesAzureOpenAIServiceType

type InferenceTypesAzureOpenAIServiceType = "azureopenai"

;
service_settings:

InferenceTypesAzureOpenAIServiceSettings

interface InferenceTypesAzureOpenAIServiceSettings {
api_key?: string;
api_version: string;
deployment_id: string;
entra_id?: string;
rate_limit?: InferenceTypesRateLimitSetting;
resource_name: string;
}

;
task_settings?:

InferenceTypesAzureOpenAITaskSettings

interface InferenceTypesAzureOpenAITaskSettings {
user?: string;
}

;
}

Responses

200 application/json

type InferenceTypesInferenceEndpointInfoAzureOpenAI = interface InferenceTypesInferenceEndpoint {
chunking_settings?:

InferenceTypesInferenceChunkingSettings

Chunking configuration object

interface InferenceTypesInferenceChunkingSettings {
max_chunk_size?: number;
overlap?: number;
sentence_overlap?: number;
separator_group?: string;
separators?: string[];
strategy?: string;
}

;
service: string;
service_settings:

InferenceTypesServiceSettings

interface InferenceTypesServiceSettings {}

;
task_settings?:

InferenceTypesTaskSettings

interface InferenceTypesTaskSettings {}

;
} & { inference_id: string;task_type:

InferenceTypesTaskTypeAzureOpenAI

type InferenceTypesTaskTypeAzureOpenAI = "text_embedding" | "completion" | "chat_completion"

; }