Search + K

Command Palette

Search for a command to run...

Sign In

Create an ELSER inference endpoint

Deprecated
PUT /_inference/{task_type}/{elser_inference_id}
Copy endpoint

Create an inference endpoint to perform an inference task with the elser service. You can also deploy ELSER by using the Elasticsearch inference integration.

info Your Elasticsearch deployment contains a preconfigured ELSER inference endpoint, you only need to create the enpoint using the API if you want to customize the settings.

The API request will automatically download and deploy the ELSER model if it isn't already downloaded.

info You might see a 502 bad gateway error in the response when using the Kibana Console. This error usually just reflects a timeout, while the model downloads in the background. You can check the download progress in the Machine Learning UI. If using the Python client, you can set the timeout parameter to a higher value.

After creating the endpoint, wait for the model deployment to complete before using it. To verify the deployment status, use the get trained model statistics API. Look for "state": "fully_allocated" in the response and ensure that the "allocation_count" matches the "target_allocation_count". Avoid creating multiple endpoints for the same model unless required, as each endpoint consumes significant resources.

Required authorization

  • Cluster privileges: manage_inference

Parameters

path Path Parameters

Name Type
task_type required

The type of the inference task that the model will perform.

type InferenceTypesElserTaskType = "sparse_embedding"
elser_inference_id required

The unique identifier of the inference endpoint.

type TypesId = string

query Query Parameters

Name Type
timeout

Specifies the amount of time to wait for the inference endpoint to be created.

type TypesDuration = string | "-1" | "0"

Request Body

application/json required
{ chunking_settings?: InferenceTypesInferenceChunkingSettings

Chunking configuration object

interface InferenceTypesInferenceChunkingSettings {
max_chunk_size?: number;
overlap?: number;
sentence_overlap?: number;
separator_group?: string;
separators?: string[];
strategy?: string;
}
;service: InferenceTypesElserServiceType
type InferenceTypesElserServiceType = "elser"
;service_settings: InferenceTypesElserServiceSettings
interface InferenceTypesElserServiceSettings {
adaptive_allocations?: InferenceTypesAdaptiveAllocations;
num_allocations: number;
num_threads: number;
}
; }

Responses

200 application/json
type InferenceTypesInferenceEndpointInfoELSER = interface InferenceTypesInferenceEndpoint {
chunking_settings?: InferenceTypesInferenceChunkingSettings

Chunking configuration object

interface InferenceTypesInferenceChunkingSettings {
max_chunk_size?: number;
overlap?: number;
sentence_overlap?: number;
separator_group?: string;
separators?: string[];
strategy?: string;
}
;
service: string;
service_settings: InferenceTypesServiceSettings
interface InferenceTypesServiceSettings {}
;
task_settings?: InferenceTypesTaskSettings
interface InferenceTypesTaskSettings {}
;
}
& { inference_id: string;task_type: InferenceTypesTaskTypeELSER
type InferenceTypesTaskTypeELSER = "sparse_embedding"
; }