Adds custom inference service API docs #4852

szabosteve · 2025-07-09T12:19:30Z

Overview

Related issue: https://github.com/elastic/developer-docs-team/issues/307

This PR adds documentation about the custom inference service.

@jonathan-buttner Could you please provide an example request that I can add to the docs?

szabosteve · 2025-07-09T12:22:21Z

specification/inference/_types/CommonTypes.ts

+  /** 
+   * Specifies the JSON parser that is used to parse the response from the custom service.
+   * Different task types require different json_parser parameters.
+   * For example:


@jonathan-buttner Do you think we should specify a JsonParser class for each task type, or is this list sufficient?

Hmm I think it might be better if we give an example of the response structure for each task type and explain how to create the parser from that.

We should also say that the format is a less featured version of JSONPath: https://en.wikipedia.org/wiki/JSONPath

Here are some examples:

Text Embeddings

For a response that looks like:

{ "object": "list", "data": [ { "object": "embedding", "index": 0, "embedding": [ 0.014539449, -0.015288644 ] } ], "model": "text-embedding-ada-002-v2", "usage": { "prompt_tokens": 8, "total_tokens": 8 } }

We'd need this definition:

"response": { "json_parser": { "text_embeddings": "$.data[*].embedding[*]" } }

Rerank

For a response that looks like:

{ "results": [ { "index": 3, "relevance_score": 0.999071, "document": "abc" }, { "index": 4, "relevance_score": 0.7867867, "document": "123" }, { "index": 0, "relevance_score": 0.32713068, "document": "super" } ] }

We'd need this definition:

"response": { "json_parser": { "reranked_index":"$.results[*].index", "relevance_score":"$.results[*].relevance_score", "document_text":"$.results[*].document" } }

reranked_index and document_text are optional.

Completion

For a response that looks like:

{ "id": "chatcmpl-B9MBs8CjcvOU2jLn4n570S5qMJKcT", "object": "chat.completion", "created": 1741569952, "model": "gpt-4.1-2025-04-14", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "Hello! How can I assist you today?", "refusal": null, "annotations": [] }, "logprobs": null, "finish_reason": "stop" } ] }

We'd need this definition:

"response": { "json_parser": { "completion_result":"$.choices[*].message.content" } }

Sparse embedding

For a response that looks like:

{ "request_id": "75C50B5B-E79E-4930-****-F48DBB392231", "latency": 22, "usage": { "token_count": 11 }, "result": { "sparse_embeddings": [ { "index": 0, "embedding": [ { "token_id": 6, "weight": 0.101 }, { "token_id": 163040, "weight": 0.28417 } ] } ] } }

We'd need this definition:

"response": { "json_parser": { "token_path": "$.result.sparse_embeddings[*].embedding[*].token_id", "weight_path": "$.result.sparse_embeddings[*].embedding[*].weight" } }

If the token_path resulting value (token_id in this example) refers to a non-string (an integer in this example), it'll be converted to a string using Java's .toString() method. Not sure how we want to articulate that though 🤔

specification/inference/_types/CommonTypes.ts

specification/inference/put_custom/PutCustomRequest.ts

…ticsearch-specification into szabosteve/infer-put-custom

github-actions · 2025-07-09T14:38:23Z

Following you can find the validation changes against the target branch for the APIs.

No changes detected.

You can validate these APIs yourself by using the make validate target.

…ticsearch-specification into szabosteve/infer-put-custom

jonathan-buttner · 2025-07-09T20:35:38Z

Here are some examples:

OpenAI Text Embedding

PUT _inference/text_embedding/test
{
    "service": "custom",
    "service_settings": {
        "secret_parameters": {
            "api_key": "<api key>"
        },
        "url": "https://api.openai.com/v1/embeddings",
        "headers": {
            "Authorization": "Bearer ${api_key}",
            "Content-Type": "application/json;charset=utf-8"
        },
        "request": "{\"input\": ${input}, \"model\": \"text-embedding-3-small\"}",
        "response": {
            "json_parser": {
                "text_embeddings": "$.data[*].embedding[*]"
            }
        }
    }
}

Cohere APIv2 Rerank

PUT _inference/rerank/test-rerank
{
    "service": "custom",
    "service_settings": {
        "secret_parameters": {
            "api_key": "<api key>"
        },
        "url": "https://api.cohere.com/v2/rerank",
        "headers": {
            "Authorization": "bearer ${api_key}",
            "Content-Type": "application/json"
        },
        "request": "{\"documents\": ${input}, \"query\": ${query}, \"model\": \"rerank-v3.5\"}",
        "response": {
            "json_parser": {
                "reranked_index":"$.results[*].index",
                "relevance_score":"$.results[*].relevance_score"
            }
        }
    }
}

Cohere APIv2 Text Embedding

PUT _inference/text_embedding/test-text-embedding
{
    "service": "custom",
    "service_settings": {
        "secret_parameters": {
            "api_key": "<api key>"
        },
        "url": "https://api.cohere.com/v2/embed",
        "headers": {
            "Authorization": "bearer ${api_key}",
            "Content-Type": "application/json"
        },
        "request": "{\"texts\": ${input}, \"model\": \"embed-v4.0\", \"input_type\": ${input_type}}",
        "response": {
            "json_parser": {
                "text_embeddings":"$.embeddings.float[*]"
            }
        },
        "input_type": {
            "translation": {
                "ingest": "search_document",
                "search": "search_query"
            },
            "default": "search_document"
        }
    }
}

Jina AI Rerank

PUT _inference/rerank/jina
{
  "service": "custom",
  "service_settings": {
    "secret_parameters": {
      "api_key": "<api key>"
    },    
    "url": "https://api.jina.ai/v1/rerank",
    "headers": {
      "Content-Type": "application/json",
      "Authorization": "Bearer ${api_key}"
    },
    "request": "{\"model\": \"jina-reranker-v2-base-multilingual\",\"query\": ${query},\"documents\":${input}}",
    "response": {
      "json_parser": {
        "relevance_score": "$.results[*].relevance_score",
        "reranked_index": "$.results[*].index"
      }
    }
  }
}

Hugging Face Text Embedding for model Qwen/Qwen3-Embedding-8B (other will be very similar)

PUT _inference/text_embedding/test-text-embedding
{
    "service": "custom",
    "service_settings": {
        "secret_parameters": {
            "api_key": "<api key>"
        },
        "url": "<dedicated inference endpoint on HF>/v1/embeddings",
        "headers": {
            "Authorization": "Bearer ${api_key}",
            "Content-Type": "application/json"
        },
        "request": "{\"input\": ${input}}",
        "response": {
            "json_parser": {
                "text_embeddings":"$.data[*].embedding[*]"
            }
        }
    }
}

TODO

VoyageAI
Hugging Face Rerank
Google VertexAI
Azure

jonathan-buttner

Great work! We'll want to add a blurb about how the custom service performs template replacement.

The template replacement functionality allows templates (portions of a string that start with ${ and end with }) to be replaced with the contents of a value that defines that key.

We look in secret_parameters and task_settings for keys to do template replacement.

We replace templates in the fields request, headers, url, and query_parameters.

If we fail to find the definition (key) for a template we emit an error.

So for example if we had the endpoint definition like this:

PUT _inference/text_embedding/test-text-embedding
{
    "service": "custom",
    "service_settings": {
        "secret_parameters": {
            "api_key": "<some api key>"
        },
        "url": "...endpoints.huggingface.cloud/v1/embeddings",
        "headers": {
            "Authorization": "Bearer ${api_key}",
            "Content-Type": "application/json"
        },
        "request": "{\"input\": ${input}}",
        "response": {
            "json_parser": {
                "text_embeddings":"$.data[*].embedding[*]"
            }
        }
    }
}

We'll look to replace ${api_key} from secret_parameters and task_settings. We should also make a note explicitly that the templates should not be surrounded by quotes (we add the quotes internally).

There are a few "special" templates:

${input} this refers to the array of input strings that comes from the input field of the subsequent inference requests
${input_type} this refers to the input type translation values (I explain this below)
${query} this refers to the query field used specifically for rerank
${top_n} this refers to the top_n` field available when performing rerank requests
${return_documents} this refers to the return_documents` field available when performing rerank requests

specification/inference/_types/CommonTypes.ts

jonathan-buttner · 2025-07-10T20:08:08Z

specification/inference/_types/CommonTypes.ts

+  /** 
+   * Specifies the JSON parser that is used to parse the response from the custom service.
+   * Different task types require different json_parser parameters.
+   * For example:


Hmm I think it might be better if we give an example of the response structure for each task type and explain how to create the parser from that.

We should also say that the format is a less featured version of JSONPath: https://en.wikipedia.org/wiki/JSONPath

Here are some examples:

Text Embeddings

For a response that looks like:

{ "object": "list", "data": [ { "object": "embedding", "index": 0, "embedding": [ 0.014539449, -0.015288644 ] } ], "model": "text-embedding-ada-002-v2", "usage": { "prompt_tokens": 8, "total_tokens": 8 } }

We'd need this definition:

"response": { "json_parser": { "text_embeddings": "$.data[*].embedding[*]" } }

Rerank

For a response that looks like:

{ "results": [ { "index": 3, "relevance_score": 0.999071, "document": "abc" }, { "index": 4, "relevance_score": 0.7867867, "document": "123" }, { "index": 0, "relevance_score": 0.32713068, "document": "super" } ] }

We'd need this definition:

"response": { "json_parser": { "reranked_index":"$.results[*].index", "relevance_score":"$.results[*].relevance_score", "document_text":"$.results[*].document" } }

reranked_index and document_text are optional.

Completion

For a response that looks like:

{ "id": "chatcmpl-B9MBs8CjcvOU2jLn4n570S5qMJKcT", "object": "chat.completion", "created": 1741569952, "model": "gpt-4.1-2025-04-14", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "Hello! How can I assist you today?", "refusal": null, "annotations": [] }, "logprobs": null, "finish_reason": "stop" } ] }

We'd need this definition:

"response": { "json_parser": { "completion_result":"$.choices[*].message.content" } }

Sparse embedding

For a response that looks like:

{ "request_id": "75C50B5B-E79E-4930-****-F48DBB392231", "latency": 22, "usage": { "token_count": 11 }, "result": { "sparse_embeddings": [ { "index": 0, "embedding": [ { "token_id": 6, "weight": 0.101 }, { "token_id": 163040, "weight": 0.28417 } ] } ] } }

We'd need this definition:

"response": { "json_parser": { "token_path": "$.result.sparse_embeddings[*].embedding[*].token_id", "weight_path": "$.result.sparse_embeddings[*].embedding[*].weight" } }

If the token_path resulting value (token_id in this example) refers to a non-string (an integer in this example), it'll be converted to a string using Java's .toString() method. Not sure how we want to articulate that though 🤔

specification/inference/_types/CommonTypes.ts

specification/inference/put_custom/PutCustomRequest.ts

specification/inference/_types/CommonTypes.ts

jonathan-buttner

Looks good, left a few more suggestions

specification/inference/_types/CommonTypes.ts

specification/inference/put_custom/PutCustomRequest.ts

specification/inference/_types/CommonTypes.ts

github-actions · 2025-07-21T15:21:00Z

The backport to 8.19 failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-8.19 8.19
# Navigate to the new working tree
cd .worktrees/backport-8.19
# Create a new branch
git switch --create backport-4852-to-8.19
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 af6a0b3be628d5fe85622b01ac9176205e576a8f
# Push it to GitHub
git push --set-upstream origin backport-4852-to-8.19
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-8.19

Then, create a pull request where the base branch is 8.19 and the compare/head branch is backport-4852-to-8.19.

* Adds custom inference service docs. * Adds response documentation. * Adds request params docs. * Fixes code style. * Fixes data type. * Adds json_spec. * Fixes typo. * Adds doc_id to the table.csv file. * Makes it prettier. * Adds examples. * Format fix. * Addresses feedback. * Adds more parameters and explanations. * Completes json_parser. * Addresses feedback. * Format fix. * Addresses more feedback. (cherry picked from commit af6a0b3)

* Adds custom inference service docs. * Adds response documentation. * Adds request params docs. * Fixes code style. * Fixes data type. * Adds json_spec. * Fixes typo. * Adds doc_id to the table.csv file. * Makes it prettier. * Adds examples. * Format fix. * Addresses feedback. * Adds more parameters and explanations. * Completes json_parser. * Addresses feedback. * Format fix. * Addresses more feedback. (cherry picked from commit af6a0b3) Co-authored-by: István Zoltán Szabó <[email protected]>

* Adds custom inference service docs. * Adds response documentation. * Adds request params docs. * Fixes code style. * Fixes data type. * Adds json_spec. * Fixes typo. * Adds doc_id to the table.csv file. * Makes it prettier. * Adds examples. * Format fix. * Addresses feedback. * Adds more parameters and explanations. * Completes json_parser. * Addresses feedback. * Format fix. * Addresses more feedback.

szabosteve added 3 commits July 9, 2025 10:58

Adds custom inference service docs.

65cb119

Adds response documentation.

ac77396

Adds request params docs.

389ce57

szabosteve requested a review from jonathan-buttner July 9, 2025 12:19

szabosteve added specification documentation ml backport 8.19 backport 9.1 labels Jul 9, 2025

szabosteve commented Jul 9, 2025

View reviewed changes

specification/inference/_types/CommonTypes.ts Show resolved Hide resolved

szabosteve commented Jul 9, 2025

View reviewed changes

specification/inference/put_custom/PutCustomRequest.ts Outdated Show resolved Hide resolved

szabosteve commented Jul 9, 2025

View reviewed changes

specification/inference/put_custom/PutCustomRequest.ts Show resolved Hide resolved

szabosteve added 7 commits July 9, 2025 14:34

Merge branch 'main' into szabosteve/infer-put-custom

a9a560e

Fixes code style.

3c4eb3f

Merge branch 'szabosteve/infer-put-custom' of github.com:elastic/elas…

3d90b42

…ticsearch-specification into szabosteve/infer-put-custom

Fixes data type.

bc66328

Adds json_spec.

1b3fe33

Fixes typo.

fab50c4

Adds doc_id to the table.csv file.

e185149

szabosteve added 3 commits July 9, 2025 17:47

Merge branch 'main' into szabosteve/infer-put-custom

859713d

Makes it prettier.

8051083

Merge branch 'szabosteve/infer-put-custom' of github.com:elastic/elas…

61c6a98

…ticsearch-specification into szabosteve/infer-put-custom

szabosteve added 3 commits July 10, 2025 13:38

Adds examples.

e0963ae

Merge branch 'main' into szabosteve/infer-put-custom

bd8ec2b

Format fix.

d568234

jonathan-buttner reviewed Jul 10, 2025

View reviewed changes

Addresses feedback.

b4c99e6

szabosteve added 5 commits July 21, 2025 12:28

Adds more parameters and explanations.

9dfe0ef

Completes json_parser.

3961f12

Addresses feedback.

a0baab6

Format fix.

d63c4d7

Merge branch 'main' into szabosteve/infer-put-custom

294a143

szabosteve marked this pull request as ready for review July 21, 2025 12:53

szabosteve requested a review from a team as a code owner July 21, 2025 12:53

jonathan-buttner reviewed Jul 21, 2025

View reviewed changes

specification/inference/_types/CommonTypes.ts Show resolved Hide resolved

specification/inference/_types/CommonTypes.ts Outdated Show resolved Hide resolved

specification/inference/put_custom/PutCustomRequest.ts Outdated Show resolved Hide resolved

jonathan-buttner reviewed Jul 21, 2025

View reviewed changes

specification/inference/_types/CommonTypes.ts Outdated Show resolved Hide resolved

szabosteve added 2 commits July 21, 2025 17:15

Addresses more feedback.

52c19aa

Merge branch 'main' into szabosteve/infer-put-custom

f45f46f

szabosteve requested a review from jonathan-buttner July 21, 2025 15:15

jonathan-buttner approved these changes Jul 21, 2025

View reviewed changes

szabosteve merged commit af6a0b3 into main Jul 21, 2025
8 checks passed

szabosteve deleted the szabosteve/infer-put-custom branch July 21, 2025 15:19

github-actions bot mentioned this pull request Jul 21, 2025

[Backport 9.1] Adds custom inference service API docs #4994

Merged

szabosteve mentioned this pull request Jul 21, 2025

[8.19]Adds custom inference service API docs (#4852) #4997

Merged

pquentin mentioned this pull request Jul 22, 2025

Add inference.put_custom rest-api-spec elastic/elasticsearch#131660

Merged

Adds custom inference service API docs #4852

Adds custom inference service API docs #4852

Uh oh!

Conversation

szabosteve commented Jul 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Uh oh!

szabosteve Jul 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jonathan-buttner Jul 10, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Jul 9, 2025

Uh oh!

jonathan-buttner commented Jul 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jonathan-buttner left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jonathan-buttner Jul 10, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jonathan-buttner left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Jul 21, 2025

Uh oh!

Uh oh!

szabosteve commented Jul 9, 2025 •

edited

Loading

szabosteve Jul 9, 2025 •

edited

Loading

jonathan-buttner commented Jul 9, 2025 •

edited

Loading