Skip to content

Commit 68ebcbe

Browse files
Update inference API specification to include new Llama Service (#5020)
* Update inference API specification to include new Llama Service * Fix typos * Fixed Typo * Update json outputs * Update specification * Update llama specification
1 parent 9492411 commit 68ebcbe

File tree

15 files changed

+1067
-54
lines changed

15 files changed

+1067
-54
lines changed

output/openapi/elasticsearch-openapi.json

Lines changed: 187 additions & 3 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

output/openapi/elasticsearch-serverless-openapi.json

Lines changed: 187 additions & 3 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

output/schema/schema.json

Lines changed: 413 additions & 48 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

output/typescript/types.ts

Lines changed: 34 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

specification/_doc_ids/table.csv

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -374,6 +374,7 @@ inference-api-put-googleaistudio,https://www.elastic.co/docs/api/doc/elasticsear
374374
inference-api-put-googlevertexai,https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-inference-put-googlevertexai,https://www.elastic.co/guide/en/elasticsearch/reference/8.18/infer-service-google-vertex-ai.html,
375375
inference-api-put-huggingface,https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-inference-put-hugging-face,https://www.elastic.co/guide/en/elasticsearch/reference/8.18/infer-service-hugging-face.html,
376376
inference-api-put-jinaai,https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-inference-put-jinaai,,
377+
inference-api-put-llama,https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-inference-put-llama,,
377378
inference-api-put-mistral,https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-inference-put-mistral,https://www.elastic.co/guide/en/elasticsearch/reference/8.18/infer-service-mistral.html,
378379
inference-api-put-openai,https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-inference-put-openai,https://www.elastic.co/guide/en/elasticsearch/reference/8.18/infer-service-openai.html,
379380
inference-api-put-voyageai,https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-inference-put-voyageai,,
@@ -403,6 +404,7 @@ knn-inner-hits,https://www.elastic.co/docs/solutions/search/vector/knn#nested-kn
403404
license-management,https://www.elastic.co/docs/deploy-manage/license/manage-your-license-in-self-managed-cluster,,
404405
list-analytics-collection,https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-search-application-get-behavioral-analytics,https://www.elastic.co/guide/en/elasticsearch/reference/8.18/list-analytics-collection.html,
405406
list-synonyms-sets,https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-synonyms-get-synonyms-sets,https://www.elastic.co/guide/en/elasticsearch/reference/8.18/list-synonyms-sets.html,
407+
llama-api-models,https://llama-stack.readthedocs.io/en/latest/references/llama_cli_reference/download_models.html/,,
406408
logstash-api-delete-pipeline,https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-logstash-delete-pipeline,https://www.elastic.co/guide/en/elasticsearch/reference/8.18/logstash-api-delete-pipeline.html,
407409
logstash-api-get-pipeline,https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-logstash-get-pipeline,https://www.elastic.co/guide/en/elasticsearch/reference/8.18/logstash-api-get-pipeline.html,
408410
logstash-api-put-pipeline,https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-logstash-put-pipeline,https://www.elastic.co/guide/en/elasticsearch/reference/8.18/logstash-api-put-pipeline.html,
Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
{
2+
"inference.put_llama": {
3+
"documentation": {
4+
"url": "https://www.elastic.co/guide/en/elasticsearch/reference/current/infer-service-llama.html",
5+
"description": "Configure a Llama inference endpoint"
6+
},
7+
"stability": "stable",
8+
"visibility": "public",
9+
"headers": {
10+
"accept": ["application/json"],
11+
"content_type": ["application/json"]
12+
},
13+
"url": {
14+
"paths": [
15+
{
16+
"path": "/_inference/{task_type}/{llama_inference_id}",
17+
"methods": ["PUT"],
18+
"parts": {
19+
"task_type": {
20+
"type": "string",
21+
"description": "The task type"
22+
},
23+
"llama_inference_id": {
24+
"type": "string",
25+
"description": "The inference ID"
26+
}
27+
}
28+
}
29+
]
30+
},
31+
"body": {
32+
"description": "The inference endpoint's task and service settings"
33+
}
34+
}
35+
}

specification/inference/_types/CommonTypes.ts

Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1556,6 +1556,54 @@ export enum JinaAITextEmbeddingTask {
15561556
search
15571557
}
15581558

1559+
export class LlamaServiceSettings {
1560+
/**
1561+
* The URL endpoint of the Llama stack endpoint.
1562+
* URL must contain:
1563+
* * For `text_embedding` task - `/v1/inference/embeddings`.
1564+
* * For `completion` and `chat_completion` tasks - `/v1/openai/v1/chat/completions`.
1565+
*/
1566+
url: string
1567+
/**
1568+
* The name of the model to use for the inference task.
1569+
* Refer to the Llama downloading models documentation for different ways of getting a list of available models and downloading them.
1570+
* Service has been tested and confirmed to be working with the following models:
1571+
* * For `text_embedding` task - `all-MiniLM-L6-v2`.
1572+
* * For `completion` and `chat_completion` tasks - `llama3.2:3b`.
1573+
* @ext_doc_id llama-api-models
1574+
*/
1575+
model_id: string
1576+
/**
1577+
* For a `text_embedding` task, the maximum number of tokens per input before chunking occurs.
1578+
*/
1579+
max_input_tokens?: integer
1580+
/**
1581+
* For a `text_embedding` task, the similarity measure. One of cosine, dot_product, l2_norm.
1582+
*/
1583+
similarity?: LlamaSimilarityType
1584+
/**
1585+
* This setting helps to minimize the number of rate limit errors returned from the Llama API.
1586+
* By default, the `llama` service sets the number of requests allowed per minute to 3000.
1587+
*/
1588+
rate_limit?: RateLimitSetting
1589+
}
1590+
1591+
export enum LlamaTaskType {
1592+
text_embedding,
1593+
completion,
1594+
chat_completion
1595+
}
1596+
1597+
export enum LlamaServiceType {
1598+
llama
1599+
}
1600+
1601+
export enum LlamaSimilarityType {
1602+
cosine,
1603+
dot_product,
1604+
l2_norm
1605+
}
1606+
15591607
export class MistralServiceSettings {
15601608
/**
15611609
* A valid API key of your Mistral account.

specification/inference/_types/Services.ts

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,7 @@ import {
3737
TaskTypeGoogleVertexAI,
3838
TaskTypeHuggingFace,
3939
TaskTypeJinaAi,
40+
TaskTypeLlama,
4041
TaskTypeMistral,
4142
TaskTypeOpenAI,
4243
TaskTypeVoyageAI,
@@ -254,6 +255,17 @@ export class InferenceEndpointInfoJinaAi extends InferenceEndpoint {
254255
task_type: TaskTypeJinaAi
255256
}
256257

258+
export class InferenceEndpointInfoLlama extends InferenceEndpoint {
259+
/**
260+
* The inference Id
261+
*/
262+
inference_id: string
263+
/**
264+
* The task type
265+
*/
266+
task_type: TaskTypeLlama
267+
}
268+
257269
export class InferenceEndpointInfoMistral extends InferenceEndpoint {
258270
/**
259271
* The inference Id
@@ -379,6 +391,7 @@ export class RateLimitSetting {
379391
* * `googlevertexai` service: `30000`
380392
* * `hugging_face` service: `3000`
381393
* * `jinaai` service: `2000`
394+
* * `llama` service: `3000`
382395
* * `mistral` service: `240`
383396
* * `openai` service and task type `text_embedding`: `3000`
384397
* * `openai` service and task type `completion`: `500`

specification/inference/_types/TaskType.ts

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -118,6 +118,12 @@ export enum TaskTypeHuggingFace {
118118
text_embedding
119119
}
120120

121+
export enum TaskTypeLlama {
122+
text_embedding,
123+
chat_completion,
124+
completion
125+
}
126+
121127
export enum TaskTypeMistral {
122128
text_embedding,
123129
chat_completion,

specification/inference/put/PutRequest.ts

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,7 @@ import { TaskType } from '@inference/_types/TaskType'
4545
* * Google AI Studio (`completion`, `text_embedding`)
4646
* * Google Vertex AI (`rerank`, `text_embedding`)
4747
* * Hugging Face (`chat_completion`, `completion`, `rerank`, `text_embedding`)
48+
* * Llama (`chat_completion`, `completion`, `text_embedding`)
4849
* * Mistral (`chat_completion`, `completion`, `text_embedding`)
4950
* * OpenAI (`chat_completion`, `completion`, `text_embedding`)
5051
* * VoyageAI (`text_embedding`, `rerank`)

0 commit comments

Comments
 (0)