Skip to content

[9.1] Adds new 'none' and 'recursive' chunking strategies to Inference APIs #4841

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jul 22, 2025

Conversation

kosabogi
Copy link
Contributor

@kosabogi kosabogi commented Jul 8, 2025

This PR adds the new 'none' and 'recursive' chunking strategies to Inference APIs.
Related issue: https://github.com/elastic/developer-docs-team/issues/308

Copy link
Contributor

github-actions bot commented Jul 8, 2025

Following you can find the validation changes against the target branch for the APIs.

No changes detected.

You can validate these APIs yourself by using the make validate target.

@kosabogi kosabogi requested a review from leemthompo July 8, 2025 11:17
@kosabogi kosabogi changed the title Adds new 'none' and 'recursive' chunking strategies to Inference APIs [9.1] Adds new 'none' and 'recursive' chunking strategies to Inference APIs Jul 8, 2025
Copy link
Contributor

@leemthompo leemthompo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Not sure if we could use an @ext_doc_id here? We do now have the ability to customize the link text so it doesn't just read External documentation.

@kosabogi kosabogi marked this pull request as draft July 11, 2025 08:43
@kosabogi
Copy link
Contributor Author

LGTM

Not sure if we could use an @ext_doc_id here? We do now have the ability to customize the link text so it doesn't just read External documentation.

Yes, I just learned that we can add external doc links for parameters too. It's on my list to update this PR along with other chunking reference updates. Thanks so much for your review!

@kosabogi kosabogi marked this pull request as ready for review July 14, 2025 07:39
@@ -85,6 +85,7 @@ ccr-put-follow,https://www.elastic.co/docs/api/doc/elasticsearch/operation/opera
ccr-resume-auto-follow-pattern,https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-ccr-resume-auto-follow-pattern,https://www.elastic.co/guide/en/elasticsearch/reference/8.18/ccr-resume-auto-follow-pattern.html,
ccs-network-delays,https://www.elastic.co/docs/solutions/search/cross-cluster-search#ccs-network-delays,,
ccs-privileges,https://www.elastic.co/docs/deploy-manage/remote-clusters/remote-clusters-cert#remote-clusters-privileges-ccs,,
chunking-strategies,https://www.elastic.co/docs/explore-analyze/elastic-inference/inference-api#chunking-strategies,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to add a Description field here to avoid the default link text?

Chunking strategies probably works

Sorry if you've already planned to do this :)

Copy link
Contributor Author

@kosabogi kosabogi Jul 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried this a few times, but it didn’t work, unfortunately.

I added a description to the table.csv file like this:

chunking-strategies,https://www.elastic.co/docs/explore-analyze/elastic-inference/inference-api#chunking-strategies,https://www.elastic.co/guide/en/kibana/8.18/inference-endpoints.html,Chunking strategies documentation

However, I didn’t see any results in the elasticsearch-openapi.json file, neither the previous URL nor the link description:

"strategy": {
  "externalDocs": {
    "url": "https://www.elastic.co/docs/explore-analyze/elastic-inference/inference-api#chunking-strategies"
  },
  "description": "The chunking strategy: `sentence`, `word`, `none` or `recursive`.\n\n * If `strategy` is set to `recursive`, you must also specify:\n\n- `max_chunk_size`\n- either `separators` or `separator_group`\n\nLearn more about different chunking strategies in the External documentation.",
  "default": "sentence",
  "type": "string"
} 

To understand what’s going on, I tested the exact same case that @lcawl referenced here:
#4772 (comment)

t worked perfectly for me too. The OpenAPI file was updated exactly as Lisa described in her comment.

My only guess is that the example Lisa tested wasn't for a parameter-level externalDocs link, but rather for a general API description, which might be treated differently by the tooling?
Could that explain the difference? Any ideas? Or am I missing something here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

images

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only guess is that the example Lisa tested wasn't for a parameter-level externalDocs link, but rather for a general API description, which might be treated differently by the tooling?
Could that explain the difference? Any ideas? Or am I missing something here?

That might be it, I think you can only have one ext-id per API too IIRC, don't know if there's a previous one defined for this API.

There's as many exceptions as there are rules I fear 😠

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also know it that way that you can have only one ext-id - but here, the ext-id seems to work, what doesn’t work is the unique link text 🤔
@lcawl, do you happen to have any idea why this might not be working?

@kosabogi kosabogi merged commit ba5cf21 into main Jul 22, 2025
8 checks passed
@kosabogi kosabogi deleted the chunking-changes branch July 22, 2025 06:28
github-actions bot pushed a commit that referenced this pull request Jul 22, 2025
…e APIs (#4841)

* Adds new chunking strategies

* Adds external link

* Adds a sentence to point to the link

* Resolves merge conflict

* Update specification/inference/_types/Services.ts

Co-authored-by: Liam Thompson <[email protected]>

---------

Co-authored-by: Liam Thompson <[email protected]>
(cherry picked from commit ba5cf21)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants