-
Notifications
You must be signed in to change notification settings - Fork 894
Tool calls #2062
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
Jofthomas
wants to merge
21
commits into
main
Choose a base branch
from
tool_calls
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Tool calls #2062
Changes from 10 commits
Commits
Show all changes
21 commits
Select commit
Hold shift + click to select a range
0c18809
Add files via upload
Jofthomas 9603eda
Add files via upload
Jofthomas 24d127c
Update Function_Calling.md
Jofthomas 929749c
Add files via upload
Jofthomas 7f0cc40
Delete assets/Function_call/Thumbnail.png
Jofthomas 8a5ccec
Edit thumbnail path
Jofthomas 38e4af5
Update and rename Function_Calling.md to tool_calling.md
Jofthomas 77688b4
Delete assets/Function_call directory
Jofthomas 225593b
Update tool_calling.md
Jofthomas 1066532
Add files via upload
Jofthomas 99fc14c
Update _blog.yml
Jofthomas a9f2340
Apply suggestions from omar's review
Jofthomas 617fb45
Merge branch 'main' into tool_calls
Jofthomas 830cf96
update aside tag that is not working
Jofthomas 5333bf9
remove unused imports
Jofthomas 9a2b194
Delete assets/tool_calling/thumbnail.png
Jofthomas 6bab604
Add files via upload
Jofthomas 3d74a79
Delete assets/tool_calling/thumbnail.png
Jofthomas 8579e8a
modify the thumbnail to "tool calling"
Jofthomas 538e160
Linda's recommandations
Jofthomas c79532d
add image
Jofthomas File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,365 @@ | ||
--- | ||
title: "Tool calling with Hugging Face" | ||
thumbnail: /blog/assets/tool_calling/thumbnail.png | ||
authors: | ||
- user: jofthomas | ||
- user: drbh | ||
- user: kkondratenko | ||
guest: true | ||
--- | ||
# Tool Calling in Hugging Face is here ! | ||
Jofthomas marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
## Introduction | ||
|
||
A few weeks ago, we introduced to you the brand new [Messages API](https://huggingface.co/blog/tgi-messages-api) that provided OpenAI compatibility with Text Generation Inference (TGI) and Inference Endpoints. | ||
Jofthomas marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
We wrote at the time that “*The Messages API does not currently support function calling” this is a limitation that has now been lifted !* | ||
Jofthomas marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
Starting with version **1.4.5,** TGI offers an API compatible with the OpenAI Chat Completion API with the addition of the `tools` and the `tools_choice` keys. This change as been propagated in the**`huggingface_hub`** version **0.23.0**, meaning any Hugging Face endpoint can now call some tools if using a newer version | ||
Jofthomas marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
This new feature is available in Inference Endpoints (dedicated and serverless). and we’ll showcase how you can start building your open-source agents right away. | ||
Jofthomas marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
To get you started quickly, we’ve included detailed code examples of how to: | ||
|
||
- Create an Inference Endpoint | ||
- Call tools with the InferenClient | ||
- Use OpenAI’s SDK | ||
- Integration with LangChain and LlamaIndex | ||
Jofthomas marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
## **Create an Inference Endpoint using `huggingface_hub`** | ||
|
||
[Inference Endpoints](https://huggingface.co/docs/inference-endpoints/index) offers a secure, production solution to easily deploy any Transformers model from the Hub on dedicated infrastructure managed by Hugging Face. | ||
|
||
To showcase this newfound power of TGI, we will deploy a 8B instruct tuned model : | ||
Jofthomas marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
[Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) | ||
|
||
We can deploy the model in just [a few clicks from the UI](https://ui.endpoints.huggingface.co/new?vendor=aws&repository=NousResearch%2FNous-Hermes-2-Mixtral-8x7B-DPO&tgi_max_total_tokens=32000&tgi=true&tgi_max_input_length=1024&task=text-generation&instance_size=2xlarge&tgi_max_batch_prefill_tokens=2048&tgi_max_batch_total_tokens=1024000&no_suggested_compute=true&accelerator=gpu®ion=us-east-1), or take advantage of the `huggingface_hub` Python library to programmatically create and manage Inference Endpoints. We demonstrate the use of the Hub library below. | ||
Jofthomas marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
First, we need to specify the endpoint name and model repository, along with the task of text-generation. A protected Inference Endpoint means a valid HF token is required to access the deployed API. We also need to configure the hardware requirements like vendor, region, accelerator, instance type, and size. You can check out the list of available resource options [here](https://api.endpoints.huggingface.cloud/#get-/v2/provider) and view recommended configurations for select models in our catalog [here](https://ui.endpoints.huggingface.co/catalog). | ||
|
||
```python | ||
from huggingface_hub import create_inference_endpoint | ||
|
||
endpoint = create_inference_endpoint( | ||
"llama-3-8b-function-calling", | ||
repository="meta-llama/Meta-Llama-3-8B-Instruct", | ||
framework="pytorch", | ||
task="text-generation", | ||
accelerator="gpu", | ||
vendor="aws", | ||
region="us-east-1", | ||
type="protected", | ||
instance_type="nvidia-a10g", | ||
instance_size="x1", | ||
custom_image={ | ||
"health_route": "/health", | ||
"env": { | ||
"MAX_INPUT_LENGTH": "3500", | ||
"MAX_BATCH_PREFILL_TOKENS": "3500", | ||
"MAX_TOTAL_TOKENS": "4096", | ||
"MAX_BATCH_TOTAL_TOKENS": "4096", | ||
"HUGGING_FACE_HUB_TOKEN":"<HF_TOKEN>", | ||
"MODEL_ID": "/repository", | ||
}, | ||
"url": "ghcr.io/huggingface/text-generation-inference:latest", # use this build or newer | ||
}, | ||
) | ||
|
||
endpoint.wait() | ||
print(endpoint.status) | ||
|
||
``` | ||
|
||
Since the model is gated, it will be very important to replace `<HF_TOKEN>` with your own Huggingface token once you have accepted the terms and condition of Llama-3-8B-Instruct on the [model page](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) | ||
Jofthomas marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
It will take a few minutes for our deployment to spin up. We can utilize the `.wait()` utility to block the running thread until the endpoint reaches a final "running" state. Once running, we can confirm its status and take it for a spin via the UI Playground: | ||
|
||
[ENDPOINT IMAGE] | ||
Jofthomas marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
Great, we now have a working deployment! | ||
|
||
<aside> | ||
💡 By default, your endpoint will scale-to-zero after 15 minutes of idle time without any requests to optimize cost during periods of inactivity. Check out [the Hub Python Library documentation](https://huggingface.co/docs/huggingface_hub/guides/inference_endpoints) to see all the functionality available for managing your endpoint lifecycle. | ||
|
||
</aside> | ||
|
||
## Using Inference Endpoints via OpenAI client libraries | ||
|
||
The added support for messages in TGI makes Inference Endpoints directly compatible with the OpenAI Chat Completion API. This means that any existing scripts that use OpenAI models via the OpenAI client libraries can be directly swapped out to use any open LLM running on a TGI endpoint! | ||
|
||
With this seamless transition, you can immediately take advantage of the numerous benefits offered by open models: | ||
|
||
- Complete control and transparency over models and data | ||
- No more worrying about rate limits | ||
- The ability to fully customize systems according to your specific needs | ||
|
||
Lets see how. | ||
Jofthomas marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
### With the InferencClient from Hugging Face | ||
Jofthomas marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
Function can directly be called with the serverless API, or with any endpoint by with the endpoint url. | ||
Jofthomas marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
```jsx | ||
Jofthomas marked this conversation as resolved.
Show resolved
Hide resolved
|
||
from huggingface_hub import InferenceClient | ||
|
||
# Ask for weather in the next days using tools | ||
#client = InferenceClient("<ENDPOINT_URL>") | ||
#or | ||
client = InferenceClient("meta-llama/Meta-Llama-3-70B-Instruct") | ||
messages = [ | ||
{"role": "system", "content": "Don't make assumptions about what values to plug into functions. Ask for clarification if a user request is ambiguous."}, | ||
{"role": "user", "content": "What's the weather like in Paris, France?"}, | ||
] | ||
tools = [ | ||
{ | ||
"type": "function", | ||
"function": { | ||
"name": "get_current_weather", | ||
"description": "Get the current weather", | ||
"parameters": { | ||
"type": "object", | ||
"properties": { | ||
"location": { | ||
"type": "string", | ||
"description": "The city and state, e.g. San Francisco, CA", | ||
}, | ||
"format": { | ||
"type": "string", | ||
"enum": ["celsius", "fahrenheit"], | ||
"description": "The temperature unit to use. Infer this from the users location.", | ||
}, | ||
}, | ||
"required": ["location", "format"], | ||
}, | ||
}, | ||
}, | ||
|
||
] | ||
response = client.chat_completion( | ||
model="meta-llama/Meta-Llama-3-70B-Instruct", | ||
messages=messages, | ||
tools=tools, | ||
tool_choice="auto", | ||
max_tokens=500, | ||
) | ||
response.choices[0].message.tool_calls[0].function | ||
``` | ||
|
||
```python | ||
ChatCompletionOutputFunctionDefinition(arguments={'format': 'celsius', 'location': 'Paris, France'}, name='get_current_weather', description=None) | ||
``` | ||
|
||
### With the OpenAI Python client | ||
|
||
The example below shows how to make this transition using the [OpenAI Python Library](https://github.com/openai/openai-python). Simply replace the `<ENDPOINT_URL>` with your endpoint URL (be sure to include the `v1/` the suffix) and populate the `<HF_API_TOKEN>` field with a valid Hugging Face user token. | ||
Jofthomas marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
We can then use the client as usual, passing a list of messages to stream responses from our Inference Endpoint. | ||
|
||
```python | ||
from openai import OpenAI | ||
|
||
# initialize the client but point it to TGI | ||
client = OpenAI( | ||
base_url="<ENDPOINT_URL>" + "/v1/", # replace with your endpoint url | ||
api_key="<HF_API_TOKEN>", # replace with your token | ||
) | ||
|
||
tools = [ | ||
{ | ||
"type": "function", | ||
"function": { | ||
"name": "get_current_weather", | ||
"description": "Get the current weather", | ||
"parameters": { | ||
"type": "object", | ||
"properties": { | ||
"location": { | ||
"type": "string", | ||
"description": "The city and state, e.g. San Francisco, CA", | ||
}, | ||
"format": { | ||
"type": "string", | ||
"enum": ["celsius", "fahrenheit"], | ||
"description": "The temperature unit to use. Infer this from the users location.", | ||
}, | ||
}, | ||
"required": ["location", "format"], | ||
}, | ||
}, | ||
} | ||
] | ||
chat_completion = client.chat.completions.create( | ||
model="tgi", | ||
messages=[ | ||
{ | ||
"role": "user", | ||
"content": "What's the weather like in Celsius in San Francisco, CA?", | ||
}, | ||
], | ||
tools=tools, | ||
tool_choice="auto", # tool selected by caller | ||
max_tokens=500, | ||
) | ||
|
||
called = chat_completion.choices[0] | ||
print(called) | ||
``` | ||
|
||
```python | ||
Choice(finish_reason='eos_token', index=0, logprobs=None, message=ChatCompletionMessage(content=None, role='assistant', function_call=None, tool_calls=[ChatCompletionMessageToolCall(id=0, function=Function(arguments={'format': 'celsius', 'location': 'San Francisco, CA'}, name='get_current_weather', description=None), type='function')])) | ||
``` | ||
|
||
Behind the scenes, TGI’s Messages API automatically converts the list of messages into the model’s required instruction format using it’s [chat template](https://huggingface.co/docs/transformers/chat_templating). You can learn more about chat templates on the [documentation](https://huggingface.co/docs/transformers/main/en/chat_templating) or on this [space](https://huggingface.co/spaces/Jofthomas/Chat_template_viewer) ! | ||
Jofthomas marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
<aside> | ||
💡 Be mindful, that specifying the `auto` parameter will always call a function. | ||
Jofthomas marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
</aside> | ||
|
||
## How to use with LangChain | ||
|
||
Now, let’s see how to use functions in the newly created package `langchain_huggingface` | ||
|
||
```python | ||
from langchain_core.pydantic_v1 import BaseModel, Field | ||
from langchain_core.output_parsers.openai_tools import JsonOutputToolsParser | ||
Jofthomas marked this conversation as resolved.
Show resolved
Hide resolved
|
||
from langchain_huggingface.llms import HuggingFaceEndpoint | ||
from langchain_huggingface.chat_models.huggingface import ChatHuggingFace | ||
|
||
llm = HuggingFaceEndpoint( | ||
endpoint_url="https://aac2dhzj35gskpof.us-east-1.aws.endpoints.huggingface.cloud", | ||
task="text-generation", | ||
max_new_tokens=1024, | ||
do_sample=False, | ||
repetition_penalty=1.03, | ||
) | ||
llm_engine_hf = ChatHuggingFace(llm=llm) | ||
|
||
class calculator(BaseModel): | ||
"""Multiply two integers together.""" | ||
a: int = Field(..., description="First integer") | ||
b: int = Field(..., description="Second integer") | ||
|
||
llm_with_multiply = llm_engine_hf.bind_tools([calculator], tool_choice="auto") | ||
tool_chain = llm_with_multiply | ||
tool_chain.invoke("what's 3 * 12") | ||
``` | ||
|
||
```python | ||
AIMessage(content='', additional_kwargs={'tool_calls': [ChatCompletionOutputToolCall(function=ChatCompletionOutputFunctionDefinition(arguments={'a': 3, 'b': 12}, name='calculator', description=None), id=0, type='function')]}, response_metadata={'token_usage': ChatCompletionOutputUsage(completion_tokens=23, prompt_tokens=154, total_tokens=177), 'model': '', 'finish_reason': 'eos_token'}, id='run-cb823ae4-665e-4c88-b1c6-e69ae5cbbc74-0', tool_calls=[{'name': 'calculator', 'args': {'a': 3, 'b': 12}, 'id': 0}])We’re able to directly leverage the same`ChatOpenAI` class that we would have used with the OpenAI models. This allows all previous code to function with our endpoint by changing just one line of code. | ||
Let’s now use this declared LLM in a simple RAG pipeline to answer a question over the contents of a HF blog post. | ||
``` | ||
|
||
## How to use with LlamaIndex | ||
|
||
Similarly, you can also use a tools with TGI endpoints in [LLamaIndex](https://www.llamaindex.ai/), but not the serverless API yet | ||
Jofthomas marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
```python | ||
import os | ||
from typing import List, Literal,Optional | ||
from llama_index.core.bridge.pydantic import BaseModel, Field | ||
from llama_index.core.tools import FunctionTool | ||
from llama_index.core.base.llms.types import ( | ||
ChatMessage, | ||
MessageRole, | ||
) | ||
|
||
from llama_index.llms.huggingface import ( | ||
TextGenerationInference, | ||
) | ||
|
||
URL = "your_tgi_endpoint" | ||
model = TextGenerationInference( | ||
model_url=URL, token=False | ||
) # set token to False in case of public endpoint | ||
|
||
def get_current_weather(location: str, format: str): | ||
"""Get the current weather | ||
|
||
Args: | ||
location (str): The city and state, e.g. San Francisco, CA | ||
format (str): The temperature unit to use ('celsius' or 'fahrenheit'). Infer this from the users location. | ||
""" | ||
... | ||
|
||
class WeatherArgs(BaseModel): | ||
location: str = Field( | ||
description="The city and region, e.g. Paris, Ile-de-France" | ||
) | ||
format: Literal["fahrenheit", "celsius"] = Field( | ||
description="The temperature unit to use ('fahrenheit' or 'celsius'). Infer this from the location.", | ||
) | ||
|
||
weather_tool = FunctionTool.from_defaults( | ||
fn=get_current_weather, | ||
name="get_current_weather", | ||
description="Get the current weather", | ||
fn_schema=WeatherArgs, | ||
) | ||
|
||
def get_current_weather_n_days(location: str, format: str, num_days: int): | ||
"""Get the weather forecast for the next N days | ||
|
||
Args: | ||
location (str): The city and state, e.g. San Francisco, CA | ||
format (str): The temperature unit to use ('celsius' or 'fahrenheit'). Infer this from the users location. | ||
num_days (int): The number of days for the weather forecast. | ||
""" | ||
... | ||
|
||
class ForecastArgs(BaseModel): | ||
location: str = Field( | ||
description="The city and region, e.g. Paris, Ile-de-France" | ||
) | ||
format: Literal["fahrenheit", "celsius"] = Field( | ||
description="The temperature unit to use ('fahrenheit' or 'celsius'). Infer this from the location.", | ||
) | ||
num_days: int = Field( | ||
description="The duration for the weather forecast in days.", | ||
) | ||
|
||
forecast_tool = FunctionTool.from_defaults( | ||
fn=get_current_weather_n_days, | ||
name="get_current_weather_n_days", | ||
description="Get the current weather for n days", | ||
fn_schema=ForecastArgs, | ||
) | ||
|
||
usr_msg = ChatMessage( | ||
role=MessageRole.USER, | ||
content="What's the weather like in Paris over next week?", | ||
) | ||
|
||
response = model.chat_with_tools( | ||
user_msg=usr_msg, | ||
tools=[ | ||
weather_tool, | ||
forecast_tool, | ||
], | ||
tool_choice="get_current_weather_n_days", | ||
) | ||
|
||
print(response.message.additional_kwargs) | ||
``` | ||
|
||
```python | ||
|
||
{'tool_calls': [{'id': 0, 'type': 'function', 'function': {'description': None, 'name': 'get_current_weather_n_days', 'arguments': {'format': 'celsius', 'location': 'Paris, Ile-de-France', 'num_days': 7}}}]} | ||
``` | ||
|
||
## Clean up | ||
|
||
To clean up our work, we can either pause or delete the model endpoint. This step can alternately be completed via the UI. | ||
|
||
```python | ||
# pause our running endpoint | ||
endpoint.pause() | ||
|
||
# optionally delete | ||
endpoint.delete() | ||
``` | ||
|
||
## Conclusion | ||
|
||
Now that you can now call some tools with Hugging Face models in the different frameworks, we strongly encourage you to deploy ( and possibly fine tune ) your own models in an Inference Endpoint and experiment with this new feature. We are convinced that the capacity of small LLMs to call some tools will be very beneficial to the community. We can’t wait to see what use cases you will power with open LLMs and tools ! | ||
Jofthomas marked this conversation as resolved.
Show resolved
Hide resolved
|
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.