-
Notifications
You must be signed in to change notification settings - Fork 353
Description
Description
When using a Pydantic model that contains a list of nested Pydantic models as output_schema
in ai.generate()
, Genkit fails with a Google Generative AI error stating that the items
field is missing, even though the generated JSON schema clearly contains the items
field.
Environment
- Genkit version: 0.4.0
- Python version: 3.13.5
- Operating System: macOS
- Model: googleai/gemini-2.5-pro
Steps to Reproduce
import pytest
from pydantic import BaseModel, Field
from genkit.ai import Genkit
from genkit.plugins.google_genai import GoogleAI
class TestPerson(BaseModel):
"""Represents a person mentioned in transcripts."""
name: str = Field(description="Name of the person")
title: str = Field(description="Title or role of the person at the account")
class TestPeopleAtAccount(BaseModel):
"""Container for all people associated with an account."""
people: list[TestPerson] = Field(
description="List of people and their titles at the account"
)
@pytest.mark.asyncio
async def test_simple_gemini_call_pydantic_class_with_list():
ai = Genkit(
plugins=[GoogleAI()],
model="googleai/gemini-2.5-pro",
)
# This fails with "missing field" error
result = await ai.generate(
prompt="Extract the people described by the following text: "
+ "The people at the account are John Doe and Jane Smith.",
output_schema=TestPeopleAtAccount,
)
Expected Behavior
The Pydantic model should be properly converted to a schema that the Gemini API accepts, and the generation should complete successfully with structured output matching the schema.
Actual Behavior
The request fails with the following error:
google.genai.errors.ClientError: 400 INVALID_ARGUMENT. {'error': {'code': 400, 'message': '* GenerateContentRequest.generation_config.response_schema.properties[people].items: missing field.\n', 'status': 'INVALID_ARGUMENT'}}
Investigation Details
When inspecting the generated JSON schema using TestPeopleAtAccount.model_json_schema(), the schema contains $ref references:
"$defs": {
"TestPerson": {
"description": "Represents a person mentioned in transcripts.",
"properties": {
"name": {
"description": "Name of the person",
"title": "Name",
"type": "string"
},
"title": {
"description": "Title or role of the person at the account",
"title": "Title",
"type": "string"
}
},
"required": ["name", "title"],
"title": "TestPerson",
"type": "object"
}
},
"properties": {
"people": {
"description": "List of people and their titles at the account",
"items": {
"$ref": "#/$defs/TestPerson"
},
"title": "People",
"type": "array"
}
},
"required": ["people"],
"title": "TestPeopleAtAccount",
"type": "object"
}
The items field is clearly present, but it contains a $ref reference. It appears that the Gemini API doesn't properly handle JSON Schema $ref references, and Genkit isn't flattening these references before sending the schema to the API.
Additional Context
- This issue specifically affects Pydantic models with nested complex types in lists
- Simple Pydantic models without nested lists work correctly
- The issue is reproducible with both gemini-2.5-pro and gemini-2.0-flash
Workaround
This works for now
def convert_pydantic_model_to_json_schema(model: BaseModel) -> dict:
def flatten_schema(schema: dict) -> dict:
"""Recursively flatten a JSON schema by resolving all $ref references."""
if "$defs" not in schema:
return schema
defs = schema.pop("$defs")
def resolve_refs(obj):
if isinstance(obj, dict):
if "$ref" in obj:
ref_path = obj["$ref"]
if ref_path.startswith("#/$defs/"):
ref_name = ref_path.split("/")[-1]
if ref_name in defs:
# Recursively resolve any refs in the definition itself
return resolve_refs(defs[ref_name].copy())
else:
# Recursively process all values
return {k: resolve_refs(v) for k, v in obj.items()}
elif isinstance(obj, list):
return [resolve_refs(item) for item in obj]
else:
return obj
return resolve_refs(schema)
schema = model.model_json_schema()
return flatten_schema(schema)
Metadata
Metadata
Assignees
Labels
Type
Projects
Status