Skip to content

[PY] Genkit Python fails with "missing field" error when using Pydantic models with nested list types as output_schema #3360

@fbrann

Description

@fbrann

Description

When using a Pydantic model that contains a list of nested Pydantic models as output_schema in ai.generate(), Genkit fails with a Google Generative AI error stating that the items field is missing, even though the generated JSON schema clearly contains the items field.

Environment

  • Genkit version: 0.4.0
  • Python version: 3.13.5
  • Operating System: macOS
  • Model: googleai/gemini-2.5-pro

Steps to Reproduce

import pytest
from pydantic import BaseModel, Field
from genkit.ai import Genkit
from genkit.plugins.google_genai import GoogleAI

class TestPerson(BaseModel):
   """Represents a person mentioned in transcripts."""
   name: str = Field(description="Name of the person")
   title: str = Field(description="Title or role of the person at the account")

class TestPeopleAtAccount(BaseModel):
   """Container for all people associated with an account."""
   people: list[TestPerson] = Field(
       description="List of people and their titles at the account"
   )

@pytest.mark.asyncio
async def test_simple_gemini_call_pydantic_class_with_list():
   ai = Genkit(
       plugins=[GoogleAI()],
       model="googleai/gemini-2.5-pro",
   )

   # This fails with "missing field" error
   result = await ai.generate(
       prompt="Extract the people described by the following text: "
       + "The people at the account are John Doe and Jane Smith.",
       output_schema=TestPeopleAtAccount,
   )

Expected Behavior

The Pydantic model should be properly converted to a schema that the Gemini API accepts, and the generation should complete successfully with structured output matching the schema.

Actual Behavior

The request fails with the following error:
google.genai.errors.ClientError: 400 INVALID_ARGUMENT. {'error': {'code': 400, 'message': '* GenerateContentRequest.generation_config.response_schema.properties[people].items: missing field.\n', 'status': 'INVALID_ARGUMENT'}}

Investigation Details

When inspecting the generated JSON schema using TestPeopleAtAccount.model_json_schema(), the schema contains $ref references:

  "$defs": {
    "TestPerson": {
      "description": "Represents a person mentioned in transcripts.",
      "properties": {
        "name": {
          "description": "Name of the person",
          "title": "Name",
          "type": "string"
        },
        "title": {
          "description": "Title or role of the person at the account",
          "title": "Title",
          "type": "string"
        }
      },
      "required": ["name", "title"],
      "title": "TestPerson",
      "type": "object"
    }
  },
  "properties": {
    "people": {
      "description": "List of people and their titles at the account",
      "items": {
        "$ref": "#/$defs/TestPerson"
      },
      "title": "People",
      "type": "array"
    }
  },
  "required": ["people"],
  "title": "TestPeopleAtAccount",
  "type": "object"
}

The items field is clearly present, but it contains a $ref reference. It appears that the Gemini API doesn't properly handle JSON Schema $ref references, and Genkit isn't flattening these references before sending the schema to the API.

Additional Context

  • This issue specifically affects Pydantic models with nested complex types in lists
  • Simple Pydantic models without nested lists work correctly
  • The issue is reproducible with both gemini-2.5-pro and gemini-2.0-flash

Workaround

This works for now

def convert_pydantic_model_to_json_schema(model: BaseModel) -> dict:

    def flatten_schema(schema: dict) -> dict:
        """Recursively flatten a JSON schema by resolving all $ref references."""
        if "$defs" not in schema:
            return schema

        defs = schema.pop("$defs")

        def resolve_refs(obj):
            if isinstance(obj, dict):
                if "$ref" in obj:
                    ref_path = obj["$ref"]
                    if ref_path.startswith("#/$defs/"):
                        ref_name = ref_path.split("/")[-1]
                        if ref_name in defs:
                            # Recursively resolve any refs in the definition itself
                            return resolve_refs(defs[ref_name].copy())
                else:
                    # Recursively process all values
                    return {k: resolve_refs(v) for k, v in obj.items()}
            elif isinstance(obj, list):
                return [resolve_refs(item) for item in obj]
            else:
                return obj

        return resolve_refs(schema)

    schema = model.model_json_schema()
    return flatten_schema(schema)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingpythonPython

    Type

    No type

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions