dragon.ai.inference.llm_proxy.LLMProxy

Bases: ABC

Transport-agnostic proxy interface for LLM chat inference.

Each proxy is a lightweight client-side handle pointing at a shared inference pipeline backend. One pipeline, many proxies.

Implementations must override chat().

__init__()

Methods

`__init__`()
`chat`(messages[, tools, json_schema, ...])	Send a chat request and return the assistant's response text.
`chat_stream`(messages[, tools, json_schema, ...])	Send a streaming chat request and yield response chunks.
`shutdown`()	Release any resources held by this proxy.

abstractmethod async chat(messages: List [Dict [str , Any ]], tools: List [Dict [str , Any ]] | None = None, json_schema: dict | None = None, continue_final_message: bool = False) → str [source] 

Send a chat request and return the assistant’s response text.

Parameters:

messages (list [dict ]) – Conversation messages in OpenAI chat format.
tools (list [dict ] | None) – Optional tool definitions.
json_schema (dict | None) – JSON schema dict for structured output. When provided, guided decoding is enabled.
continue_final_message (bool ) – Continue last assistant message.

Returns:

Response text.

Return type:

str

abstractmethod async chat_stream(messages: List [Dict [str , Any ]], tools: List [Dict [str , Any ]] | None = None, json_schema: dict | None = None, continue_final_message: bool = False) → AsyncIterator [StreamChunk][source] 

Send a streaming chat request and yield response chunks.

Unlike chat(), this method yields StreamChunk objects as tokens are generated, enabling Server-Sent Events (SSE) responses.

Parameters:

messages (list [dict ]) – Conversation messages in OpenAI chat format.
tools (list [dict ] | None) – Optional tool definitions.
json_schema (dict | None) – JSON schema dict for structured output.
continue_final_message (bool ) – Continue last assistant message.

Yields:

StreamChunk objects with incremental response text.

Return type:

AsyncIterator[StreamChunk]

async shutdown() → None [source] 

Release any resources held by this proxy.

Called once during agent teardown. The default implementation is a no-op; subclasses that hold pooled connections, queues, or other heavyweight resources should override this.