dragon.ai.inference.llm_proxy.LLMProxy

class LLMProxy[source]

Bases: ABC

Transport-agnostic proxy interface for LLM chat inference.

Each proxy is a lightweight client-side handle pointing at a shared inference pipeline backend. One pipeline, many proxies.

Implementations must override chat().

__init__()

Methods

__init__()

chat(messages[, tools, json_schema, ...])

Send a chat request and return the assistant's response text.

shutdown()

Release any resources held by this proxy.

abstractmethod async chat(messages: List [Dict [str , Any ]], tools: List [Dict [str , Any ]] | None = None, json_schema: dict | None = None, continue_final_message: bool = False) str [source]

Send a chat request and return the assistant’s response text.

Parameters:
  • messages (list [dict ]) – Conversation messages in OpenAI chat format.

  • tools (list [dict ] | None) – Optional tool definitions.

  • json_schema (dict | None) – JSON schema dict for structured output. When provided, guided decoding is enabled.

  • continue_final_message (bool ) – Continue last assistant message.

Returns:

Response text.

Return type:

str

async shutdown() None [source]

Release any resources held by this proxy.

Called once during agent teardown. The default implementation is a no-op; subclasses that hold pooled connections, queues, or other heavyweight resources should override this.