dragon.ai.inference.config.ModelConfig

class ModelConfig[source]

Bases: object

LLM model configuration.

__init__(model_name: str , hf_token: str , tp_size: int , dtype: str = 'bfloat16', max_tokens: int = 100, max_model_len: int = 8192, padding_side: str = 'left', truncation_side: str = 'left', top_k: int = 50, top_p: float = 0.95, system_prompt: List [str ] = <factory>, vllm_log_level: str = 'error') None

Methods

__init__(model_name, hf_token, tp_size, ...)

validate(gpus_per_node)

Validate model configuration.

Attributes

dtype

max_model_len

max_tokens

padding_side

top_k

top_p

truncation_side

vllm_log_level

model_name

hf_token

tp_size

system_prompt

model_name: str
hf_token: str
tp_size: int
dtype: str = 'bfloat16'
max_tokens: int = 100
max_model_len: int = 8192
padding_side: str = 'left'
truncation_side: str = 'left'
top_k: int = 50
top_p: float = 0.95
system_prompt: List [str ]
vllm_log_level: str = 'error'
validate(gpus_per_node: int ) None [source]

Validate model configuration.

Parameters:

gpus_per_node (int ) – Number of GPUs available per node.

Raises:

ValueError – If any configuration parameter is invalid.

__init__(model_name: str , hf_token: str , tp_size: int , dtype: str = 'bfloat16', max_tokens: int = 100, max_model_len: int = 8192, padding_side: str = 'left', truncation_side: str = 'left', top_k: int = 50, top_p: float = 0.95, system_prompt: List [str ] = <factory>, vllm_log_level: str = 'error') None