dragon.ai.inference.guardrails.GuardrailsProcessor

class GuardrailsProcessor[source] 

Bases: object

Handles prompt safety checking using PromptGuard model.

This class is responsible ONLY for guardrails/safety checking, completely separated from batching and LLM inference logic.

__init__(config: GuardrailsConfig, hf_token: str )[source] 

Initialize the guardrails processor.

Parameters:

config (GuardrailsConfig) – Guardrails configuration.
hf_token (str ) – HuggingFace token for model access.

Methods

`__init__`(config, hf_token)	Initialize the guardrails processor.
`check_prompts`(prompts)	Check a list of prompts for jailbreak attempts.
`filter_batch`(prompts, formatted_prompts, ...)	Filter a batch of prompts, separating safe from malicious ones.
`get_malicious_response`()	Get the standard response for malicious prompts.

__init__(config: GuardrailsConfig, hf_token: str )[source] 

Initialize the guardrails processor.

Parameters:

config (GuardrailsConfig) – Guardrails configuration.
hf_token (str ) – HuggingFace token for model access.

check_prompts(prompts: List [str ]) → Tuple [List [bool ], List [float ], float ][source] 

Check a list of prompts for jailbreak attempts.

Parameters:: prompts (list [str ]) – List of user prompts to check.
Returns:: Tuple (is_safe, jailbreak_scores, processing_time) where is_safe is a list of booleans (True if safe, False if malicious), jailbreak_scores are the scores per prompt, and processing_time is the total processing time in seconds.
Return type:: tuple [list [bool ], list [float ], float ]

filter_batch(prompts: List [str ], formatted_prompts: List [str ], response_queues: List , latency_metrics: List [Tuple [float , float , float ]]) → Tuple [List [str ], List [str ], List , List [Tuple [float , float , float ]], List [int ], float ][source] 

Filter a batch of prompts, separating safe from malicious ones.

Parameters:

prompts (list [str ]) – List of user prompts.
formatted_prompts (list [str ]) – List of formatted prompts.
response_queues (list ) – List of response queues.
latency_metrics (list [tuple [float , float , float ]]) – List of latency metric tuples.

Returns:

Tuple (safe_prompts, safe_formatted, safe_queues, safe_metrics, malicious_indices, processing_time) where the safe_* lists contain only safe entries, malicious_indices are the indices of malicious prompts and processing_time is the guardrails processing time in seconds.

Return type:

tuple

get_malicious_response() → str [source] 

Get the standard response for malicious prompts.

Returns:: Standard response string for malicious prompts.
Return type:: str