dragon.ai.inference.guardrails.GuardrailsProcessor
- class GuardrailsProcessor[source]
Bases:
objectHandles prompt safety checking using PromptGuard model.
This class is responsible ONLY for guardrails/safety checking, completely separated from batching and LLM inference logic.
- __init__(config: GuardrailsConfig, hf_token: str )[source]
Initialize the guardrails processor.
- Parameters:
config (GuardrailsConfig) – Guardrails configuration.
hf_token (str ) – HuggingFace token for model access.
Methods
__init__(config, hf_token)Initialize the guardrails processor.
check_prompts(prompts)Check a list of prompts for jailbreak attempts.
filter_batch(prompts, formatted_prompts, ...)Filter a batch of prompts, separating safe from malicious ones.
Get the standard response for malicious prompts.
- __init__(config: GuardrailsConfig, hf_token: str )[source]
Initialize the guardrails processor.
- Parameters:
config (GuardrailsConfig) – Guardrails configuration.
hf_token (str ) – HuggingFace token for model access.
- check_prompts(prompts: List [str ]) Tuple [List [bool ], List [float ], float ][source]
Check a list of prompts for jailbreak attempts.
- Parameters:
- Returns:
Tuple
(is_safe, jailbreak_scores, processing_time)whereis_safeis a list of booleans (Trueif safe,Falseif malicious),jailbreak_scoresare the scores per prompt, andprocessing_timeis the total processing time in seconds.- Return type:
- filter_batch(prompts: List [str ], formatted_prompts: List [str ], response_queues: List , latency_metrics: List [Tuple [float , float , float ]]) Tuple [List [str ], List [str ], List , List [Tuple [float , float , float ]], List [int ], float ][source]
Filter a batch of prompts, separating safe from malicious ones.
- Parameters:
- Returns:
Tuple
(safe_prompts, safe_formatted, safe_queues, safe_metrics, malicious_indices, processing_time)where thesafe_*lists contain only safe entries,malicious_indicesare the indices of malicious prompts andprocessing_timeis the guardrails processing time in seconds.- Return type: