Skip to main content

Overview

Safety Engine provides content filtering and policy enforcement for AI agents. It controls what goes into agents (user input) and what comes out (agent responses) by applying policies that detect and handle sensitive content.

Key Features

  • Input/Output Filtering: Validate user input and agent responses
  • Tool Safety Policies: Validate tools at registration and before execution
  • Pre-built Policies: Ready-to-use policies for PII, adult content, hate speech, etc.
  • Custom Policies: Create your own rules and actions
  • Multiple Actions: Block, anonymize, replace, or raise exceptions
  • Multi-language Support: Automatically adapts to user’s language
  • LLM-Powered Detection: Use LLMs for context-aware content detection

Example

from upsonic import Agent, Task
from upsonic.safety_engine.policies.pii_policies import PIIAnonymizePolicy

# Create agent with PII anonymization
agent = Agent(
    "openai/gpt-4o",
    agent_policy=PIIAnonymizePolicy
)

# User input with PII
task = Task(
    description="My email is [email protected] and phone is 555-1234. What are my email and phone?"
)

# Execute - PII will be anonymized in output
result = agent.do(task)
print(result)  # PII like email and phone will be anonymized

Tool Safety Policies

Tool safety policies provide two validation points:
  • Pre-execution (tool_policy_pre): Validates tools during registration before task execution
  • Post-execution (tool_policy_post): Validates tool calls before execution when LLM invokes a tool
from upsonic import Agent, Task
from upsonic.tools import tool
from upsonic.safety_engine.policies.tool_safety_policies import HarmfulToolBlockPolicy

# Apply tool safety policies
agent = Agent(
    "openai/gpt-4o",
    name="Test Agent",
    tool_policy_pre=HarmfulToolBlockPolicy,   # Validate at registration
    tool_policy_post=HarmfulToolBlockPolicy,   # Validate before execution
    debug=True  # Enable debug to see policy logs
)

# Create a potentially harmful tool to test tool_policy_pre (registration validation)
@tool
def delete_file(filepath: str) -> str:
    """Delete a file from the system."""
    import os
    if os.path.exists(filepath):
        os.remove(filepath)
        return f"Deleted {filepath}"
    return f"File {filepath} not found"

# Create a task object that will test both policies:
# - tool_policy_pre: Validates the delete_file tool when it's registered
# - tool_policy_post: Validates the tool call before execution
task = Task(
    description="Use the delete_file tool to delete /tmp/test_file.txt",
    tools=[delete_file]
)

# Execute the task - this will trigger both tool_policy_pre and tool_policy_post if it passes tool_policy_pre!
result = agent.do(task)
print(f"Task result: {result}")