Agents¶

Agents are a core component of the Oblix orchestration system that provide awareness of system state and influence model orchestration decisions. This page explains how agents work and how they are used to optimize AI model execution.

What Are Agents?¶

In Oblix, agents are specialized components that:

Monitor specific aspects of the system (resources, connectivity, etc.)
Evaluate the current state against defined thresholds
Recommend execution targets based on their findings
Provide detailed metrics for transparency and debugging

Agents run continuously in the background, ensuring the Oblix system always has up-to-date information about the execution environment to make intelligent orchestration decisions.

Built-in Agent Types¶

Oblix currently provides two built-in agent types that work together to make intelligent orchestration decisions:

ResourceMonitor¶

The ResourceMonitor agent tracks system resource utilization, including:

CPU usage - Overall processor utilization percentage
Memory usage - RAM utilization percentage
System load - Average system load over time
GPU availability - Presence and utilization of GPU resources
GPU metrics - Real-time GPU utilization and memory usage on macOS devices

Based on these metrics, the ResourceMonitor recommends one of several execution targets:

LOCAL_CPU - Execute on local CPU (when resources are available)
LOCAL_GPU - Execute on local GPU (when available and suitable)
CLOUD - Execute on cloud models (when local resources are constrained)

from oblix.agents.resource_monitor import ResourceMonitor

# Create with default settings
resource_monitor = ResourceMonitor()

# Add to client
client.hook_agent(resource_monitor)

Resource States¶

The ResourceMonitor reports one of three states:

AVAILABLE - Sufficient resources available for local execution
CONSTRAINED - Resources are limited but usable
CRITICAL - Resources are extremely limited, recommend cloud execution

ConnectivityAgent¶

The ConnectivityAgent monitors network connectivity, including:

Connection type - Type of network connection (wifi, ethernet, etc.)
Latency - Response time to key endpoints
Packet loss - Percentage of lost packets
Bandwidth - Available network bandwidth

Based on these metrics, the ConnectivityAgent recommends one of several orchestration targets:

LOCAL - Use local models due to connectivity issues
CLOUD - Use cloud models due to good connectivity
HYBRID - Consider balanced execution between local and cloud

from oblix.agents.connectivity import ConnectivityAgent

# Create with default settings
connectivity_agent = ConnectivityAgent()

# Add to client
client.hook_agent(connectivity_agent)

Connectivity States¶

The ConnectivityAgent reports one of three states:

OPTIMAL - Good connectivity suitable for cloud model use
DEGRADED - Limited connectivity that may affect performance
DISCONNECTED - No connectivity, must use local models

Agent Lifecycle¶

Agents follow a defined lifecycle:

Initialization - Setting up monitoring capabilities
Periodic Checks - Collecting metrics at regular intervals
On-demand Checks - Performing checks when explicitly requested
Shutdown - Gracefully releasing resources

Agent Orchestration Process¶

When you execute a prompt, all registered agents perform checks to inform the orchestration decision:

response = await client.execute("Explain quantum computing")

# Orchestration decisions are included in the response
agent_checks = response["agent_checks"]

The agent_checks dictionary contains detailed information about:

Current system state assessments
Recommended execution targets
Specific metrics collected
Reasoning for orchestration decisions

Customizing Agent Behavior¶

You can customize agent behavior by providing configuration options:

Resource Thresholds¶

# Create with custom thresholds
resource_monitor = ResourceMonitor(
    custom_thresholds={
        "cpu_threshold": 70.0,     # CPU usage percentage (default: 80.0)
        "memory_threshold": 75.0,  # RAM usage percentage (default: 85.0)
        "load_threshold": 3.0,     # System load average (default: 4.0)
        "gpu_threshold": 75.0,     # GPU utilization percentage threshold (default: 85.0)
        "critical_gpu": 90.0       # Critical GPU threshold (default: 95.0)
    }
)

Connectivity Thresholds¶

# Create with custom connectivity thresholds
connectivity_agent = ConnectivityAgent(
    latency_threshold=150.0,       # Maximum acceptable latency in ms (default: 200.0)
    packet_loss_threshold=5.0,     # Maximum acceptable packet loss percentage (default: 10.0)
    bandwidth_threshold=10.0,      # Minimum acceptable bandwidth in Mbps (default: 5.0)
    check_interval=60              # Seconds between connectivity checks (default: 30)
)

Intelligent Orchestration System¶

Oblix's agent architecture powers its intelligent orchestration system with several key benefits:

Adaptive Execution - Dynamically selects the optimal model based on real-time conditions
Resilience - Automatically falls back to local models when connectivity fails
Resource Optimization - Balances workload between local and cloud resources
Performance Tuning - Configurable thresholds to match specific hardware capabilities
Transparency - Clear visibility into orchestration decisions

Orchestration Decision Flow¶

The Oblix orchestration system follows this decision process:

Resource Assessment - ResourceMonitor evaluates system capabilities
Connectivity Verification - ConnectivityAgent checks network conditions
Decision Prioritization - Weighs connectivity status higher than resource availability
Model Selection - Chooses appropriate model based on agent recommendations
Execution - Runs the prompt on the selected model

This orchestration flow ensures that your AI workloads run on the most appropriate model given the current system conditions.

Extensible Agent System¶

Oblix's agent system is designed to be extensible and scalable. The architecture allows for additional agents to be integrated in the future, enhancing the orchestration capabilities while maintaining compatibility with existing code. As new monitoring needs emerge, the agent ecosystem can grow accordingly.

GPU Monitoring on macOS¶

Oblix provides enhanced GPU monitoring capabilities on macOS devices:

Real-time GPU Utilization - Live measurement of GPU processing load (0-100%)
Memory Usage Tracking - For Apple Silicon GPUs with unified memory
No Sudo Required - Works without elevated privileges (permissions handled during installation)
Automatic Decision Making - Routes to appropriate execution target based on GPU state

Accessing GPU Metrics¶

You can access GPU metrics programmatically:

# Get complete resource metrics
resource_metrics = await client.get_resource_metrics()

# Access GPU metrics
if resource_metrics.get('gpu') and resource_metrics['gpu'].get('available'):
    gpu_info = resource_metrics['gpu']

    # Check GPU utilization
    if gpu_info.get('utilization') is not None:
        print(f"GPU utilization: {gpu_info['utilization'] * 100:.2f}%")

    # Memory usage (Apple Silicon)
    if gpu_info.get('memory_utilization') is not None:
        print(f"GPU memory: {gpu_info['memory_utilization'] * 100:.2f}%")

    # Additional information
    print(f"GPU name: {gpu_info.get('name', 'Unknown')}")
    print(f"GPU type: {gpu_info.get('type', 'Unknown')}")

Best Practices¶

When working with Oblix's agent-based orchestration:

Enable Both Agents - Use both ResourceMonitor and ConnectivityAgent together for optimal orchestration
Start with Defaults - Use the default agent thresholds initially
Monitor Performance - Review agent recommendations over time
Adjust Gradually - Make small adjustments to thresholds based on observations
Configure for Your Environment - Tune thresholds based on your specific hardware capabilities
Tune GPU Thresholds - Adjust GPU thresholds based on your specific GPU model and workload requirements

By leveraging Oblix's agent-based orchestration system effectively, you can build AI applications that intelligently adapt to changing conditions while optimizing for performance, cost, and reliability.