Agents¶
Agents are a core component of the Oblix orchestration system that provide awareness of system state and influence model orchestration decisions. This page explains how agents work and how they are used to optimize AI model execution.
What Are Agents?¶
In Oblix, agents are specialized components that:
- Monitor specific aspects of the system (resources, connectivity, etc.)
- Evaluate the current state against defined thresholds
- Recommend execution targets based on their findings
- Provide detailed metrics for transparency and debugging
Agents run continuously in the background, ensuring the Oblix system always has up-to-date information about the execution environment to make intelligent orchestration decisions.
Built-in Agent Types¶
Oblix currently provides two built-in agent types that work together to make intelligent orchestration decisions:
ResourceMonitor¶
The ResourceMonitor
agent tracks system resource utilization, including:
- CPU usage - Overall processor utilization percentage
- Memory usage - RAM utilization percentage
- System load - Average system load over time
- GPU availability - Presence and utilization of GPU resources
- GPU metrics - Real-time GPU utilization and memory usage on macOS devices
Based on these metrics, the ResourceMonitor recommends one of several execution targets:
LOCAL_CPU
- Execute on local CPU (when resources are available)LOCAL_GPU
- Execute on local GPU (when available and suitable)CLOUD
- Execute on cloud models (when local resources are constrained)
from oblix.agents.resource_monitor import ResourceMonitor
# Create with default settings
resource_monitor = ResourceMonitor()
# Add to client
client.hook_agent(resource_monitor)
Resource States¶
The ResourceMonitor reports one of three states:
AVAILABLE
- Sufficient resources available for local executionCONSTRAINED
- Resources are limited but usableCRITICAL
- Resources are extremely limited, recommend cloud execution
ConnectivityAgent¶
The ConnectivityAgent
monitors network connectivity, including:
- Connection type - Type of network connection (wifi, ethernet, etc.)
- Latency - Response time to key endpoints
- Packet loss - Percentage of lost packets
- Bandwidth - Available network bandwidth
Based on these metrics, the ConnectivityAgent recommends one of several orchestration targets:
LOCAL
- Use local models due to connectivity issuesCLOUD
- Use cloud models due to good connectivityHYBRID
- Consider balanced execution between local and cloud
from oblix.agents.connectivity import ConnectivityAgent
# Create with default settings
connectivity_agent = ConnectivityAgent()
# Add to client
client.hook_agent(connectivity_agent)
Connectivity States¶
The ConnectivityAgent reports one of three states:
OPTIMAL
- Good connectivity suitable for cloud model useDEGRADED
- Limited connectivity that may affect performanceDISCONNECTED
- No connectivity, must use local models
Agent Lifecycle¶
Agents follow a defined lifecycle:
- Initialization - Setting up monitoring capabilities
- Periodic Checks - Collecting metrics at regular intervals
- On-demand Checks - Performing checks when explicitly requested
- Shutdown - Gracefully releasing resources
Agent Orchestration Process¶
When you execute a prompt, all registered agents perform checks to inform the orchestration decision:
response = await client.execute("Explain quantum computing")
# Orchestration decisions are included in the response
agent_checks = response["agent_checks"]
The agent_checks
dictionary contains detailed information about:
- Current system state assessments
- Recommended execution targets
- Specific metrics collected
- Reasoning for orchestration decisions
Customizing Agent Behavior¶
You can customize agent behavior by providing configuration options:
Resource Thresholds¶
# Create with custom thresholds
resource_monitor = ResourceMonitor(
custom_thresholds={
"cpu_threshold": 70.0, # CPU usage percentage (default: 80.0)
"memory_threshold": 75.0, # RAM usage percentage (default: 85.0)
"load_threshold": 3.0, # System load average (default: 4.0)
"gpu_threshold": 75.0, # GPU utilization percentage threshold (default: 85.0)
"critical_gpu": 90.0 # Critical GPU threshold (default: 95.0)
}
)
Connectivity Thresholds¶
# Create with custom connectivity thresholds
connectivity_agent = ConnectivityAgent(
latency_threshold=150.0, # Maximum acceptable latency in ms (default: 200.0)
packet_loss_threshold=5.0, # Maximum acceptable packet loss percentage (default: 10.0)
bandwidth_threshold=10.0, # Minimum acceptable bandwidth in Mbps (default: 5.0)
check_interval=60 # Seconds between connectivity checks (default: 30)
)
Intelligent Orchestration System¶
Oblix's agent architecture powers its intelligent orchestration system with several key benefits:
- Adaptive Execution - Dynamically selects the optimal model based on real-time conditions
- Resilience - Automatically falls back to local models when connectivity fails
- Resource Optimization - Balances workload between local and cloud resources
- Performance Tuning - Configurable thresholds to match specific hardware capabilities
- Transparency - Clear visibility into orchestration decisions
Orchestration Decision Flow¶
The Oblix orchestration system follows this decision process:
- Resource Assessment - ResourceMonitor evaluates system capabilities
- Connectivity Verification - ConnectivityAgent checks network conditions
- Decision Prioritization - Weighs connectivity status higher than resource availability
- Model Selection - Chooses appropriate model based on agent recommendations
- Execution - Runs the prompt on the selected model
This orchestration flow ensures that your AI workloads run on the most appropriate model given the current system conditions.
Extensible Agent System¶
Oblix's agent system is designed to be extensible and scalable. The architecture allows for additional agents to be integrated in the future, enhancing the orchestration capabilities while maintaining compatibility with existing code. As new monitoring needs emerge, the agent ecosystem can grow accordingly.
GPU Monitoring on macOS¶
Oblix provides enhanced GPU monitoring capabilities on macOS devices:
- Real-time GPU Utilization - Live measurement of GPU processing load (0-100%)
- Memory Usage Tracking - For Apple Silicon GPUs with unified memory
- No Sudo Required - Works without elevated privileges (permissions handled during installation)
- Automatic Decision Making - Routes to appropriate execution target based on GPU state
Accessing GPU Metrics¶
You can access GPU metrics programmatically:
# Get complete resource metrics
resource_metrics = await client.get_resource_metrics()
# Access GPU metrics
if resource_metrics.get('gpu') and resource_metrics['gpu'].get('available'):
gpu_info = resource_metrics['gpu']
# Check GPU utilization
if gpu_info.get('utilization') is not None:
print(f"GPU utilization: {gpu_info['utilization'] * 100:.2f}%")
# Memory usage (Apple Silicon)
if gpu_info.get('memory_utilization') is not None:
print(f"GPU memory: {gpu_info['memory_utilization'] * 100:.2f}%")
# Additional information
print(f"GPU name: {gpu_info.get('name', 'Unknown')}")
print(f"GPU type: {gpu_info.get('type', 'Unknown')}")
Best Practices¶
When working with Oblix's agent-based orchestration:
- Enable Both Agents - Use both ResourceMonitor and ConnectivityAgent together for optimal orchestration
- Start with Defaults - Use the default agent thresholds initially
- Monitor Performance - Review agent recommendations over time
- Adjust Gradually - Make small adjustments to thresholds based on observations
- Configure for Your Environment - Tune thresholds based on your specific hardware capabilities
- Tune GPU Thresholds - Adjust GPU thresholds based on your specific GPU model and workload requirements
By leveraging Oblix's agent-based orchestration system effectively, you can build AI applications that intelligently adapt to changing conditions while optimizing for performance, cost, and reliability.