Ollama¶
Ollama is an open-source tool that allows you to run open-source large language models locally on your own hardware. This page covers how to use Ollama models with Oblix for intelligent model orchestration.
Installation and Setup¶
1. Install Oblix¶
Oblix is available exclusively for macOS at this time:
- Visit oblix.ai to download the latest version
- Follow the installation instructions on the website
- Once installed, you can import the SDK in your Python projects
Note: Currently, only macOS is supported. Windows and Linux versions are planned for future releases.
2. Install Ollama¶
Before using Ollama with Oblix, you need to install Ollama on your Mac:
3. Start Ollama Server¶
Once installed, make sure the Ollama server is running:
This starts the Ollama server on the default port 11434
.
4. Pull Models¶
Pull the models you want to use:
Hooking Ollama Models in Oblix¶
from oblix import OblixClient, ModelType
# Initialize client
client = OblixClient(oblix_api_key="your_oblix_api_key")
# Hook Ollama model with default endpoint (http://localhost:11434)
await client.hook_model(
model_type=ModelType.OLLAMA,
model_name="llama2"
)
# Or specify a custom endpoint
await client.hook_model(
model_type=ModelType.OLLAMA,
model_name="llama2",
endpoint="http://custom-server:11434"
)
Supported Ollama Models¶
Ollama supports a wide range of open-source models, including:
Model Family | Examples | Description |
---|---|---|
Llama | llama2, llama3 | Meta's Llama models |
Mistral | mistral, mixtral | Mistral AI's models |
Phi | phi-2 | Microsoft's Phi models |
Gemma | gemma:2b, gemma:7b | Google's Gemma models |
Vicuna | vicuna | Fine-tuned Llama models |
Orca | orca-mini | Microsoft's Orca models |
Falcon | falcon | TII's Falcon models |
For the most up-to-date list of available models, see the Ollama model library.
Using Ollama with Oblix¶
Ollama works best as part of a hybrid setup with Oblix, where you combine local and cloud models for optimal results. While it's technically possible to use just Ollama models, Oblix is designed to excel in orchestration scenarios where both local and cloud models are available.
Hybrid Local-Cloud Setup with Ollama¶
from oblix import OblixClient, ModelType
from oblix.agents.resource_monitor import ResourceMonitor
from oblix.agents.connectivity import ConnectivityAgent
# Initialize client
client = OblixClient(oblix_api_key="your_oblix_api_key")
# Hook local Ollama model
await client.hook_model(
model_type=ModelType.OLLAMA,
model_name="llama2"
)
# Hook cloud model
await client.hook_model(
model_type=ModelType.OPENAI,
model_name="gpt-3.5-turbo",
api_key="your_openai_api_key"
)
# Add monitoring agents for intelligent orchestration
client.hook_agent(ResourceMonitor())
client.hook_agent(ConnectivityAgent())
# Execute prompt - Oblix will automatically choose between
# local and cloud models based on connectivity and system resources
response = await client.execute("Explain quantum computing in simple terms")
print(response["response"])
Advanced Configuration¶
You can customize Ollama model settings when executing prompts:
response = await client.execute(
"Write a story about a robot who gains consciousness",
model_id="ollama:llama2",
temperature=0.8,
max_tokens=2000,
use_gpu=True # Enable GPU acceleration if available
)
Supported Parameters¶
Ollama models in Oblix support the following parameters:
Parameter | Description | Default |
---|---|---|
temperature |
Controls randomness (0-2) | 0.7 |
max_tokens |
Maximum response length | model-dependent |
use_gpu |
Enable GPU acceleration | False |
stop |
List of stop sequences | None |
Performance Considerations¶
When using Ollama models with Oblix, consider:
System Requirements for macOS¶
Models have different resource requirements on Mac:
Model Size | RAM Required | Disk Space | Apple Silicon Benefits |
---|---|---|---|
7B | 8GB+ | ~4GB | Good performance on M1/M2 |
13B | 16GB+ | ~8GB | Better with M1 Pro/Max/Ultra or M2 |
70B | 32GB+ | ~40GB | Best with M1/M2 Max/Ultra |
Apple Silicon Optimization¶
Ollama is optimized for Apple Silicon, and Oblix leverages this effectively:
- Metal API: Hardware acceleration on Apple Silicon chips
- Memory management: Optimized for macOS memory architecture
- Resource monitoring: Oblix's ResourceMonitor is specially tuned for macOS
Offline Capability¶
One of the key advantages of Ollama models is offline capability:
- Models run completely locally without internet connectivity
- Perfect for privacy-sensitive applications
- Ideal for environments with unreliable connections
Oblix Orchestration Benefits¶
The combination of Ollama and Oblix provides unique advantages:
- Intelligent switching between local Ollama models and cloud models
- Automatic detection of connectivity status and resource availability
- Optimization for macOS-specific hardware features
- Seamless fallback to local models when offline
Troubleshooting¶
Common issues and their solutions:
Model Not Found¶
If you encounter "model not found" errors:
Port Issues¶
If Ollama is running on a non-standard port:
await client.hook_model(
model_type=ModelType.OLLAMA,
model_name="llama2",
endpoint="http://localhost:YOUR_PORT"
)
Resource Constraints on Mac¶
If the model is running slowly or crashing on your Mac:
- Try a smaller model (e.g., switch from llama2:13b to llama2:7b)
- Ensure adequate RAM is available
- Close other memory-intensive applications
- For M1/M2 Macs, check Activity Monitor for memory pressure
MacOS-Specific Optimization¶
To get the best performance with Ollama on macOS:
- Ensure the latest version of macOS is installed
- Keep Ollama updated for latest Metal API optimizations
- For Apple Silicon Macs, use models optimized for Metal (e.g., llama2:7b-q4_0)
- Oblix's ResourceMonitor will automatically detect Metal compatibility
API Reference¶
See the OblixClient API reference for detailed information on all available methods.