Ollama¶

Ollama is an open-source tool that allows you to run open-source large language models locally on your own hardware. This page covers how to use Ollama models with Oblix for intelligent model orchestration.

Installation and Setup¶

1. Install Oblix¶

Oblix is available exclusively for macOS at this time:

Visit oblix.ai to download the latest version
Follow the installation instructions on the website
Once installed, you can import the SDK in your Python projects

Note: Currently, only macOS is supported. Windows and Linux versions are planned for future releases.

2. Install Ollama¶

Before using Ollama with Oblix, you need to install Ollama on your Mac:

curl -fsSL https://ollama.com/install.sh | sh

3. Start Ollama Server¶

Once installed, make sure the Ollama server is running:

ollama serve

This starts the Ollama server on the default port 11434.

4. Pull Models¶

Pull the models you want to use:

# Examples of models you can pull
ollama pull llama2
ollama pull mistral
ollama pull gemma

Hooking Ollama Models in Oblix¶

from oblix import OblixClient, ModelType

# Initialize client
client = OblixClient(oblix_api_key="your_oblix_api_key")

# Hook Ollama model with default endpoint (http://localhost:11434)
await client.hook_model(
    model_type=ModelType.OLLAMA,
    model_name="llama2"
)

# Or specify a custom endpoint
await client.hook_model(
    model_type=ModelType.OLLAMA,
    model_name="llama2",
    endpoint="http://custom-server:11434"
)

Supported Ollama Models¶

Ollama supports a wide range of open-source models, including:

Model Family	Examples	Description
Llama	llama2, llama3	Meta's Llama models
Mistral	mistral, mixtral	Mistral AI's models
Phi	phi-2	Microsoft's Phi models
Gemma	gemma:2b, gemma:7b	Google's Gemma models
Vicuna	vicuna	Fine-tuned Llama models
Orca	orca-mini	Microsoft's Orca models
Falcon	falcon	TII's Falcon models

For the most up-to-date list of available models, see the Ollama model library.

Using Ollama with Oblix¶

Ollama works best as part of a hybrid setup with Oblix, where you combine local and cloud models for optimal results. While it's technically possible to use just Ollama models, Oblix is designed to excel in orchestration scenarios where both local and cloud models are available.

Hybrid Local-Cloud Setup with Ollama¶

from oblix import OblixClient, ModelType
from oblix.agents.resource_monitor import ResourceMonitor
from oblix.agents.connectivity import ConnectivityAgent

# Initialize client
client = OblixClient(oblix_api_key="your_oblix_api_key")

# Hook local Ollama model
await client.hook_model(
    model_type=ModelType.OLLAMA,
    model_name="llama2"
)

# Hook cloud model
await client.hook_model(
    model_type=ModelType.OPENAI,
    model_name="gpt-3.5-turbo",
    api_key="your_openai_api_key"
)

# Add monitoring agents for intelligent orchestration
client.hook_agent(ResourceMonitor())
client.hook_agent(ConnectivityAgent())

# Execute prompt - Oblix will automatically choose between 
# local and cloud models based on connectivity and system resources
response = await client.execute("Explain quantum computing in simple terms")
print(response["response"])

Advanced Configuration¶

You can customize Ollama model settings when executing prompts:

response = await client.execute(
    "Write a story about a robot who gains consciousness",
    model_id="ollama:llama2",
    temperature=0.8,
    max_tokens=2000,
    use_gpu=True  # Enable GPU acceleration if available
)

Supported Parameters¶

Ollama models in Oblix support the following parameters:

Parameter	Description	Default
`temperature`	Controls randomness (0-2)	0.7
`max_tokens`	Maximum response length	model-dependent
`use_gpu`	Enable GPU acceleration	False
`stop`	List of stop sequences	None

Performance Considerations¶

When using Ollama models with Oblix, consider:

System Requirements for macOS¶

Models have different resource requirements on Mac:

Model Size	RAM Required	Disk Space	Apple Silicon Benefits
7B	8GB+	~4GB	Good performance on M1/M2
13B	16GB+	~8GB	Better with M1 Pro/Max/Ultra or M2
70B	32GB+	~40GB	Best with M1/M2 Max/Ultra

Apple Silicon Optimization¶

Ollama is optimized for Apple Silicon, and Oblix leverages this effectively:

Metal API: Hardware acceleration on Apple Silicon chips
Memory management: Optimized for macOS memory architecture
Resource monitoring: Oblix's ResourceMonitor is specially tuned for macOS

Offline Capability¶

One of the key advantages of Ollama models is offline capability:

Models run completely locally without internet connectivity
Perfect for privacy-sensitive applications
Ideal for environments with unreliable connections

Oblix Orchestration Benefits¶

The combination of Ollama and Oblix provides unique advantages:

Intelligent switching between local Ollama models and cloud models
Automatic detection of connectivity status and resource availability
Optimization for macOS-specific hardware features
Seamless fallback to local models when offline

Troubleshooting¶

Common issues and their solutions:

Model Not Found¶

If you encounter "model not found" errors:

# Pull the model manually first
ollama pull llama2

Port Issues¶

If Ollama is running on a non-standard port:

await client.hook_model(
    model_type=ModelType.OLLAMA,
    model_name="llama2",
    endpoint="http://localhost:YOUR_PORT"
)

Resource Constraints on Mac¶

If the model is running slowly or crashing on your Mac:

Try a smaller model (e.g., switch from llama2:13b to llama2:7b)
Ensure adequate RAM is available
Close other memory-intensive applications
For M1/M2 Macs, check Activity Monitor for memory pressure

MacOS-Specific Optimization¶

To get the best performance with Ollama on macOS:

Ensure the latest version of macOS is installed
Keep Ollama updated for latest Metal API optimizations
For Apple Silicon Macs, use models optimized for Metal (e.g., llama2:7b-q4_0)
Oblix's ResourceMonitor will automatically detect Metal compatibility

API Reference¶

See the OblixClient API reference for detailed information on all available methods.