Ollama (Local Models)
Run LLMs locally with complete privacy - no API keys, no internet, no costs.
Overview
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Ollama Local Setup β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β Your Machine β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β β
β β ββββββββββββββββ ββββββββββββββββ β β
β β β ADK-Rust β ββββΆ β Ollama β β β
β β β Agent β β Server β β β
β β ββββββββββββββββ ββββββββ¬ββββββββ β β
β β β β β
β β ββββββββΌββββββββ β β
β β β Local LLM β β β
β β β (llama3.2) β β β
β β ββββββββββββββββ β β
β β β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β π 100% Private - Data never leaves your machine β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Why Ollama?
| Benefit | Description |
|---|---|
| π Free | No API costs, ever |
| π Private | Data stays on your machine |
| π΄ Offline | Works without internet |
| ποΈ Control | Choose any model, customize settings |
| β‘ Fast | No network latency |
Step 1: Install Ollama
macOS
brew install ollama
Linux
curl -fsSL https://ollama.com/install.sh | sh
Windows
Download from ollama.com
Step 2: Start the Server
ollama serve
You should see:
Couldn't find '/Users/you/.ollama/id_ed25519'. Generating new private key.
Your new public key is: ssh-ed25519 AAAA...
time=2024-01-05T12:00:00.000Z level=INFO source=server.go msg="Listening on 127.0.0.1:11434"
Step 3: Pull a Model
In a new terminal:
# Recommended starter model (3B parameters, fast)
ollama pull llama3.2
# Other popular models
ollama pull qwen2.5:7b # Excellent tool calling
ollama pull mistral # Good for code
ollama pull codellama # Code generation
ollama pull gemma2 # Google's efficient model
Step 4: Add to Your Project
[dependencies]
adk-model = { version = "0.2", features = ["ollama"] }
Step 5: Use in Code
use adk_model::ollama::{OllamaModel, OllamaConfig};
use adk_agent::LlmAgentBuilder;
use std::sync::Arc;
#[tokio::main]
async fn main() -> anyhow::Result<()> {
// No API key needed!
let model = OllamaModel::new(OllamaConfig::new("llama3.2"))?;
let agent = LlmAgentBuilder::new("local_assistant")
.instruction("You are a helpful assistant running locally.")
.model(Arc::new(model))
.build()?;
// Use the agent...
Ok(())
}
Complete Working Example
use adk_rust::prelude::*;
use adk_rust::Launcher;
use std::sync::Arc;
#[tokio::main]
async fn main() -> anyhow::Result<()> {
dotenvy::dotenv().ok();
// No API key needed!
let model = OllamaModel::new(OllamaConfig::new("llama3.2"))?;
let agent = LlmAgentBuilder::new("ollama_assistant")
.description("Ollama-powered local assistant")
.instruction("You are a helpful assistant running locally via Ollama. Be concise.")
.model(Arc::new(model))
.build()?;
// Run interactive session
Launcher::new(Arc::new(agent)).run().await?;
Ok(())
}
Cargo.toml
[dependencies]
adk-rust = { version = "0.2", features = ["cli", "ollama"] }
tokio = { version = "1", features = ["full"] }
dotenvy = "0.15"
anyhow = "1.0"
Configuration Options
use adk_model::ollama::{OllamaModel, OllamaConfig};
let config = OllamaConfig::new("llama3.2")
.with_base_url("http://localhost:11434") // Custom server URL
.with_temperature(0.7) // Creativity (0.0-1.0)
.with_max_tokens(2048); // Max response length
let model = OllamaModel::new(config)?;
Recommended Models
| Model | Size | RAM Needed | Best For |
|---|---|---|---|
llama3.2 | 3B | 4GB | Fast, general purpose |
llama3.2:7b | 7B | 8GB | Better quality |
qwen2.5:7b | 7B | 8GB | Best tool calling |
mistral | 7B | 8GB | Code and reasoning |
codellama | 7B | 8GB | Code generation |
gemma2 | 9B | 10GB | Balanced performance |
llama3.1:70b | 70B | 48GB | Highest quality |
Choosing a Model
- Limited RAM (8GB)? β
llama3.2(3B) - Need tool calling? β
qwen2.5:7b - Writing code? β
codellamaormistral - Best quality? β
llama3.1:70b(needs 48GB+ RAM)
Tool Calling with Ollama
Ollama supports function calling with compatible models:
use adk_model::ollama::{OllamaModel, OllamaConfig};
use adk_agent::LlmAgentBuilder;
use adk_tool::FunctionTool;
use std::sync::Arc;
// qwen2.5 has excellent tool calling support
let model = OllamaModel::new(OllamaConfig::new("qwen2.5:7b"))?;
let weather_tool = Arc::new(FunctionTool::new(
"get_weather",
"Get weather for a location",
|_ctx, args| async move {
let location = args.get("location").and_then(|v| v.as_str()).unwrap_or("unknown");
Ok(serde_json::json!({
"location": location,
"temperature": "72Β°F",
"condition": "Sunny"
}))
},
));
let agent = LlmAgentBuilder::new("weather_assistant")
.instruction("Help users check the weather.")
.model(Arc::new(model))
.tool(weather_tool)
.build()?;
Note: Tool calling uses non-streaming mode for reliability with local models.
Example Output
π€ User: Hello! What can you do?
π€ Ollama (llama3.2): Hello! I'm a local AI assistant running on your
machine. I can help with:
- Answering questions
- Writing and editing text
- Explaining concepts
- Basic coding help
All completely private - nothing leaves your computer!
Troubleshooting
"Connection refused"
# Make sure Ollama is running
ollama serve
"Model not found"
# Pull the model first
ollama pull llama3.2
Slow responses
- Use a smaller model (
llama3.2instead ofllama3.1:70b) - Close other applications to free RAM
- Consider GPU acceleration if available
Check available models
ollama list
Running Examples
# From the official_docs_examples folder
cd official_docs_examples/models/providers_test
cargo run --bin ollama_example
Related
- Model Providers - Cloud-based LLM providers
- Local Models (mistral.rs) - Native Rust inference
Previous: β Model Providers | Next: Local Models (mistral.rs) β