Filters¶
Olla provides a powerful filtering system that allows you to control which models and profiles are available in your deployment. Filters use glob patterns with wildcard support, making it easy to include or exclude resources based on naming patterns.
Core Concepts¶
Filter Configuration¶
Filters are configured using include
and exclude
lists:
filter:
include: # Only items matching these patterns are allowed
- "llama*"
- "mistral*"
exclude: # Items matching these patterns are rejected
- "*test*"
- "*debug*"
Pattern Matching¶
Olla supports glob-style patterns with the *
wildcard:
Pattern | Matches | Examples |
---|---|---|
* | Everything | All models/profiles |
llama* | Starts with "llama" | llama3-8b , llama2-70b |
*-7b | Ends with "-7b" | mistral-7b , qwen-7b |
*embed* | Contains "embed" | nomic-embed-text , text-embedding-ada |
deepseek-* | Starts with "deepseek-" | deepseek-coder , deepseek-r1 |
Precedence Rules¶
- Exclude takes precedence over include - If an item matches both include and exclude patterns, it will be excluded
- No filter means allow all - If no filter is specified, all items are allowed
- Empty include means exclude all - An empty include list with no exclude list blocks everything
Where Filters Can Be Applied¶
Filters can be applied at multiple levels in your Olla configuration:
1. Profile Filtering¶
Control which inference profiles are loaded at startup. Learn more →
2. Endpoint Model Filtering¶
Filter models at the endpoint level during discovery. Learn more →
discovery:
static:
endpoints:
- name: ollama-prod
url: http://localhost:11434
model_filter:
exclude:
- "*embed*" # No embedding models
- "*test*" # No test models
- "nomic-*" # No nomic models
3. Global Model Filtering (Planned)¶
Filter models globally across all endpoints. This feature is planned for a future release.
Common Use Cases¶
Production Deployment¶
Exclude test and experimental models:
Specialized Services¶
Embedding Service¶
Only allow embedding models:
Chat Service¶
Only allow conversational models:
Code Generation Service¶
Only allow code-focused models:
Model Size Restrictions¶
Small Models Only (≤13B)¶
Large Models Only (≥34B)¶
Provider-Specific Filtering¶
Only OpenAI-Compatible Models¶
Local Models Only¶
Performance Considerations¶
- Filter evaluation is cached - Pattern matching results are cached for performance
- Discovery-time filtering - Models are filtered during discovery, not at request time
- Minimal overhead - The filter system adds negligible latency to model discovery
Debugging Filters¶
To see which models are being filtered:
-
Enable debug logging:
-
Check the logs during model discovery:
-
Use the status endpoint to verify active models:
Best Practices¶
- Be specific with patterns - Avoid overly broad patterns that might accidentally exclude needed models
- Test filters in development - Verify your filters work as expected before deploying to production
- Document your filters - Add comments explaining why certain patterns are included/excluded
- Use exclude for security - Explicitly exclude sensitive or inappropriate models
- Consider maintenance - Design patterns that won't break when new models are added
Examples¶
Complete Configuration Example¶
# Exclude embedding models from a general-purpose endpoint
discovery:
static:
endpoints:
- name: chat-endpoint
url: http://localhost:11434
type: ollama
model_filter:
include:
- "llama*" # Llama family
- "mistral*" # Mistral family
- "qwen*" # Qwen family
exclude:
- "*embed*" # No embeddings
- "*-uncensored" # No uncensored variants
- name: embedding-endpoint
url: http://localhost:11435
type: ollama
model_filter:
include:
- "*embed*" # Only embeddings
- "bge-*" # BGE models
- "e5-*" # E5 models
# Only load production-ready profiles
proxy:
profile_filter:
exclude:
- "*test*"
- "*debug*"
Migration from Unfiltered Setup¶
If you're adding filters to an existing deployment:
- Audit current models: List all models currently in use
- Design inclusive patterns: Start with broad includes
- Add specific excludes: Gradually add exclusions
- Test thoroughly: Verify critical models aren't filtered
- Deploy gradually: Roll out to staging before production
Related Documentation¶
- Configuration Reference - Complete configuration options
- Configuration Examples - Practical configuration examples
- Profile System - Understanding inference profiles