Security Best Practices¶
This guide covers security considerations and best practices for deploying Olla in production environments.
Default Security Configuration
Key Settings:server: rate_limits: global_requests_per_minute: 1000 per_ip_requests_per_minute: 100 request_limits: max_body_size: 52428800 # 50MB max_header_size: 524288 # 512KB
- Rate limiting enabled by default
- Request size limits prevent abuse
- No authentication built-in (use reverse proxy)
Environment Variables:
OLLA_SERVER_RATE_LIMITS_*
Security Principles¶
Olla follows these security principles:
- Defence in Depth: Multiple layers of security controls
- Least Privilege: Minimal permissions required
- Fail Secure: Safe defaults when errors occur
- Zero Trust: Verify all requests
Network Security¶
Bind Address Configuration¶
Control network exposure carefully:
# Development - local only
server:
host: "localhost" # Only local connections
port: 40114
# Production - controlled exposure
server:
host: "0.0.0.0" # Accept from network
port: 40114
Recommendations:
- Use
localhost
unless network access is required - Deploy behind a reverse proxy for internet exposure
- Use firewall rules to restrict access
TLS/HTTPS Configuration¶
Olla doesn't handle TLS directly. Use a reverse proxy:
nginx example:
server {
listen 443 ssl http2;
server_name api.example.com;
ssl_certificate /path/to/cert.pem;
ssl_certificate_key /path/to/key.pem;
ssl_protocols TLSv1.2 TLSv1.3;
location / {
proxy_pass http://localhost:40114;
proxy_http_version 1.1;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
}
}
Caddy example:
Firewall Rules¶
Restrict access at the network level:
# Allow only from trusted networks
iptables -A INPUT -p tcp --dport 40114 -s 10.0.0.0/8 -j ACCEPT
iptables -A INPUT -p tcp --dport 40114 -j DROP
# Or use UFW
ufw allow from 10.0.0.0/8 to any port 40114
ufw deny 40114
Rate Limiting¶
Protect against abuse and DoS attacks:
Global Rate Limits¶
Prevent system overload:
server:
rate_limits:
global_requests_per_minute: 5000 # Total system capacity
health_requests_per_minute: 1000 # Monitoring endpoints
Per-Client Rate Limits¶
Prevent single client abuse:
server:
rate_limits:
per_ip_requests_per_minute: 60 # Strict for public APIs
burst_size: 10 # Small burst allowance
Rate Limit Strategies¶
Deployment Type | Global RPM | Per-IP RPM | Burst |
---|---|---|---|
Internal API | 10000 | 1000 | 100 |
Public API | 5000 | 60 | 10 |
Development | 1000 | 100 | 50 |
High Security | 1000 | 20 | 5 |
Request Validation¶
Size Limits¶
Prevent resource exhaustion attacks:
server:
request_limits:
max_body_size: 10485760 # 10MB - adjust based on needs
max_header_size: 131072 # 128KB - usually sufficient
Guidelines:
- Set limits based on legitimate use cases
- Smaller limits for public APIs
- Monitor rejected requests
Input Validation¶
Olla validates:
- Request size limits
- Header size limits
- URL format and structure
- HTTP method restrictions
Access Control¶
Internal Endpoints¶
Restrict access to internal endpoints:
# These endpoints should not be public:
# /internal/health
# /internal/status
# /internal/process
# /version
nginx protection:
Backend Endpoint Security¶
Secure your LLM endpoints:
discovery:
static:
endpoints:
# Use internal networks
- url: "http://10.0.1.10:11434" # Internal IP
name: "internal-ollama"
type: "ollama"
# Avoid public endpoints when possible
# If required, ensure they have authentication
Deployment Security¶
Container Security¶
When using Docker:
# Run as non-root user
FROM golang:1.21-alpine AS builder
# ... build steps ...
FROM alpine:latest
RUN adduser -D -g '' appuser
USER appuser
COPY --from=builder /app/olla /usr/local/bin/
# docker-compose.yml
services:
olla:
image: ghcr.io/thushan/olla:latest
user: "1000:1000" # Non-root UID/GID
read_only: true # Read-only filesystem
security_opt:
- no-new-privileges:true
cap_drop:
- ALL
cap_add:
- NET_BIND_SERVICE # Only if needed for low ports
Kubernetes Security¶
Security policies for Kubernetes:
apiVersion: v1
kind: Pod
metadata:
name: olla
spec:
securityContext:
runAsNonRoot: true
runAsUser: 1000
fsGroup: 1000
containers:
- name: olla
image: ghcr.io/thushan/olla:latest
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop:
- ALL
resources:
limits:
memory: "512Mi"
cpu: "1000m"
requests:
memory: "256Mi"
cpu: "100m"
Process Isolation¶
Run Olla with minimal privileges:
# systemd service
[Service]
User=olla
Group=olla
PrivateTmp=yes
ProtectSystem=strict
ProtectHome=yes
NoNewPrivileges=yes
ReadWritePaths=/var/log/olla
Logging and Auditing¶
Security Logging¶
Configure appropriate logging:
server:
request_logging: true # Log all requests
logging:
level: "info"
format: "json" # Structured logs for analysis
output: "stdout"
Log Sensitive Data¶
Never log:
- API keys or tokens
- Request/response bodies with sensitive data
- User credentials
- Internal IP addresses in public logs
Audit Trail¶
Monitor these security events:
- Rate limit violations
- Request size limit violations
- Circuit breaker trips
- Failed health checks
- Configuration changes
Secrets Management¶
Configuration Files¶
Protect configuration files:
Environment Variables¶
Use environment variables for sensitive data:
Security Monitoring¶
Key Metrics¶
Monitor for security issues:
- Rate Limit Hits: Potential abuse
- Error Rates: Potential attacks
- Request Patterns: Unusual activity
- Circuit Breaker: Endpoint failures
- Response Times: DoS indicators
Alerting Thresholds¶
Set alerts for:
- Rate limit violations > 10/minute
- Error rate > 5%
- Circuit breaker trips > 3/hour
- Response time > 10s
- Memory usage > 80%
Common Security Issues¶
1. Exposed Internal Endpoints¶
Risk: Information disclosure
Mitigation:
2. No Rate Limiting¶
Risk: DoS attacks
Mitigation:
3. Large Request Acceptance¶
Risk: Resource exhaustion
Mitigation:
4. Public Bind Address¶
Risk: Unintended exposure
Mitigation:
Security Checklist¶
Production deployment checklist:
- Configure rate limiting
- Set request size limits
- Use reverse proxy with TLS
- Restrict bind address appropriately
- Configure firewall rules
- Run as non-root user
- Protect configuration files
- Enable request logging
- Monitor security metrics
- Regular security updates
- Implement log rotation
- Set up alerting
- Document security procedures
Incident Response¶
Rate Limit Violations¶
When detecting abuse:
- Check logs for source IPs
- Verify legitimate vs malicious traffic
- Adjust rate limits if needed
- Block malicious IPs at firewall
- Document the incident
Circuit Breaker Trips¶
When endpoints fail:
- Check endpoint health directly
- Review error logs
- Verify network connectivity
- Check for attacks on backends
- Implement additional monitoring
Compliance Considerations¶
Data Protection¶
- Olla doesn't store request/response data
- Logs may contain metadata
- Configure log retention appropriately
- Implement log encryption if required
Network Segmentation¶
- Deploy in appropriate network zones
- Use private networks for backend communication
- Implement network policies
- Regular security assessments
Next Steps¶
- Performance Best Practices - Optimise safely
- Monitoring Guide - Security monitoring
- Configuration Reference - Security settings