🚀 Local LLM Benchmarking Results

Intel i7-1185G7 with Iris Xe Graphics • 32GB RAM • Dell Latitude 5420

GPU vs CPU Performance
Tokens per second by model size (higher is better)
Framework Performance
OpenVINO vs Ollama across different models
Overall Model Performance Ranking
All tested configurations sorted by tokens/second
GPU Speedup vs CPU
Performance multiplier (1.0x = equal performance)
Performance by Parameter Count
How model size affects inference speed

📊 Key Insights

GPU Advantage: Minimal

For 7B models, GPU only provides 1.3x speedup over CPU. The setup complexity often outweighs the marginal performance gains.

🏆 CPU Competitiveness

CPU tied or beat GPU for models ≤4B parameters. Modern Intel CPUs are highly capable for smaller models.

🔧 Framework Matters

OpenVINO CPU was 1.6x faster than Ollama for Mistral 7B, but Ollama outperformed OpenVINO by 32% for Llama 3.1 8B.

🎯 Best Performer

Qwen3-VL 8B via Ollama achieved 5.14 tok/s—multimodal capabilities with zero GPU complexity.

💾 Memory Ceiling

32GB RAM maxed out at 8B models. Larger models that might benefit more from GPU require more memory.

Recommendation

For most use cases: Use Ollama on CPU. Skip GPU setup unless running sustained high-throughput workloads.