Benchmarking for Local LLM – Franklin M. Siler

Posted in Uncategorized

May 16, 2025

There’s probably some database or registry. I returned an M4 Mac Mini with 24GB RAM and the base CPU recently and ordered a Mac Studio M4 Max 16/40/16, 128GB RAM.

I also tested on an Macbook Pro M1 Pro, 32GB RAM I have access to, and gave the following prompt:

write a simple Flask API with OpenAPI endpoints

LLM Model	M1 Pro, 32GB RAM eval rate (tokens/s)	M4 Max, 128GB RAM eval rate (tokens/s)	Intel NUC12i5, 64GB Ram, running Ipex Ollama
qwq:32B	4.97	18.6, 16.8, 17.3	—
qwen2.5:7b	26.1	71.2, 73.2, 72.1	5.09, 4.95, 5.27
am-thinking-v1	5.73	15.6, 15.4	—
qwen2.5:72b	—	8.75, 8.84, 8.71	—

I vlogged about these, and the setup of my new machine, here.