The Right Tool Series
The Right Tool
Where The Thinking Network is abstract and cinematic, The Right Tool is analytical and precise. This is the workbench. The focus here is physical and tangible: hardware, tools, and decisions.
You do not know what an instrument can do until you force it to carry weight.
This series documents the evaluation of local LLMs against actual autonomous network workloads. It is a field study built to answer one question: which model belongs where in a system that actually has to work?
The routing choices are not based on leaderboards. They are based on observed behavior, physical hardware constraints, and the absolute requirement for sovereign intelligence.
The Field Work
- Part 1: Cold Start. Testing six local AI models against five actual network lab tasks. This baseline run exposed a false negative in breach detection and a hardware reality that bypassed the GPU entirely.
Evaluating Local LLMs on Network Tasks: The Right Tool (Part 1)
Six local LLMs tested against five real network lab tasks. A baseline cold start run exposing the gap between assumed hardware capability and actual execution.

- Part 2: The Fair Fight. A flawed test lies to you about the tool. By correcting token budgets and extending timeouts, every model was given its full room to operate.
Coming Soon.
- Part 3: The GPU Question. Turning on the NVIDIA T600 to measure actual VRAM utilization and thermal limits. This established the hard boundary that defines voice loop viability.
Coming Soon.
- Part 4: The Writing Test. Code tests show you if a model can build, but prose tests show you if a model can reason. Evaluating the models on unstructured incident narratives and session logs inverted the board.
Coming Soon.
- Part 5: The Routing Decision. The final routing table. Assigning Qwen 2.5 Coder to background tasks, Gemma 3 4B to voice loops, and evicting the premier cloud API entirely to maintain complete system sovereignty.
Coming Soon.
- Part 6: The Benchmark, Portable. Extracting the methodology into a reusable toolkit. Real tasks, predefined scoring, and honest error reporting built to run on your own hardware.