The Thinking Network (Installment 9): Reactive BGP Automation & SR Linux
Installment 9 of The Thinking Network. We build a reactive actuator that reads Prometheus telemetry and autonomously manipulates BGP Local Preference on SR Linux to mitigate SLA breaches.
Architecture Overview: Phase 6 (The Reactive Actuator)Objective: Implement closed-loop, reactive network automation based on telemetry thresholds.Core Technologies: BGP Local Preference manipulation, Python automation, Nokia SR Linux CLI (sr_cli), and Prometheus metrics.The Goal: Deploy a programmatic "thermostat" that detects network latency breaches (>5ms) and autonomously shifts traffic paths without human intervention.The sensing layer is running. Prometheus is receiving data every ten seconds. L3 latency is sitting at 1.8ms. BGP sessions are all established.
Now we introduce a problem.
docker exec clab-nbl-diamond-v1-client-l3-1 tc qdisc add dev eth1 root netem delay 8ms
One command on the client container adds 8 milliseconds of artificial delay to every packet leaving its fabric-facing interface. From the network's perspective, the path has degraded. L3 RTT climbs from under 2ms to over 9ms. The SLA threshold is 5ms.
What happens next happens without human intervention. What we are building in this phase is a reactive autonomous network. It is the necessary foundation before we can introduce actual predictive intelligence.
What the Actuator Is
Phase 6 is a monitoring loop. Every ten seconds it queries Prometheus for the current nokia_latency_l3_ms value. It compares that value to the SLA threshold. It maintains a single state variable: within SLA or rerouted.
When latency crosses 5ms, it changes one BGP attribute. When latency recovers below 5ms, it changes it back.
That is the complete description. There is no machine learning. There is no model. There is threshold detection and a binary response. It is, as this series has said before, a thermostat.
A thermostat that operates faster than any human can respond, that never misses a breach, and that restores service without waiting for someone to log in and fix it.
BGP Local Preference
The attribute the actuator changes is called Local Preference.
Local Preference is an iBGP attribute - it only exists within a single autonomous system and is not advertised to external peers. It is a number attached to a route that tells BGP how much to prefer it relative to alternatives. The default is 100. Higher values win.
In this diamond topology, srl1 has two paths to srl4:
- Path A through srl2 - the primary path
- Path B through srl3 - the alternate path
Under normal conditions both paths have equal Local Preference. BGP picks Path A based on other tie-breaking criteria. Traffic flows srl1 to srl2 to srl4.
When the actuator detects a breach it sets Local Preference on srl1's iBGP group from 100 to 50. BGP reconverges across all four nodes. Every router now prefers Path B. Traffic flows srl1 to srl3 to srl4.
The actuator did not reprogram forwarding tables. It did not touch IS-IS. It changed a number in the control plane and let BGP do what BGP does.
The SR Linux v26 Discovery
The first version of the actuator used this pattern to push the BGP change:
cmd = "enter candidate; set / network-instance default protocols bgp group ibgp-mesh local-preference 50; commit stay"
subprocess.run(["docker", "exec", container_name("srl1"), "sr_cli", "-c", cmd], ...)
SR Linux v26 does not accept semicolon-chained commands via the -c flag. The parser sees enter candidate as the full command and then hits set as an unknown token in that context. The error:
sr_cli error: Parsing error: Unknown token 'set'.
Options are ['#', '>', '>>', 'candidate', 'show', 'state', '|']
The fix required finding how SR Linux v26 actually accepts multi-line input. The answer is stdin. Pass the commands as a newline-separated string via subprocess.run's input parameter, with docker exec -i to allow stdin attachment:
cmd = (
f"enter candidate\n"
f"set / network-instance default protocols bgp group ibgp-mesh local-preference {pref}\n"
f"commit stay\n"
)
result = subprocess.run(
["docker", "exec", "-i", container_name("srl1"), "sr_cli"],
input=cmd,
capture_output=True, text=True, timeout=15
)
Verified directly first:
printf 'enter candidate\nset / network-instance default protocols bgp group ibgp-mesh local-preference 50\ncommit stay\n' \
| docker exec -i clab-nbl-diamond-v1-srl1 sr_cli
Response:
All changes have been committed. Starting new transaction.
That is the pattern. Newlines instead of semicolons. Stdin instead of -c. The script was updated and the actuator was restarted.
What the Terminal Showed
With the congestion injected and the fixed actuator running:
2026-05-17 11:41:07 [INFO] Sense | L3: 9.53ms | SLA: 5.0ms | State: OK
2026-05-17 11:41:17 [WARNING] SLA BREACH: 9.53ms > 5.0ms
2026-05-17 11:41:17 [WARNING] [ACTION] SLA violated -- rerouting via Path B (srl1->srl3)
2026-05-17 11:41:17 [WARNING] Setting BGP local-preference: 100 -> 50 on srl1
2026-05-17 11:41:17 [WARNING] [ACTION] Reroute applied -- traffic now on Path B
2026-05-17 11:41:27 [INFO] Sense | L3: 9.59ms | SLA: 5.0ms | State: REROUTED
The breach detected. The BGP change committed. State updated to REROUTED.
Then the congestion was removed:
docker exec clab-nbl-diamond-v1-client-l3-1 tc qdisc del dev eth1 root
2026-05-17 11:42:47 [INFO] Sense | L3: 1.86ms | SLA: 5.0ms | State: REROUTED
2026-05-17 11:42:47 [INFO] SLA RECOVERY: 1.86ms <= 5.0ms
2026-05-17 11:42:47 [INFO] [ACTION] Latency recovered -- restoring normal routing (Path A)
2026-05-17 11:42:47 [INFO] Setting BGP local-preference: 50 -> 100 on srl1
2026-05-17 11:42:47 [INFO] [ACTION] Normal routing restored -- traffic back on Path A
2026-05-17 11:42:50 [INFO] Sense | L2: 0.07ms | L3: 2.36ms | SLA: 5.0ms
Latency recovered. BGP restored to 100. State back to OK. Traffic on Path A.
The complete autonomous cycle - breach, reroute, recovery, restore - ran without a person touching anything.
What This Is and What It Is Not
The actuator is a thermostat. It is worth being precise about this.
A thermostat is not unintelligent. A thermostat that works reliably, that fires within ten seconds of a threshold crossing, that restores service without human intervention, has real operational value. In a carrier environment, reducing mean time to remediation from fifteen minutes to ten seconds changes the customer experience in measurable ways.
But a thermostat does not anticipate. It waits for the threshold to be crossed and then reacts. During the ten seconds between the breach and the actuator's next sensing cycle, the network is in violation. That gap is the cost of reactive systems.
The gap is also what makes the next phase worth building.
Phase 7 reads the trajectory of the latency data rather than the instantaneous value. It asks: given where this number has been for the last sixty seconds, where is it going in the next thirty? If the answer is above the threshold, it acts before the threshold is crossed.
The reactive actuator is the foundation that prediction stands on. It provides the action mechanism. Phase 7 provides the earlier trigger.
Build Record
May 21, 2026 - Dell Precision 3571 - Garuda Linux rolling - SR Linux v26.3.2
SR Linux v26 fix - stdin pipe instead of semicolons:
Before:
cmd = "enter candidate; set / ... local-preference {pref}; commit stay"
subprocess.run(["docker", "exec", container_name(SOURCE_NODE), "sr_cli", "-c", cmd], ...)
After:
cmd = f"enter candidate\nset / ... local-preference {pref}\ncommit stay\n"
subprocess.run(["docker", "exec", "-i", container_name(SOURCE_NODE), "sr_cli"], input=cmd, ...)
Full actuator cycle captured:
11:41:17 [WARNING] SLA BREACH: 9.53ms > 5.0ms
11:41:17 [WARNING] Reroute applied -- traffic now on Path B (Local-Pref 100 -> 50)
11:42:47 [INFO] SLA RECOVERY: 1.86ms <= 5.0ms
11:42:47 [INFO] Normal routing restored -- traffic back on Path A (Local-Pref 50 -> 100)
Phase 6 gate criterion met: BGP local-pref changed to 50 within one sensing cycle of breach AND restored to 100 on recovery.
Next installment: The Predictive Brain. Linear regression reads sixty seconds of history and forecasts thirty seconds ahead. The network acts before the threshold is crossed.