CLI Over HTTPS Part 2: Proving It

In Part 1, I argued that SSH is a slow transport for network automation at scale and that HTTPS is fundamentally faster. Round-trip analysis and back-of-napkin math are useful, but they’re not proof. This post is the proof.

I built clibench, a dual-protocol network device emulator and benchmark client that measures the difference at realistic latencies sourced from Verizon’s published backbone measurements.

Design Constraints

For the comparison to mean anything, the test has to be fair. That means:

Same device, same commands, same output. Both transports hit the same device.Device struct. The only variable is how the command arrives and how the response leaves.
Same latency for both. Delay is injected at the TCP connection level, not the application level. Both SSH and HTTPS experience identical network conditions.
Realistic latency values. No made-up numbers. Every profile is sourced from published measurements.
Multiple modes per transport. SSH gets tested with fresh connections and with connection reuse (ControlMaster-style). HTTPS gets tested with fresh connections, keep-alive, multi-command batching, and config push. Each mode represents a real-world usage pattern.

Architecture

clibench is written in Go. The project has nine packages:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
internal/
  bench/        Benchmark runner: SSH, HTTPS, proxy, and PTY modes
  device/       Shared command engine: prefix matching, transcript loading
  sshserver/    crypto/ssh listener, CiSSHGo patterns
  httpserver/   net/http + TLS, ASA-style /admin/exec/ and /admin/config
  proxy/        HTTPS→SSH edge proxy (fresh + pooled modes)
  netem/        tc netem latency injection (Linux, requires sudo)
  latency/      Userspace delay injection (fallback, no root)
  stats/        Benchmark statistics: percentile, summarize, parallel runner
  tlsutil/      Shared self-signed TLS config generator

Benchmark architecture: client on the left, latency injection in the middle, SSH and HTTPS servers both feeding into a shared device.Device on the right

The benchmark client embeds its own server. No separate process needed. Latency is injected at the kernel level using Linux tc netem, applied per-port on the loopback interface so both SSH and HTTPS experience identical network conditions.

The Shared Command Engine

Both servers use the same device.Device:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
type Device struct {
    Hostname      string
    Username      string
    Password      string
    commands      map[string]string // "show version" -> transcript text
    transcriptDir string
}

func (d *Device) Exec(input string) string {
    input = strings.TrimSpace(input)
    if input == "" {
        return ""
    }
    // exact match first
    if resp, ok := d.commands[input]; ok {
        return resp
    }
    // prefix match ("sh ver" -> "show version")
    // returns "% Ambiguous command" or "% Unknown command" on miss
}

Command transcripts are plain text files loaded from a directory. The filename convention maps to the command: show_version.txt becomes show version. Templates support {{.Hostname}} substitution. This follows the same pattern as CiSSHGo, which I wrote about recently.

The SSH Server

The SSH side uses Go’s crypto/ssh package with an ed25519 host key generated at startup. It supports both exec mode (ssh host "show version") and interactive shell sessions with prompt rendering and command matching. The benchmark client tests both, since real-world tools are split: libraries like Go’s x/crypto/ssh use exec mode, while Netmiko, Ansible, and Scrapli use PTY/shell.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
// Exec mode: split newline-delimited payloads for batch support
case "exec":
    if len(req.Payload) > 4 {
        execCmd = string(req.Payload[4:])
    }
    req.Reply(true, nil)
    if execCmd != "" {
        var out strings.Builder
        for _, line := range strings.Split(execCmd, "\n") {
            cmd := strings.TrimSpace(line)
            if cmd != "" {
                out.WriteString(s.dev.Exec(cmd))
            }
        }
        io.WriteString(ch, out.String())
    }
    ch.SendRequest("exit-status", false, []byte{0, 0, 0, 0})

The HTTPS Server

The HTTPS side generates a self-signed P-256 ECDSA certificate at startup (negotiating TLS 1.3 with TLS_AES_128_GCM_SHA256) and exposes the same endpoints as the Cisco ASA HTTP interface:

GET /admin/exec/show+version. Single command, URL-encoded
GET /admin/exec/cmd1/cmd2/cmd3. Multiple commands, slash-separated
POST /admin/config. Bulk commands, newline-delimited body

Authentication is HTTP Basic over TLS, matching the ASA’s behavior.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
func (s *Server) handleExec(w http.ResponseWriter, r *http.Request) {
    path := strings.TrimPrefix(r.URL.Path, "/admin/exec/")
    parts := strings.Split(path, "/")
    var out strings.Builder
    for _, p := range parts {
        cmd := strings.ReplaceAll(p, "+", " ")
        cmd = strings.TrimSpace(cmd)
        if cmd == "" {
            continue
        }
        out.WriteString(s.dev.Exec(cmd))
    }
    w.Header().Set("Content-Type", "text/plain")
    io.WriteString(w, out.String())
}

The Benchmark Client

The client calls both transports with the same commands and measures wall-clock time. The key difference is visible in the code: SSH requires connection setup, auth, channel open, and per-command exec requests. HTTPS is a single HTTP call:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
// SSH: connect + auth + exec per command
client, _ := ssh.Dial("tcp", addr, sshConfig)
for _, cmd := range commands {
    session, _ := client.NewSession()
    output, _ := session.CombinedOutput(cmd)
    session.Close()
}

// HTTPS: one request, all commands
url := "https://" + addr + "/admin/exec/" + strings.Join(commands, "/")
resp, _ := httpClient.Get(url)
body, _ := io.ReadAll(resp.Body)

Latency Injection

Latency is injected using Linux tc netem on the loopback interface, configured entirely via the vishvananda/netlink library, the same netlink library used by Docker and Kubernetes. The tool sets up a prio qdisc with per-port u32 filters so that traffic to the SSH and HTTPS server ports gets the configured one-way delay, while other loopback traffic is unaffected. This requires root or CAP_NET_ADMIN, the same requirement as most raw-socket networking tools.

Netlink qdisc setup code (click to expand)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
// Qdisc setup via netlink - no shell-out to tc
prio := netlink.NewPrio(netlink.QdiscAttrs{
    LinkIndex: loopbackIndex,
    Handle:    netlink.MakeHandle(1, 0),
    Parent:    netlink.HANDLE_ROOT,
})
prio.Bands = 4
prio.PriorityMap = [16]uint8{} // unmatched traffic → band 0 (no delay)
netlink.QdiscAdd(prio)

netem := netlink.NewNetem(
    netlink.QdiscAttrs{Parent: netlink.MakeHandle(1, 2)},
    netlink.NetemQdiscAttrs{Latency: uint32(delay.Microseconds())},
)
netlink.QdiscAdd(netem)

// Per-port u32 filter: match dport, classify to delayed band
netlink.FilterAdd(&netlink.U32{
    FilterAttrs: netlink.FilterAttrs{Parent: netlink.MakeHandle(1, 0), Protocol: 0x0800},
    ClassId:     netlink.MakeHandle(1, 2),
    Sel: &netlink.TcU32Sel{
        Flags: netlink.TC_U32_TERMINAL,
        Keys:  []netlink.TcU32Key{{Val: uint32(port), Mask: 0xffff, Off: 20}},
    },
})

Because netem operates at the kernel’s network stack, it captures real TCP behavior: Nagle’s algorithm, delayed ACKs, TCP window scaling, and proper per-packet delay. Every packet in both directions, client-to-server and server-to-client, experiences the configured delay. This is more accurate than userspace delay injection, which can’t distinguish between logically separate protocol exchanges that happen to be coalesced into a single write.

A -userspace flag is available as a fallback for environments where root isn’t available, but the published numbers all use tc netem.

Latency Profiles

Each profile corresponds to a real network path, sourced from Verizon Enterprise’s monthly IP latency statistics (March 2026). The simulated RTT values are rounded for readability; the Verizon measured column shows the exact source data:

Profile	Simulated RTT	Real-world path	Verizon measured RTT
`local`	0ms	Co-located	Baseline
`campus`	2ms	Same data center	AWS/Prisma: 1-2ms
`regional`	30ms	US backbone	29.9ms
`continental`	70ms	NYC ↔ London	70.2ms
`intercontinental`	150ms	US ↔ Hong Kong	145.5ms
`transpacific`	175ms	NA ↔ Taiwan	175.2ms

Benchmark Modes

The client tests these scenarios across both transports (plus a multi-command GET mode when running more than one command per iteration):

SSH Exec Modes

SSH exec mode opens a channel, sends a command, and reads the output. This is what Go’s x/crypto/ssh, Paramiko’s exec_command(), and OpenSSH’s ssh host "cmd" use. Each command gets its own channel.

Mode	What it measures
`ssh/fresh-conn`	Full SSH lifecycle per iteration: TCP + handshake + auth + channel + exec
`ssh/reuse-conn`	One SSH connection shared across all iterations (ControlMaster-style)
`ssh/batch-exec`	Multi-line command string over a single exec session

SSH PTY/Shell Modes

SSH PTY mode opens an interactive shell with a pseudo-terminal, sends commands as keystrokes, and detects the prompt after each command. This is what Netmiko, Ansible network_cli, Scrapli, and most real-world network automation tools use. Many network devices don’t support exec mode properly, and automation tools need prompt detection, pagination control, and mode transitions. (Part 1 called this the “screen-scraping tax”, the cost of parsing an unstructured byte stream.)

The PTY benchmark includes session preparation (sending terminal length 0 and terminal width 511 before the first command) and per-command echo verification (reading until the echoed command appears, then reading until the prompt), matching the protocol-level behavior common to all major tools.

Mode	What it measures
`ssh/pty-fresh`	Full SSH lifecycle + PTY + shell + session prep + commands with echo verification
`ssh/pty-reuse`	Shared connection, new PTY/shell per iteration with session prep

HTTPS Modes

Mode	What it measures
`https/fresh-conn`	New TCP + TLS handshake per iteration (`DisableKeepAlives: true`)
`https/keep-alive`	Single TCP + TLS connection reused across all iterations (default HTTP behavior)
`https/batch-post`	All commands in one POST body (`/admin/config`)
`https/multi-cmd`	All commands in one GET request (ASA `/admin/exec/cmd1/cmd2` syntax)

Each mode runs N iterations, each executing 5 show version commands. The client reports min, max, average, p50, p95, and standard deviation.

Running It Yourself

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
git clone https://github.com/lykinsbd/clibench.git
cd clibench
go build -o bin/bench ./cmd/bench/

# Baseline - no added latency (no root needed)
./bin/bench -latency local -iterations 20 -commands 5

# US backbone - 30ms RTT (requires root for tc netem)
sudo ./bin/bench -latency regional -iterations 20 -commands 5

# US to Hong Kong - 150ms RTT
sudo ./bin/bench -latency intercontinental -iterations 20 -commands 5

# Fallback: userspace delay injection (no root, less accurate)
./bin/bench -latency regional -iterations 20 -commands 5 -userspace

clibench embeds its own server. No separate process needed. Non-local profiles require root (or CAP_NET_ADMIN) for tc netem on the loopback interface. Output is JSON.

Results

5 commands per iteration, all times in milliseconds (average of 20 iterations). Latency injected via tc netem on the loopback interface.

At zero latency, SSH exec mode is the fastest option. There’s no round-trip penalty, and SSH’s binary framing has less per-message overhead than HTTP headers + TLS:

Bar chart at 0ms RTT showing SSH exec-fresh at 3.9ms, PTY-fresh at 4.3ms, and HTTPS fresh at 12.0ms

The moment real network latency enters the picture, the result flips. At 30ms RTT, a US backbone path per Verizon’s March 2026 measurements, HTTPS batch is 16.8x faster than SSH PTY fresh and 15.9x faster than SSH exec fresh-conn:

Bar chart at 30ms RTT showing SSH exec-fresh at 494ms, PTY-fresh at 522ms, and HTTPS batch at 31ms

At intercontinental distances (US ↔ Hong Kong, 150ms RTT), SSH PTY fresh takes 2.6 seconds for 5 commands. SSH exec fresh takes 2.4 seconds. HTTPS batch does it in 151ms:

Bar chart at 150ms RTT showing SSH PTY-fresh at 2,565ms, exec-fresh at 2,412ms, and HTTPS batch at 151ms

Exec Mode vs PTY/Shell Mode

The PTY overhead comes from session preparation (terminal length 0, terminal width 511) and per-command echo verification. At higher latencies, this adds up:

Profile	RTT	SSH exec fresh	SSH PTY fresh	PTY overhead
local	0ms	3.9ms	4.3ms	+0.4ms
campus	2ms	40ms	42ms	+2ms
regional	30ms	494ms	522ms	+28ms
continental	70ms	1,144ms	1,213ms	+69ms
intercontinental	150ms	2,412ms	2,565ms	+153ms

The PTY overhead scales linearly with RTT because the session prep commands add roughly one extra round trip of overhead before the first real command runs. At 150ms RTT, that’s ~150ms of pure protocol overhead. And this is the best case. Real devices add processing time, ANSI escape codes, and prompt detection regex that the emulator doesn’t capture.

Speedup vs SSH PTY fresh (what most tools actually use)

Profile	RTT	SSH exec fresh	SSH reuse	HTTPS keep-alive	HTTPS batch
local	0ms	1.1x	3.9x	7.2x	19.1x
campus	2ms	1.1x	1.8x	3.3x	16.3x
regional	30ms	1.1x	1.7x	3.3x	16.8x
continental	70ms	1.1x	1.7x	3.4x	16.3x
intercontinental	150ms	1.1x	1.7x	3.4x	17.0x

What the Numbers Say

All results are from 20 iterations per profile. Variance was low, at regional (30ms), SSH exec fresh-conn p50 was 492ms with p95 at 508ms.

At zero latency, SSH exec wins. When there’s no network delay, TLS handshake overhead dominates. SSH exec fresh-conn takes 3.9ms; HTTPS fresh-conn takes 12.0ms. But PTY mode is already slower at 4.3ms due to session prep overhead. The reuse modes tell a different story: SSH exec reuse (1.1ms) and HTTPS keep-alive (0.6ms) are both sub-millisecond. Once the handshake is amortized, both protocols are fast.

Most automation tools don’t use exec mode. As covered above, they use PTY/shell mode for prompt detection, pagination control, and mode transitions. The PTY numbers are what your automation actually experiences.

SSH reuse helps, but not enough. Sharing one SSH connection (the ControlMaster pattern) eliminates the handshake cost, but each command still requires its own round trips. The improvement is consistent at ~1.7x. Real, but modest.

HTTPS keep-alive is ~3.4x faster at any real latency. Every HTTP client library does connection pooling by default. You don’t have to configure anything special. Just reuse the http.Client. At 30ms RTT, that’s 158ms vs 522ms (PTY fresh).

HTTPS batch is ~17x faster. Batching all commands into a single HTTP request eliminates per-command round trips entirely. The entire exchange costs one round trip regardless of command count. At 150ms RTT, that’s 151ms vs 2,565ms (PTY fresh). Unlike keep-alive (which still pays one round trip per command), batch mode pays a fixed cost regardless of command count.

The advantage grows with command count, for per-command modes. At 30ms RTT with 50 commands, SSH exec fresh-conn takes 3,253ms. SSH PTY fresh takes 1,912ms (PTY avoids per-command channel overhead but pays per-command echo verification). HTTPS keep-alive takes 1,548ms. But HTTPS batch takes just 33ms. A ~99x improvement over exec fresh and ~58x over PTY fresh. SSH batch-exec shows the same flat scaling (~250ms regardless of command count), confirming this is a property of batching, not the transport.

Time vs command count at 30ms RTT: batch modes stay flat while per-command modes scale linearly

What This Means at Scale

If you’re managing 100 devices serially (worst case, no concurrency), using PTY mode (what Netmiko/Ansible actually do):

Profile	RTT	SSH PTY fresh (total)	HTTPS batch (total)	Time saved
regional	30ms	52s	3.1s	49s
continental	70ms	121s (2.0 min)	7.4s	114s
intercontinental	150ms	257s (4.3 min)	15s	242s (4.0 min)

Concurrency shrinks the wall time, but the per-device cost stays the same. At 150ms RTT with 10 concurrent workers against 1,000 devices, SSH PTY takes ~4.3 minutes of wall time. HTTPS batch takes ~15 seconds.

Limitations

This benchmark measures transport overhead, not device processing time. Real network devices add their own latency to command execution: parsing the command, generating output, writing to the terminal. That cost is the same regardless of transport, so it doesn’t affect the relative comparison.

The HTTPS server uses a self-signed certificate with no session resumption. TLS 1.3 0-RTT resumption would make the HTTPS numbers even better on repeated connections, but I didn’t implement it because most device management scenarios don’t maintain long-lived TLS sessions.

A -userspace flag is available as a fallback for environments where root isn’t available, but it under-counts SSH round trips due to write coalescing in Go’s crypto/ssh. The published numbers all use tc netem.

What’s Next

In Part 3, I’ll look at what happens when you can’t change the device: the proxy pattern. Move SSH to the edge, talk HTTPS over the WAN, and capture most of the improvement without touching a single device config.

The benchmark code already supports proxy mode. Try it yourself and see what your numbers look like.