Skip to content

No configurable option for HTTP/1 idle timeout on keep-alive connections #14147

Open
@gamerslouis

Description

@gamerslouis

What is the issue?

HTTP/1.1 keep-alive connections are being prematurely closed by the Linkerd proxy after 30 seconds of idleness, and there is currently no way to configure this timeout.

We are using [VictoriaMetrics](https://victoriametrics.com/) to collect metrics from services running inside a Kubernetes cluster. Many of our scrape intervals are configured to 60 seconds. We’ve observed that the vmagent scraper is encountering frequent connection closures from the Linkerd sidecar:

INFO ThreadId(01) linkerd_app_core::serve: Connection closed error=read header from client timeout client.addr=10.244.0.13:37704 server.addr=xxx.xxx.xxx.xxx:xxx

This indicates that persistent HTTP/1 connections used by vmagent to scrape metrics are being closed before the next request is sent. Capturing traffic with tcpdump shows that vmagent uses HTTP keep-alive and only sends requests every 60 seconds, which causes the HTTP/1 connection to remain idle for longer than Linkerd's 30-second default.

When Linkerd is enabled, these connections are closed after 30 seconds of inactivity. From debug logs, we see the following error:

[  1861.475759s] DEBUG ThreadId(01) linkerd_proxy_http::server: The client is shutting down the connection res=Err(hyper::Error(HeaderTimeout))

We traced this to the underlying [Hyper](https://github.com/hyperium/hyper/blob/c88df7886c74a1ade69c0b4c68eaf570c8111622/src/server/conn/http1.rs#L79) implementation used by Linkerd. Hyper's Builder sets a default h1_header_read_timeout of 30 seconds. This timeout is triggered when no new headers are received during that period on an idle HTTP/1 connection.

h1_header_read_timeout: Dur::Default(Some(Duration::from_secs(30))),

Currently, the Linkerd proxy does not expose this setting, and there’s no way to override it via annotations or configuration.

How can it be reproduced?

Establish an HTTP/1.1 keep-alive connection (e.g., using Python’s requests.Session) and leave it idle without sending further requests for over 30 seconds. The Linkerd proxy will close the connection with a HeaderTimeout error.

Logs, error output, etc

[450882.624436s] INFO ThreadId(01) outbound: linkerd_app_core::serve: Connection closed error=read header from client timeout client.addr=10.246.13.59:50568 server.addr=10.246.12.16:9793
[450881.544872s] INFO ThreadId(01) outbound: linkerd_app_core::serve: Connection closed error=read header from client timeout client.addr=10.246.13.59:47154 server.addr=10.246.15.207:9180
[450878.959567s] INFO ThreadId(01) outbound: linkerd_app_core::serve: Connection closed error=read header from client timeout client.addr=10.246.13.59:44612 server.addr=10.246.14.56:9793
[450878.672225s] INFO ThreadId(01) outbound: linkerd_app_core::serve: Connection closed error=read header from client timeout client.addr=10.246.13.59:44486 server.addr=10.246.9.45:9180

output of linkerd check -o short

linkerd-version
---------------
‼ cli is up-to-date
    is running version 25.5.5 but the latest edge version is 25.6.2
    see https://linkerd.io/2/checks/#l5d-version-cli for hints

control-plane-version
---------------------
‼ control plane is up-to-date
    is running version 25.5.5 but the latest edge version is 25.6.2
    see https://linkerd.io/2/checks/#l5d-version-control for hints

linkerd-control-plane-proxy
---------------------------
‼ control plane proxies are up-to-date
    some proxies are not running the current version:
	* linkerd-destination-97c46bc4d-mst8p (edge-25.5.5)
	* linkerd-destination-97c46bc4d-r4pdm (edge-25.5.5)
	* linkerd-destination-97c46bc4d-vsfhf (edge-25.5.5)
	* linkerd-identity-7b5c4b4f75-gvgw5 (edge-25.5.5)
	* linkerd-identity-7b5c4b4f75-l6hln (edge-25.5.5)
	* linkerd-identity-7b5c4b4f75-wsj9m (edge-25.5.5)
	* linkerd-proxy-injector-c58469d8f-257lh (edge-25.5.5)
	* linkerd-proxy-injector-c58469d8f-8f4v2 (edge-25.5.5)
	* linkerd-proxy-injector-c58469d8f-tcqv4 (edge-25.5.5)
    see https://linkerd.io/2/checks/#l5d-cp-proxy-version for hints

Status check results are √

Environment

  • Kubernetes Version: 1.32
  • Cluster Environment: RKE2
  • Host OS: Ubuntu
  • Linkerd Version: 25.5.5

Possible solution

Expose the h1_header_read_timeout setting in the Linkerd proxy configuration, possibly via annotations or config fields, so that users can adjust the HTTP/1 idle timeout behavior to better suit long-polling or low-frequency scraping use cases.

Additional context

No response

Would you like to work on fixing this bug?

maybe

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions