Description
Describe the bug
I am trying to use the OpenTelemetry Lambda layers in a Container image. While this functionality is not currently documented, it seems reasonable to expect it to work, as the Lambda layers are just unzipped into the /opt
directory for ZIP file distributions.
The OpenTelemetry Lambda Collector extension fails when running a Lambda container locally with Docker and invoking it with an event. The Lambda aborts execution and the Collector fails with the following error:
{"level":"fatal","ts":1749391498.7609231,"msg":"Cannot start Telemetry API Listener","error":"failed to find available port: listen tcp: lookup sandbox.localdomain on 127.0.0.11:53: no such host"}
Steps to reproduce
I have created a repository to reproduce the issue at https://github.com/gotgenes/lambda-opentelemetry-docker.
The repository includes the following:
- A Node.js Lambda function implemented in TypeScript.
- A Dockerfile to build the Lambda container image from the Node.js v22 Lambda base image with the OpenTelemetry Lambda layers.
- A Docker Compose file to run the Lambda container locally with Docker, along with an otel-tui sidecar container to view the telemetry.
- A CDK app to create the ECR repository and deploy the Lambda function.
The steps to reproduce the issue are as follows:
-
Clone the repository.
-
Set the environment variables:
export AWS_PROFILE=$YOUR_PROFILE export COMPOSE_BAKE=true
-
Build and start the Lambda container with Docker Compose:
docker compose up --build
-
In a separate terminal session, invoke the Lambda function:
curl -XPOST -i -d '{}' http://localhost:9000/2015-03-31/functions/function/invocations
-
Observe the logs in the terminal where Docker Compose is running.
Please see the repository's README for detailed instructions.
What did you expect to see?
I expected the OpenTelemetry Lambda Collector extension to start successfully and the Lambda function to execute without errors, allowing me to see trace data in the otel-tui sidecar container, and for the curl command to receive a successful response.
What did you see instead?
I get a 502 response from the curl command, and the following logs in the terminal where Docker Compose is running:
2025-06-08 10:01:57.408 | 08 Jun 2025 14:01:57,408 [INFO] (rapid) exec '/var/runtime/bootstrap' (cwd=/var/task, handler=)
2025-06-08 10:04:58.719 | 08 Jun 2025 14:04:58,719 [INFO] (rapid) INIT START(type: on-demand, phase: init)
2025-06-08 10:04:58.719 | START RequestId: d2de0040-511a-4348-bf8f-b590e057ee0c Version: $LATEST
2025-06-08 10:04:58.752 | {"level":"info","ts":1749391498.7526667,"msg":"Launching OpenTelemetry Lambda extension","version":"v0.126.0"}
2025-06-08 10:04:58.754 | 08 Jun 2025 14:04:58,754 [INFO] (rapid) External agent collector (3dcc14c5-3f8b-4004-b7c5-556b5deea231) registered, subscribed to [INVOKE SHUTDOWN]
2025-06-08 10:04:58.761 | {"level":"fatal","ts":1749391498.7609231,"msg":"Cannot start Telemetry API Listener","error":"failed to find available port: listen tcp: lookup sandbox.localdomain on 127.0.0.11:53: no such host"}
2025-06-08 10:04:58.762 | 08 Jun 2025 14:04:58,762 [WARNING] (rapid) First fatal error stored in appctx: Extension.Crash
2025-06-08 10:04:58.762 | 08 Jun 2025 14:04:58,762 [WARNING] (rapid) Process extension-collector-1 exited: exit status 1
2025-06-08 10:04:58.762 | 08 Jun 2025 14:04:58,762 [INFO] (rapid) INIT RTDONE(status: error)
2025-06-08 10:04:58.762 | 08 Jun 2025 14:04:58,762 [INFO] (rapid) INIT REPORT(durationMs: 42.785000)
2025-06-08 10:04:58.762 | 08 Jun 2025 14:04:58,762 [ERROR] (rapid) Init failed error=exit status 1 InvokeID=
2025-06-08 10:04:58.762 | 08 Jun 2025 14:04:58,762 [WARNING] (rapid) Shutdown initiated: spindown
2025-06-08 10:04:58.762 | 08 Jun 2025 14:04:58,762 [INFO] (rapid) Waiting for runtime domain processes termination
2025-06-08 10:04:58.762 | 08 Jun 2025 14:04:58,762 [INFO] (rapid) INIT START(type: on-demand, phase: invoke)
2025-06-08 10:04:58.762 | 08 Jun 2025 14:04:58,762 [INFO] (rapid) INIT REPORT(durationMs: 0.051000)
2025-06-08 10:04:58.762 | 08 Jun 2025 14:04:58,762 [INFO] (rapid) INVOKE START(requestId: bdacc1f2-3764-4f4b-b6af-7fa918202024)
2025-06-08 10:04:58.762 | 08 Jun 2025 14:04:58,762 [ERROR] (rapid) Invoke failed error=ErrAgentNameCollision InvokeID=bdacc1f2-3764-4f4b-b6af-7fa918202024
2025-06-08 10:04:58.762 | 08 Jun 2025 14:04:58,762 [ERROR] (rapid) Invoke DONE failed: Sandbox.Failure
2025-06-08 10:04:58.762 | 08 Jun 2025 14:04:58,762 [WARNING] (rapid) Reset initiated: ReleaseFail
2025-06-08 10:04:58.762 | 08 Jun 2025 14:04:58,762 [WARNING] (rapid) The runtime was not started.
2025-06-08 10:04:58.762 | 08 Jun 2025 14:04:58,762 [WARNING] (rapid) Agent collector (3dcc14c5-3f8b-4004-b7c5-556b5deea231) failed to launch, therefore skipping shutting it down.
2025-06-08 10:04:58.762 | 08 Jun 2025 14:04:58,762 [INFO] (rapid) Waiting for runtime domain processes termination
2025-06-08 10:07:26.750 | 08 Jun 2025 14:07:26,750 [INFO] (rapid) Received signal signal=terminated
2025-06-08 10:07:26.750 | 08 Jun 2025 14:07:26,750 [INFO] (rapid) Shutting down...
2025-06-08 10:07:26.750 | 08 Jun 2025 14:07:26,750 [WARNING] (rapid) Reset initiated: SandboxTerminated
2025-06-08 10:07:26.750 | 08 Jun 2025 14:07:26,750 [INFO] (rapid) Waiting for runtime domain processes termination
What version of collector/language SDK version did you use?
- Collector extension layer version:
v0.15.0
- Node.js layer version:
v0.14.0
What language layer did you use?
JavaScript/Node.js (implemented in TypeScript)
Additional context
While I appreciate this project providing extension layers for the collector and language SDKs that can be accessed for ZIP file distributions, it seems to me that AWS wants to push users towards using container images for Lambda functions. Therefore, I think it would be beneficial to support the OpenTelemetry Lambda layers in container images as well. That would look like also providing base images with the OpenTelemetry Lambda layers included, or at least documenting how to use the layers in a custom Dockerfile. The purpose of my repository is to demonstrate how to do that, but it currently does not work.