Skip to content

Fix XRay Lambda Propagator Implementation Across OTel SDKs #1782

Open
@garysassano

Description

@garysassano

Summary

The current implementation of the AWS X-Ray Lambda propagator (AWSXRayLambdaPropagator) in OpenTelemetry needs to be fixed to properly handle the Sampled=0 flag in the X-Ray trace header set via the _X_AMZN_TRACE_ID environment variable. This affects all OTel SDKs that implement the XRay Lambda propagator.

Background

When AWS Lambda functions run, AWS automatically sets the _X_AMZN_TRACE_ID environment variable. When X-Ray tracing is not enabled for the Lambda, AWS still sets this variable but with Sampled=0, indicating the request should not be sampled.

Source: AWS Lambda docs

The current OpenTelemetry implementation, as described in the specification, incorrectly extracts context from this environment variable even when Sampled=0 is present. This causes any other propagators (like W3C TraceContext) to be skipped, effectively disabling distributed tracing when X-Ray is not enabled.

Source: OTel Lambda instrumentation docs

Issue

The main issue is in the pseudocode implementation in the OpenTelemetry specification:

extract(context, carrier) {
    xrayContext = xrayPropagator.extract(context, carrier)

    // To avoid potential issues when extracting with an active span context (such as with a span link),
    // the `xray-lambda` propagator SHOULD check if the provided context already has an active span context.
    // If found, the propagator SHOULD just return the extract result of the `xray` propagator.
    if (Span.fromContext(context).getSpanContext().isValid())
      return xrayContext

    // If xray-lambda environment variable not set, return the xray extract result.
    traceHeader = getEnvironment("_X_AMZN_TRACE_ID")
    if (isEmptyOrNull(traceHeader))
      return xrayContext

    // Apply the xray propagator using the span context contained in the xray-lambda environment variable.
    return xrayPropagator.extract(xrayContext, ["X-Amzn-Trace-Id": traceHeader])
}

This implementation always extracts from the environment variable if it exists, even when Sampled=0 is present. This forces all spans to be non-sampled even when other propagators like W3C TraceContext would otherwise create a root span.

Proposed Fix

The implementation should be updated to check if the trace header in the environment variable contains Sampled=0 and skip extraction in that case. Here is the proposed updated pseudocode:

extract(context, carrier) {
    // First try to extract from carrier
    xrayContext = xrayPropagator.extract(context, carrier)

    // Check if we got a valid context from the carrier
    if (hasValidSpan(xrayContext))
      return xrayContext

    // Check the environment variable
    traceHeader = getEnvironment("_X_AMZN_TRACE_ID")

    // If no env var or Sampled=0, do not extract further
    if (isEmptyOrNull(traceHeader) || traceHeader.contains("Sampled=0"))
      return xrayContext

    // Fallback: extract from the environment variable
    envCarrier = {"X-Amzn-Trace-Id": traceHeader}
    return xrayPropagator.extract(xrayContext, envCarrier)
}

// Helper function to check if a context has an active span
function hasValidSpan(context) {
    span = Span.fromContext(context)
    spanContext = span.getSpanContext()
    return spanContext.isValid()
}

This fix has been implemented in Node.js, Python, and Rust versions of the library with consistent behavior across all OTel SDKs.

Benefits of This Fix

  1. Properly respects the Sampled=0 flag, allowing other propagators to create root spans when X-Ray is not enabled
  2. Ensures consistent behavior across all language implementations
  3. Maintains backward compatibility for normal X-Ray tracing scenarios
  4. Allows proper integration with W3C TraceContext and other propagation mechanisms

Implementations

Working implementations have been created for:

Action Items

  1. Update the OpenTelemetry specification with the corrected pseudocode
  2. Implement the fix in all OTel SDKs
  3. Release updates with this fix as a priority for AWS Lambda users

Testing

The fix can be verified by:

  1. Creating a Lambda function with X-Ray tracing disabled
  2. Configuring OpenTelemetry with W3C TraceContext and XRay Lambda propagators
  3. Verifying that traces are properly created and sampled by the W3C propagator

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions