Skip to content

Propagate Step Function Trace Context through Managed Services #667

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 19 commits into from
Jul 31, 2025

Conversation

ryanstrat
Copy link
Contributor

@ryanstrat ryanstrat commented Jul 15, 2025

What does this PR do?

This PR adds support for extracting step function trace context in the following cases:

  1. SFN -> EventBridge -> Lambda
  2. SFN -> EventBridge -> SQS -> Lambda
  3. SFN -> SQS -> Lambda
  4. SFN -> SNS -> Lambda
  5. SFN -> SNS -> SQS -> Lambda

Motivation

This PR adds feature parity to the JS libraries to match functionality in the python tracing library: DataDog/datadog-lambda-python#573

Testing Guidelines

I published a private layer to the ddserverless account and verified that all cases produced the intended traces. The resulting traces are below.

  1. SFN -> EventBridge -> Lambda
SFN-EVB-Lambda
  1. SFN -> EventBridge -> SQS -> Lambda
SFN-EVB-SQS-Lambda
  1. SFN -> SQS -> Lambda
SFN-SQS-Lambda
  1. SFN -> SNS -> Lambda
SFN-SNS-Lambda
  1. SFN -> SNS -> SQS -> Lambda
SFN-SNS-SQS-Lambda

Additional Notes

Minor update to the publish script follows a patter from other layers to allow adding a suffix for test layer releases.

Documentation PR: DataDog/documentation#30715

Types of Changes

  • Bug fix
  • New feature
  • Breaking change
  • Misc (docs, refactoring, dependency upgrade, etc.)

Check all that apply

  • This PR's description is comprehensive
  • This PR contains breaking changes that are documented in the description
  • This PR introduces new APIs or parameters that are documented and unlikely to change in the foreseeable future
  • This PR impacts documentation, and it has been updated (or a ticket has been logged)
  • This PR's changes are covered by the automated tests
  • This PR collects user input/sensitive content into Datadog
  • This PR passes the integration tests (ask a Datadog member to run the tests)

@ryanstrat ryanstrat force-pushed the ryan.strat/sfn-managed-services branch from c5130af to a813a29 Compare July 24, 2025 15:02
@ryanstrat ryanstrat marked this pull request as ready for review July 24, 2025 15:59
@ryanstrat ryanstrat requested review from a team as code owners July 24, 2025 15:59
@@ -87,6 +87,12 @@ if [[ ! ${STAGES[@]} =~ $STAGE ]]; then
fi

layer="${LAYERS[$index]}"
if [ -z "$LAYER_NAME_SUFFIX" ]; then
Copy link
Contributor

@joeyzhao2018 joeyzhao2018 Jul 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is the variable LAYER_NAME_SUFFIX set? is this mainly for testing in development?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not, I added it as a local config option to publish private versions. This is similar to an option available in the python layer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mmm, not sure about this change, how are we sure this is not going to affect gitlab pipelines?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing this edit based on feedback from @duncanista

Copy link
Contributor

@joeyzhao2018 joeyzhao2018 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit: Good job refactoring the code


it("extracts trace context from Step Function EventBridge event", () => {
// Reset StepFunctionContextService instance
StepFunctionContextService["_instance"] = undefined as any;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be done before every step?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reset is only required for tests that use the Step Function Context Service. It could be added in a BeforeEach for all tests, but I don't think it would add value.

Co-authored-by: jordan gonzález <[email protected]>
ryanstrat and others added 2 commits July 31, 2025 16:35
@ryanstrat
Copy link
Contributor Author

/merge

@dd-devflow-routing-codex
Copy link

dd-devflow-routing-codex bot commented Jul 31, 2025

View all feedbacks in Devflow UI.

2025-07-31 20:51:56 UTC ℹ️ Start processing command /merge


2025-07-31 20:52:10 UTC ℹ️ MergeQueue: waiting for PR to be ready

This merge request is not mergeable yet, because of pending checks/missing approvals. It will be added to the queue as soon as checks pass and/or get approvals.
Note: if you pushed new commits since the last approval, you may need additional approval.
You can remove it from the waiting list with /remove command.


2025-07-31 20:59:19 UTC ℹ️ MergeQueue: merge request added to the queue

The expected merge time in main is approximately 0s (p90).


2025-07-31 21:09:13 UTC ℹ️ MergeQueue: This merge request was merged

@dd-mergequeue dd-mergequeue bot merged commit 1619bce into main Jul 31, 2025
27 checks passed
@dd-mergequeue dd-mergequeue bot deleted the ryan.strat/sfn-managed-services branch July 31, 2025 21:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants