-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Emit outcome: failure
in obsconsumer
#13234
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Emit outcome: failure
in obsconsumer
#13234
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #13234 +/- ##
=======================================
Coverage 91.44% 91.45%
=======================================
Files 533 534 +1
Lines 29564 29596 +32
=======================================
+ Hits 27034 27066 +32
Misses 1998 1998
Partials 532 532 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This PR was marked stale due to lack of activity. It will be closed in 14 days. |
#### Description This PR updates the [Pipeline Component Telemetry RFC](https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/rfcs/component-universal-telemetry.md) with the following changes: - Reflect implementation choices that have been made since the RFC was written: 1. using instrumentation scope attributes instead of datapoint attributes to identify component instances (see discussion in #12217 and open-telemetry/opentelemetry-go#6404) 2. automatically injecting these attributes, without changes to component code 3. changing the instrumentation scope name used for pipeline metrics - Slightly change the semantics of `outcome = refused`: The current planned behavior (from #11956) is that, in the case of a pipeline A → B where component B returns an error, the "consumed" metric for B and the "produced" metric for A should both have `outcome = failure`. I fear that this may lead users to think that a failure occurred in A, and would like to restrict `outcome = failure` to only be associated with the component that "failed", ie. component B. The "produced" metric associated with A would instead have `outcome = refused`. This incidentally makes implementation slightly easier, since an instrumentation layer will not need different error wrapping behavior between the "producer" layer and the "consumer" layer. See draft PR #13234 for an example implementation. As this is a non-trivial change to an RFC, it may need to follow the RFC process. Co-authored-by: Alex Boten <[email protected]>
Is this ready for review now that the RFC amendment has been merged? |
Not quite, I'm thinking of adding some additional tests. Not sure when I'll have the time to get to it though 😅 |
Looks like the contrib failures are due to #13364. This should be ready for review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. @evan-bradley could you review the consumererror bits?
I think I've addressed your comments, could you take another look @evan-bradley? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Thanks for your patience with all my questions. 🙂
We discussed whether
I don't remember very well, but I think @dmitryax raised the point that perhaps we should instead default to the interpretation which is least likely to "hide" internal failures. Do you have objections to the current logic in the PR? |
@open-telemetry/collector-approvers Based on the above comment I think we can merge this by EOW (after the merge conflicts are resolved) unless there are objections. If we explicitly consider this edge case to be unspecified behavior, I think we can go ahead with the choice we made |
Description
The last remaining part of #12676 is to implement the
outcome: failure
part of the Pipeline Component Telemetry RFC (see here). This is done by introducing adownstream
error wrapper struct, to distinguish between errors coming from the next component from errors bubbled from further downstream.Important note
This PR implements things slightly differently from what the text of the RFC describes.
If a pipeline contains components
A → B
and an error occurs in B, this PR records:otelcol.component.outcome = failure
in theotelcol.*.consumed.*
metric for Botelcol.component.outcome = refused
in theotelcol.*.produced.*
metric for Awhereas the RFC would set both
outcome
s tofailure
.This is programmatically simpler — no need to have different behavior between the
obsconsumer
around the output of A and the one around the input of B — but more importantly, I think it is clearer for users as well:outcome = failure
only occurs on metrics associated with the component where the failure actually occurred.This subtlety wasn't discussed in-depth in #11956 which introduced
outcome = refused
, so I took the liberty to make this change. If necessary, I can file another RFC amendment to match, or, if there are objections, implement the RFC as-is instead.Link to tracking issue
Fixes #12676
Testing
I've updated the existing tests in
obsconsumer
to expect adownstream
-wrapped error to exit theobsconsumer
layer. I may add more tests later.Documentation
None.