Description
Component(s)
processor/tailsampling
Is your feature request related to a problem? Please describe.
There is no currently no way to tell by looking at a sampled trace which policy sampled it. If we could, we'd be able to analyze traces by policy, direct traces to different pipelines by policy, and better understand the sampling behavior of this processor.
Describe the solution you'd like
I propose that we add the name of the first top-level policy that returns a positive sampling decision to each span in a trace as an span context attribute. For example, given the following policies:
policies:
tail_sampling:
policies:
- name: probabilistic-policy
type: probabilistic
probabilistic:
sampling_percentage: 10
- name: http-error-policy
type: and
and:
and_sub_policy:
- name: error-code-policy
type: ottl_condition
ottl_condition:
span:
- 'IsMatch(attributes["http.response.status_code"], "^[45][0-9][0-9]$")'
- name: probabilistic-policy
type: probabilistic
probabilistic:
sampling_percentage: 100
And given a trace which includes a span with a 404 status code. Say the probabilistic-policy
returned a NotSampled
decision, but the http-error-policy
returned a Sampled
decision. A span in this trace would look like:
{
"name": "client_request",
"kind": "SpanKind.CLIENT",
"attributes": {
"http.response.status_code": 404,
"sampling.policy": "http-error-policy" // new attribute added
},
...
}
Describe alternatives you've considered
- Branch a pipeline into multiple using the
forward
connector, set up tail-sampling on each branch with different policies, then use thetransform
processor to add an attribute for the branch a trace was sampled in: Not only does this use more resources as multiple processors have to be set up, it can also result in duplicate samples, and there isn't an easy way to dedup spans/traces in the collector. - Add the policy to just the root span: This would mean less work for the processor to do, but wouldn't be great if the root span was somehow dropped or lost
- Add all matching policies as a list value: List values are harder to aggregate over than simple string values. List values can also get pretty long for complex policy setups.
Additional context
I'd be willing to create a PR for this if maintainers agree that this is a good idea.