-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Description
Component(s)
processor/tailsampling
Is your feature request related to a problem? Please describe.
For our tail-sampling use case, we use the composite policy to set a rate limit and bucket spans on different criteria. One of the disadvantages of this approach is that a percentage of the rate is "reserved" for that bucket (example: high latency spans). If there are no high latency spans for a period of time, we will miss out on that sub policies percentage of the limit.
In practice that means we're seeing post tail-sampling throughputs much lower than our actual budget that we've set.
Describe the solution you'd like
We would like the ability to set some flag in the composite policy that ensures that if a given sub policy won't take advantage of it's total budget, that budget gets added to an always-sample policy.
This could look like : fill_remaining_budget: true
.
Describe alternatives you've considered
One alternative is artificially increasing our max_total_spans_per_second
value above our actual SPS budget to see the throughput we would like. The disadvantage there is, in the case where all sub policies are satisfied to their capacity, we will be well over budget.
Additional context
Current tail sampling configuration:
tail_sampling/catchall:
decision_wait: 120s
num_traces: 1000000
policies:
- name: composite-policy-catchall
type: composite
composite:
max_total_spans_per_second: 4000
policy_order: [ latency-policy, http-error-policy, exception-policy, probabilistic-policy, always-sample-remaining-policy ]
composite_sub_policy:
- name: latency-policy
type: latency
latency:
threshold_ms: 400
- name: http-error-policy
type: numeric_attribute
numeric_attribute:
key: http.status_code
min_value: 400
max_value: 600
- name: exception-policy
type: string_attribute
string_attribute:
key: exception.message
values: [ .* ]
enabled_regex_matching: true
- name: probabilistic-policy
type: probabilistic
probabilistic:
sampling_percentage: 40
- name: always-sample-remaining-policy
type: always_sample
rate_allocation:
- policy: latency-policy
percent: 20
- policy: http-error-policy
percent: 10
- policy: exception-policy
percent: 10
- policy: probabilistic-policy
percent: 20
- policy: always-sample-remaining-policy
percent: 40