Skip to content

Commit 154d313

Browse files
authored
[tailsamplingprocessor] Fix the decision timer metric to measure > 50ms (#37722)
#### Description Previously the decision timer metric was in microseconds. Because the max histogram bucket was 50000, it capped the maximum measurable latency at 50ms. This commit changes the metric to be in milliseconds, which allows for measuring latencies of up to 50 seconds. Since the default decision tick is 1s, it's necessary to observe when the latency is approaching 1s. I'm not sure how big of a breaking change it is to update the unit, but given the previous value was not super useful, I thought it was better to change it. Here's an example of the current state: ![Screenshot 2025-02-05 at 3 23 21 PM](https://github.com/user-attachments/assets/0c39fad4-961b-44c0-b018-9633b9d2ff72) Notice how the average latency is running much higher than the p99, indicating that we are missing important data on how slow the decision tick actually is.
1 parent 843499f commit 154d313

File tree

7 files changed

+38
-11
lines changed

7 files changed

+38
-11
lines changed
Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
# Use this changelog template to create an entry for release notes.
2+
3+
# One of 'breaking', 'deprecation', 'new_component', 'enhancement', 'bug_fix'
4+
change_type: breaking
5+
6+
# The name of the component, or a single word describing the area of concern, (e.g. filelogreceiver)
7+
component: tailsamplingprocessor
8+
9+
# A brief description of the change. Surround your text with quotes ("") if it needs to start with a backtick (`).
10+
note: Fix the decision timer metric to capture longer latencies beyond 50ms.
11+
12+
# Mandatory: One or more tracking issues related to the change. You can use the PR number here if no issue exists.
13+
issues: [37722]
14+
15+
# (Optional) One or more lines of additional information to render under the primary note.
16+
# These lines will be padded with 2 spaces and then inserted directly into the document.
17+
# Use pipe (|) for multiline entries.
18+
subtext: This changes the unit of the decision timer metric from microseconds to milliseconds.
19+
20+
# If your change doesn't affect end users or the exported elements of any package,
21+
# you should instead start your pull request title with [chore] or use the "Skip Changelog" label.
22+
# Optional: The change log or logs in which this entry should be included.
23+
# e.g. '[user]' or '[user, api]'
24+
# Include 'user' if the change is relevant to end users.
25+
# Include 'api' if there is a change to a library API.
26+
# Default: '[user]'
27+
change_logs: []

processor/tailsamplingprocessor/documentation.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -56,11 +56,11 @@ Latency (in microseconds) of a given sampling policy
5656

5757
### otelcol_processor_tail_sampling_sampling_decision_timer_latency
5858

59-
Latency (in microseconds) of each run of the sampling decision timer
59+
Latency (in milliseconds) of each run of the sampling decision timer
6060

6161
| Unit | Metric Type | Value Type |
6262
| ---- | ----------- | ---------- |
63-
| µs | Histogram | Int |
63+
| ms | Histogram | Int |
6464

6565
### otelcol_processor_tail_sampling_sampling_late_span_age
6666

processor/tailsamplingprocessor/internal/metadata/generated_telemetry.go

Lines changed: 2 additions & 2 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

processor/tailsamplingprocessor/internal/metadatatest/generated_telemetrytest.go

Lines changed: 2 additions & 2 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

processor/tailsamplingprocessor/metadata.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -22,8 +22,8 @@ telemetry:
2222
bucket_boundaries: [1, 2, 5, 10, 25, 50, 75, 100, 150, 200, 300, 400, 500, 750, 1000, 2000, 3000, 4000, 5000, 10000, 20000, 30000, 50000]
2323

2424
processor_tail_sampling_sampling_decision_timer_latency:
25-
description: Latency (in microseconds) of each run of the sampling decision timer
26-
unit: µs
25+
description: Latency (in milliseconds) of each run of the sampling decision timer
26+
unit: ms
2727
enabled: true
2828
histogram:
2929
value_type: int

processor/tailsamplingprocessor/processor.go

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -351,7 +351,7 @@ func (tsp *tailSamplingSpanProcessor) samplingPolicyOnTick() {
351351

352352
decision := tsp.makeDecision(id, trace, &metrics)
353353

354-
tsp.telemetry.ProcessorTailSamplingSamplingDecisionTimerLatency.Record(tsp.ctx, int64(time.Since(startTime)/time.Microsecond))
354+
tsp.telemetry.ProcessorTailSamplingSamplingDecisionTimerLatency.Record(tsp.ctx, int64(time.Since(startTime)/time.Millisecond))
355355
tsp.telemetry.ProcessorTailSamplingGlobalCountTracesSampled.Add(tsp.ctx, 1, decisionToAttribute[decision])
356356

357357
// Sampled or not, remove the batches

processor/tailsamplingprocessor/processor_telemetry_test.go

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -137,8 +137,8 @@ func TestMetricsAfterOneEvaluation(t *testing.T) {
137137
opts: []metricdatatest.Option{metricdatatest.IgnoreTimestamp(), metricdatatest.IgnoreValue()},
138138
m: metricdata.Metrics{
139139
Name: "otelcol_processor_tail_sampling_sampling_decision_timer_latency",
140-
Description: "Latency (in microseconds) of each run of the sampling decision timer",
141-
Unit: "µs",
140+
Description: "Latency (in milliseconds) of each run of the sampling decision timer",
141+
Unit: "ms",
142142
Data: metricdata.Histogram[int64]{
143143
Temporality: metricdata.CumulativeTemporality,
144144
DataPoints: []metricdata.HistogramDataPoint[int64]{{}},

0 commit comments

Comments
 (0)