Closed
Description
Component(s)
processor/probabilisticsampler
Is your feature request related to a problem? Please describe.
OTEP 235 specifies how to encode randomness and threshold information for consistent probability sampling. Add support for this specification.
Describe the solution you'd like
There are two modes of sampling implied by the new specification.
- Proportional sampling based on 56 bits of randomness
- Equalizing sampling based on 56 bits of randomness
This compared with the existing hash-based solution, which uses 14 bits of seed hash value.
Describe alternatives you've considered
In #31894 I've demonstrated the end-to-end change I would like to see. I propose to factor it into 3 parts.
- Changes in pkg/sampling. In New component: pkg/sampling #29738 a package of code was contributed meant for this work, stemming from Probabilistic sampler processor based on draft t-value/r-value encoding #24811. I have found a few minor changes that are required/nice-to-have in this package, and will separate them.
- Changes in probabilisticsampler before adding new modes. This is a major refactoring project, but it does not change functionality. Introduces the
FailClosed
feature, which gives the user control over error handling. Note the refactored code shares almost all of its logic between the two code paths, unlike the existing. - Adds two new probability sampler modes. This will complete the project.
Additional context
Noticed while testing #31894 a couple of major/minor inconsistencies, documented them in the README.
- Some code paths would apply the hash function to an empty input, such as an empty trace ID or an empty attribute value. These would get a fixed, hard-coded value which sampled in 89% of cases. In the new code,
FailClosed
determines the outcome when randomness is missing, among other potential error cases. - The logs "sampling priority" mechanism is very different from the traces mechanism. I would like to reconcile this, but won't do so under this issue.
- Since the default configuration includes
hash_seed: 0
, I propose changing the default in this code to use "proportional" by default, instead of "hash_seed", when there is not an explicit hash seed set, when trace IDs are used. This allows the new OTel specification to be used in common cases.