Skip to content

Update probabilistic sampler processor with OTEP 235 support #31918

Closed
@jmacd

Description

@jmacd

Component(s)

processor/probabilisticsampler

Is your feature request related to a problem? Please describe.

OTEP 235 specifies how to encode randomness and threshold information for consistent probability sampling. Add support for this specification.

Describe the solution you'd like

There are two modes of sampling implied by the new specification.

  • Proportional sampling based on 56 bits of randomness
  • Equalizing sampling based on 56 bits of randomness
    This compared with the existing hash-based solution, which uses 14 bits of seed hash value.

Describe alternatives you've considered

In #31894 I've demonstrated the end-to-end change I would like to see. I propose to factor it into 3 parts.

  1. Changes in pkg/sampling. In New component: pkg/sampling #29738 a package of code was contributed meant for this work, stemming from Probabilistic sampler processor based on draft t-value/r-value encoding #24811. I have found a few minor changes that are required/nice-to-have in this package, and will separate them.
  2. Changes in probabilisticsampler before adding new modes. This is a major refactoring project, but it does not change functionality. Introduces the FailClosed feature, which gives the user control over error handling. Note the refactored code shares almost all of its logic between the two code paths, unlike the existing.
  3. Adds two new probability sampler modes. This will complete the project.

Additional context

Noticed while testing #31894 a couple of major/minor inconsistencies, documented them in the README.

  1. Some code paths would apply the hash function to an empty input, such as an empty trace ID or an empty attribute value. These would get a fixed, hard-coded value which sampled in 89% of cases. In the new code, FailClosed determines the outcome when randomness is missing, among other potential error cases.
  2. The logs "sampling priority" mechanism is very different from the traces mechanism. I would like to reconcile this, but won't do so under this issue.
  3. Since the default configuration includes hash_seed: 0, I propose changing the default in this code to use "proportional" by default, instead of "hash_seed", when there is not an explicit hash seed set, when trace IDs are used. This allows the new OTel specification to be used in common cases.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions