Description
Describe the bug
This issue was originally reported in CNCF slack by Srinivas Bevara and confirmed by @codeboten who asked me to open this issue
After Upgrade to v0.106.1 from v0.104.0 we have run into issues where we see an increase in request latency being reported by both the collector's internal telemetry and our ingress proxies.
Memory remains unaffected however pprof is reporting garbage collection seems to have increased due internal telemetry tracing related allocations:
This is unexpected since tracing telemetry is disabled in our deployments.
Steps to reproduce
Run a load test comparing v0.104.0 and v0.106.1
What did you expect to see?
- p99 Request latency and throughput should be comparable between both versions ~74ms
- Internal telemetry must not generate recording spans when it is disabled
What did you see instead?
- An increase in p99 latency to > 1s
- Profiling reveals increased objected allocations related to internal telemetry span data
- Profiling reveals an increase in time spent in GC
What version did you use?
v0.106.1
What config did you use?
Our configuration has many transformations and connector pipelines. The bare minimum you may likely need is the one you use for benchmarking.
Environment
Ubuntu 20.04.1
Additional context
Screen shots of increased p99 latency and CPU and steady memory