Open
Description
Describe the bug
I deployed the Collector using Kubernetes to receive Trace data and report it to Kafka. However, I found that the CPU utilization rate of the Collector is high, but the memory utilization rate is low.
Steps to reproduce
I provided the Collector with a configuration of 1C2G.
What did you expect to see?
I hope the CPU utilization of the Collector can be lower, because its memory utilization is very low.
Are there any other ways to optimize this Collector? I want it to use less CPU resources.
What did you see instead?
Then I sent data to this Collector, and its monitoring situation is as follows:
What version did you use?
v0.95.0
What config did you use?
---
apiVersion: v1
kind: ConfigMap
metadata:
name: otel-collector-gd-config
namespace: default
data:
config.yaml: |-
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
processors:
memory_limiter:
check_interval: 1s
limit_mib: 2000
spike_limit_mib: 400
batch:
send_batch_size: 500
send_batch_max_size: 500
resource:
attributes:
- key: from-collector
value: gd-fat-k8s
action: insert
exporters:
logging:
verbosity: normal
kafka:
brokers:
- xx.xx.xx.xx:9092
- xx.xx.xx.xx:9092
- xx.xx.xx.xx:9092
topic: otlp_trace_fat
partition_traces_by_id: true
protocol_version: 1.0.0
sending_queue:
enabled: true
num_consumers: 10
queue_size: 10000
extensions:
pprof:
endpoint: ":1777"
service:
extensions: [pprof]
pipelines:
traces:
receivers: [otlp]
processors: [memory_limiter, resource, batch]
exporters: [logging, kafka]
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: otel-collector-gd
namespace: default
labels:
app: opentelemetry
component: otel-collector-gd
spec:
selector:
matchLabels:
app: opentelemetry
component: otel-collector-gd
template:
metadata:
labels:
app: opentelemetry
component: otel-collector-gd
spec:
containers:
- name: otel-collector-gd
image: otel/opentelemetry-collector-contrib:0.95.0
resources:
limits:
cpu: 1000m
memory: 2048Mi
volumeMounts:
- mountPath: /var/log
name: varlog
readOnly: true
- mountPath: /var/lib/docker/containers
name: varlibdockercontainers
readOnly: true
- mountPath: /etc/otelcol-contrib/config.yaml
name: data
subPath: config.yaml
readOnly: true
terminationGracePeriodSeconds: 30
volumes:
- name: varlog
hostPath:
path: /var/log
- name: varlibdockercontainers
hostPath:
path: /var/lib/docker/containers
- name: data
configMap:
name: otel-collector-gd-config
---
apiVersion: v1
kind: Service
metadata:
name: otel-collector-gd
namespace: default
labels:
app: opentelemetry
component: otel-collector-gd
spec:
ports:
- name: otlp-grpc
port: 4317
protocol: TCP
targetPort: 4317
- name: otlp-http
port: 4318
protocol: TCP
targetPort: 4318
- name: pprof
port: 1777
protocol: TCP
targetPort: 1777
selector:
component: otel-collector-gd
Environment
Additional context
I used pprof to analyze the CPU usage.
File: otelcol-contrib
Type: cpu
Time: Dec 26, 2024 at 1:50pm (CST)
Duration: 300s, Total samples = 230.42s (76.81%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 82.53s, 35.82% of 230.42s total
Dropped 1059 nodes (cum <= 1.15s)
Showing top 10 nodes out of 261
flat flat% sum% cum cum%
14.37s 6.24% 6.24% 14.37s 6.24% runtime/internal/syscall.Syscall6
10.79s 4.68% 10.92% 13.18s 5.72% compress/flate.(*decompressor).huffSym
10.43s 4.53% 15.45% 20.19s 8.76% runtime.scanobject
10.29s 4.47% 19.91% 45.47s 19.73% runtime.mallocgc
8.19s 3.55% 23.47% 8.19s 3.55% runtime.memclrNoHeapPointers
7.17s 3.11% 26.58% 7.17s 3.11% runtime.memmove
7.03s 3.05% 29.63% 7.95s 3.45% runtime.lock2
5.52s 2.40% 32.02% 24.51s 10.64% compress/flate.(*decompressor).huffmanBlock
4.57s 1.98% 34.01% 4.75s 2.06% runtime.unlock2
4.17s 1.81% 35.82% 4.17s 1.81% runtime.nextFreeFast (inline)