Skip to content

Unmarshal error for OTLP-formatted metrics from cross-account observability enabled CloudWatch metric streams #39462

Closed
@skydontdribble

Description

@skydontdribble

Component(s)

receiver/awsfirehose

What happened?

Description

We are seeing the following errors when sending metrics from AWS Firehose to an OpenTelemetry Collector with the awsfirehose receiver:

error    [email protected]/receiver.go:225    Unable to consume records    {"otelcol.component.id": "awsfirehose", "otelcol.component.kind": "Receiver", "otelcol.signal": "metrics", "error": "unable to unmarshal input: unexpected EOF"}

We have a Firehose stream that receives OTLP_v1 encoded metrics via Direct PUT from a CloudWatch metric stream with cross-account observability enabled (that is a monitoring account whose metrics originate from other source accounts within the same region) and forwards metrics to an OpenTelemetry collector with an awsfirehose receiver.
Oddly, this issue only occurs when the CloudWatch metric stream has cross-account observability enabled. This hints that the request payload is too large and gets truncated in transit from the CloudWatch metric stream through the Firehose stream to the OpenTelemetry collector, especially considering the very similar issue #38736. Perhaps, metric events that are originate from cross-account enabled CloudWatch metric streams are more verbose than CloudWatch metric streams without cross-account enabled. Nonetheless, the receiver should be able to handle requests that contain standard-size metric events (i.e., metrics without any transformations) from CloudWatch metric streams with cross-account observability enabled.

Steps to Reproduce

(Apologies in advance for screenshots. The AWS services were configured in UI as we are still exploring this design for feasibility before committing to any infrastructure-as-code.)

CloudWatch Metric Stream configuration
Image
(Note the Monitoring Account label in the top-right corner, which allows the account to enable cross-account observability and adds the Include source account metrics toggle option. This can be easily configured following this AWS document.)

AWS Firehose configuration
Image
The source for the stream is Direct PUT (see the CloudWatch metric stream configuration above that shows the destination is set to this Firehose stream) and the destination is an HTTP endpoint, which is an exposed endpoint of an OpenTelemetry collector. Most importantly, note that we are using the maximum buffer size (64MiB) for Firehose and even using (gzip) compression (though it doesn't make a difference since the errors occur whether compression is on or off).

Expected Result

awsfirehose receiver should be able to handle large incoming requests without error.

Collector version

v0.122.1

Environment information

Environment

Helm chart: opentelemetry-collector v0.120.0
Kubernetes: v1.30.10

OpenTelemetry Collector configuration

opentelemetry-collector:
  config:
    extensions:
      health_check: {}
      oauth2client:
        client_id: ${env:id}
        client_secret: ${env:secret}
        token_url: ...
        scopes: ...
    receivers:
      awsfirehose:
        endpoint: ${env:MY_POD_IP}:4433
        encoding: otlp_v1
        tls:
          cert_file: /etc/tls/tls.crt
          key_file: /etc/tls/tls.key
    exporters:
      debug: {}
    service:
      extensions:
        - oauth2client
        - health_check
      pipelines:
        metrics:
          receivers:
            - awsfirehose
          exporters:
            - debug
  ports:
    aws-firehose:
      enabled: true
      containerPort: 4433
      servicePort: 4433
      hostPort: 4433
      protocol: TCP
...

Log output

2025-04-16T18:52:02.469Z    error    [email protected]/receiver.go:225    Unable to consume records    {"otelcol.component.id": "awsfirehose", "otelcol.component.kind": "Receiver", "otelcol.signal": "metrics", "error": "unable to unmarshal input: unexpected EOF"}
github.com/open-telemetry/opentelemetry-collector-contrib/receiver/awsfirehosereceiver.(*firehoseReceiver).ServeHTTP                                                                  
     github.com/open-telemetry/opentelemetry-collector-contrib/receiver/[email protected]/receiver.go:225                                     
2025-04-16T18:52:06.482Z    error    [email protected]/receiver.go:225    Unable to consume records    {"otelcol.component.id": "awsfirehose", "otelcol.component.kind": "Receiver", "otelcol.signal": "metrics", "error": "unable to unmarshal input: unexpected EOF"} 
github.com/open-telemetry/opentelemetry-collector-contrib/receiver/awsfirehosereceiver.(*firehoseReceiver).ServeHTTP                                                             
     github.com/open-telemetry/opentelemetry-collector-contrib/receiver/[email protected]/receiver.go:225                                                          

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions