Description
Component(s)
receiver/hostmetrics
What happened?
Description
The cpu.Counts
gopsutil func, which is called by the cpu scraper, does not set a deadline/timeout on its context, which forces WMIQueryWithContext to set it using the hardcoded timeout value of 3 seconds.
In large busy env or/and low resourced, the wmi call can take longer than 3 seconds, which will lead to a context deadline exceeded error and fail to get the CPU counts.
Steps to Reproduce
Find a windows host where the wmi calls take longer than 3 seconds and run the hostmetrics receiver with the cpu scraper.
Expected Result
Get all the metrics, including the physical and logical CPU counts
Actual Result
CPU counts are missing and we see this error in the logs
4670103 Mar 29 00:13 Error splunk-otel-collector 3 1.7116855975244713e+09 error
scraperhelper/scrapercontroller.go:200 Error
scraping metrics {"kind": "receiver", "name":
"hostmetrics", "data_type": "metrics", "error":
"context deadline exceeded", "scraper": "cpu"}
go.opentelemetry.io/collector/receiver/scraperhelper.
(*controller).scrapeMetricsAndReport
go.opentelemetry.io/collector/[email protected]/scrap
erhelper/scrapercontroller.go:200
go.opentelemetry.io/collector/receiver/scraperhelper.
(*controller).startScraping.func1
go.opentelemetry.io/collector/[email protected]/scrap
erhelper/scrapercontroller.go:176
Collector version
v0.95.0
Environment information
Environment
host_cpu_cores:"2"
host_cpu_model:"Intel(R) Xeon(R) Platinum 8175M CPU @ 2.50GHz"
host_mem_total:"8080924"
host_os_name":"Microsoft Windows Server 2016 Datacenter",
OpenTelemetry Collector configuration
No response
Log output
4670103 Mar 29 00:13 Error splunk-otel-collector 3 1.7116855975244713e+09 error
scraperhelper/scrapercontroller.go:200 Error
scraping metrics {"kind": "receiver", "name":
"hostmetrics", "data_type": "metrics", "error":
"context deadline exceeded", "scraper": "cpu"}
go.opentelemetry.io/collector/receiver/scraperhelper.
(*controller).scrapeMetricsAndReport
go.opentelemetry.io/collector/[email protected]/scrap
erhelper/scrapercontroller.go:200
go.opentelemetry.io/collector/receiver/scraperhelper.
(*controller).startScraping.func1
go.opentelemetry.io/collector/[email protected]/scrap
erhelper/scrapercontroller.go:176
Additional context
Suggestion is to use CountsWithContext
instead of Counts
and introduce a wmi_timeout option for cpuscraper
cc: @atoulme who helped with the RCA