-
Notifications
You must be signed in to change notification settings - Fork 2.9k
[exporter/clickhouse] Fix data race that caused panic in integration tests #38798
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[exporter/clickhouse] Fix data race that caused panic in integration tests #38798
Conversation
…rName into config instead. Fix data race that write into global driverName from muptiple tests in parallel
I have a branch where I'm refactoring some of the driver code, this won't be a problem anymore in that version but this is a good fix to get the integration tests running again. Thanks! |
Hello, could somebody else review this mr pls? There are 40 loc and it blocks another fix(because it adds an integration test). @dmitryax @Frapschen @hanjm |
I am little confuse, please give more examplianation about :
|
@@ -91,7 +90,7 @@ func getContainer(t *testing.T, req testcontainers.ContainerRequest) testcontain | |||
|
|||
func verifyExportLog(t *testing.T, logExporter *logsExporter) { | |||
mustPushLogsData(t, logExporter, simpleLogs(1)) | |||
db := sqlx.NewDb(logExporter.client, driverName) | |||
db := sqlx.NewDb(logExporter.client, clickhouseDriverName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the different use of a const string clickhouse
between a var string clickhouse
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was a variable because some tests set this variable with mock names to record their actions. However, this behavior wasn't safe for concurrent use, especially for integration tests. If they were executed after other tests, they used the mock instead of 'clickhouse'. Now, this is a constant, and nobody changes it. Tests that need to use the mock pass it explicitly with the driverName, so this issue no longer occurs. You can read the description of the PR for more details
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logical issue here is that integration tests might use mock drivers from other tests instead of the real Clickhouse driver because those tests set the mock names in driverName. The main fix is to avoid setting mock names in driverName and instead pass these names explicitly in the configuration
Hope that the problem is clear now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
like this code, right?
opentelemetry-collector-contrib/exporter/clickhouseexporter/exporter_logs_test.go
Lines 261 to 266 in 4e84b33
func initClickhouseTestServer(t *testing.T, recorder recorder) { | |
driverName = t.Name() | |
sql.Register(t.Name(), &testClickhouseDriver{ | |
recorder: recorder, | |
}) | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes . driverName = t.Name()
is the problem
In production, nobody sets the names of their mocks in driverName, so it always defaults to 'Clickhouse'. This works fine, and it would also work if the other tests (those that set this variable) were disabled. |
…tests (open-telemetry#38798) <!--Ex. Fixing a bug - Describe the bug and how this fixes the issue. Ex. Adding a feature - Explain what this achieves.--> #### Description clickhouse exporter tests use their own sql drivers to validate the queries sent during the tests. Each driver has a unique name (which is the same as the test name). This name must be passed to the sql.Open function during database initialization. Unfortunately, instead of explicitly passing the driver name into the buildDB function, it was passed implicitly through a global variable. This approach works in production but leads to data races when tests are executed in parallel because each test writes its own name to driverName. As a result, integration tests expected driverName to be set to the standard "clickhouse", but instead, it retained the value set by the previous test. This caused the validation callbacks to be triggered incorrectly, leading to test failures (panics). There were two possible solutions to fix this issue: 1. Hardcode the "clickhouse" constant in integration tests. 2. Pass driverName explicitly into sql.Open. Although the first option might work, I believe it is not a careful way to write tests and code. A global variable that changes during tests introduces bugs that are difficult to debug and reproduce. Therefore, I prefer the second option. <!-- Issue number (e.g. open-telemetry#1234) or full URL to issue, if applicable. --> #### Link to tracking issue Fixes Close open-telemetry#32530 --------- Co-authored-by: Antoine Toulme <[email protected]>
…tests (open-telemetry#38798) <!--Ex. Fixing a bug - Describe the bug and how this fixes the issue. Ex. Adding a feature - Explain what this achieves.--> #### Description clickhouse exporter tests use their own sql drivers to validate the queries sent during the tests. Each driver has a unique name (which is the same as the test name). This name must be passed to the sql.Open function during database initialization. Unfortunately, instead of explicitly passing the driver name into the buildDB function, it was passed implicitly through a global variable. This approach works in production but leads to data races when tests are executed in parallel because each test writes its own name to driverName. As a result, integration tests expected driverName to be set to the standard "clickhouse", but instead, it retained the value set by the previous test. This caused the validation callbacks to be triggered incorrectly, leading to test failures (panics). There were two possible solutions to fix this issue: 1. Hardcode the "clickhouse" constant in integration tests. 2. Pass driverName explicitly into sql.Open. Although the first option might work, I believe it is not a careful way to write tests and code. A global variable that changes during tests introduces bugs that are difficult to debug and reproduce. Therefore, I prefer the second option. <!-- Issue number (e.g. open-telemetry#1234) or full URL to issue, if applicable. --> #### Link to tracking issue Fixes Close open-telemetry#32530 --------- Co-authored-by: Antoine Toulme <[email protected]>
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue. Ex. Adding a feature - Explain what this achieves.--> #### Description In the current version of the exporter, the createDatabase function uses cfg.buildDSN, which appends the database from the config into the dsn that is passed into the clickhouse driver. As a result, the exporter tries to connect to the database before it is actually created, causing a ClickHouse exception in the start function This pr couldn't be merged until [the integration test fix](#38798) <!--Describe what testing was performed and which tests were added.--> #### Testing I've added an integration test that checks whether the database was successfully created. It fails now but works with the fix. <!--Please delete paragraphs that you did not use before submitting.--> --------- Co-authored-by: Antoine Toulme <[email protected]>
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue. Ex. Adding a feature - Explain what this achieves.--> #### Description In the current version of the exporter, the createDatabase function uses cfg.buildDSN, which appends the database from the config into the dsn that is passed into the clickhouse driver. As a result, the exporter tries to connect to the database before it is actually created, causing a ClickHouse exception in the start function This pr couldn't be merged until [the integration test fix](open-telemetry#38798) <!--Describe what testing was performed and which tests were added.--> #### Testing I've added an integration test that checks whether the database was successfully created. It fails now but works with the fix. <!--Please delete paragraphs that you did not use before submitting.--> --------- Co-authored-by: Antoine Toulme <[email protected]>
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue. Ex. Adding a feature - Explain what this achieves.--> #### Description In the current version of the exporter, the createDatabase function uses cfg.buildDSN, which appends the database from the config into the dsn that is passed into the clickhouse driver. As a result, the exporter tries to connect to the database before it is actually created, causing a ClickHouse exception in the start function This pr couldn't be merged until [the integration test fix](open-telemetry#38798) <!--Describe what testing was performed and which tests were added.--> #### Testing I've added an integration test that checks whether the database was successfully created. It fails now but works with the fix. <!--Please delete paragraphs that you did not use before submitting.--> --------- Co-authored-by: Antoine Toulme <[email protected]>
Description
clickhouse exporter tests use their own sql drivers to validate the queries sent during the tests.
Each driver has a unique name (which is the same as the test name). This name must be passed to the sql.Open function during database initialization.
Unfortunately, instead of explicitly passing the driver name into the buildDB function, it was passed implicitly through a global variable. This approach works in production but leads to data races when tests are executed in parallel because each test writes its own name to driverName. As a result, integration tests expected driverName to be set to the standard "clickhouse", but instead, it retained the value set by the previous test. This caused the validation callbacks to be triggered incorrectly, leading to test failures (panics).
There were two possible solutions to fix this issue:
Although the first option might work, I believe it is not a careful way to write tests and code. A global variable that changes during tests introduces bugs that are difficult to debug and reproduce. Therefore, I prefer the second option.
Link to tracking issue
Fixes
Close #32530