Skip to content

[exporter/clickhouse] Fix data race that caused panic in integration tests #38798

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

lim123123123
Copy link
Contributor

@lim123123123 lim123123123 commented Mar 19, 2025

Description

clickhouse exporter tests use their own sql drivers to validate the queries sent during the tests.
Each driver has a unique name (which is the same as the test name). This name must be passed to the sql.Open function during database initialization.

Unfortunately, instead of explicitly passing the driver name into the buildDB function, it was passed implicitly through a global variable. This approach works in production but leads to data races when tests are executed in parallel because each test writes its own name to driverName. As a result, integration tests expected driverName to be set to the standard "clickhouse", but instead, it retained the value set by the previous test. This caused the validation callbacks to be triggered incorrectly, leading to test failures (panics).

There were two possible solutions to fix this issue:

  1. Hardcode the "clickhouse" constant in integration tests.
  2. Pass driverName explicitly into sql.Open.
    Although the first option might work, I believe it is not a careful way to write tests and code. A global variable that changes during tests introduces bugs that are difficult to debug and reproduce. Therefore, I prefer the second option.

Link to tracking issue

Fixes
Close #32530

…rName into config instead. Fix data race that write into global driverName from muptiple tests in parallel
@atoulme atoulme added Skip Changelog PRs that do not require a CHANGELOG.md entry waiting-for-code-owners labels Mar 19, 2025
@SpencerTorres
Copy link
Member

I have a branch where I'm refactoring some of the driver code, this won't be a problem anymore in that version but this is a good fix to get the integration tests running again. Thanks!

@lim123123123
Copy link
Contributor Author

Hello, could somebody else review this mr pls? There are 40 loc and it blocks another fix(because it adds an integration test). @dmitryax @Frapschen @hanjm

@Frapschen
Copy link
Contributor

I am little confuse, please give more examplianation about :

This approach works in production but leads to data races when tests are executed in parallel because each test writes its own name to driverName.

@@ -91,7 +90,7 @@ func getContainer(t *testing.T, req testcontainers.ContainerRequest) testcontain

func verifyExportLog(t *testing.T, logExporter *logsExporter) {
mustPushLogsData(t, logExporter, simpleLogs(1))
db := sqlx.NewDb(logExporter.client, driverName)
db := sqlx.NewDb(logExporter.client, clickhouseDriverName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the different use of a const string clickhouse between a var string clickhouse

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was a variable because some tests set this variable with mock names to record their actions. However, this behavior wasn't safe for concurrent use, especially for integration tests. If they were executed after other tests, they used the mock instead of 'clickhouse'. Now, this is a constant, and nobody changes it. Tests that need to use the mock pass it explicitly with the driverName, so this issue no longer occurs. You can read the description of the PR for more details

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logical issue here is that integration tests might use mock drivers from other tests instead of the real Clickhouse driver because those tests set the mock names in driverName. The main fix is to avoid setting mock names in driverName and instead pass these names explicitly in the configuration
Hope that the problem is clear now

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

like this code, right?

func initClickhouseTestServer(t *testing.T, recorder recorder) {
driverName = t.Name()
sql.Register(t.Name(), &testClickhouseDriver{
recorder: recorder,
})
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes . driverName = t.Name() is the problem

@lim123123123
Copy link
Contributor Author

I am little confuse, please give more examplianation about :

This approach works in production but leads to data races when tests are executed in parallel because each test writes its own name to driverName.

In production, nobody sets the names of their mocks in driverName, so it always defaults to 'Clickhouse'. This works fine, and it would also work if the other tests (those that set this variable) were disabled.

@lim123123123
Copy link
Contributor Author

lim123123123 commented Apr 3, 2025

@atoulme Do I understand right that we need a review from @dmitryax to merge this?

@atoulme atoulme added ready to merge Code review completed; ready to merge by maintainers and removed waiting-for-code-owners labels Apr 15, 2025
@songy23 songy23 merged commit 1c509bf into open-telemetry:main Apr 15, 2025
182 of 183 checks passed
@github-actions github-actions bot added this to the next release milestone Apr 15, 2025
akshays-19 pushed a commit to akshays-19/opentelemetry-collector-contrib that referenced this pull request Apr 23, 2025
…tests (open-telemetry#38798)

<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
#### Description
clickhouse exporter tests use their own sql drivers to validate the
queries sent during the tests.
Each driver has a unique name (which is the same as the test name). This
name must be passed to the sql.Open function during database
initialization.

Unfortunately, instead of explicitly passing the driver name into the
buildDB function, it was passed implicitly through a global variable.
This approach works in production but leads to data races when tests are
executed in parallel because each test writes its own name to
driverName. As a result, integration tests expected driverName to be set
to the standard "clickhouse", but instead, it retained the value set by
the previous test. This caused the validation callbacks to be triggered
incorrectly, leading to test failures (panics).

There were two possible solutions to fix this issue:

1. Hardcode the "clickhouse" constant in integration tests.
2. Pass driverName explicitly into sql.Open.
Although the first option might work, I believe it is not a careful way
to write tests and code. A global variable that changes during tests
introduces bugs that are difficult to debug and reproduce. Therefore, I
prefer the second option.


<!-- Issue number (e.g. open-telemetry#1234) or full URL to issue, if applicable. -->
#### Link to tracking issue
Fixes
Close
open-telemetry#32530

---------

Co-authored-by: Antoine Toulme <[email protected]>
Fiery-Fenix pushed a commit to Fiery-Fenix/opentelemetry-collector-contrib that referenced this pull request Apr 24, 2025
…tests (open-telemetry#38798)

<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
#### Description
clickhouse exporter tests use their own sql drivers to validate the
queries sent during the tests.
Each driver has a unique name (which is the same as the test name). This
name must be passed to the sql.Open function during database
initialization.

Unfortunately, instead of explicitly passing the driver name into the
buildDB function, it was passed implicitly through a global variable.
This approach works in production but leads to data races when tests are
executed in parallel because each test writes its own name to
driverName. As a result, integration tests expected driverName to be set
to the standard "clickhouse", but instead, it retained the value set by
the previous test. This caused the validation callbacks to be triggered
incorrectly, leading to test failures (panics).

There were two possible solutions to fix this issue:

1. Hardcode the "clickhouse" constant in integration tests.
2. Pass driverName explicitly into sql.Open.
Although the first option might work, I believe it is not a careful way
to write tests and code. A global variable that changes during tests
introduces bugs that are difficult to debug and reproduce. Therefore, I
prefer the second option.


<!-- Issue number (e.g. open-telemetry#1234) or full URL to issue, if applicable. -->
#### Link to tracking issue
Fixes
Close
open-telemetry#32530

---------

Co-authored-by: Antoine Toulme <[email protected]>
atoulme added a commit that referenced this pull request May 9, 2025
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
#### Description
In the current version of the exporter, the createDatabase function uses
cfg.buildDSN, which appends the database from the config into the dsn
that is passed into the clickhouse driver.
As a result, the exporter tries to connect to the database before it is
actually created, causing a ClickHouse exception in the start function

This pr couldn't be merged until [the integration test
fix](#38798)

<!--Describe what testing was performed and which tests were added.-->
#### Testing
I've added an integration test that checks whether the database was
successfully created. It fails now but works with the fix.

<!--Please delete paragraphs that you did not use before submitting.-->

---------

Co-authored-by: Antoine Toulme <[email protected]>
dragonlord93 pushed a commit to dragonlord93/opentelemetry-collector-contrib that referenced this pull request May 23, 2025
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
#### Description
In the current version of the exporter, the createDatabase function uses
cfg.buildDSN, which appends the database from the config into the dsn
that is passed into the clickhouse driver.
As a result, the exporter tries to connect to the database before it is
actually created, causing a ClickHouse exception in the start function

This pr couldn't be merged until [the integration test
fix](open-telemetry#38798)

<!--Describe what testing was performed and which tests were added.-->
#### Testing
I've added an integration test that checks whether the database was
successfully created. It fails now but works with the fix.

<!--Please delete paragraphs that you did not use before submitting.-->

---------

Co-authored-by: Antoine Toulme <[email protected]>
dd-jasminesun pushed a commit to DataDog/opentelemetry-collector-contrib that referenced this pull request Jun 23, 2025
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
#### Description
In the current version of the exporter, the createDatabase function uses
cfg.buildDSN, which appends the database from the config into the dsn
that is passed into the clickhouse driver.
As a result, the exporter tries to connect to the database before it is
actually created, causing a ClickHouse exception in the start function

This pr couldn't be merged until [the integration test
fix](open-telemetry#38798)

<!--Describe what testing was performed and which tests were added.-->
#### Testing
I've added an integration test that checks whether the database was
successfully created. It fails now but works with the fix.

<!--Please delete paragraphs that you did not use before submitting.-->

---------

Co-authored-by: Antoine Toulme <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
exporter/clickhouse ready to merge Code review completed; ready to merge by maintainers Skip Changelog PRs that do not require a CHANGELOG.md entry
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[exporter/clickhouse] Integration test hits a panic
5 participants