Skip to content

[extension/filestorage] - Add an option to recover from panic if underlying file is corrupted. #40986

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Jul 16, 2025

Conversation

VihasMakwana
Copy link
Contributor

@VihasMakwana VihasMakwana commented Jun 30, 2025

Description

Add an option to recreate the database from scratch. This is useful if the underlying database is corrupted and there's no way to recover from it.

Ideally, we would want to recover from a panic. However, I came across the panic statements in bbolt's codebase and they are raised inside of a goroutine. Unfortunately, we can't recover from that panic in our collector process. So, our best route here is to provide an option to the user to recreate a new database.

Link to tracking issue

Fixes #36840, #35899

Testing

Documentation

@VihasMakwana VihasMakwana marked this pull request as ready for review July 2, 2025 13:21
@VihasMakwana VihasMakwana requested a review from a team as a code owner July 2, 2025 13:21
@VihasMakwana VihasMakwana requested a review from ArthurSens July 2, 2025 13:22
Copy link
Contributor

@swiatekm swiatekm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change looks good to me technically, but I'm not sure it's the right thing to do. My impression was that we wanted to recover from a panic when opening the DB, and then create the backup. Is that not feasible.

@VihasMakwana
Copy link
Contributor Author

VihasMakwana commented Jul 14, 2025

The change looks good to me technically, but I'm not sure it's the right thing to do. My impression was that we wanted to recover from a panic when opening the DB, and then create the backup. Is that not feasible.

Those were my intentions as well. However, I came across the panic statements in bbolt's codebase and they are raised inside of a goroutine. Unfortunately, we can't recover from that panic in our collector process.

Essentially, it is like following piece of code:

func main() {
	defer func() {
		err := recover() // WON'T HELP BECAUE OF PANIC INSIDE OF A GOROUTINE
		fmt.Println(err)
	}()
	go func() {
		panic("OOPS!")
	}()
}

@VihasMakwana VihasMakwana requested a review from swiatekm July 15, 2025 07:05
@VihasMakwana
Copy link
Contributor Author

@open-telemetry/collector-contrib-approvers could someone take a look here? I have an approval from the codeowner.

@andrzej-stencel andrzej-stencel merged commit c105bc2 into open-telemetry:main Jul 16, 2025
179 checks passed
@github-actions github-actions bot added this to the next release milestone Jul 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

recover from filestorage panic on corrupted DB
5 participants