Skip to content

Async background persistence #3905

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

joostjager
Copy link
Contributor

@joostjager joostjager commented Jul 2, 2025

Stripped down version of #3778. It allows background persistence to be async, but channel monitor persistence remains sync. This means that for the time being, users wanting async background persistence would be required to implement both the sync and the async KVStore trait. This model is available through process_events_full_async.

process_events_async still takes a synchronous kv store to remain backwards compatible.

Usage in ldk-node: lightningdevkit/ldk-node@main...joostjager:ldk-node:upgrade-to-async-kvstore

@ldk-reviews-bot
Copy link

ldk-reviews-bot commented Jul 2, 2025

👋 Thanks for assigning @TheBlueMatt as a reviewer!
I'll wait for their review and will help manage the review process.
Once they submit their review, I'll check if a second reviewer would be helpful.

@joostjager joostjager force-pushed the async-persister branch 7 times, most recently from 3fb7d6b to 1847e8d Compare July 3, 2025 09:57
@joostjager joostjager self-assigned this Jul 3, 2025
@joostjager joostjager force-pushed the async-persister branch 2 times, most recently from 1f59bbe to 723a5a6 Compare July 3, 2025 11:52
@joostjager joostjager mentioned this pull request May 12, 2025
24 tasks
@TheBlueMatt TheBlueMatt linked an issue Jul 7, 2025 that may be closed by this pull request
@joostjager joostjager force-pushed the async-persister branch 10 times, most recently from bc9c29a to 90ab1ba Compare July 9, 2025 09:52
@joostjager joostjager marked this pull request as ready for review July 9, 2025 09:52
@joostjager joostjager requested a review from tnull July 9, 2025 09:52
Comment on lines 631 to 683
fn persist_state<'a>(
&self, sweeper_state: &SweeperState,
) -> Pin<Box<dyn Future<Output = Result<(), io::Error>> + 'a + Send>> {
let encoded = &sweeper_state.encode();

self.kv_store.write(
OUTPUT_SWEEPER_PERSISTENCE_PRIMARY_NAMESPACE,
OUTPUT_SWEEPER_PERSISTENCE_SECONDARY_NAMESPACE,
OUTPUT_SWEEPER_PERSISTENCE_KEY,
encoded,
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The encoded variable is captured by reference in the returned future, but it's a local variable that will be dropped when the function returns. This creates a potential use-after-free issue. Consider moving ownership of encoded into the future instead:

fn persist_state<'a>(
    &self, sweeper_state: &SweeperState,
) -> Pin<Box<dyn Future<Output = Result<(), io::Error>> + 'a + Send>> {
    let encoded = sweeper_state.encode();

    self.kv_store.write(
        OUTPUT_SWEEPER_PERSISTENCE_PRIMARY_NAMESPACE,
        OUTPUT_SWEEPER_PERSISTENCE_SECONDARY_NAMESPACE,
        OUTPUT_SWEEPER_PERSISTENCE_KEY,
        &encoded,
    )
}

This ensures the data remains valid for the lifetime of the future.

Suggested change
fn persist_state<'a>(
&self, sweeper_state: &SweeperState,
) -> Pin<Box<dyn Future<Output = Result<(), io::Error>> + 'a + Send>> {
let encoded = &sweeper_state.encode();
self.kv_store.write(
OUTPUT_SWEEPER_PERSISTENCE_PRIMARY_NAMESPACE,
OUTPUT_SWEEPER_PERSISTENCE_SECONDARY_NAMESPACE,
OUTPUT_SWEEPER_PERSISTENCE_KEY,
encoded,
)
fn persist_state<'a>(
&self, sweeper_state: &SweeperState,
) -> Pin<Box<dyn Future<Output = Result<(), io::Error>> + 'a + Send>> {
let encoded = sweeper_state.encode();
self.kv_store.write(
OUTPUT_SWEEPER_PERSISTENCE_PRIMARY_NAMESPACE,
OUTPUT_SWEEPER_PERSISTENCE_SECONDARY_NAMESPACE,
OUTPUT_SWEEPER_PERSISTENCE_KEY,
&encoded,
)

Spotted by Diamond

Is this helpful? React 👍 or 👎 to let us know.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this real?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so as the compiler would likely optimize that away, given that encoded will be an owned value (Vec returned by encode()). Still, the change that it suggests looks cleaner.

In general it will be super confusing that we encode at the time of creating the future, but would only actually persist once we dropped the lock. Starting from now we'll need to be super cautious about the side-effects of interleaving persist calls.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is that an async kv store store encodes the data and stores the write action in a queue at the moment the future is created. Things should still happen in the original order.

Can you show a specific scenario where we have to be super cautious even if we have that queue?

Copy link
Contributor Author

@joostjager joostjager Jul 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved &

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is that an async kv store store encodes the data and stores the write action in a queue at the moment the future is created. Things should still happen in the original order.

If that is the idea that we start assuming in this PR, we should probably also start documenting these assumptions in this PR on KVStore already.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added this requirement to the async KVStore trait doc

@joostjager joostjager changed the title Async Persister trait Async Persister trait and async OutputSweeper persistence Jul 10, 2025
Copy link
Contributor

@tnull tnull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a first pass. Changes look mostly good, although I'm not the biggest fan of the process_events_full_async variant as well as the SyncOutputSweeperSyncKVStoreSync in the last commit. I hope we can avoid them?

For the former, it would be useful to wait for #3688 (given it's close), and then the user could use the builder pattern to either configure an async or sync KVStore.

@@ -769,7 +789,7 @@ use futures_util::{dummy_waker, OptionalSelector, Selector, SelectorOutput};
#[cfg_attr(feature = "std", doc = " handle.await.unwrap()")]
/// # }
///```
pub async fn process_events_async<
pub async fn process_events_full_async<
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dislike to solve this by just doubling the API surface here. IMO, we should finally move forward with #3688 first, and then utilize the builder pattern to allow the user to plugin a sync or async KVStore variant.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should do #3688 in the form that it currently has. My opinion is that it is too much code for what it gets us.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, we may need to conclude on that debate soon then.

Note that a) we had already decided to go this way on #3612 b) have a contributor that already did the work and c) this PR is a pretty good example why we'd want the builder pattern over just adding more constructors or optional fields on them.

But probably best to discuss elsewhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also temporary, because we want to move over to full async eventually.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say that to solve #3612 we can also use the dummy implementations and don't necessarily need a builder. But indeed best to move that discussion to #3688.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say that to solve #3612 we can also use the dummy implementations and don't necessarily need a builder. But indeed best to move that discussion to #3688.

I still think forcing the user to supply all the dummy implementation doesn't significantly improve the API situation for the user. But yeah, lets move the discussion elsewhere.

@@ -121,6 +121,58 @@ pub trait KVStoreSync {
) -> Result<Vec<String>, io::Error>;
}

/// A wrapper around a [`KVStoreSync`] that implements the [`KVStore`] trait.
pub struct KVStoreSyncWrapper<K: Deref>(pub K)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed yesterday, we'll also need access to this type in LDK Node as it will be part of the OutputSweeper type signature. Not sure if it then would preferable to have the user implement this wrapper. If we leave it here, it might make sense to split out all these wrappers into a separate util.rs or helpers.rs file, so they don't clutter up the codebase.

log_error!(self.logger, "Error persisting OutputSweeper: {:?}", e);
})?;
state_lock.dirty = false;
state_lock.dirty = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I think we should really unset the dirty flag once we're sure we persisted, not before. I.e., rather than unsetting and re-setting it, just set it to false in the success case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do it after persisting, another thread may also persist because it thinks the data is still dirty? What I've implemented is the outcome of a chat with @TheBlueMatt about it. I agree it isn't pretty, so open to other suggestions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do it after persisting, another thread may also persist because it thinks the data is still dirty?

Hmm, but this way we have the same issue around interleaving threads, just the other way around? I.e., we could have another thread come in acquiring the lock, making changes, persisting, at which point the failed persistence task would be blocked on re-acquiring the mutex, then would set dirty = true, even though the state was just re-persisted in the meantime, also resulting in another unnecessary persist?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is true, but I think the way the PR is currently written optimizes for the happy flow where the persist doesn't fail. So that in the happy flow, there are no unnecessary persists.

@@ -922,6 +930,173 @@ where
}
}

/// A wrapper around [`OutputSweeper`] to be used with a sync kv store.
pub struct OutputSweeperSyncKVStore<
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need yet another wrapper type for this? Do we then also need an OutputSweeperSyncKVStoreSync to complete the matrix?

I think this is really messy. If this is only added to avoid exposing KVStoreSyncWrapper, I'd much rather go this way or implement the wrapper in LDK Node.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OutputSweeperSyncKVStoreSync is what OutputSweeperSync is, so that is already present.

Don't like this either, but we wanted to guarantee that users can't misuse the wrappers, and this is the result of that. If we have consensus on another approach, I am fine with that too.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also not clear on why this needs a separate wrapper. We can have a second constructor on OutputSweeper that does the KVStore sync wrapping before returning a fully-async OutputSweeper, the only difference between this and that is that track_spendable_outputs would go from async to sync, but in this case its a user who already has support for calling most things async, so not sure why we really care.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I first had the second constructor, but that required exporting the wrapper type too which we wanted to keep private.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline. As long as we don't poll the future and expect it to be ready immediately, there is no risk in exposing the wrapper. Added secondary constructor and secondary read method and removed OutputSweeperSyncKVStoreSync

}

/// Returns a reference to the underlying [`OutputSweeper`].
pub fn sweeper_async(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't you implement deref to solve this?

Comment on lines 631 to 683
fn persist_state<'a>(
&self, sweeper_state: &SweeperState,
) -> Pin<Box<dyn Future<Output = Result<(), io::Error>> + 'a + Send>> {
let encoded = &sweeper_state.encode();

self.kv_store.write(
OUTPUT_SWEEPER_PERSISTENCE_PRIMARY_NAMESPACE,
OUTPUT_SWEEPER_PERSISTENCE_SECONDARY_NAMESPACE,
OUTPUT_SWEEPER_PERSISTENCE_KEY,
encoded,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so as the compiler would likely optimize that away, given that encoded will be an owned value (Vec returned by encode()). Still, the change that it suggests looks cleaner.

In general it will be super confusing that we encode at the time of creating the future, but would only actually persist once we dropped the lock. Starting from now we'll need to be super cautious about the side-effects of interleaving persist calls.

@ldk-reviews-bot
Copy link

👋 The first review has been submitted!

Do you think this PR is ready for a second reviewer? If so, click here to assign a second reviewer.

@joostjager
Copy link
Contributor Author

@tnull changes made diff and replied to open threads.

@joostjager joostjager requested a review from tnull July 14, 2025 09:03
Copy link
Contributor

@tnull tnull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned above, changes basically look good to me, although I'd prefer to avoid process_events_full_async by using the builder introduced in #3688.

But, generally this should be good for a second reviewer, so pinging @TheBlueMatt.

@tnull tnull requested a review from TheBlueMatt July 15, 2025 09:16
/// Trait that handles persisting a [`ChannelManager`], [`NetworkGraph`], and [`WriteableScore`] to disk.
///
/// [`ChannelManager`]: crate::ln::channelmanager::ChannelManager
pub trait Persister<'a, CM: Deref, L: Deref, S: Deref>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given Persister is only used in lightning-background-processor and we've been migrating to just using KVStore everywhere (eg its now required to use Sweeper), maybe we just kill off Persister entirely? We had Persister before we had KVStore as a way to persist the objects that the BP wanted to persist. To simplify the interface, we added the KVStore as a pseudo-wrapper around Persister. But since Sweeper now requires a KVStore explicitly, users can no longer only implement Persister, making it basically useless.

The only reason to keep it would be to avoid building the encoded Vec<u8> of the network graph and scorer for users who are avoiding persisting those objects, but I'm not entirely sure avoiding the ~50MiB memory spike during write is worth it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added first commit that removes Persister. Rebased the rest. Lots of simplication.

@@ -922,6 +930,173 @@ where
}
}

/// A wrapper around [`OutputSweeper`] to be used with a sync kv store.
pub struct OutputSweeperSyncKVStore<
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also not clear on why this needs a separate wrapper. We can have a second constructor on OutputSweeper that does the KVStore sync wrapping before returning a fully-async OutputSweeper, the only difference between this and that is that track_spendable_outputs would go from async to sync, but in this case its a user who already has support for calling most things async, so not sure why we really care.

@joostjager joostjager force-pushed the async-persister branch 5 times, most recently from 5988195 to cc5703e Compare July 18, 2025 06:06
@joostjager joostjager changed the title Async Persister trait and async OutputSweeper persistence Async background persistence Jul 18, 2025
In preparation for the addition of an async KVStore, we here remove the
Persister pseudo-wrapper. The wrapper is thin, would need to be
duplicated for async, and KVStore isn't fully abstracted anyway anymore
because the sweeper takes it directly.
@joostjager joostjager force-pushed the async-persister branch 3 times, most recently from 98cdc61 to fda8a58 Compare July 18, 2025 09:20
Copy link
Collaborator

@TheBlueMatt TheBlueMatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost LGTM, just one real comment and a doc nit.

) -> Result<Vec<u8>, io::Error>;
/// Persists the given data under the given `key`.
) -> Pin<Box<dyn Future<Output = Result<Vec<u8>, io::Error>> + 'static + Send>>;
/// Persists the given data under the given `key`. Note that the order of multiple writes calls needs to be retained
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: It may be useful to clarify what this means with an example or a suggested approach to working around it.

}

output_info.status.broadcast(cur_hash, cur_height, spending_tx.clone());
self.broadcaster.broadcast_transactions(&[&spending_tx]);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, it used to be the case that we'd first persist, wait for that to finish, then broadcast. I don't think its critical, but it does seem like we should retain that behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

Async KV Store Persister
4 participants