-
Notifications
You must be signed in to change notification settings - Fork 418
Async background persistence #3905
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
👋 Thanks for assigning @TheBlueMatt as a reviewer! |
1b95d30
to
21dc34c
Compare
3fb7d6b
to
1847e8d
Compare
1f59bbe
to
723a5a6
Compare
bc9c29a
to
90ab1ba
Compare
lightning/src/util/sweep.rs
Outdated
fn persist_state<'a>( | ||
&self, sweeper_state: &SweeperState, | ||
) -> Pin<Box<dyn Future<Output = Result<(), io::Error>> + 'a + Send>> { | ||
let encoded = &sweeper_state.encode(); | ||
|
||
self.kv_store.write( | ||
OUTPUT_SWEEPER_PERSISTENCE_PRIMARY_NAMESPACE, | ||
OUTPUT_SWEEPER_PERSISTENCE_SECONDARY_NAMESPACE, | ||
OUTPUT_SWEEPER_PERSISTENCE_KEY, | ||
encoded, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The encoded
variable is captured by reference in the returned future, but it's a local variable that will be dropped when the function returns. This creates a potential use-after-free issue. Consider moving ownership of encoded
into the future instead:
fn persist_state<'a>(
&self, sweeper_state: &SweeperState,
) -> Pin<Box<dyn Future<Output = Result<(), io::Error>> + 'a + Send>> {
let encoded = sweeper_state.encode();
self.kv_store.write(
OUTPUT_SWEEPER_PERSISTENCE_PRIMARY_NAMESPACE,
OUTPUT_SWEEPER_PERSISTENCE_SECONDARY_NAMESPACE,
OUTPUT_SWEEPER_PERSISTENCE_KEY,
&encoded,
)
}
This ensures the data remains valid for the lifetime of the future.
fn persist_state<'a>( | |
&self, sweeper_state: &SweeperState, | |
) -> Pin<Box<dyn Future<Output = Result<(), io::Error>> + 'a + Send>> { | |
let encoded = &sweeper_state.encode(); | |
self.kv_store.write( | |
OUTPUT_SWEEPER_PERSISTENCE_PRIMARY_NAMESPACE, | |
OUTPUT_SWEEPER_PERSISTENCE_SECONDARY_NAMESPACE, | |
OUTPUT_SWEEPER_PERSISTENCE_KEY, | |
encoded, | |
) | |
fn persist_state<'a>( | |
&self, sweeper_state: &SweeperState, | |
) -> Pin<Box<dyn Future<Output = Result<(), io::Error>> + 'a + Send>> { | |
let encoded = sweeper_state.encode(); | |
self.kv_store.write( | |
OUTPUT_SWEEPER_PERSISTENCE_PRIMARY_NAMESPACE, | |
OUTPUT_SWEEPER_PERSISTENCE_SECONDARY_NAMESPACE, | |
OUTPUT_SWEEPER_PERSISTENCE_KEY, | |
&encoded, | |
) | |
Spotted by Diamond
Is this helpful? React 👍 or 👎 to let us know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this real?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so as the compiler would likely optimize that away, given that encoded
will be an owned value (Vec
returned by encode()
). Still, the change that it suggests looks cleaner.
In general it will be super confusing that we encode
at the time of creating the future, but would only actually persist once we dropped the lock. Starting from now we'll need to be super cautious about the side-effects of interleaving persist calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is that an async kv store store encodes the data and stores the write action in a queue at the moment the future is created. Things should still happen in the original order.
Can you show a specific scenario where we have to be super cautious even if we have that queue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved &
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is that an async kv store store encodes the data and stores the write action in a queue at the moment the future is created. Things should still happen in the original order.
If that is the idea that we start assuming in this PR, we should probably also start documenting these assumptions in this PR on KVStore
already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added this requirement to the async KVStore
trait doc
Persister
traitPersister
trait and async OutputSweeper
persistence
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did a first pass. Changes look mostly good, although I'm not the biggest fan of the process_events_full_async
variant as well as the SyncOutputSweeperSyncKVStoreSync
in the last commit. I hope we can avoid them?
For the former, it would be useful to wait for #3688 (given it's close), and then the user could use the builder pattern to either configure an async or sync KVStore
.
@@ -769,7 +789,7 @@ use futures_util::{dummy_waker, OptionalSelector, Selector, SelectorOutput}; | |||
#[cfg_attr(feature = "std", doc = " handle.await.unwrap()")] | |||
/// # } | |||
///``` | |||
pub async fn process_events_async< | |||
pub async fn process_events_full_async< |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dislike to solve this by just doubling the API surface here. IMO, we should finally move forward with #3688 first, and then utilize the builder pattern to allow the user to plugin a sync or async KVStore
variant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should do #3688 in the form that it currently has. My opinion is that it is too much code for what it gets us.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, we may need to conclude on that debate soon then.
Note that a) we had already decided to go this way on #3612 b) have a contributor that already did the work and c) this PR is a pretty good example why we'd want the builder pattern over just adding more constructors or optional fields on them.
But probably best to discuss elsewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also temporary, because we want to move over to full async eventually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say that to solve #3612 we can also use the dummy implementations and don't necessarily need a builder. But indeed best to move that discussion to #3688.
I still think forcing the user to supply all the dummy implementation doesn't significantly improve the API situation for the user. But yeah, lets move the discussion elsewhere.
@@ -121,6 +121,58 @@ pub trait KVStoreSync { | |||
) -> Result<Vec<String>, io::Error>; | |||
} | |||
|
|||
/// A wrapper around a [`KVStoreSync`] that implements the [`KVStore`] trait. | |||
pub struct KVStoreSyncWrapper<K: Deref>(pub K) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed yesterday, we'll also need access to this type in LDK Node as it will be part of the OutputSweeper
type signature. Not sure if it then would preferable to have the user implement this wrapper. If we leave it here, it might make sense to split out all these wrappers into a separate util.rs
or helpers.rs
file, so they don't clutter up the codebase.
log_error!(self.logger, "Error persisting OutputSweeper: {:?}", e); | ||
})?; | ||
state_lock.dirty = false; | ||
state_lock.dirty = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I think we should really unset the dirty flag once we're sure we persisted, not before. I.e., rather than unsetting and re-setting it, just set it to false in the success case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do it after persisting, another thread may also persist because it thinks the data is still dirty? What I've implemented is the outcome of a chat with @TheBlueMatt about it. I agree it isn't pretty, so open to other suggestions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do it after persisting, another thread may also persist because it thinks the data is still dirty?
Hmm, but this way we have the same issue around interleaving threads, just the other way around? I.e., we could have another thread come in acquiring the lock, making changes, persisting, at which point the failed persistence task would be blocked on re-acquiring the mutex, then would set dirty = true
, even though the state was just re-persisted in the meantime, also resulting in another unnecessary persist?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is true, but I think the way the PR is currently written optimizes for the happy flow where the persist doesn't fail. So that in the happy flow, there are no unnecessary persists.
lightning/src/util/sweep.rs
Outdated
@@ -922,6 +930,173 @@ where | |||
} | |||
} | |||
|
|||
/// A wrapper around [`OutputSweeper`] to be used with a sync kv store. | |||
pub struct OutputSweeperSyncKVStore< |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need yet another wrapper type for this? Do we then also need an OutputSweeperSyncKVStoreSync
to complete the matrix?
I think this is really messy. If this is only added to avoid exposing KVStoreSyncWrapper
, I'd much rather go this way or implement the wrapper in LDK Node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OutputSweeperSyncKVStoreSync
is what OutputSweeperSync
is, so that is already present.
Don't like this either, but we wanted to guarantee that users can't misuse the wrappers, and this is the result of that. If we have consensus on another approach, I am fine with that too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also not clear on why this needs a separate wrapper. We can have a second constructor on OutputSweeper
that does the KVStore
sync wrapping before returning a fully-async OutputSweeper
, the only difference between this and that is that track_spendable_outputs
would go from async to sync, but in this case its a user who already has support for calling most things async, so not sure why we really care.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I first had the second constructor, but that required exporting the wrapper type too which we wanted to keep private.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed offline. As long as we don't poll the future and expect it to be ready immediately, there is no risk in exposing the wrapper. Added secondary constructor and secondary read method and removed OutputSweeperSyncKVStoreSync
lightning/src/util/sweep.rs
Outdated
} | ||
|
||
/// Returns a reference to the underlying [`OutputSweeper`]. | ||
pub fn sweeper_async( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't you implement deref to solve this?
lightning/src/util/sweep.rs
Outdated
fn persist_state<'a>( | ||
&self, sweeper_state: &SweeperState, | ||
) -> Pin<Box<dyn Future<Output = Result<(), io::Error>> + 'a + Send>> { | ||
let encoded = &sweeper_state.encode(); | ||
|
||
self.kv_store.write( | ||
OUTPUT_SWEEPER_PERSISTENCE_PRIMARY_NAMESPACE, | ||
OUTPUT_SWEEPER_PERSISTENCE_SECONDARY_NAMESPACE, | ||
OUTPUT_SWEEPER_PERSISTENCE_KEY, | ||
encoded, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so as the compiler would likely optimize that away, given that encoded
will be an owned value (Vec
returned by encode()
). Still, the change that it suggests looks cleaner.
In general it will be super confusing that we encode
at the time of creating the future, but would only actually persist once we dropped the lock. Starting from now we'll need to be super cautious about the side-effects of interleaving persist calls.
👋 The first review has been submitted! Do you think this PR is ready for a second reviewer? If so, click here to assign a second reviewer. |
32e17b3
to
9cf8375
Compare
9cf8375
to
c3a4c24
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned above, changes basically look good to me, although I'd prefer to avoid process_events_full_async
by using the builder introduced in #3688.
But, generally this should be good for a second reviewer, so pinging @TheBlueMatt.
lightning/src/util/persist.rs
Outdated
/// Trait that handles persisting a [`ChannelManager`], [`NetworkGraph`], and [`WriteableScore`] to disk. | ||
/// | ||
/// [`ChannelManager`]: crate::ln::channelmanager::ChannelManager | ||
pub trait Persister<'a, CM: Deref, L: Deref, S: Deref> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given Persister
is only used in lightning-background-processor
and we've been migrating to just using KVStore
everywhere (eg its now required to use Sweeper
), maybe we just kill off Persister
entirely? We had Persister
before we had KVStore
as a way to persist the objects that the BP wanted to persist. To simplify the interface, we added the KVStore
as a pseudo-wrapper around Persister
. But since Sweeper
now requires a KVStore
explicitly, users can no longer only implement Persister
, making it basically useless.
The only reason to keep it would be to avoid building the encoded Vec<u8>
of the network graph and scorer for users who are avoiding persisting those objects, but I'm not entirely sure avoiding the ~50MiB memory spike during write is worth it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added first commit that removes Persister
. Rebased the rest. Lots of simplication.
lightning/src/util/sweep.rs
Outdated
@@ -922,6 +930,173 @@ where | |||
} | |||
} | |||
|
|||
/// A wrapper around [`OutputSweeper`] to be used with a sync kv store. | |||
pub struct OutputSweeperSyncKVStore< |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also not clear on why this needs a separate wrapper. We can have a second constructor on OutputSweeper
that does the KVStore
sync wrapping before returning a fully-async OutputSweeper
, the only difference between this and that is that track_spendable_outputs
would go from async to sync, but in this case its a user who already has support for calling most things async, so not sure why we really care.
5988195
to
cc5703e
Compare
Persister
trait and async OutputSweeper
persistencecc5703e
to
e83affa
Compare
In preparation for the addition of an async KVStore, we here remove the Persister pseudo-wrapper. The wrapper is thin, would need to be duplicated for async, and KVStore isn't fully abstracted anyway anymore because the sweeper takes it directly.
98cdc61
to
fda8a58
Compare
fda8a58
to
1809349
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost LGTM, just one real comment and a doc nit.
) -> Result<Vec<u8>, io::Error>; | ||
/// Persists the given data under the given `key`. | ||
) -> Pin<Box<dyn Future<Output = Result<Vec<u8>, io::Error>> + 'static + Send>>; | ||
/// Persists the given data under the given `key`. Note that the order of multiple writes calls needs to be retained |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: It may be useful to clarify what this means with an example or a suggested approach to working around it.
} | ||
|
||
output_info.status.broadcast(cur_hash, cur_height, spending_tx.clone()); | ||
self.broadcaster.broadcast_transactions(&[&spending_tx]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, it used to be the case that we'd first persist, wait for that to finish, then broadcast. I don't think its critical, but it does seem like we should retain that behavior.
Stripped down version of #3778. It allows background persistence to be async, but channel monitor persistence remains sync. This means that for the time being, users wanting async background persistence would be required to implement both the sync and the async
KVStore
trait. This model is available throughprocess_events_full_async
.process_events_async
still takes a synchronous kv store to remain backwards compatible.Usage in ldk-node: lightningdevkit/ldk-node@main...joostjager:ldk-node:upgrade-to-async-kvstore