-
Notifications
You must be signed in to change notification settings - Fork 37
enable merging parameters for diloco #212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
e609b5a
to
b93142d
Compare
3f8e0b8
to
323fb47
Compare
37da9d9
to
8ce038d
Compare
7a016b6
to
355deed
Compare
b5eb209
to
6ad8993
Compare
8b76f91
to
6524d16
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -719,6 +721,7 @@ def test_streaming_diloco_commit_failure( | |||
"diloco_args": { | |||
"fragment_sync_delay": fragment_sync_delay, | |||
"sync_every": 4, | |||
"fragment_update_alpha": alpha, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thoughts on adding some mocked out tests where we actually check the numerical values against some references to avoid regressions?
With a fixed seed and deterministic mode in torch
we can add regression tests by comparing against a known value.
torchft/local_sgd.py
Outdated
Merges the local and global parameters. | ||
""" | ||
for name, p in self._model_fragment.named_parameters(): | ||
torch.lerp( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this isn't inplace right? having some numerics tests would be nice to catch issues like this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep let me add this and regression test in a separate diff?
Summary: - merge local and global parameters of the model after synchronization - add the "alpha" parameter to integration tests Test Plan: ``` pytest -vs ./torchft/local_sgd_integ_test.py ```
Summary:
Test Plan:
Stack created with Sapling. Best reviewed with ReviewStack.