-
-
Notifications
You must be signed in to change notification settings - Fork 348
Add refresh_attributes() and implement cache_attrs for Group and Array #3215
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add refresh_attributes() and implement cache_attrs for Group and Array #3215
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #3215 +/- ##
==========================================
+ Coverage 94.73% 94.75% +0.01%
==========================================
Files 78 78
Lines 8646 8670 +24
==========================================
+ Hits 8191 8215 +24
Misses 455 455
🚀 New features to boost your workflow:
|
cache_attrs : bool, optional | ||
If True (default), user attributes will be cached for attribute read | ||
operations. If False, user attributes are reloaded from the store prior | ||
to all attribute read operations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this docstring says true is the default, but the parameter itself has a default of None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other places defining a cache_attrs
argument did the same, so I decided to follow suit. (:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets break with the past and ensure that the docstring matches the annotation here
anyone have opinions on whether cache_attrs should have a default of |
Can someone write up a detailed proposal of the model here? The interaction between in-memory metadata objects and their serialized counterparts is a big topic. If we're formally committing to something then we should do so carefully, and document it extensively. |
From my perspective of "just solving the issue", the crux of this PR is the As for a model... that would be a good idea, I agree. If I understand the current main branch code correctly (from a cursory reading, mind it), the current model for caching data is that the store may cache data, but Array/Group never cache data themselves, only metadata. Metadata is cached when opening an array/group, and kept cached until you discard the respective object. This PR adds the possibility of manually (or automatically) reloading just the attributes of an array or group. Ideally, there would be support for reloading the whole metadata in case there might have been changes made by others—but for that, we would need to either make arrays/groups unfrozen, or guide users to re-open the whole array/group (and replace all references to the old version) in case there have been changes made by another process. As for a better model:
Thoughts? |
Very briefly, my understand of the current model is that there isn't any metadata caching at the Python Array / Group object level ( Under that model, you can rely on your IIUC, this attribute (or more generally, metadata) caching / refreshing is equivalent to users creating a new |
" Now, if -- Since
|
On
and
I can't tell if it's just semantics or not, but I feel like there is a difference between "this object doesn't attempt to cache" (my words) and "this metadata stored for the lifetime of the Group/Array instance is a caching mechanism at the Group / Array level" (your words), despite them being functionally the same. If I read a JSON file into a dictionary: d = json.loads(Path('file.json').read_text()) Would we say that So in general, I'd push people toward recreating a Very good point about us being loose with whether or not our Overall, I'd push us towards simple, stateless objects in zarr-python. I recognize that trying to layer stateless behavior on top of a fundamentally stateful thing (objects in a IMO zarr-python can keep things simple and provide the stateless building blocks that other things can be built on top of. But that's just my loosely held opinion. I don't want to hold this up anymore if other maintainers disagree with me. |
To reply to your question:
You could easily make a point that " I guess there is some difference in semantics there; but to me, "cache" is a very loose concept, which applies even in cases when you do something as simple as: d = load_d()
print(d['a'] + d['b']) instead of: print(load_d()['a'] + load_d()['b']) I can't comment on whether a simple stateless/immutable implementation or a fancy stateful implementation is better for Zarr Python—so I'll trust you on that 😅 In this case, if you would rather double down on making things stateless, I think the best option might be to remove the metadata field from the Array/Group, and instead fetch it every time it's accessed from storage. It would make the code somewhat slower, even with caching at the Store level, since we would have to re-parse the metadata JSON every time an array is resized or even just read from (*in case someone replaced the whole array from "under our feet" and now the filters are different). ..Alternatively, we could make things simple, if inconsistent. Leave things as they are, with a (cached) copy of the metadata in the Array/Group instance, and carefully document the behavior of that. In that case, the original issue this PR was made for could be closed as "wontfix", and users advised that opening a group loads the as-of-then current metadata—while listing subgroups or reading from an array reads the state at the time of listing/reading instead. |
Should resolve #3178, if one passes
cache_attrs=False
when creating the various groups.TODO:
docs/user-guide/*.rst
changes/