Skip to content

improvements to parse_dtype #3264

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

d-v-b
Copy link
Contributor

@d-v-b d-v-b commented Jul 17, 2025

  • Add a new function parse_dtype. parse_data_type is kept around but it just wraps parse_dtype. The reason for this change is naming consistency -- the ZDType methods already use the "dtype" abbreviation extensively, so it's potentially confusing that parse_data_type does not.
  • Handle strings and sequences as potential json-like inputs. Adds tests to ensure that the JSON form a of a dtype is a valid argument to parse_dtype (with the exception of "|O", which is ambiguous).

closes #3263

…more JSON-like inputs, and test for round-trips
@github-actions github-actions bot added the needs release notes Automatically applied to PRs which haven't added release notes label Jul 17, 2025
@github-actions github-actions bot removed the needs release notes Automatically applied to PRs which haven't added release notes label Jul 17, 2025
@d-v-b
Copy link
Contributor Author

d-v-b commented Jul 17, 2025

cc @TomNicholas

@d-v-b d-v-b requested a review from a team July 17, 2025 14:23
@d-v-b d-v-b changed the title improvments to parse_dtype improvements to parse_dtype Jul 17, 2025
Copy link

codecov bot commented Jul 17, 2025

Codecov Report

Attention: Patch coverage is 80.00000% with 2 lines in your changes missing coverage. Please review.

Project coverage is 59.60%. Comparing base (abbdbf2) to head (d234ae2).

Files with missing lines Patch % Lines
src/zarr/core/dtype/__init__.py 75.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3264      +/-   ##
==========================================
+ Coverage   59.56%   59.60%   +0.03%     
==========================================
  Files          78       78              
  Lines        8684     8690       +6     
==========================================
+ Hits         5173     5180       +7     
+ Misses       3511     3510       -1     
Files with missing lines Coverage Δ
src/zarr/core/array.py 69.02% <100.00%> (ø)
src/zarr/dtype.py 0.00% <ø> (ø)
src/zarr/core/dtype/__init__.py 30.00% <75.00%> (+5.92%) ⬆️

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@d-v-b
Copy link
Contributor Author

d-v-b commented Jul 17, 2025

d684ada adds a test to ensure that parse_dtype is the same as parse_data_type

Copy link
Contributor

@dstansby dstansby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice - I like the name change. Having two identical functions in our API seems a bit confusing from a user POV: https://zarr--3264.org.readthedocs.build/en/3264/api/zarr/dtype/index.html#functions. Could you remove parse_data_type from __all__ so it's removed from the docs, but will still be imported and work for backwards compatibility?

Comment on lines +193 to +216
Interpret the input as a ZDType.
This function wraps ``parse_dtype``. The only difference is the function name. This function may
be deprecated in a future version of Zarr Python in favor of ``parse_dtype``.
Parameters
----------
dtype_spec : ZDTypeLike
The input to be interpreted as a ZDType. This could be a ZDType, which will be returned
directly, or a JSON representation of a ZDType, or a native dtype, or a python object that
can be converted into a native dtype.
zarr_format : ZarrFormat
The Zarr format version.
Returns
-------
ZDType[TBaseDType, TBaseScalar]
The ZDType corresponding to the input.
Examples
--------
>>> parse_dtype("int32", zarr_format=2)
Int32(endianness="little")
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say bin the docstirng here to avoid duplication, and just point to parse_dtype.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think duplication is fine here. Happy to revisit this if maintaining two docstrings is a maintenance burden, but my preference is for each function to describe what it does, even if another function does the same thing.

NullTerminatedBytes(length=10)
>>> parse_data_type({"name": "numpy.datetime64", "configuration": {"unit": "s", "scale_factor": 10}}, zarr_format=3)
DateTime64(endianness='little', scale_factor=10, unit='s')
>>> parse_dtype("int32", zarr_format=2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, does this not need an import? not an issue, just a question

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it does need an import

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in c42edf6

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

incomplete round-tripping of v3 data type json
2 participants