can we reject dtype inference for numpy object arrays

numpy arrays with dtype `"O"` are ambiguous, in the sense that they could contain values that zarr should store as:
- variable-length strings
- variable-length arrays
- arbitrary python objects
- etc

Unlike the object dtype, every other numpy dtype has a simple mapping to a zarr metadata representation. For these dtypes (e.g., `int8`, `int16`, etc), a user can provide a numpy array and we can automatically pick the right zarr data type representation from that array. But for the object dtype, this is not possible. Extra information is needed to resolve a zarr data type for object dtype arrays.

in zarr-python 2, we used an optional `object_codec` keyword argument to array creation routines. If a user provided `dtype=np.dtype('O')` or equivalent without a `object_codec`, then zarr-python 2 would error. 

I don't want to use this exact pattern today, because `object_codec` is not really well-defined, and this extra parameter, used only for numpy object dtypes, would greatly complicate the dtype inference for all the other dtypes. Here is my alternative proposal: we refuse to do any dtype inference for numpy object dtypes. Instead, the user must provide an explicit zarr dtype that is compatible with the numpy object dtype.

e.g.: 
`create_array(...., dtype=np.dtype('O'))` would raise an informative exception, guiding the user to do this instead:
`create_array(..., dtype=zarr.dtypes.VariableLengthString())`, or `create_array(..., dtype='numpy.variable_length_string')`

Thoughts on this pattern?



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

can we reject dtype inference for numpy object arrays #3077

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

can we reject dtype inference for numpy object arrays #3077

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions