Ideas for improving and leveraging NDPointIndex

About `NDPointIndex` (KDTree) added in #10478, a few ideas for further improvements:

- Alignment with tolerance
  - This is a good case
  - Until Xarray's API allows providing a tolerance value when aligning objects, we could allow providing it as an option at index creation.
  - Picking a tolerance value among the ones set for each index to align would depend on the join method, e.g., pick the smallest value for `"exact"` and `"inner"` and pick the largest value for `"outer"`.
  - For tolerance > 0, we could rely Scipy KDTree's `count_neighbors`. For tolerance == 0 we could just compare the data points as implemented in this PR.

- Join and re-index
  - I guess it is possible to support it via querying all matching point pairs within the given tolerance.
  - Raise an error if tolerance == 0 ?
  - We could rely Scipy KDTree's `query(..., distance_upper_bound=tolerance)`. `query_ball_tree()` could also be a good candidate but it returns nested Python lists that might be slow to process.

- Positional indexing (isel)
  - Currently (this PR) the index is dropped after `isel`, but we could create a new one (from-scratch) instead. 
  - Implementation should be pretty straightforward, we can transform the input dimension indexers into a flat indexer for the (n_points, n_coordinates) point data array via `numpy.ravel_multi_index`.
  - Perhaps make this opt-in via an option at index creation ?
  - Alternatively we could keep the original kd-tree and keep track of the slices or indices. The big advantage is that `isel` would be almost free (no costly index rebuild), but we would have more internal state to keep track of and to deal with.

- Concat
  - I guess numpy provides all the utilities needed to implement it in a clever way so that the concatenated (n_points, n_coordinates) point data array stay consistent with the location of the concat dimension `dim` in the original coordinate variables.

- Stack
  - Would it be a useful alternative to `pandas.MultiIndex` for floating-point coordinates, even though the original distribution of the data points is regular?
  - Implementation is trivial
  - Cannot support `unstack` (it only works with `pandas.MultiIndex`)

- Interpolation
  - This is a good case for n-dimensional nearest neighbor, linear, IDW, etc. interpolation.
  - We'd need to figure out first how to plug custom indexes with Xarray's `interp` API.

- GroupBy
  - It would be great to have specialized grouper objects like `KNearestNeighborsGrouper(points: xr.DataArray, k: int)` and `BallNeigborsGrouper(points: xr.DataArray, radius: float)` (@dcherian).

- `neighbors` accessor?
  - It would be nice to have some Xarray-friendly API to, e.g., return the distances to the nearest neighbors, select k-nearest neighbors, etc.

_Originally posted by @benbovy in https://github.com/pydata/xarray/issues/10478#issuecomment-3031523478_
            

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Ideas for improving and leveraging NDPointIndex #10513

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Ideas for improving and leveraging NDPointIndex #10513

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions