Skip to content

Ideas for improving and leveraging NDPointIndex #10513

@benbovy

Description

@benbovy

About NDPointIndex (KDTree) added in #10478, a few ideas for further improvements:

  • Alignment with tolerance

    • This is a good case
    • Until Xarray's API allows providing a tolerance value when aligning objects, we could allow providing it as an option at index creation.
    • Picking a tolerance value among the ones set for each index to align would depend on the join method, e.g., pick the smallest value for "exact" and "inner" and pick the largest value for "outer".
    • For tolerance > 0, we could rely Scipy KDTree's count_neighbors. For tolerance == 0 we could just compare the data points as implemented in this PR.
  • Join and re-index

    • I guess it is possible to support it via querying all matching point pairs within the given tolerance.
    • Raise an error if tolerance == 0 ?
    • We could rely Scipy KDTree's query(..., distance_upper_bound=tolerance). query_ball_tree() could also be a good candidate but it returns nested Python lists that might be slow to process.
  • Positional indexing (isel)

    • Currently (this PR) the index is dropped after isel, but we could create a new one (from-scratch) instead.
    • Implementation should be pretty straightforward, we can transform the input dimension indexers into a flat indexer for the (n_points, n_coordinates) point data array via numpy.ravel_multi_index.
    • Perhaps make this opt-in via an option at index creation ?
    • Alternatively we could keep the original kd-tree and keep track of the slices or indices. The big advantage is that isel would be almost free (no costly index rebuild), but we would have more internal state to keep track of and to deal with.
  • Concat

    • I guess numpy provides all the utilities needed to implement it in a clever way so that the concatenated (n_points, n_coordinates) point data array stay consistent with the location of the concat dimension dim in the original coordinate variables.
  • Stack

    • Would it be a useful alternative to pandas.MultiIndex for floating-point coordinates, even though the original distribution of the data points is regular?
    • Implementation is trivial
    • Cannot support unstack (it only works with pandas.MultiIndex)
  • Interpolation

    • This is a good case for n-dimensional nearest neighbor, linear, IDW, etc. interpolation.
    • We'd need to figure out first how to plug custom indexes with Xarray's interp API.
  • GroupBy

    • It would be great to have specialized grouper objects like KNearestNeighborsGrouper(points: xr.DataArray, k: int) and BallNeigborsGrouper(points: xr.DataArray, radius: float) (@dcherian).
  • neighbors accessor?

    • It would be nice to have some Xarray-friendly API to, e.g., return the distances to the nearest neighbors, select k-nearest neighbors, etc.

Originally posted by @benbovy in #10478 (comment)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions