-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Open
Description
About NDPointIndex
(KDTree) added in #10478, a few ideas for further improvements:
-
Alignment with tolerance
- This is a good case
- Until Xarray's API allows providing a tolerance value when aligning objects, we could allow providing it as an option at index creation.
- Picking a tolerance value among the ones set for each index to align would depend on the join method, e.g., pick the smallest value for
"exact"
and"inner"
and pick the largest value for"outer"
. - For tolerance > 0, we could rely Scipy KDTree's
count_neighbors
. For tolerance == 0 we could just compare the data points as implemented in this PR.
-
Join and re-index
- I guess it is possible to support it via querying all matching point pairs within the given tolerance.
- Raise an error if tolerance == 0 ?
- We could rely Scipy KDTree's
query(..., distance_upper_bound=tolerance)
.query_ball_tree()
could also be a good candidate but it returns nested Python lists that might be slow to process.
-
Positional indexing (isel)
- Currently (this PR) the index is dropped after
isel
, but we could create a new one (from-scratch) instead. - Implementation should be pretty straightforward, we can transform the input dimension indexers into a flat indexer for the (n_points, n_coordinates) point data array via
numpy.ravel_multi_index
. - Perhaps make this opt-in via an option at index creation ?
- Alternatively we could keep the original kd-tree and keep track of the slices or indices. The big advantage is that
isel
would be almost free (no costly index rebuild), but we would have more internal state to keep track of and to deal with.
- Currently (this PR) the index is dropped after
-
Concat
- I guess numpy provides all the utilities needed to implement it in a clever way so that the concatenated (n_points, n_coordinates) point data array stay consistent with the location of the concat dimension
dim
in the original coordinate variables.
- I guess numpy provides all the utilities needed to implement it in a clever way so that the concatenated (n_points, n_coordinates) point data array stay consistent with the location of the concat dimension
-
Stack
- Would it be a useful alternative to
pandas.MultiIndex
for floating-point coordinates, even though the original distribution of the data points is regular? - Implementation is trivial
- Cannot support
unstack
(it only works withpandas.MultiIndex
)
- Would it be a useful alternative to
-
Interpolation
- This is a good case for n-dimensional nearest neighbor, linear, IDW, etc. interpolation.
- We'd need to figure out first how to plug custom indexes with Xarray's
interp
API.
-
GroupBy
- It would be great to have specialized grouper objects like
KNearestNeighborsGrouper(points: xr.DataArray, k: int)
andBallNeigborsGrouper(points: xr.DataArray, radius: float)
(@dcherian).
- It would be great to have specialized grouper objects like
-
neighbors
accessor?- It would be nice to have some Xarray-friendly API to, e.g., return the distances to the nearest neighbors, select k-nearest neighbors, etc.
Originally posted by @benbovy in #10478 (comment)
dcherian and kmuehlbauer
Metadata
Metadata
Assignees
Labels
No labels