-
-
Notifications
You must be signed in to change notification settings - Fork 18.7k
Closed
Labels
AlgosNon-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diffNon-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diffEnhancementExtensionArrayExtending pandas with custom dtypes or arrays.Extending pandas with custom dtypes or arrays.NA - MaskedArraysRelated to pd.NA and nullable extension arraysRelated to pd.NA and nullable extension arrays
Description
Now we start to have mask-based dtypes/arrays (integer, boolean), we should also look into making our algos work with such masked arrays. An example for which we could explore this is factorize
/ unique
.
Currently, BooleanArray and IntegerArray need to convert their masked array into a single numpy array using a certain "NA sentinel" that is specified so the algo can recognize this sentinel. This happens through the ExtensionArray._values_for_factorize
, which returns a (numpy array, NA sentinel) tuple.
In practice this means that the boolean array is converted to integer (with NA as -1), and IntegerArray is converted to float array with NA as NaN, so the algos can handle this.
We should look into:
- Can we adapt or make a specific version of the unique/factorize hashtable class that takes a mask instead of a NA sentinel
- We could then have a variant of
ExtensionArray._values_for_factorize
that then returns (array, mask) instead of (array, NA).
Metadata
Metadata
Assignees
Labels
AlgosNon-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diffNon-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diffEnhancementExtensionArrayExtending pandas with custom dtypes or arrays.Extending pandas with custom dtypes or arrays.NA - MaskedArraysRelated to pd.NA and nullable extension arraysRelated to pd.NA and nullable extension arrays