-
-
Notifications
You must be signed in to change notification settings - Fork 18.7k
Closed
Labels
API DesignCompatpandas objects compatability with Numpy or Python functionspandas objects compatability with Numpy or Python functionsExtensionArrayExtending pandas with custom dtypes or arrays.Extending pandas with custom dtypes or arrays.
Milestone
Description
Discussed briefly on the call today, but we should go through things formally.
What should the return type of Series[extension_array].values
and Index[extension_array].values
be? I believe the two options are
- Return the ExtensionArray backing it (e.g. like what Categorical does)
- Return an ndarray with some information loss / performance cost
- e.g. like Series[datetimeTZ].values -> datetime64ns at UTC
- e.g. Series[period].values -> ndarray[Period objects]
Current State
Not sure how much weight we should put on the current behavior, but for reference:
type | Series.values | Index.values |
---|---|---|
datetime | datetime64ns | datetime64ns |
datetime-tz | datetine64ns(UTC&naive) | datetime64ns(UTC&naive) |
categorical | Categorical | Categorical |
period | NA | ndarray[Period objects] |
interval | NA | ndarray[Interval objects] |
In [5]: pd.Series(pd.date_range('2017', periods=1)).values
Out[5]: array(['2017-01-01T00:00:00.000000000'], dtype='datetime64[ns]')
In [6]: pd.Series(pd.date_range('2017', periods=1, tz='US/Eastern')).values
Out[6]: array(['2017-01-01T05:00:00.000000000'], dtype='datetime64[ns]')
In [7]: pd.Series(pd.Categorical([1])).values
Out[7]:
[1]
Categories (1, int64): [1]
In [8]: pd.Series(pd.SparseArray([1])).values
Out[8]:
[1]
Fill: 0
IntIndex
Indices: array([0], dtype=int32)
In [9]: pd.date_range('2017', periods=1).values
Out[9]: array(['2017-01-01T00:00:00.000000000'], dtype='datetime64[ns]')
In [10]: pd.date_range('2017', periods=1, tz='US/Central').values
Out[10]: array(['2017-01-01T06:00:00.000000000'], dtype='datetime64[ns]')
In [11]: pd.period_range('2017', periods=1, freq='D').values
Out[11]: array([Period('2017-01-01', 'D')], dtype=object)
In [12]: pd.interval_range(start=0, periods=1).values
Out[12]: array([Interval(0, 1, closed='right')], dtype=object)
In [13]: pd.CategoricalIndex([1]).values
Out[13]:
[1]
Categories (1, int64): [1]
If we decide to have the return values be ExtensionArrays, we'll need to discuss
to what extent they're part of the public API.
Regardless of the choice for .values
, we'll probably want to support the other
use case (maybe just by documenting "call np.asarray
on it). Internally, we
have ._values
("best" array, ndarray or EA) and ._ndarray_values
(always an
ndarray).
cc @jreback @jorisvandenbossche @jschendel @jbrockmendel @shoyer @chris-b1
Metadata
Metadata
Assignees
Labels
API DesignCompatpandas objects compatability with Numpy or Python functionspandas objects compatability with Numpy or Python functionsExtensionArrayExtending pandas with custom dtypes or arrays.Extending pandas with custom dtypes or arrays.