-
-
Notifications
You must be signed in to change notification settings - Fork 18.8k
Closed
Labels
BugClosing CandidateMay be closeable, needs more eyeballsMay be closeable, needs more eyeballsCompatpandas objects compatability with Numpy or Python functionspandas objects compatability with Numpy or Python functionsNumeric OperationsArithmetic, Comparison, and Logical operationsArithmetic, Comparison, and Logical operationsRegressionFunctionality that used to work in a prior pandas versionFunctionality that used to work in a prior pandas version
Description
Using pandas 1.0.5 and latest dask 2020.12.0:
In [2]: import dask.dataframe as dd
In [3]: df = pd.DataFrame({"x": ["a", "b", "c"] * 100}, dtype="category")
...: ddf = dd.from_pandas(df, npartitions=3)
In [4]: df.x
Out[4]:
0 a
1 b
2 c
3 a
4 b
..
295 b
296 c
297 a
298 b
299 c
Name: x, Length: 300, dtype: category
Categories (3, object): [a, b, c]
In [5]: ddf.x
Out[5]:
Dask Series Structure:
npartitions=3
0 category[known]
100 ...
200 ...
299 ...
Name: x, dtype: category
Dask Name: getitem, 6 tasks
In [6]: df.x == ddf.x
Out[6]:
0 True
1 True
2 True
3 True
4 True
...
295 True
296 True
297 True
298 True
299 True
Name: x, Length: 300, dtype: bool
In [9]: (df.x == ddf.x).all()
Out[9]: True
But with master (using same dask version), this gives:
In [3]: df.x == ddf.x
Out[3]:
0 False
1 False
2 False
3 False
4 False
...
295 False
296 False
297 False
298 False
299 False
Name: x, Length: 300, dtype: bool
Metadata
Metadata
Assignees
Labels
BugClosing CandidateMay be closeable, needs more eyeballsMay be closeable, needs more eyeballsCompatpandas objects compatability with Numpy or Python functionspandas objects compatability with Numpy or Python functionsNumeric OperationsArithmetic, Comparison, and Logical operationsArithmetic, Comparison, and Logical operationsRegressionFunctionality that used to work in a prior pandas versionFunctionality that used to work in a prior pandas version