Skip to content

DEPS: bump pyarrow minimum version from 10.0 to 12.0 #61723

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

jorisvandenbossche
Copy link
Member

For our support window of 2 years, we can bump the minimum pyarrow version to 12.0.1 (see list of release dates here: https://arrow.apache.org/release/, we could also directly bump to 13 assuming the final 3.0 release will happen in 1-2 months).

@jorisvandenbossche jorisvandenbossche added this to the 3.0 milestone Jun 27, 2025
@jorisvandenbossche jorisvandenbossche added Dependencies Required and optional dependencies Arrow pyarrow functionality labels Jun 27, 2025
@@ -20,11 +19,10 @@
pa_version_under18p0 = _palv < Version("18.0.0")
pa_version_under19p0 = _palv < Version("19.0.0")
pa_version_under20p0 = _palv < Version("20.0.0")
HAS_PYARROW = True
HAS_PYARROW = _palv >= Version("12.0.1")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked the current usages of HAS_PYARROW and essentially everywhere we mean it to be a supported version of pyarrow (didnt check the tests, but those we run only with supported versions anyway).

By changing the definition here, we can use HAS_PYARROW in other places to protect imports (the ones that were now using if not pa_version_under10p1), and then we don't have to update those everytime updating the minimum version.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. This line could also specify not pa_version_under12p1 to avoid specifying Version("12.0.1") twice

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left it as is for now because we would have to update that line anyway when we update the minimum version, and I find the >= Version("12.0.1") more readable compared to the "not under" logic

@@ -307,7 +307,7 @@ Dependency Minimum Version pip ex
`PyTables <https://github.com/PyTables/PyTables>`__ 3.8.0 hdf5 HDF5-based reading / writing
`zlib <https://github.com/madler/zlib>`__ hdf5 Compression for HDF5
`fastparquet <https://github.com/dask/fastparquet>`__ 2024.2.0 - Parquet reading / writing (pyarrow is default)
`pyarrow <https://github.com/apache/arrow>`__ 10.0.1 parquet, feather Parquet, ORC, and feather reading / writing
`pyarrow <https://github.com/apache/arrow>`__ 12.0.1 parquet, feather Parquet, ORC, and feather reading / writing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this is for the "Other data sources" section.

As is now also improves performance for a default dtype, should also add to "Performance dependencies (recommended)" section or will this be done in another PR, #61722?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, we indeed still have to update our installation guidelines to more prominently recommend installing pyarrow. Will open a separate issue about that (currently tracked it in the list of todo items in #54792)

Comment on lines -63 to -64
with pytest.raises(TypeError, match="different 'freq'"):
pa.array(periods, type=ArrowPeriodType("T"))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Were had an xfail for not pa_version_under10p1, so essentially we were letting this xfail for all our supported pyarrow versions. They fail because this last check, and so I removed this failing step and then also removed the xfail (in any case, it is behaviour in pyarrow outside of our control, AFAIK)

Comment on lines +50 to +53
if name == "DatetimeTZBlock":
from pandas.core.internals.api import _DatetimeTZBlock as DatetimeTZBlock

return DatetimeTZBlock
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This deprecated import was removed in #58467, which was temporarily reverted in #58715 to keep compatibility with our oldest supported pyarrow versions (this is no longer needed starting from pyarrow 15).
But when reverting, we only reverted part of the original PR: adding back ExtensionBlock, and not DatetimeTZBlock, while it was actually DatetimeTZBlock which pyarrow needed (the use of ExtensionBlock was already removed in a much older version of pyarrow). Not sure how I missed that in the original revert (we were also skipping the parquet tests that needed this for the then oldest supported pyarrow version .. that skip is now removed)

Comment on lines +126 to +127
elif klass is _DatetimeTZBlock and not isinstance(values.dtype, DatetimeTZDtype):
# pyarrow calls get here (pyarrow<15)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This goes together with temporarily adding back DatetimeTZBlock as mentioned above in https://github.com/pandas-dev/pandas/pull/61723/files#r2179667396, and updated the comment to indicate this is only needed as long as we support pyarrow < 15

@jorisvandenbossche
Copy link
Member Author

@mroeschke this should be ready now (all tests are green)

Copy link
Member

@mroeschke mroeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also update the table under the Increased minimum versions for dependencies section in the v3.0.0.rst whatsnew?

@jorisvandenbossche
Copy link
Member Author

Could you also update the table under the Increased minimum versions for dependencies section in the v3.0.0.rst whatsnew?

Done!

@jorisvandenbossche
Copy link
Member Author

Going to merge this so I can update #61722

@jorisvandenbossche jorisvandenbossche merged commit 22f12fc into pandas-dev:main Jul 3, 2025
44 checks passed
@jorisvandenbossche jorisvandenbossche deleted the bump-pyarrow-12 branch July 3, 2025 08:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Arrow pyarrow functionality Dependencies Required and optional dependencies
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants