Skip to content

open_mfdataset with remote files is broken because of #9687 #9784

@phofl

Description

@phofl

What happened?

#9687

This PR broke open_mfdataset with remote files. The _normalize_path_list doesn't identify them properly and recurses into the remote file

What did you expect to happen?

This should continue to work, i.e. exit if p is not a list instead of recursing.

Minimal Complete Verifiable Example

from distributed import Client

import s3fs
import xarray as xr
s3 = s3fs.S3FileSystem()

file_list = ['s3://nex-gddp-cmip6/NEX-GDDP-CMIP6/ACCESS-CM2/historical/r1i1p1f1/hurs/hurs_day_ACCESS-CM2_historical_r1i1p1f1_gn_1950.nc']
files = [s3.open(f) for f in file_list]


cc @headtr1ck @dcherian 

if __name__ == "__main__":
    client = Client()
    # Load input NetCDF data files
    # TODO: Reduce explicit settings once https://github.com/pydata/xarray/issues/8778 is completed.
    ds = xr.open_mfdataset(
        files,
        engine="h5netcdf",
        combine="nested",
        concat_dim="time",
        data_vars="minimal",
        coords="minimal",
        compat="override",
        parallel=True,
    )

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.
  • Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

Traceback (most recent call last):
  File "/Users/patrick/Library/Application Support/JetBrains/PyCharm2024.3/scratches/scratch.py", line 19, in <module>
    ds = xr.open_mfdataset(
         ^^^^^^^^^^^^^^^^^^
  File "/Users/patrick/mambaforge/envs/dask-dev/lib/python3.11/site-packages/xarray/backends/api.py", line 1539, in open_mfdataset
    paths = _find_absolute_paths(paths, engine=engine, **kwargs)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/patrick/mambaforge/envs/dask-dev/lib/python3.11/site-packages/xarray/backends/common.py", line 149, in _find_absolute_paths
    return _normalize_path_list(paths)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/patrick/mambaforge/envs/dask-dev/lib/python3.11/site-packages/xarray/backends/common.py", line 140, in _normalize_path_list
    return [
           ^
  File "/Users/patrick/mambaforge/envs/dask-dev/lib/python3.11/site-packages/xarray/backends/common.py", line 144, in <listcomp>
    else _normalize_path_list(p)
         ^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/patrick/mambaforge/envs/dask-dev/lib/python3.11/site-packages/xarray/backends/common.py", line 140, in _normalize_path_list
    return [
           ^
  File "/Users/patrick/mambaforge/envs/dask-dev/lib/python3.11/site-packages/xarray/backends/common.py", line 144, in <listcomp>
    else _normalize_path_list(p)
         ^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/patrick/mambaforge/envs/dask-dev/lib/python3.11/site-packages/xarray/backends/common.py", line 140, in _normalize_path_list
    return [
           ^
  File "/Users/patrick/mambaforge/envs/dask-dev/lib/python3.11/site-packages/xarray/backends/common.py", line 144, in <listcomp>
    else _normalize_path_list(p)
         ^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/patrick/mambaforge/envs/dask-dev/lib/python3.11/site-packages/xarray/backends/common.py", line 140, in _normalize_path_list
    return [
           ^
TypeError: 'int' object is not iterable

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS

commit: None
python: 3.11.10 | packaged by conda-forge | (main, Oct 16 2024, 01:26:25) [Clang 17.0.6 ]
python-bits: 64
OS: Darwin
OS-release: 23.4.0
machine: arm64
processor: arm
byteorder: little
LC_ALL: None
LANG: None
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.14.3
libnetcdf: None

xarray: 2024.10.1.dev51+g864b35a1
pandas: 2.2.3
numpy: 2.0.2
scipy: 1.14.1
netCDF4: None
pydap: None
h5netcdf: None
h5py: 3.12.1
zarr: 2.18.3
cftime: None
nc_time_axis: None
iris: None
bottleneck: 1.4.2
dask: 2024.11.2+23.g709bad03e
distributed: 2024.11.2
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
fsspec: 2024.10.0
cupy: None
pint: None
sparse: 0.15.4
flox: None
numpy_groupies: None
setuptools: 75.3.0
pip: 24.3.1
conda: None
pytest: 8.3.3
mypy: None
IPython: 8.29.0
sphinx: None
None

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions