Skip to content

URL passed to fs.listdir instead of a path #15379

@leoleoasd

Description

@leoleoasd

Bug description

https://github.com/Lightning-AI/lightning/blob/master/src/pytorch_lightning/trainer/connectors/checkpoint_connector.py#L582
This line is called fs.listdir(dir_path) where dir_path is an URL instead of a path.

pyarrow will complain about it:

pyarrow.lib.ArrowInvalid: FileSelector.base_dir must not be a URI, got: hdfs:///somewhere/MNIST

How to reproduce the bug

use a HDFS path in trainer's default_root_dir

Error messages and logs

  File "main.py", line 63, in main
    trainer.fit(mnist_model, train_loader, valid_loader)
  File "/home/jobuser/build/yuxlu-test/environments/satellites/python/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 696, in fit
    self._call_and_handle_interrupt(
  File "/home/jobuser/build/yuxlu-test/environments/satellites/python/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 650, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
  File "/home/jobuser/build/yuxlu-test/environments/satellites/python/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 735, in _fit_impl
    results = self._run(model, ckpt_path=self.ckpt_path)
  File "/home/jobuser/build/yuxlu-test/environments/satellites/python/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1110, in _run
    self._restore_modules_and_callbacks(ckpt_path)
  File "/home/jobuser/build/yuxlu-test/environments/satellites/python/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1063, in _restore_modules_and_callbacks
    self._checkpoint_connector.resume_start(checkpoint_path)
  File "/home/jobuser/build/yuxlu-test/environments/satellites/python/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/checkpoint_connector.py", line 78, in resume_start
    self.resume_checkpoint_path = self._hpc_resume_path or checkpoint_path
  File "/home/jobuser/build/yuxlu-test/environments/satellites/python/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/checkpoint_connector.py", line 66, in _hpc_resume_path
    max_version = self.__max_ckpt_version_in_folder(dir_path_hpc, "hpc_ckpt_")
  File "/home/jobuser/build/yuxlu-test/environments/satellites/python/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/checkpoint_connector.py", line 506, in __max_ckpt_version_in_folder
    files = [os.path.basename(f["name"]) for f in fs.listdir(dir_path)]
  File "/home/jobuser/build/yuxlu-test/environments/satellites/python/lib/python3.10/site-packages/fsspec/spec.py", line 1301, in listdir
    return self.ls(path, detail=detail, **kwargs)
  File "/home/jobuser/build/yuxlu-test/environments/satellites/python/lib/python3.10/site-packages/fsspec/implementations/arrow.py", line 66, in ls
    for entry in self.fs.get_file_info(FileSelector(path))
  File "pyarrow/_fs.pyx", line 433, in pyarrow._fs.FileSystem.get_file_info
  File "pyarrow/error.pxi", line 144, in pyarrow.lib.pyarrow_internal_check_status
  File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: FileSelector.base_dir must not be a URI, got: hdfs:///somewhere/MNIST

Environment

fsspec: 2022.10.0
pyarrow: 8.0.0
pytorch_lightning: 1.7.7

More info

No response

cc @awaelchli @ananthsub @ninginthecloud @rohitgr7 @otaj

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingcheckpointingRelated to checkpointing

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions