Skip to content

CSV inside ZIP which is not UTF-8 encoded causes UnicodeDecodeError #74

@craiga

Description

@craiga

pyexcel-io is assuming all files within a CSVZ file are UTF-8 encoded.

To demonstrate the issue, this zip file contains one CSV file which is UTF-32 encoded.

Passing it through pyexcel yields the following error:

  …
  File "…/views/upload_spreadsheets.py", line 67, in save_files
    yield (file.name, dict(self.save_book(file.get_book(), share_with_org)))
  File "…/site-packages/pyexcel_webio/__init__.py", line 203, in get_book
    return pe.get_book(**params)
  File "…/site-packages/pyexcel/core.py", line 47, in get_book
    book_stream = sources.get_book_stream(**keywords)
  File "…/site-packages/pyexcel/internal/core.py", line 39, in get_book_stream
    sheets = a_source.get_data()
  File "…/site-packages/pyexcel/plugins/sources/memory_input.py", line 40, in get_data
    sheets = self.__parser.parse_file_content(
  File "…/site-packages/pyexcel/plugins/parsers/excel.py", line 27, in parse_file_content
    return self._parse_any(
  File "…/site-packages/pyexcel/plugins/parsers/excel.py", line 40, in _parse_any
    sheets = get_data(anything, file_type=file_type, **keywords)
  File "…/site-packages/pyexcel_io/io.py", line 72, in get_data
    data, _ = _get_data(
  File "…/site-packages/pyexcel_io/io.py", line 91, in _get_data
    return load_data(**keywords)
  File "…/site-packages/pyexcel_io/io.py", line 216, in load_data
    result = reader.read_all()
  File "…/site-packages/pyexcel_io/book.py", line 157, in read_all
    result[sheet.name] = self.read_sheet(sheet)
  File "…/site-packages/pyexcel_io/readers/csvz.py", line 46, in read_sheet
    sheet = StringIO(content.decode("utf-8"))
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions