Skip to content

Cannot parse zip file containing 65535 files, or with a central directory offset of 0xffffffff, if not in Zip64 format #108

@AxbB36

Description

@AxbB36

Create ffff.zip containing 65535 files as follows:

$ seq 1 65535 | while read n; do touch -d '2019-05-01 00:00:00 UTC' $(printf %04x $n); done
$ TZ=UTC zip -X ffff.zip $(seq 1 65535 | while read n; do printf "%04x\n" $n; done)

UnZip 6.0 can parse it:

$ unzip -l ffff.zip | tail -n 3
        0  2019-05-01 00:00   ffff
---------                     -------
        0                     65535 files

But this yauzl program cannot:

let yauzl = require("yauzl");
yauzl.open(process.argv[2], {lazyEntries: true}, (err, zipfile) => {
    if (err)
        throw err;
    zipfile.on("entry", entry => {
        zipfile.openReadStream(entry, (err, r) => {
            if (err)
                throw err;
            let n = 0;
            r.on("data", chunk => n += chunk.length);
            r.on("end", () => {
                console.log(`${n}\t${entry.fileName}`);
                zipfile.readEntry();
            });
        });
    });
    zipfile.readEntry();
});

The error message is:

$ node ziplist.js ffff.zip
ziplist.js:4
        throw err;
        ^

Error: invalid zip64 end of central directory locator signature
    at node_modules/yauzl/index.js:154:27
    at node_modules/yauzl/index.js:631:5
    at node_modules/fd-slicer/index.js:32:7
    at FSReqWrap.wrapper [as oncomplete] (fs.js:658:17)

yauzl interprets an entryCount of 0xffff (or a centralDirectoryOffset of 0xffffffff) to mean that a Zip64 end of central directory locator must be present:

yauzl/index.js

Lines 140 to 142 in 02a5ca6

if (!(entryCount === 0xffff || centralDirectoryOffset === 0xffffffff)) {
return callback(null, new ZipFile(reader, centralDirectoryOffset, totalSize, entryCount, comment, options.autoClose, options.lazyEntries, decodeStrings, options.validateEntrySizes, options.strictFileNames));
}

APPNOTE.TXT seems to say that the implication goes the other way: instead of 0xffff ⇒ Zip64, it is Zip64 ⇒ 0xffff; i.e., a value of 0xffff does not necessarily imply that Zip64 information must be present.

4.4.1.4 If one of the fields in the end of central directory record is too small to hold required data, the field SHOULD be set to -1 (0xFFFF or 0xFFFFFFFF) and the ZIP64 format record SHOULD be created.

How some other implementations handle it

UnZip searches for a zip64 end of central directory locator unconditionally (whether or not there is a 0xffff or 0xffffffff), and does not error if the locator is not found. process.c:find_ecrec:

    /* Next: Check for existence of Zip64 end-of-cent-dir locator
       ECLOC64. This structure must reside on the same volume as the
       classic ECREC, at exactly (ECLOC64_SIZE+4) bytes in front
       of the ECREC.
       The ECLOC64 structure directs to the longer ECREC64 structure
       A ECREC64 will ALWAYS exist for a proper Zip64 archive, as
       the "Version Needed To Extract" field is required to be set
       to 4.5 or higher whenever any Zip64 features are used anywhere
       in the archive, so just check for that to see if this is a
       Zip64 archive.
     */
    result = find_ecrec64(__G__ searchlen+76);
        /* 76 bytes for zip64ec & zip64 locator */
    if (result != PK_COOL) {
        if (error_in_archive < result)
            error_in_archive = result;
        return error_in_archive;
    }

process.c:find_ecrec64:

    if (memcmp((char *)byterecL, end_centloc64_sig, 4) ) {
      /* not found */
      return PK_COOL;
    }

Python zipfile also searches for a zip64 end of central directory locator unconditionally, and does not error if it does not find the expected signature:
https://github.com/python/cpython/blob/v3.7.0/Lib/zipfile.py#L258-L259
https://github.com/python/cpython/blob/v3.7.0/Lib/zipfile.py#L282-L284
https://github.com/python/cpython/blob/v3.7.0/Lib/zipfile.py#L197-L202

    data = fpin.read(sizeEndCentDir64Locator)
    if len(data) != sizeEndCentDir64Locator:
        return endrec
    sig, diskno, reloff, disks = struct.unpack(structEndArchive64Locator, data)
    if sig != stringEndArchive64Locator:
        return endrec

Go archive/zip searches for a zip64 end of central directory locator only if entryCount is 0xffff, or centralDirectoryOffset is 0xffffffff, or the central directory size is 0xffffffff. It doesn't error if the locator is not found.
https://github.com/golang/go/blob/go1.12.4/src/archive/zip/reader.go#L502-L511

	// These values mean that the file can be a zip64 file
	if d.directoryRecords == 0xffff || d.directorySize == 0xffff || d.directoryOffset == 0xffffffff {
		p, err := findDirectory64End(r, directoryEndOffset)
		if err == nil && p >= 0 {
			err = readDirectory64End(r, p, d)
		}
		if err != nil {
			return nil, err
		}
	}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions