Zip vs Tar
Take note from this issue
Having dealt extensively with both the TAR and ZIP formats, I have concluded that both are terrible formats and consistent support for them is awful. However, in terms of better world-wide support, I vote for TAR.
Here is my assessment of the advantage and disadvantage of each:
- ZIP has its heritage in Windows and better supports Windowisms.
- TAR has its heritage in Unix and better supports Unixisms.
- ZIP was designed to be written in a random-access manner, but can be written in a streaming manner.
- ZIP must be read in a random-access manner, but some readers incorrectly assume you can read in a streaming manner.
- TAR is written in a streaming manner.
- TAR is read in a streaming manner.
- ZIP allows random-access reading between files.
- ZIP does not allow random-access reading within a file (if compression is used).
- TAR does not allow random-access reading between files
- TAR does not allow random-access reading within a file.
- ZIP has one primary format which is well-specified, but attempts to be extension friendly with its "extra" fields, which has ironically led to a huge number of variants (too many to mention). Many variants conflict with each other, but nothing prevents you from placing multiple conflicting "extra" fields together. The specifications for these extensions are not always easy to find.
- TAR has 3 main competing formats (USTAR, PAX, and GNU). USTAR is entirely a subset of PAX; so really two competing formats. The two most common tools GNU tar and BSD tar have strong support for both formats. The PAX format is standardized, and the GNU format is well-documented.
- ZIP has issues with character encoding, making exact representation of filenames difficult (especially when it comes to foreign languages). Support for the UTF-8 flag is fairly poor.
- TAR has better support for character encodings. The USTAR format is always ASCII, PAX format is always UTF-8, but unfortunately GNU format is specified as "local variant of 8-bit ASCII".
- ZIP supports symlinks via certain "extra" header extensions, but I highly discourage them as being widely-compatible in any way.
- TAR supports for symlinks.
- TAR and ZIP can both support file sizes up to 18.4EiB.
- ZIP has max path names of 64KiB.
- TAR supports unlimited path names (via GNU or PAX formats).
- ZIP has DEFLATE compression built-in, but wide support for other compression algorithms is poor.
- TAR has no compression. However, it is very common to compress an archive as the GZIP, BZIP2, XZ, or (upcoming) ZSTD formats. GZIP and ZSTD are well-specified. BZIP2 and XZ are "specified" according to the reference implementation.
- ZIP compresses on a per-file basis, while usually the entire TAR archive is compressed. Thus, TAR tends to have smaller archives. Since these Go source-code archives usually contains many small files, compressed TAR can gain a decent size reduction over compressed ZIP.
- ZIP has poor support for Unix permissions (via the various competing Unix "extra" fields).
- TAR has good support for Unix permissions.
- ZIP has builtin CRC protection for the data.
- TAR has no CRC protection for the data.
- ZIP has poor support for accurate timestamps (the original format stored the local date at 2s resolution without storing the timezone). Various "extra" fields store the timestamps as seconds since Unix epoch.
- TAR has good support for accurate timestamps.
- ZIP has no support for sparse files.
- TAR has some support for sparse files.
- The main advantage of ZIP is the ability to random-access between files. For which, I'm not sure if that feature is a deal breaker. There are ways to stripe through a TAR archive once and build an index to provide random access between files and within a file.