I very much love the performance of AppleArchive and how approachable it is, and believe it to be one of the most underrated frameworks in the SDK. In a scenario quite typical, I need to compress files and submit them to a back end, where the server handling the files is not an Apple platform. Obviously, individual files compressed with AA will not be compatible with other systems out of the box, but there are compatible compression algorithms.
ZLIB is recommended for cases where cross-platform compatibility is necessary. As I understand it, AA adds additional headers to files in order to support preservation of file attributes, ownership and other data. Following the steps outlined in the docs, I've written code to compress single files. I can easily compress and decompress using AA without issue.
To create a proof-of-concept, I've written some code in python using its zlib
module. In order to get to the compressed data, it's necessary to handle the AA header fields. The first 64 bytes of a compressed file appear as follows:
AA documentation states that ZLIB Level 5 compression is used, and comes in the form of raw DEFLATE data prefixed with two header bytes. In this case, these bytes are 78 5e
, which begin at the 28th byte and appear as x^
above. My hope was that seeking to the start of the compressed data, then passing what remains to a decompressor object initialized with the correct WBITS
would work.
It works fantastically for files 1MB or less in size. Files which are larger only decompress the first megabyte. The decompressor object is reaching EOF, and I've tried various ways of attempting to seek to and concatenate the other blocks, but to no avail.
Using the older Compression framework and the method specified here, with the same algorithm, yields different results. I can decompress files of any size using python's zlib
module. My assumption is that AppleArchive is doing something differently in order to support its multithreading capabilities, perhaps even with asymmetric encoding where the blocks are not ordered.
Is there a solution to this problem? If not, why would one ever use ZLIB
versus the much more efficient LZFSE
? I could use the older Compression API, but it is significantly slower compressing synchronously, and performance is critical with the application I am adding this feature to.