Post

Replies

Boosts

Views

Activity

API to mark files as duplicate?
APFS currently supports copy-on-write and internally marking files as deltas of each other, which is great. This feature usually calls for two operations to be supported:cloning or reflinking, which is available on the file level via clonefile(2).marking two files as duplicates of each other so they would become clones, which is the subject of this question.People are scratching their heads writing deduplication software that tries to emulate (2) using (1), but the implementation always comes out a bit sloppy. Sure, copy the Unix attributes and everything back to the new "clone", and then you get reminded that there is ACL, extra forks, and all those extended attributes. This is inelegant and easy to get wrong. APFS needs to provide such a feature as a primitive from the driver itself.Linux is not exactly the place people should look for good interface designs in, but what their Btrfs (now generalized) provides is fairly good. Instead of operating on files themselves, they provide a bit more control by addressing on file data, treating them as a chunk of a continuous blob to be taken offsets and lengths of. Now here are the two operations we have seen again, with a bit more flavor:ioctl_ficlone(2): clone the data from one fd into anotherioctl_ficlonerange(2): clone the data from one fd into another, but only the chunk requested by an offset-length pairioctl_fideduperange(2): take two fds, two offsets, and a length, tell the filesystem to let file1[off1...off1+len] and file2[off2...off2+len] share storage if they are identicalWith these primitives, programmers will find their work deduplicating files much easier. Apple should seriously consider adding these interfaces to take advantage of what the APFS is capable of.
2
0
591
Jan ’20