How to query APFS file name size limit? (NAME_MAX is wrong)

In APFS, the file name size limit seems to be 255 UTF-8 characters. This is what Wikipedia says (https://en.wikipedia.org/wiki/Comparison_of_file_systems#Limits), and matches my own tests. However, I cannot seem to find a concrete way to query that in a C POSIX program, or even find the official documentation that says "APFS file name is 255 UTF-8 characters" as the Wikipedia article did not actually site its source.

There is a syslimits.h, which defines NAME_MAX as follows:;

#define NAME_MAX                  255   /* max bytes in a file name */

This is extremely misleading (it's basically wrong) as APFS can handle UTF-8 characters, which probably means up to 4 * 255 = 1024 bytes. Is there an official API or compile-time constant that we can refer to? It's quite non-ideal to manually do a NAME_MAX * 4 especially when you are working on a cross-platform program that just queries NAME_MAX for bytes in say Linux. The current naive implementation of char file_name[NAME_MAX+1] will lead to buffer overflow when user has non-ASCII file names. (Edit: Obviously either way you should program defensively and check your buffer sizes regardless of whether NAME_MAX is correct)

FWIW seems like HFS+ is 255 UTF-16 characters, as specified here: https://developer.apple.com/library/archive/technotes/tn/tn1150.html

I still can't look up the official docs for APFS though. Experimentation suggests that the current macOS behavior is the same: 255 UTF-16 chars, but I don't know if it's the limitation of the tool or a system API. Characters like 'a' (1 byte) and '不' (2 byte) would could as 1 character, and wider characters like 🚧 would be considered 2 characters (so you can only have 127 of them plus another character). I still can't find out where in the documentation where that is stated though as APFS docs state that all file names are UTF-8. I guess I could just be safe and always allocate 255 * 4 assuming that a full 4-byte character could count as only 1 char. I would love it if there's an official docs that I could look up though.

One of the references of the Wikipedia article is the Apple File System Reference . It’s pretty technical, but as far as I can tell, the name of an APFS volume is stored in 256 bytes, but the name of a file is a null terminated string, and elsewhere the name of the string is stored as a 10 bit unsigned number.

Thanks! I did see the file and saw the 10-bit length but somehow didn’t register. I guess that could be as good as we get as a hard limit as it does mean you can’t get more than 1024 bytes which would match the assertion of it being 255 4-byte characters (since you have to leave one out for the null character).

I guess it’s still unclear to me why macOS treats wide 4-byte characters as two chars in the file size limit but I’m wondering if that’s more for backwards compatibility reasons as it doesn’t want you to be able to create files with names that can’t fit in HFS+. I’ll probably just be safe and reserve 1024 bytes.

By the way, experimentally, Finder won't let me set a file name with more than 255 (ascii) characters on an APFS volume. But maybe that's just Finder, not the file system.

I’ll probably just be safe and reserve 1024 bytes.

So this is where I get confused. What API are you calling that requires you to reserve space in this way? Most of our APIs don’t work that way. For example, readdir man page specifically notes that readdir_r is deprecated because of this issue, and you should use readdir instead.

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

@eskimo I'm debugging some code in a cross-platform C app that is doing some wildcard substitution and manually allocating enough space to hold file names when concatenating file names together. Admittedly this is probably not the best way to do it and ideally we do dynamic allocation only after knowing how long of a name we have, and the code also needs some bounds checking to make sure it won't crash. But basically in this situation, we need to know the max file name size in order to do a static allocation.

From reading glibc at least (which Apple doesn't use) it seems that their limits are essentially suggestive and not guaranteed to be respected: https://www.gnu.org/software/libc/manual/html_node/Limits-for-Files.html.

Anyway for the code I was looking at, it actually is calling readdir(). It's only the concatenation later that causes problems. If you do man dirent or look at usr/include/dirent.h the docs actually does say it's using 1024 bytes for a file name (there isn't a real constant you can use though), same as max path size, so that's probably what I will use for now, although the code probably should be refactored to not have to have to rely on such constants.

Edit: Just to add. Another reason why we are interested in the max file name is that we want to be able to append a postfix to the file name. So imagine we have a file some_long_name.txt, and we want to save a backup version called some_long_name.txt.backup. It's useful to be able to make the name as long as possible without truncation, but if the constant is wrong we could end up making too long of a name which fails the system call.

Admittedly this is probably not the best way to do it

Indeed. You want the system to provide you an easy-to-use ‘maximum name length’ constant but the system’s position is that it won’t provide such a value because best practice is to deal with arbitrary name lengths.

Another reason why we are interested in the max file name is that we want to be able to append a postfix to the file name … but if the constant is wrong we could end up making too long of a name which fails the system call.

Again, this isn’t best practice. As you’re aware, different volume formats have different lengths, which means the system can’t express this concept as a constant. If you’re extending a name like this, you have to do it in the context of a specific volume.

You can get volume-specific values here — see _PC_NAME_MAX and _PC_PATH_MAX in the pathconf man page — but I think you’ll find that these are also unsatisfactory. There’s a fundamental issue in play, which is that the text encoding used by the BSD APIs, UTF-8, doesn’t line up with the text encoding used by the volume format. So these APIs could reasonably return a value that is the maximum length of a name that the file system can hold, but that doesn’t guarantee that you can create every possible name of that length.

This is an extension of the name validation problem. Every volume format has its own constraints as to what represents a valid name and it’s hard to export those constraints in a way that’s useful to clients. Imagine an API that’s able to express the limits of:

  • APFS, with 255 Unicode characters, except for slash

  • HFS Plus, with 255 UTF-16 elements, except for colon [1]

  • HFS, with 31 bytes of some retro Mac text encoding which is set by the user per volume, again except for colon

  • MS-DOS/FAT, with an 8.3 name using a very limited set of characters

That’d be one hella complex API, so complex that it probably wouldn’t be useful to clients. Clients would simply fall back to “let’s create the file and see if it fails” algorithm, which is what the Finder does btw, and that’s how we got to where we are (-:

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

[1] When we introduced Mac OS X, we flipped the colon and slash limitation twice:

  • On the volume, we disallowed colon.

  • At the BSD level, we flipped this to disallow slash.

  • At higher level, we flipped it again to disallow colon.

Right. Thanks a lot for the additional pointers! I didn't know about pathconf. But you are right, this does sound like a pain. It's a good thing to know this in my backpocket but for this particular issue (fixing a problem in a popular open-source tool) I will aim for a more conservative approach (allocate at least enough memory and be conservative in the file name size when creating a derived file name).

But yeah I think you didn't even point out all the complexities in the file name validation issues since this is such a surprisingly complicated topic. HFS+ automatically normalize all UTF-8 code points, while APFS preserves existing UTF-8 code points without normalization (which I remember caused some teeth gnashing).

It does seem a little unsatisfactory to me because the "create a file and see if it fails" strategy is a reactive one and prevents you from doing certain things (e.g. properly picking the correct file name length, or pre-processing to safely escape file name characters) but given the complexity of the file name validation rules it may not be something that's easy to expose (not to mention that file name truncation is itself something to worry about if you don't want to cut off in the middle of a multi-byte-UTF-8 character).

which I remember caused some teeth gnashing

Oh, the teeth gnashing is ongoing )-: For example, this.

It does seem a little unsatisfactory to me because the "create a file and see if it fails" strategy is a reactive one

Right. I’m used to this from the networking world. I regularly see folks try to preflight network connections, and that never ends well.

not to mention that file name truncation is itself something to worry about if you don't want to cut off in the middle of a multi-byte-UTF-8 character

Oh, it’s worse than that. Splitting UTF-8 is bad, but so is splitting a character from its combining accents. In Swift the elements of a string, the Character type, represents a “single extended grapheme cluster”. But I love the way that it then goes on to say that it “approximates a user-perceived character”.

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

How to query APFS file name size limit? (NAME_MAX is wrong)
 
 
Q