Method to Sanitize a Single Path Component?

I'm trying to figure out if any methods in NSString, NSURL, or NSFileManager will sanitize a single path component so it can be appended to a fileURL which represents a directory.

The specific usage is a case where the user is saving multiple objects and is allowed to pick/create a directory, which will be populated with files named roughly corresponding to user-entered NSStrings. I'm already prepared to strip prohibited characters, normalize, and append a numbered suffix if some objects have the same sanitized name, but one big problem was forward slash ("/") translation. Is there a method which will interpret a string as a single path component and translate slash to colon so the user-presented filename will look the way the user expects? Some like -[NSURL URLByAppendingPathComponent:] will accept a string representing multiple components separated by slash, so no translation occurs.

There are other issues I'd like the method to handle, like stripping or refusing prohibited characters, Unicode normalization, etc, but they could be handled in other ways (e.g. -decomposedStringWithCanonicalMapping).

Accepted Reply

Is there a method which will interpret a string as a single path component and translate slash to colon so the user-presented filename will look the way the user expects?

No. In in the original Foundation design this was quite tricky because it was designed around paths, and different file systems used different path separators. In modern code, however, you should use URLs to identify file system objects, and URLs always use

/
as the path separator. Thus it is, IMO, fine to do this job with string APIs.

Keep in mind that converting

/
to
:
is not the only convention here; it’s also conventional to convert
:
to
-
. If this string really is coming from the user, it’s best to apply the second convention as they type, which is what the standard save panel does.

Finally, remember that you don’t need to do this in the reverse case. To display the name of a file system object, use the

NSURLLocalizedNameKey
property and it’ll take care of this and many other complexities.

I'm already prepared to strip prohibited characters, normalize, …

What prohibited characters are you planning to strip? The only one you absolutely must strip is U+0000 NULL. The file system should handle pretty much anything else but there are good reasons to strip other things (like anything that’s not printable). Doing that in a fully Unicode-savvy could be kinda tricky.

On the normalisation front, you shouldn’t need to normalise the name. The general rule here is to ignore normalisation on the way ‘down’ and then handle it properly on the way ‘up’. You have to handle weird normalisations coming up from the file system anyway, and in that case there’s no point normalising on the way down.

Share and Enjoy

Quinn “The Eskimo!”
Apple Developer Relations, Developer Technical Support, Core OS/Hardware

let myEmail = "eskimo" + "1" + "@apple.com"

Replies

Is there a method which will interpret a string as a single path component and translate slash to colon so the user-presented filename will look the way the user expects?

No. In in the original Foundation design this was quite tricky because it was designed around paths, and different file systems used different path separators. In modern code, however, you should use URLs to identify file system objects, and URLs always use

/
as the path separator. Thus it is, IMO, fine to do this job with string APIs.

Keep in mind that converting

/
to
:
is not the only convention here; it’s also conventional to convert
:
to
-
. If this string really is coming from the user, it’s best to apply the second convention as they type, which is what the standard save panel does.

Finally, remember that you don’t need to do this in the reverse case. To display the name of a file system object, use the

NSURLLocalizedNameKey
property and it’ll take care of this and many other complexities.

I'm already prepared to strip prohibited characters, normalize, …

What prohibited characters are you planning to strip? The only one you absolutely must strip is U+0000 NULL. The file system should handle pretty much anything else but there are good reasons to strip other things (like anything that’s not printable). Doing that in a fully Unicode-savvy could be kinda tricky.

On the normalisation front, you shouldn’t need to normalise the name. The general rule here is to ignore normalisation on the way ‘down’ and then handle it properly on the way ‘up’. You have to handle weird normalisations coming up from the file system anyway, and in that case there’s no point normalising on the way down.

Share and Enjoy

Quinn “The Eskimo!”
Apple Developer Relations, Developer Technical Support, Core OS/Hardware

let myEmail = "eskimo" + "1" + "@apple.com"

It's interesting you mention NSSavePanel because it looks like it could be used as a (very expensive) sanitizer. Setting -directoryURL and -nameFieldStringValue will compose URLs with the slash/colon translation, but it looks like colon/dash is only done on keystroke; colons set in the field programmatically are translated back to slashes on display. At the very least I could use it in unit tests for my sanitizer.

For compatibility with other filesystems I've been using the following snippet to strip prohibited characters:

NSUInteger b[] = {0x03ff700100000000, 0x07fffffe87fffffe};
return [NSCharacterSet characterSetWithBitmapRepresentation:[NSData dataWithBytesNoCopy:b length:sizeof(b) freeWhenDone:false]];
NSSavePanel
… could be used as a (very expensive) sanitizer.

Well that’s… creative (-:

Apropos creative solutions, it’d be perfectly reasonable for you to file an enhancement request requesting an API that does this high-level task. Please post your bug number, just for the record.

For compatibility with other filesystems I've been using the following snippet to strip prohibited characters:

I tend to avoid

NSCharacterSet
because it has known issues with non- BMP characters. However, I presume you’re excluding all non-BMP characters anyway, so it won’t cause you any problems in practice.

Of more concern is the fact that you’re using a multibyte type (

NSUInteger
) to store a byte-oriented data structure (the contents of an
NSData
). I suspect this will fail if we ever switch back to big endian. It’d be better to do this:
uint8_t b[] = {0x03f, 0xff, … };

Oh, another reason to avoid

NSUInteger
is is that it’s pointer sized, so if your code ever finds itself on a 32-bit platform [1], it’ll fail to compile.

Share and Enjoy

Quinn “The Eskimo!”
Apple Developer Relations, Developer Technical Support, Core OS/Hardware

let myEmail = "eskimo" + "1" + "@apple.com"

[1] 32-bit platforms are few and far between these days, at least in the Apple ecosystem, but they do still exist (old watches).