Issues with UTF-8 encoding and `defaults read` command

Given I run this command:

$ defaults write com.example.encoding room -string "Baño"

plutil shows that it is properly stored in UTF-8, and the character is correct:

$ plutil -convert xml1 ~/Library/Preferences/com.example.encoding.plist -o -
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
	<key>room</key>
	<string>Baño</string>
</dict>
</plist>

I can also get this value from PlistBuddy:

$ /usr/libexec/PlistBuddy -c "Print :room" ~/Library/Preferences/com.example.encoding.plist 
Baño

However - if I try using defaults read, the value comes back garbled - the special character (ñ) does not get returned correctly:

$ defaults read com.example.encoding room
Ba\361o

Is there some way to set the encoding that defaults read uses?

Accepted Reply

Is there some way to set the encoding that defaults read uses?

No. This isn’t really a text encoding issue — defaults does the right thing at that level — but rather an escaping issue. defaults read processes the incoming string as UTF-16 and then escapes as follows:

  1. For ASCII printables, there’s no escaping.

  2. If the value is below U+0100 it uses \ooo, where ooo is on octal sequence.

  3. Otherwise it prints, \uhhhh, where hhhh is a hex sequence.

You can see the last point in action here:

% defaults write com.example.encoding room -string "Baño😀" 
% defaults read com.example.encoding room                  
Ba\361o\ud83d\ude00

The emoji here is U+1F600 GRINNING FACE, whose UTF-16 surrogate pair is d83d de00.

Some more digging reveals that:

  • This encoding is what you get when you call data(using:allowLossyConversion:) with String.Encoding.nonLossyASCII

  • There’s no way to override this in the defaults command.

If you’d like this to change, I encourage you to file an enhancement request against that command. Please post your bug number, just for the record.

There’s a couple of ways you might work around this but it kinda depends on your situation. Can you explain more about your intended workflow here?

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

Replies

Is there some way to set the encoding that defaults read uses?

No. This isn’t really a text encoding issue — defaults does the right thing at that level — but rather an escaping issue. defaults read processes the incoming string as UTF-16 and then escapes as follows:

  1. For ASCII printables, there’s no escaping.

  2. If the value is below U+0100 it uses \ooo, where ooo is on octal sequence.

  3. Otherwise it prints, \uhhhh, where hhhh is a hex sequence.

You can see the last point in action here:

% defaults write com.example.encoding room -string "Baño😀" 
% defaults read com.example.encoding room                  
Ba\361o\ud83d\ude00

The emoji here is U+1F600 GRINNING FACE, whose UTF-16 surrogate pair is d83d de00.

Some more digging reveals that:

  • This encoding is what you get when you call data(using:allowLossyConversion:) with String.Encoding.nonLossyASCII

  • There’s no way to override this in the defaults command.

If you’d like this to change, I encourage you to file an enhancement request against that command. Please post your bug number, just for the record.

There’s a couple of ways you might work around this but it kinda depends on your situation. Can you explain more about your intended workflow here?

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

Thank you - that makes sense with what I am seeing.

The workflow is to be able to store/recover string values from within a system of shell scripts. Is there some shell command that would allow us to unescape the values?

Is there some shell command that would allow us to unescape the values?

Nothing that immediately springs to mind. It’s trivial to undo this encoding in code:

let d = Data(#"Ba\361o\ud83d\ude00"#.utf8)
let sQ = String(bytes: d, encoding: .nonLossyASCII)

but I can’t think of a good way to do this from the command line.

Another way to tackle this would be to read the value without the encoding. You can do that in code (by calling CFPreferencesCopyAppValue) but the tricky part is finding a developer tool that’s a) installed by default, and b) can easily call CF. One option is Python, taking advantage of PyObjC:

% python
…
>>> from CoreFoundation import *
>>> s = CFPreferencesCopyAppValue("room", "com.example.encoding")
>>> print(s)
Baño😀
>>> 

Of course you then have to ignore all the dire warnings about Python 2.x being deprecated (-:

ps Don’t forget to file that bug!

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

Another way to tackle this would be to read the value without the encoding.

Found it!

You can use the export command to extract the entire domain into a plist and then use PlistBuddy to extract the string:

% defaults export com.example.encoding tmp.plist  
% /usr/libexec/PlistBuddy -c "Print room" tmp.plist
Baño😀

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

It actually works without doing the export as well (if you just point to the original plist in ~/Library)...in my original post I listed PlistBuddy as an approach - and it will be sufficient for me as a workaround.

I hadn't logged a bug, since it seemed to be working as designed (using data(using:allowLossyConversion:)) and for my particular case, I need something that will work in older versions of the OS as well. I have now logged it as FB9160695. Thank you for the pointers and the guidance!

if you just point to the original plist in ~/Library

Don’t do that!

The backing store for user defaults is not something you want to rely on. On modern systems user defaults are actually managed by a system process (cfprefsd) and accessing preferences files behind its back will not end well. By using export you extract the domain’s preferences to a property list file that you know is a consistent snapshot of the domain and has the expected format.

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"