string character codes

How do I check what the character codes are that compose a given string?

Accepted Reply

"Character code" is not a simple thing, someone assumes each byte value of UTF-8 representation as a "character code", another would think each UTF-16 code unit as a "character code".


Here, I show you how to get each Unicode code point from String, but remember this may not be an expected value as "character code".

import UIKit
var str = "Hello, playground"
str.unicodeScalars.forEach {
    print(String(format: "U+%04X", $0.value))
}
/*↓
U+0048
U+0065
U+006C
U+006C
U+006F
U+002C
U+0020
U+0070
U+006C
U+0061
U+0079
U+0067
U+0072
U+006F
U+0075
U+006E
U+0064
*/


str = "I live in \u{1F1EF}\u{1F1F5}" //Unfortunately this site is not good at showing emojis. '\u{1F1EF}\u{1F1F5}' representa a single emoji character.
str.unicodeScalars.forEach {
    print(String(format: "U+%04X", $0.value))
}
/*↓
U+0049
U+0020
U+006C
U+0069
U+0076
U+0065
U+0020
U+0069
U+006E
U+0020
U+1F1EF
U+1F1F5
*/


Or, if you want an Array of UnicodeScalar values, you can write something like this:

let unicodeScalarArray = str.unicodeScalars.map {$0.value}
print(unicodeScalarArray) //->[73, 32, 108, 105, 118, 101, 32, 105, 110, 32, 127471, 127477]
print(String(unicodeScalarArray[0], radix: 16)) //->49 ("49" is hexadecimal representaion of 73 (='I'))


Anyway, you need to clarify what your "character code" is, first.

Replies

"Character code" is not a simple thing, someone assumes each byte value of UTF-8 representation as a "character code", another would think each UTF-16 code unit as a "character code".


Here, I show you how to get each Unicode code point from String, but remember this may not be an expected value as "character code".

import UIKit
var str = "Hello, playground"
str.unicodeScalars.forEach {
    print(String(format: "U+%04X", $0.value))
}
/*↓
U+0048
U+0065
U+006C
U+006C
U+006F
U+002C
U+0020
U+0070
U+006C
U+0061
U+0079
U+0067
U+0072
U+006F
U+0075
U+006E
U+0064
*/


str = "I live in \u{1F1EF}\u{1F1F5}" //Unfortunately this site is not good at showing emojis. '\u{1F1EF}\u{1F1F5}' representa a single emoji character.
str.unicodeScalars.forEach {
    print(String(format: "U+%04X", $0.value))
}
/*↓
U+0049
U+0020
U+006C
U+0069
U+0076
U+0065
U+0020
U+0069
U+006E
U+0020
U+1F1EF
U+1F1F5
*/


Or, if you want an Array of UnicodeScalar values, you can write something like this:

let unicodeScalarArray = str.unicodeScalars.map {$0.value}
print(unicodeScalarArray) //->[73, 32, 108, 105, 118, 101, 32, 105, 110, 32, 127471, 127477]
print(String(unicodeScalarArray[0], radix: 16)) //->49 ("49" is hexadecimal representaion of 73 (='I'))


Anyway, you need to clarify what your "character code" is, first.

str.unicodeScalars.forEach shows up as a method in code completion. How would I be able to figure out what you showed me from Xcode's code completion feature?

With just relying on Xcode suggestion, it's hard to find a feature which just fits for your purpose.


Read as much documentations as you can.

For example: String

You can find some explanation about Unicode Scalar View, UTF-16 View and UTF-8 View.

Read and try.

How would I be able to figure out what you showed me from Xcode's code completion feature?

You might find both of the following helpful:

Share and Enjoy

Quinn “The Eskimo!”
Apple Developer Relations, Developer Technical Support, Core OS/Hardware

let myEmail = "eskimo" + "1" + "@apple.com"