Storage Size of a String

I'm pushing and pulling a lot of data from a SQLite database using my own library and perfoming data analysis/modeling of the objects in memory.


I would like to estimate on how much memory my internal data objects are currenlty using on an object-by-object or group of objects basis within the program itself. It looks like there is no simple "ruler" to give me the size of an instantiated object in bytes. I am building my own and I can easily estimate the Int/Double array storager requirements (using MemoryLayout<>.stride for these types conbined to structures).


I can't find a definition for the internal representation of the character data in a Swift String. NSString documentation is very clear and states UTF16 enconding, so I am multiplying the length by 2 to get the 8-bit btyes required for internal storage (does't not need to be perfect, just an estimate).


Does a Swift String use the same internal storage model as NSString? Or is it using UTF32? Is there are more robust way to determine the storage requirements of an individual String? And for an array of Strings? It would be awesome if there was a function like storageSizeInBtyes([String Array]) or storageSizeInBtyes(MasterDataStorageObject).

Accepted Reply

Have you reviewed the discussion of swift strings here?


h t t p s : / / www.mikeash.com/pyblog/friday-qa-2015-11-06-why-is-swifts-string-api-so-hard.html


Also note that if your motivation, at least in part, stems from reliance on a well defined ABI, that it appears Swift's has been backburner'd w/3.0, so...


Swift is just now starting to mature w/3.0, so it may help to keep that in mind, at least for the next year, I think.


Good luck in any case.


Ken

Replies

Swift strings may use a variety of coding models. Assuming the database string is stored as UTF-8, then you can get a count from swiftString.utf8.count. Alternatively, you can get a UTF-16 count or a UTF-32 count, if you need those, depending on how you actually store the string data.

I understand that Swift can give me back a String in a variety of formats; however, I want to know what the internal memory model is so I can calculate the RAM memory storage that is taken. I may have a database of ASCII data that is stored as UTF-8 (that is one btye per character). If there is 100 MB worth of string data, then the interal NSString memtory storage for that string data would be 200 MB because of its UTF-16 encoding (or double that again for UTF-32). The storage requirements using my own custom C character arrays or Foundation's NSString are very clear. The memory storage for Swift is not.


Again, I just want to know the internal memory coding for character data that makes up a Strings in Swift. If Swift varies the INTERNAL storage model, fine - but how do I tell what model it is using? And if it varies the internal model, can I force it to use the model I would like? The internal memory requirements for large data sets with UTF-8 versus UTF-32 are significant.


I can't seem to find a detailed Specification of the Swift String type that would allow me to understand this.

The storage requirements using … Foundation's NSString are very clear …

That’s not true. You’re drawing too many conclusions from NSString’s simple external API. Internally NSString is much more complex:

  • It will use 8-bit encodings in some common scenarios

  • Certain common strings are held in a global table, meaning that extra copies don’t take any memory

  • On 64-bit systems, small string values can be stored entirely in the string pointer, using a mechanism known as tagged pointers

Swift’s

String
implementation is at least as complex as this, because a Swift
String
can just hold a reference to an NSString.

Beyond that,

String
does not make any guarantees about how its data is stored. If you want to look behind the abstraction layer, you can take a look at the Swift standard library source code. However, I suspect you’re approaching this from the wrong angle. If you’re dealing with this much string data, it seems to me that any standard string implementation is going to add overhead that you don’t need, and that you could create something completely custom that would offer significant benefits. As a simple example, you could memory map the file and then create NSStrings using
-initWithBytesNoCopy:length:encoding:freeWhenDone:
; then you want be using any memory for string data.

btw I presume you’re working on iOS here. On macOS I would just ignore this issue, at least until you’ve got a prototype up and running and can actually profile things. I suspect that any modern Mac will handle a few hundred meg of string data without even breaking into a sweat (-:

Share and Enjoy

Quinn “The Eskimo!”
Apple Developer Relations, Developer Technical Support, Core OS/Hardware

let myEmail = "eskimo" + "1" + "@apple.com"

Just spend 10 min typing a replay that I could not post because of "The message contains invalid characters" - of course - it would not highlight which characters where invalid 😟


I'll make the request very simple: Can I get the size of the actual internal memory used to store a Swift string in bytes? If Swift is storing "apple" as UTF-8, UTF-16 or UTF-32 or just as pointer to some master string directory to the word apple - great. But are you telling me I am not allowed to know how much memory it is actually using? To me this seems like a very basic and reasonable question to ask. I just want a summary of how much RAM I am using.

The answer to your question is no, for the reasons Quinn already said.


If you are genuinely concerned about "filling up" memory with large strings, then you'd be better off to store them in a format you know (e.g. as UTF-8 data in a Data instance) and discard the original string. In a sense, "String" is the wrong type for the data you're representing: you need a concrete representation type rather than an abstract representation type.


You've still got another problem, though. Even if you know the total storage requirements of all your strings, how are you going to decide when you have too much data? In macOS and iOS there isn't a straightforward measure of how much memory is "available" to your app.


In other words, you might be trying to calculate an inscrutable percentage of an inscrutable number. Your assumptions here are going to need to be so conservative that it's hard to see how the result will help you.

Thank-you for the response.


I'm not going to retype my original response because I don't know which characters are invalid. I do everything you said in C right now. I want to use the Sting functionality that Swift provides in some of my analysis routines - I loose that by converting to data (or spending so much time going back and forth).


I can make some guesses on the memory requirements for the Strings and go from there (just like I would for the memory requirements of my macOS program).


Stepping back from just this example, I do find the ambiguity of how a basic data type is stored puzzling - and goes beyond my initial question. As a C programmer you are accustomed to managing memory yourself. I understand that I don’t need to worry about that and Strings are stored in a complex manner for either memory efficiency or speed efficiency. But what is wrong with trying to gain an understanding of this process?


So many forum posts have ended hitting a brick wall with this size question (and hence my post) - because there is literally a brick wall between you and what is going on behind the scenes with the storage of strings. You can yell over the wall and ask it to throw buckets of UTF-8, UTF-16, or UTF-32 over. But a window was not placed in the wall to see how it was being stored on the other side. Or yelling over the wall and asking the most basic question of any data type - how much storage is in use - is met with a “you don’t need to know”, “not in my job description”, "impossible to tell! (that is, without a little extra work on my part)" or “crazy question - use some other programming method if you want to know basic information” coming back. 😐

Have you reviewed the discussion of swift strings here?


h t t p s : / / www.mikeash.com/pyblog/friday-qa-2015-11-06-why-is-swifts-string-api-so-hard.html


Also note that if your motivation, at least in part, stems from reliance on a well defined ABI, that it appears Swift's has been backburner'd w/3.0, so...


Swift is just now starting to mature w/3.0, so it may help to keep that in mind, at least for the next year, I think.


Good luck in any case.


Ken

It's not so much "you don't need to know", but that there are no simple answers. A String has developed into something sophisticated, with multiple representations, and the questions really do get hard to answer.


For example, Swift Strings are value types that have copy-on-write behavior (at least in some representations). That means that if it took X bytes to store the string originally, it still only takes X bytes to store 2 copies of the string, until you mutate it. So, how much memory do the two copies use, X bytes or 2X bytes? There's really no answer to the question, because the truth depends on when you ask as much as what you ask. Similarly, because of autorelease pools, objects may consume memory when they the are non-existent from your code's point of view. Similarly, when objects are swapped out of virtual memory, they occupy address space, but do they get regarded as occupying RAM too?


That's why I said that if you need to make the storage size of your string a functional aspect of your code, then you probably need to control the representation (UTF-8, etc) as well.