Serializing generic structs

What is the best approach to archive/serialize generic structs?


I cannot use NSCoding (since structs are not NSObjects).

I can translate my structs to some other representation (such as JSON strings),

but I'm struggling to unarchive/deserialize the structs back from the JSON

if the structs are generic.


More precisely, I'm wrestling with the type system to allow me to express

this in an elegant, maintainable way.

Answered by QuinceyMorris in 18725022

Well, it's a bit voluminous, and getting the generic creation function to compile was a bit strange, but this is what I ended up with:


protocol SerializableType {
  static var serializableName: String { get }
  var serializableProperties: [String: Any] { get }
  init (serializedProperties: [String: Any])
}

extension Int: SerializableType {
  static var serializableName: String { return "Int" }
  var serializableProperties: [String: Any] {
  return ["value": self]
  }
  init (serializedProperties: [String: Any]) {
  self = serializedProperties ["value"] as! Int
  }
}

extension String: SerializableType {
  static var serializableName: String { return "String" }
  var serializableProperties: [String: Any] {
  return ["value": self]
  }
  init (serializedProperties: [String: Any]) {
  self = serializedProperties ["value"] as! String
  }
}

struct A: SerializableType {
  static var serializableName: String { return "A" }
  var serializableProperties: [String: Any] {
  return ["a1": a1, "a2": a2]
  }
  init (a1: Int, a2: String) {
  self.a1 = a1
  self.a2 = a2
  }
  init (serializedProperties: [String: Any]) {
  a1 = serializedProperties ["a1"] as! Int
  a2 = serializedProperties ["a2"] as! String
  }
  var a1: Int
  var a2: String
}

struct B<T: SerializableType>: SerializableType {
  static var serializableName: String { return "B<\(T.serializableName)>" }
  var serializableProperties: [String: Any] {
  return ["b1": b1, "b2": b2, "b3": b3 as Any]
  }
  init (b1: Int, b2: String, b3: T) {
  self.b1 = b1
  self.b2 = b2
  self.b3 = b3
  }
  init (serializedProperties: [String: Any]) {
  b1 = serializedProperties ["b1"] as! Int
  b2 = serializedProperties ["b2"] as! String
  b3 = serializedProperties ["b3"] as! T
  }
  var b1: Int
  var b2: String
  var b3: T
}

func serialize (instance: SerializableType) -> [String: Any] {
  return ["type": instance.dynamicType.serializableName, "values": instance.serializableProperties]
}

func deserialize (serialization: [String: Any]) -> SerializableType {

  let type = serialization ["type"] as! String
  let typeName: String = …
  let typeParameter: String? = …
  let values = serialization ["values"] as! [String: Any]

  if let typeParameter = typeParameter {
  switch typeParameter {

  case "A":
       return deserializedValueWithParameter (A.self, typeName: typeName, values: values)
  case "Int":
       return deserializedValueWithParameter (Int.self, typeName: typeName, values: values)
  case "String":
       return deserializedValueWithParameter (String.self, typeName: typeName, values: values)
  default:
       fatalError ()
  }
  }

  else {
  switch typeName {
  case "A":
       return A (serializedProperties: values)
  case "Int":
       return Int (serializedProperties: values)
  case "String":
       return String (serializedProperties: values)
  }
  }
}

func deserializedValueWithParameter<T: SerializableType> (parameterType: T.Type, typeName: String, values: [String: Any]) -> SerializableType {

  switch typeName {
  case "B":
       return B<T> (serializedProperties: values)
  default:
       fatalError ()
  }
}

let b = B (b1: 0, b2: "X", b3: 1)
let bb = serialize (b)
let bbb = deserialize (bb)


Note:

— The code for serializing and deserializing actually needs to be recursive, but for reasons of space I didn't write that code here.

— I've already written all of the non-generic implemention of this as real code in a project I'm working on. In the real code, known types like Int and String are special cased, rather than handled like the above.

— I didn't write the code to analyze the "B<A>" string into its components, so the last 'bbb = …' line crashes, but the rest of it works in a playground.

— I didn't write any error handling, I just threw in "as!" everywhere.

— I didn't really serialize anything, just converted to a plist-style dictionary that can be serialized easily into JSON, or a plist, or a NSKeyedArchiver archive, according to preference.

— But the number of cases for generics is additive, not multiplicative, so this approach looks a lot better if the number of serializable types is bigger.

— Sorry, the indentation got messed up when I pasted.

Sigh, because, as it says in the documentation for the NSCoding protocol:

The NSCoding protocol declares the two methods that a class must implement so that instances of that class can be encoded and decoded.


It says class there, not struct, for a couple of reasons that has to do with the basics of concepts such as static and dynamic typeing, value and reference types etc (which I have tried to explain here over and over again).


What you are doing is essentially just wrapping reference types within value types (structs and enums), which I don't really see the point in doing (yes, it's done in the implementation of Swift's Array etc, I know) but the OP wants the type checking and other good things about value types, all of which goes away as soon as Any, AnyObject etc enters the picture, which they do if the value types are just thin wrappers for dynamic stuff.


Ok, now I will leave the thread : )

The basic problem is that you cannot dynamically create a generic. Generics are compile time constructs. You cannot know at compile time what might come from an archived object.

You would need to call a factory function that returns a T without knowing what that T is.

consider:


func serializedGeneric<T>(dictRep:NSDictionary) -> Serializable<T> {


How does one call that without knowing what T is at compile time? I tried, and aside from the solution being messy, this is the point where it falls apart. You can only get a dynamic type from a factory, even if you *can* encode the generic. Here's as far as I got, perhaps someone has a clever solution:


// Playground - noun: a place where people can play
import Foundation
protocol SerializableType {
    var typename:String {get}

    func string() -> String
    static func fromString(stringRep:String) -> Self
}
extension Int : SerializableType {
    var typename:String { get { return "int" } }

    func string() ->  String {
        return String(self)
    }

    static func fromString(stringRep:String) -> Int {
        return stringRep.toInt()!
    }
}
protocol SerializableStruct {
    func asDictionary() -> NSDictionary
}
struct A<T: SerializableType> : SerializableStruct {
    var value:T
    var name = "bob"
    init(value:T) {
        self.value = value
    }

    init?(dict:NSDictionary)  {
        value = T.fromString(dict["value"] as! String)
        name = dict["name"] as! String
    }

    func asDictionary() -> NSDictionary {
        return ["typename":value.typename, "value": value.string(), "name":name]
    }
}
func serializableFactory(dict:NSDictionary) -> SerializableStruct? {

    switch dict["typename"] as! String {
    case "int":
        return A<Int>(dict: dict)
    default:
        return nil
    }
}
let a = A<Int>(value:1)
let dictRep = a.asDictionary()
let deser = serializableFactory(dictRep)!

I think this wouldn't work because with this approach, you need to provide a concrete type for the parameter of FooStruct when calling its decoding init? method. However, consider the scenario when you want to decode an array, and you do not know if it was an Array<String> or Array<Int> when encoded.

Therefore, you cannot specify whether you want to call Array<Int>(coder: ) or Array<String>(coder:) beforehand.


In your case, you'd need to call FooStruct<WhatHere?>(coder), but you don't actually know what to put as the type parameter to FooStruct, which means you have no means of calling the correct FooStruct type initializer.


Some extra information must be written somewhere in the archive about the type of the array, and then it must be somehow used in a systematic way to allow generics decoding. That's the puzzle.

I think you just hit the nail on the head with the first clear explanation of the underlying (hard) problem.


EDIT: I'm inclined to mark your solution as the correct reply because I have a suspicion that this might actually be the best possible solution, or at least pretty close to it. The bad part is of course the switch in the serializableFactory function, but you already know that. You'd need to add another case to that switch every time you add a new serializable struct. It's a huge progress from the heavily asymmetric solutions previously suggested in this thread, however.


Another bad part is having separate protocols for SerializableType and SerializableStruct, although it might be possible to unify these somehow.


If no better answer appears in a week or so, I'll mark your reply as the correct answer. Thanks Monicker!

Thank you Monicker for clearly saying (yet again) what I have been trying to say for ever and ever here : )


Monicker wrote

"You would need to call a factory function that returns a T without knowing what that T is."


Which also applies to non generic structs, so the (hard (impossible)) problem is not any easier for non-generic structs (since they are of course compile time constructs too). Perhaps this can be easier to see by simply replacing "T" with "certain struct type" in the above, so that it reads:


"You would need to call a factory function that returns a certain struct type without knowing what that certain struct type is."


MikeA: Think carefully about what it says there. It is impossible. Right?

If you do understand this now: Please re-read our (sometimes a bit harsh) conversation in this thread with your new insights fresh in memory, and maybe then any hard feelings might go away, even though the hard problem won't go away.
: )

(returning to the top level because the text boxes are getting too narrow)


I think you're all wrong! You're confusing two different problems, one major and one minor. (Generic structs are the minor one.)


The major problem is that any serialization scheme, to be useful, must encode the type of each object (by which I mean an instance of a class, struct or enum) in the archive. Not only does Swift not have a way of giving something for a type that's readily archivable, it doesn't provide you with a way of getting back, at deserialization time, arbitrary types from their serialized representations. That means you're forced to translate serialized types into type literals via a great big switch statement (or equivalent). That also means that your archiving system can only represent a finite number of the infinite number of possible Swift types, and the ones it knows must be known explicitly.


Incidentally, there's another complication here that you haven't considered. In general, a class name in Swift isn't unique, it's only unique within a module. In general, therefore, there's a danger of ambiguity when encoding objects of types in different modules. You could make these unique by encoding the module name, but there's no guarantee that the unarchiving code has the various classes in modules of the original names. IOW, your archived type identifiers must be globally and permanently unique in the archiving system, and therefore would need to be assigned manually as a design step.


None of this is any harder or easier if some of the types involved are generic. If the archivable type representation is a string, for example, so that struct X's type is represented as "X", then struct S<T> can be represented as "S.T". It's not really different from the non-generic case.


The only drawback here is the minor problem that this tends to multiply the number of cases known explicitly to the deserializer. If there are #S possibilities for S, and #T for T, then there are #S * #T cases, which can be a lot. More so if you have S<T,U>.


However, it seems straightforward to me to handle the parametrization hierarchically, so that in "S.T" the parameter type representation is extracted first, and used to drive a switch statement (on the various type possibilities for T) which calls a generic function that contains a switch statement on the possibilities for S. That reduces the total number of switch cases to #S + #T instead of #S * #T. A similar technique will deserialize "S.T.U" in #S + #T + #U total cases, instead of #S * #T * #U. That should reduce the coding work to a manageable level.


Finally, I'll point out that once you've solved all this, it's relatively trivial to integrate it into the NSCopying mechanism, simply by putting the serializations of Swift objects (presumably strings, dictionaries or NSData objects) into the archive instead of the objects themselves. In NSCopying, you always end up calling encodeObject…/decodeObject… per archive key — it's a manual process, so intervening to do the Swift serialization shouldn't be a problem.

Everything you wrote is true. I think most people in this thread are actually aware of most of these things, but they haven't been summed up like this yet (particularly the fact that Swift currently doesn't provide good facilities for runtime metatype creation).


You also raise an interesting issue with modules, I didn't realize that before.


What we're all looking for, however, is the best (possible) implementation.

Monicker's seems to be pretty close, any ideas how to make it better?

Yeah, that factory switch is ikky.


You could make things better if you actually did know, at compile time, what to expect back out of the archive. I mean, I have done serialization where I know "This serial data is going to contain a bunch of X".


If it's possible for your design to do this, then you can just create a protocol that expects to have an init that takes a dict (or even an NSData) and populates it's fields appropriately. This is basically re-creating NSCoding in Swift.


I don't think there is any way to get around the SerializableType due to the genericness. The struct wouldn't know what T is, so wouldn't know how to 'flatten' it, so T has to know how to flatten/inflate itself.

Great! I also agree with everything that QuinceyMorris wrote.


So, if all of us could just hold on for a moment now and think really carefully about what we expect from our "best (possible) implementation".


Can we please make really sure that it will not boil down to essentially the following (don't be upset, just read and think this through a couple of times, as we might easily think that our expectations will not boil down to this, even though they might actually do just that):


#1. We write our code, typechecking and compilation happens here.

#2. We get our executable.

#3. We change something in eg a JSON archive that the executable has been setup to unarchive from.

#4. We run our executable.

#5. We get frustrated about the fact that what happened in #3 can't be known by the compiler at #1.


Thus, If we are _really_ sure that what we are expecting from the best possible solution is in fact not the same as expecting the compiler to know stuff that is only knowable at runtime, then:


The solution to our problem will always be one that essentially does one of these three things:


1. Wraps everything that we like to be statically typed inside some dynamically typed (thus type erasing) construct.


2. Doesn't complicate things without any good reason and just uses dynamically typed constructs for tasks such as this (like eg Objective-C and javascript does everything (including tasks that would be better solved statically in eg Swift)).


3. Makes sure that the (deserializing) code has static type constructs for everything that any possible archive may possibly represent.


Alternative 3 is of course only practical for very limited/simple/specialized/inflexible forms of archives. Stuff that would probably not qualify to be called serializing/archiving. A simple example would be eg: You (and most importantly the compiler!) can know that the archive always describes a UInt32, thus your deserializing code can simply contain something like this:

let unarchivedUInt32 = UInt32(valueFromArchive: archive)

Very simple, but also not very flexible.


IMHO I think solutions that does number 1 might often just be strange and overcomplicated ways of doing number 2. But sometimes it might perhaps be motivated to do number 1.


Perhaps it's easy to miss that no matter if you do number 1 or 2 you will be type erasing, ie you will not get the "good parts" of static typing, typechecking etc "back" from any archive, simply because, again, that would be like expecting to get compile time errors for mistakes in your JSON archive.

Jens please just go away.


Everybody here understands that Alternative 3 is impossible.

We're just trying to find the best way to do Alternative 1.

That is what this entirely thread is meant to be about.

Alternative 2 is a non-answer: this question is explicitly about archiving structs. We all know how to archive classes.


You misunderstood the original question and you continue to pollute this thread with replies that

you think contribute something to answering the question, but really, they don't. They only

make this thread harder to follow for everybody who is interested in the

original question:What is the best approach to archive and unarchive generic structs?

Accepted Answer

Well, it's a bit voluminous, and getting the generic creation function to compile was a bit strange, but this is what I ended up with:


protocol SerializableType {
  static var serializableName: String { get }
  var serializableProperties: [String: Any] { get }
  init (serializedProperties: [String: Any])
}

extension Int: SerializableType {
  static var serializableName: String { return "Int" }
  var serializableProperties: [String: Any] {
  return ["value": self]
  }
  init (serializedProperties: [String: Any]) {
  self = serializedProperties ["value"] as! Int
  }
}

extension String: SerializableType {
  static var serializableName: String { return "String" }
  var serializableProperties: [String: Any] {
  return ["value": self]
  }
  init (serializedProperties: [String: Any]) {
  self = serializedProperties ["value"] as! String
  }
}

struct A: SerializableType {
  static var serializableName: String { return "A" }
  var serializableProperties: [String: Any] {
  return ["a1": a1, "a2": a2]
  }
  init (a1: Int, a2: String) {
  self.a1 = a1
  self.a2 = a2
  }
  init (serializedProperties: [String: Any]) {
  a1 = serializedProperties ["a1"] as! Int
  a2 = serializedProperties ["a2"] as! String
  }
  var a1: Int
  var a2: String
}

struct B<T: SerializableType>: SerializableType {
  static var serializableName: String { return "B<\(T.serializableName)>" }
  var serializableProperties: [String: Any] {
  return ["b1": b1, "b2": b2, "b3": b3 as Any]
  }
  init (b1: Int, b2: String, b3: T) {
  self.b1 = b1
  self.b2 = b2
  self.b3 = b3
  }
  init (serializedProperties: [String: Any]) {
  b1 = serializedProperties ["b1"] as! Int
  b2 = serializedProperties ["b2"] as! String
  b3 = serializedProperties ["b3"] as! T
  }
  var b1: Int
  var b2: String
  var b3: T
}

func serialize (instance: SerializableType) -> [String: Any] {
  return ["type": instance.dynamicType.serializableName, "values": instance.serializableProperties]
}

func deserialize (serialization: [String: Any]) -> SerializableType {

  let type = serialization ["type"] as! String
  let typeName: String = …
  let typeParameter: String? = …
  let values = serialization ["values"] as! [String: Any]

  if let typeParameter = typeParameter {
  switch typeParameter {

  case "A":
       return deserializedValueWithParameter (A.self, typeName: typeName, values: values)
  case "Int":
       return deserializedValueWithParameter (Int.self, typeName: typeName, values: values)
  case "String":
       return deserializedValueWithParameter (String.self, typeName: typeName, values: values)
  default:
       fatalError ()
  }
  }

  else {
  switch typeName {
  case "A":
       return A (serializedProperties: values)
  case "Int":
       return Int (serializedProperties: values)
  case "String":
       return String (serializedProperties: values)
  }
  }
}

func deserializedValueWithParameter<T: SerializableType> (parameterType: T.Type, typeName: String, values: [String: Any]) -> SerializableType {

  switch typeName {
  case "B":
       return B<T> (serializedProperties: values)
  default:
       fatalError ()
  }
}

let b = B (b1: 0, b2: "X", b3: 1)
let bb = serialize (b)
let bbb = deserialize (bb)


Note:

— The code for serializing and deserializing actually needs to be recursive, but for reasons of space I didn't write that code here.

— I've already written all of the non-generic implemention of this as real code in a project I'm working on. In the real code, known types like Int and String are special cased, rather than handled like the above.

— I didn't write the code to analyze the "B<A>" string into its components, so the last 'bbb = …' line crashes, but the rest of it works in a playground.

— I didn't write any error handling, I just threw in "as!" everywhere.

— I didn't really serialize anything, just converted to a plist-style dictionary that can be serialized easily into JSON, or a plist, or a NSKeyedArchiver archive, according to preference.

— But the number of cases for generics is additive, not multiplicative, so this approach looks a lot better if the number of serializable types is bigger.

— Sorry, the indentation got messed up when I pasted.

That would work if you could expect that, for example, the data contains Array<Array<Int>>, but if the data may also contain Array<Array<String>>, and you don't know whether to expect the former or the latter, NSCoding-like approach won't work - you need that ikky factory switch, or something like it.


Maybe that factory switch can be replaced with a lookup in dictionary of type [String -> (Dict -> SerializableStruct)] ? Maybe that dictionary can be global, so that each type that wants to gain encoding and decoding capabilities can add itself to that dictionary?


I think that might actually be the best possible thing at the moment. Coding and decoding code for type X would all be located nears X's other code, not scattered across the code base.


Something like this:


var constructorDict: [String: (NSDictionary -> SerializableStruct)] = [:]

func serializableFactory(dict: NSDictionary) -> SerializableStruct? {
     if let constructor = constructorDict[dict["typename"]] {
          return constructor(dict)
     }
     return nil
}


Then, near your A type definition, you'll simply add:

constructorDict["int"] = { A<Int>($0) }

What do you think?

And the answer to that is still: There is none in Swift.


Generic types in Swift are a pure compile time feature. Deserializing is a inherently dynamic functionality, so it naturally doesn't fit.

I give up. I'll go away, and let others explain to you why you "can't even say":


"archive and unarchive generic structs",


because it's like saying you want to "archive and unarchive sections of your source code, at runtime (in a compiled language), and have the typechecker help you etc.


If you really knew all the things I'm going on and on about, then you would not have written that over and over again, including other things you write in this thread.

Thanks ... : )

Serializing generic structs
 
 
Q