How to capture all byte data of a class instance

Ahoy everyone,

I am currently in the process of writing a custom undo system, as I will be handling large geometric data, and want to make sure the undo data is very efficient.

The plan is to get all the byte data, and then perform an XOR and store the result in an optomised way.

Currently I can retieve the class objects byte data using the Data() data type, but this does not encapsulate all the data that makes up the object. you end up with pointers that point to data which is not getting collected.

Question A: Is there an existing API call or data type that allows me to pass in a class object and returns all the byte data ?

Question B: If question A is no, could you recommend/suggest a way to achieve this?

Thank you for your time and help :)

  • The following is my Playground I was using for testing purposes, just incase I have made a huge mistake with my use of the Data type.

import Foundation
import simd

class Node {
  public var position : simd_float3 = [0, 0, 0]
  public var name : String = "Node"
  public var vertices : Array<simd_float3> = []
}

/// Default instance
var newClassNode = Node()
/// Modified instances
var moved_newClassNode = Node()

moved_newClassNode.name = "Updated_Node_Position"
moved_newClassNode.position = [1, 1, 1]
moved_newClassNode.vertices = [ [0, 0, 0], [1, 1, 1], [2, 2, 2], [3, 3, 3], [4, 4, 4], 
                [5, 5, 5], [6, 6, 6], [7, 7, 7], [8, 8, 8], [9, 9, 9], ]

/// Size of the MemoryLayout class
let mem_size = MemoryLayout<Node>.size

var data_test = Data(bytes: &newClassNode, count: mem_size)
var data_test2 = Data(bytes: &moved_newClassNode, count: mem_size)

let d_ints = data_test.map { UInt8($0) }
let d2_ints = data_test2.map { UInt8($0) }

/// The XOR byte data.
var xor_bytes : Array<UInt8> = []

/// The byte data that has had the xor bytes applied.
var applied_xor : Array<UInt8> = []

// XOR comparison

/// The following is performing the byte XOR comparison for each byte found in the Data object.
for i in 0..<mem_size {
  xor_bytes.append(d_ints[i] ^ d2_ints[i])
}

// Applies the XOR bytes, and stores the result
for i in 0..<mem_size {
  applied_xor.append(d_ints[i] ^ xor_bytes[i])
}

/// Assigns the byte data to a Data variable
var xor_data = Data(applied_xor)

/// Result of the applied xor data cast back into the original object type.
let appliedNode = xor_data.withUnsafeBytes { $0.load(as: Node.self) }

// Updating a variable after the XOR should not appear in the final result, unless its using pointers and not fully capturing all the bytedata
moved_newClassNode.vertices[1] = [101,101,101]

/// == LOGGING ==

memory_debug(data_test)
memory_debug(data_test2)

print("\n-- XOR Numeric --")
for i in 0..<mem_size {
  print("\(d_ints[i]) ^ \(d2_ints[i]) = \(xor_bytes[i])")
}

print("\n-- BINARY representation --")
for i in 0..<mem_size {
  print("\( pad(string:String(d_ints[i], radix:2), toSize:8) ) ^ \( pad(string:String(d2_ints[i], radix:2), toSize:8 )) = \( pad(string:String(xor_bytes[i], radix:2), toSize:8))")
}

print("\n-- xor applied --")
for i in 0..<mem_size {
  print("\(d_ints[i]) ^ \(xor_bytes[i]) = \(applied_xor[i])")
}

print("\n -- Converted applied xor byte data to object --")
print(appliedNode.name)
print(appliedNode.position)
print(appliedNode.vertices)


/* ============
 *  Functions
 * ============
 */

func memory_debug(_ input_data:Data) {
  print("\n-- Data_Test Debug --")
  print(input_data)
  print(input_data as NSData)
  print("count : \(input_data.count)")
  let hex_string = input_data.map { String(format: "%02x", $0) }.joined() 
  print(hex_string) // hex value
   
  //print(String(Int(hex_string, radix:16)!, radix:2)) // hex to binary value
   
  print(input_data.map { String($0, radix:2) } )
}

func pad(string : String, toSize: Int) -> String {
  var padded = string
  for _ in 0..<(toSize - string.count) {
    padded = "0" + padded
  }
  return padded
}

Question A: Is there an existing API call or data type that allows me to pass in a class object and returns all the byte data ?

I don’t actually know, but consider a case where an object does reference other objects. Does “all the byte data” mean you want to capture the entire object graph? That’s not necessarily what you really want.

Question B: If question A is no, could you recommend/suggest a way to achieve this?

Sounds like you just need a serialization scheme to convert an object to a byte stream. Grabbing the underlying bytes via Data(bytes:count:) is one way of course, but you could also implement a custom scheme to explicitly serialize all the object properties you need.

Then I’m guessing that however you generate a byte stream representing an object, the fun part is that you are using XOR to diff successive entries in your undo stack and storing these diffs efficiently, probably taking advantage of the diffs containing a lot of 0x00 bytes. Is that the idea? If so, then any serialization scheme that produces the desired behavior should work.

Thanks @Scott

Q.A: You are correct, I am wanting to capture the entire object graph, so I can diff its entirety. A fun byproduct of this is I can also use this to record the size of the object in memory for the user feedback, but thats just a nice bonus.

Q.B: I have given Data a try, but Data seems to not 'collect' all the pointer data, so if an array or complex data type exists then it does not get added. After some discussion it looks like I may end up going the way you mentioned using Codable and then xoring the byte data.

Im looking into some existing Codable libraries that just do Byte encoding and do not serialise any keys, which will work in my case as it it very transient data.

Grabbing the contents of memory like this isn't a viable approach:

  • As you've already noticed, not all of the object's information is stored inside the contiguous block of raw memory allocated for the instance. Yes, anything represented by a pointer in the allocated instance can't be copied by copying the instance data, but it's worse than that, because there's no single (or predictable) mechanism for representing such stored-in-another-place data. It might not be a pointer, it might be an index to another data structure, and so on without limit.

  • Copying raw data isn't safe, because it isn't meaningful to copy it back again to re-create the instance.

The only practical approach is to require that objects participating in your undo scheme implement pre-arranged behavior (i.e. in Swift terms, conform to a common protocol such as Codable) by which the objects can serialize themselves into data.

I am currently in the process of writing a custom undo system

This is you, right?

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

Yip that is me, I posted in both locations incase there was some more Apple specific API that could achieve what I was asking, that exists outside of the main Swift API

How to capture all byte data of a class instance
 
 
Q