Core Data with CloudKit doesn't seem to document how changes get merged

Core Data with CloudKit doesn't seem to document how changes get merged. (I filed a bug, but maybe we can collectively scratch out some details here?)


As far as I can tell, Core Data with CloudKit uses a "last writer wins" approach at the attribute level, and a "last writer wins + merge" approach at the relationship level.


Here's an experiment:


1. Create a Core Data with CloudKit app

2. Create an entity with two text attributes: text1, text2

3. Create an instance of that entity, set some values in text1 and text2, and sync it across two devices

4. Disconnect the two devices from the network

5. On device 1, set text1 to "aaa" and leave text2 as is. Save the changes.

6. On device 2, set text2 to "bbb" and leave text1 as is. Save the changes.

7. Reconnect both devices to the network and allow them to sync.

8. I believe the final result will be that the entity has "bbb" in text2 and the original value in text1, because device 2 saved its changes after device 1.


The thing to note is that the two changes weren't merged even though they changed different attributes. (As far as I can tell the Core Data merge policy is irrelevant when it comes to CloudKit merges.)


I'm not making any comment on whether this behaviour is good or bad, right or wrong; I'm just saying that it should be documented.


(I think the picture for relationships is a little more complex. I think to-one relationships are "last writer wins" while to-many relationships are "merge"—but there are some important experiments I haven't conducted yet.)

Replies

Please do post the results of your further experiements, if you can. The only place I've seen merge policy mentioned is in the Using Core Data with CloudKit WWDC talk "Conflict resolution isimplemented automatically byNSPersistentCloudKitContainerusing a last writer wins mergepolicy."


You're right, it should be mentioned in the documentation.

Two additional experiments.


1. "last writer wins" with to-one relationships


1. Create a Core Data with CloudKit app

2. Create two entities, A and B, with a to-one relationship from A to B

3. Create two instances of B, b1 and b2

4. Create an instance of A, a1, with no to-one relationship

5. Sync across two devices

6. Disconnect the two devices from the network

7. On device 1, set the to-one relationshp to b1. Save the changes.

8. On device 2, set the to-one relationship to b2. Save the changes.

9. Reconnect both devices to the network and allow them to sync.

10. I believe the final result will be that a1's to-one relationship is b2, because device 2 saved its changes after device 1.


2. "merge" with to-many relationships


1. Create a Core Data with CloudKit app

2. Create two entities, A and B, with a to-many relationship from A to B

3. Create two instances of B, b1 and b2

4. Create an instance of A, a1, with no to-many relationship

5. Sync across two devices

6. Disconnect the two devices from the network

7. On device 1, add b1 to a1's to-many relationshp. Save the changes.

8. On device 2, add b2 to a1's to-many relationship. Save the changes.

9. Reconnect both devices to the network and allow them to sync.

10. I believe the final result will be that a1's to-many relationship contains both b1 and b2.


===


Neither of the above is surprising; I'm just documenting what I'm finding.


HOWEVER...


I now have a case where device 1 and device 2 no longer have the same contents for a to-many relationship. Since the entire mechanism is a black box I don't know precisely how that happened; I just know that, given the model, it takes some effort to get them back in sync.


Using example 2 as a starting point, with both devices sync'd, I have this:


1. device 1's instance of a1 has no to-many relationships

2. device 2's instance of a1 has b2


Oops. Then:


1. On device 2, add b1 to a1 and save. (a1 now has b1 and b2)

2. On device 1, after sync, a1 now has b1, but it still doesn't have b2.


Oops. Then:


1. On device 2, remove b2 from a1 and save. (a1 now has b1)

2. On device 2, add b2 to a1 and save. (a1 now has b1 and b2)

3. On device 1, after sync, a1 finally has both b1 and b2.

Post not yet marked as solved Up vote reply of mul Down vote reply of mul

One last experiment.


3. "last writer wins" with to-one relationships *and* attributes


1. Create a Core Data with CloudKit app

2. Create two entities, A and B, with a to-one relationship from A to B, and a text attribute in A

3. Create two instances of B, b1 and b2

4. Create an instance of A, a1, with a to-one relationship to b1 and a value of "x" in the text attribute

5. Sync across two devices

6. Disconnect the two devices from the network

7. On device 1, set the text attribute to "y". Save the changes.

8. On device 2, set the to-one relationship to b2. Save the changes.

9. Reconnect both devices to the network and allow them to sync.

10. I believe the final result will be that a1's text attribute is still "x" and to-one relationship is b2, because device 2 saved its changes after device 1. Device 1's change was "lost".


===


This qualifies as both surprising and not-surprising: to-one relationships act entirely like attributes. (Whether or not it's surprsing depends, I think, on your initial mental model. If you're thinking in terms of "attributes" vs "relationships" you might find it surprising. If you're thinking in terms of "single-valued" vs. "multi-valued" you might not.)


===


I'm going to stop here because I've learned what I needed to know for my own purposes, but there's at least one additional experiment that would be very valuable to try: how do changes to an inverse relationship interact with changes to an attribute?


1. Disconnect the two devices from the network

2. On device 1, set a1's text attribute. Save the changes.

3. On device 2, set the *inverse* of a1's to-one relationship. Save the changes.

4. Reconnect both devices to the network and allow them to sync.


Is either change lost? Or are both kept?

Post not yet marked as solved Up vote reply of mul Down vote reply of mul

Thanks very much for the links!


It's theoretically possible for "last writer wins" to operate at the attribute level, entity level, or change/delta/event level. The example in the video only talked about concurrent changes to the same attribute, so on its own it wasn't enough to answer my question.


The choice Apple made, "entity level", I think matches what most people would probably expect by default, with one possibly surprising exception: to-one relationships seem to act like attributes rather than to-many relationships. (Whether or not that's surprising depends, I think, on your default mental model. If you're thinking in terms of "attributes" vs. "relationships", as I was, you might be surprised. If you're thiking in terms of, say, columns and rows in database tables, you might not be.)


I'm trying to figure out whether or not I can reason accurately about concurrent changes in such a model. My initial take is that, for the app I'm currently working on, I can't.


Domain Driven Design defines the term "aggregate" to mean something like "a set of entities and values that must be kept together in order to maintain system invariants." In Apple's implementation an aggregate would have to be implementable as a single row in a table. If that works for your use case, you're golden. If it doesn't, then you'd have to be willing and able to consider far more "run time states" than you might otherwise expect. That's where the "reason accurately" comes into play: you'd have to be able to anticipate all of those additional states that you didn't think a single user on a single device could introduce, given your UI—for example, a "mismatch" between two one-to-one entities, or a "mismatch" between a master and multiple detail entities. (Inverse relationships might make the reasoning more difficult still, but I haven't experimented with them yet, and don't currently plan to.)


Pondering all of this...


PS I make no claim that my experimental results are correct. :-) In my current setup the experiments are a bit painful to run. It would be great if someone else was able to confirm my results.

Post not yet marked as solved Up vote reply of mul Down vote reply of mul

One last comment...


All of my experiments revolve around concurrent changes on disconnected devices. For connected devices most of this won't matter, at least not most of the time.


My goal is to understand how things might go sideways—and how sideways they might go. How likely are users to feel/think that they "lost" data when they didn't expect to? How likely am I, as the developer, to understand the actual invariants of the system rather than the invariants I expected?

Post not yet marked as solved Up vote reply of mul Down vote reply of mul