CKSyncEngine - Are the "change batches" we send to iCloud guaranteed to be identical to the batches iCloud sends back to other devices?

Hey,

I'm currently working on adding CloudKit support to the GRDB SQLite database in my app. CKSyncEngine, though still a bit tricky to wrap your head around, is amazing!

I have most of the basic setup implemented and some very easy cases already work, which is really exciting!

However, I do have some questions regarding data consistency across devices. I'm not sure though, that these questions are actually "correct" to ask. Maybe my entire approach is inherently flawed.

Say we add two records to the pending changes of the sync engine:

// I'm simplifying CKRecord.ID to be a String here
syncEngine.state.add(pendingRecordZoneChanges: [.saveRecord("1"), .saveRecord("2")]

Let's also say that both records are tightly connected. For example, they could be in a one-to-one relationship and must always be added to the database together because the app relies on the existence of either none or both.

After that call, at some later point, the system will call the sync engine's delegate nextRecordZoneChangeBatch(_:syncEngine:) for us to batch the changes together to send to iCloud.


First question: Can we guarantee that records "1" and "2" always land in the exact same batch, and are never separated?

Looking at the example code, there are two line that worry me a bit:

// (Sample project: `SyncedDatabase.swift, lines 132-133`)
let scope = context.options.scope
let changes = syncEngine.state.pendingRecordZoneChanges.filter { scope.contains($0) }

The scope could lead to one of the two records being filtered out. However, my assumption is that the scope will always be .all when the system calls it from an automatically managed schedule, and only differs when you manually specify a different value through calling syncEngine.sendChanges(_:). Is that correct?


Now back to the example. Say we successfully batched records "1" and "2" together and the changes have been sent to iCloud. Awesome!

What happens next? Other connected devices will, at some point, fetch those changes and apply them to their respective local databases.


Second question: Can we guarantee that iCloud sends the exact same batches from earlier to other devices and does not create new batches from all the changes?

I'm worried that iCloud might take all stored changes and "re-batch" them for whatever reason (efficiency, for example). Because that could cause records "1" and "2" to be separated into different batches. A second device could end up receiving "1" and, for at least some period of time, will not receive "2", which would be an "illegal" state of the data.


I'd appreciate any help or clarification on this :)

Thanks a lot!

Replies

I'm afraid it's impossible to guarantee database integrity (I mean foreign keys and other relational invariants) with CloudKit and CKRecord.

I recommend reading this article: https://useyourloaf.com/blog/wwdc22-core-data-lab-notes/ It is about Core Data, but many points apply to GRDB just as well.

In particular, it contains this interesting sentence:

Cloud-enabled stores have a number of restrictions. Relationships are optional. Usually we recommend having another store that is not CloudKit enabled. Copy the data from the CloudKit store to the local store where you can perform validation. Serve clients from the local store where you know the graph is valid.

Well, that's a bummer 😅

Having a second local database and figuring out when and what to copy to the "real" database seems quite complicated 🤔. I'd have to track all incoming changes from CloudKit, map them to their respective tables and then attempt to copy the inserted data (in the correct order) whenever a batch has been handled. If it fails, I try again with the next batch. If it succeeds, I delete the data from the "CloudKit" local database.

But that doesn't even cover outgoing changes, which I would have to handle as well.

I'm not sure I can do this without missing a gazillion edge cases 😅.

But thanks for your answer :) This at least helps me a bit with the decision whether or not I should continue working on iCloud sync. I'm gonna think about this a bit more. Maybe it's possible to do it. I feel like I'm really close, but the lack of data integrity with CloudKit might just be too much.

  • Update: I've come up with a possible solution using such a "buffer" database for incoming changes. Some first, simple tests are promising. I was able to add/modify/delete data on one device and it synced to another device, keeping all relationships and such in tact, even though the order of fetched batches are not always "correct". I might still hit a road block with this, but maybe I'm getting somewhere.

  • Once I got it working properly, I will write down what I've come up with and what I've implemented and publish it somewhere. Maybe it will help someone :)

Add a Comment