NSPersistentCloudKitContainer: How to know initial sync between CloudKit has happened?

I am looking into using the new NSPersistentCloudKitContainer to sync some internal data for my app. Generally it works great, but I have one scenario that I cannot figure out.


Here the basic scenario, there is a single record in CloudKit that created and used to keep track of some internal state. Keeping it the cloud allows for two cases: when the user open the app on another device (say an iPad) and when the user re-installs the app on the same device.


If its indeed the very first time the user is using the app on any device, I need to create the record with a bunch of default initial values. I am okay with spending a little time at app startup determining if its a brand new user or an existing user installing on another device.


I thought maybe observing NSPersistentStoreRemoteChangeNotification would provide a useful point in time that I can say "ah ha, now I know its a new user, or an existing user". However its gets called a lot and I cannot find anything in its userInfo that gives me a definitive answer to my question.


Any suggestions?

joey

Replies

This is a fallacy.


In an eventually consistent distributed system you can never "know" that you have existing data or devices in the cloud. Your application will simply "find out at some point" that this data exists and needs to be designed to handle that.

I have a similar situation where I need to identify and record the first device to run the app, and more. I need to seed the cloud store with a preset structure that users build upon with their own content. So I can't have race conditions up to the cloud with regard to identifying if this seeding has begun, has finished, or hasn't occured yet. Even if two concurrent device-local first-runs both query the cloud within milliseconds, they'll both discover that they seem to be the first-run device because there is no historical entry for a first-run in the cloud store. If they then decide to register at the same time, then obivously you'll end up with conflicts and data being overwitten, and both devices thinking they're the first-run device, so, in my situation, I'll end up seeding the cloud store twice. iCloud Key-Value store won't help you because you can easily have a similiar race condition, between querying, and overwriting.


I think you will have to look outside Core Data for CloudKit to solve this issue.


I thought Firebase might be useful because they have security rules that prevent updates etc. But then you'll definitely hit problems when the user uninstalls for all devices and Firebase still thinks those devices are running. Even worse if the user sells all devices, and later buys new devices, and they won't be able to re-populate the cloud store because Firebase doesn't know about app uninstalls and still thinks a data store for the user exists in iCloud, and a first-run has already occurred, and doesn't need to occur again.


I think if you use CloudKit directly, you will be able to solve this. Whether you have a Flags record type or a FirstRun record type is besides the point, but this is what I'm going to do:


When the app starts up: don't load your core data for cloudkit store!

- In a record zone in the user's private database, use CloudKit to query the FirstRun record type, ordering the results by created date ascending.

- If a FirstRun record exists, and the device ID stored in the record doesn't match yours, then you're not the first run device. I'll have a status field on the record, so I can tell if the first-run proceedure being performed by the other device (on the Core Data for CloudKit container) has begun, is busy performing a substep, or has completed sucessfully. Based, on that, I'll know whether I need to prevent the user from making changes to their database until the status is completed, and my background process to check the seed data has downloaded sucessfully into the waiting app instance Core Data store.

- On the other hand, if no FirstRun records exist, write a record with your device ID. Wait for it to save to the cloud, ensuring the correct HTTP result type etc.

- Immediately re-perform the original query, fetching FirstRun records for this user, ordering by created date ascending.

- If your device ID happens to be in the first record returned, then congratulations, you are indeed the first-run device to register with the cloud store, and you can do whatever processing you need to do for a first run. Load your persistent container linked to your CloudKit Core Data store. This is where I'll set the first-run record status to Started via the CloudKit API.

- If your device ID isn't first in the list, the other device pipped you to the post and wrote a record just before you. Inspect the Status field to see the progress of the other device. I'm assuming subscriptions will work for keeping track of the update status of the FirstRun record.


This works well if you want all traces of first runs to disappear when the user uninstalls your app from all devices, because the data in the CloudKit-accessed record zone will also disppear. So even if they sell all devices and begin again with new devices, it'll appear as a virgin user. If you have data linked to an external service, like Firebase, then you might have issues, but then you'll probably always have issues keeping them in sync, on delete.

In an eventually consistent distributed system you can never "know" that you have existing data or devices in the cloud. Your application will simply "find out at some point" that this data exists and needs to be designed to handle that.

You statement is from the perspective of a computer scientist, not from that of a user. And its all about the user, all the time in the real world. I've had many complaints from users that my app seems to have lost data - yes the user is being impatient and not waiting an infinite amount of time for the initial sync. But it should would be nice if we could trigger an initial sync or at least know that its happening so we can inform the user that "be patient, data is on its way to your device".

I agree, we need to be able to check If there are changes in the cloud so we can reflect this to the user.