CloudKit with huge CoreData

Hello


I have a generic question regarding best practices for my specific case.


I'm writing a biking application which is communicating with multiple sensors (speed, cadence, hr) + using the phone's gps. I'm saving every data entry into CoreData.


For example I have a workout object, with an uuid. Then when a sensor reading arrives, I'm creating a new entry for that reading, and saving the workoutID as a foreign key. So no relationships managed by CoreData.


Now I want to add CloudKit support to my app. My main concern is the amount of entries. Like I have a workout (a real one 🚴 🙂 ) where I biked for 12 hours. For that workout, I have 113k sensor reading entry, and around 90k GPS. In total just for that workout it is more than 200k entry. For CoreData it is working fine (reading time for this whole banch of data is around 4 sec), but I'm worrying about not to overwhelm CloudKit by creating entries. In theory it could work to serialise all the entries into a JSON with Codable, and save them under an attribute in the Workout object (ok, memory usage will be high I know), but I don't feel this as a nice solution.


So my question is how to deal with situations like this? Also is there a detailed description how to deal with CloudKit with CoreData? I mean for example the local cache is a bit fuzzy for me. Like should I store the data in the core data after the systemFields decode?

Accepted Reply

And, if I missunderstood your question - of course you would not upload the data after each gps event! Accumulate the data, perhaps do some analysis/averaging of the data and then upload it in large chunks. Perhaps upload each exercise session once after the session is completed.

Replies

I think the amount of data you are talking about is trivial, sorry. If I missunderstand then you can always use a "CKAsset".

I never seen an app which has several million entries for 1 single user. For 1000 user if they are biking in the same time with just 2 sensors, it would mean 2000 requests /sec to the server. As far as I understand the app has around 40 limit. But even if I send batch updates the rate would be huge. Thats why I’m looking for a solution

You indicated that you had "more than 200k entry". At that limit, size is irrelevant. And in any event, size is not the only criteria here. It's a database. You have to create the structure you want. If all you want is to store the information then put it into an NSData object and upload it. If it is greater than a few Megabytes then put it into a CKAsset. User's have reported exceeding these limits with no problems:


https://developer.apple.com/library/archive/documentation/DataManagement/Conceptual/CloudKitWebServicesReference/PropertyMetrics.html

And, if I missunderstood your question - of course you would not upload the data after each gps event! Accumulate the data, perhaps do some analysis/averaging of the data and then upload it in large chunks. Perhaps upload each exercise session once after the session is completed.

Take a look at 'seam3' on github. it does background, bi-directional sync between coredata and cloudkit. Don't know how it would work with that volume of records but worth a look. and the source code is all there.


question regarding the app itself... is it possible to filter down / compress the telemetry? is the full-grain detail of the raw data necessary for analysis purposes? For the stated example, there seem to be 4.5 samples/second, between gps and the other sensors. Is that grain necessary for the app? also, I like the idea of packaging up a single workout's data into a blob. If you use seam, then in your coredata model, use the binary data with "allows external storage" option set. It will then create a CKAsset when it syncs with cloudkit. I've done this with large image files and it works OK. all sync happens in the background, so it doesn't cause delays in the UI.


it currently uses the private d.b. in cloud, so that would segregate each user's data. Likely you have some reference data that would be shared between users via the public database, and I am looking into how seam would be used with the public database.


last... I think that it syncs databases in entirety, so that would mean that a user would have all of their data in local coredata, which might be a problem. looking into how a subset of data could reside locally, with everything in the cloud.