What would be the best way to handle CSV files in Swift

Hi,


I need some advice, I have a big CSV file (about 3MB, 50,000 entries) that I will be using in my app to retrieve some information but I'm not sure what would be the best way to do it. What I need to do is be able to search the CSV file using Swift.


1. What is the most common method to handle CSV files in Swift/iOS?


2. Is it possible to add the file directly to XCode and make searches with Swift?


3. Is 3MB considered too big of a file to be directly in the app?


Any advice will be greatly appreciated. Thanks.

Replies

1:

Seen this 2016 thread?

https://forums.developer.apple.com/message/200204#200204


2:

Not sure that's what you mean/want, so...no comment


3:

3 mb/50k records, in my opinion, is trivial, by a large margin, which might be better defined if you mentioned the platform(s) you intend to target

1. There must be several hundred CSV parsers, if not more. Since Swift is fairly new, there are probably only a few hundred.


However, I strongly urge caution with regard to CSV files. People often use them thinking they are simple and/or easy to parse. They are ubquitous, I'll give you that. But they can become extremely complex and difficult to parse. There are many different variants of CSV. There are an infinite numbers of ways that a CSV can become corrupt. It is unsual for CSV parsers to check for or warn about corrupt data. And in the real world, CSVs are virtually always corrupt.


2. Sure, if you are some kind of masochist. If this is something you are embedding within your app, convert the data to something sane, verify it, and include it in your app in some easy-to-load format.


3. 3 MB is tiny. So is 50k records. Store that in a plist file. If you want it to load fast without parsing, load it into a SQLite file. If you do this, pay close attention to #1 above.

If I understand you correctly, what you are suggesting is to convert the data into something more manageable, maybe JSON?


The only reason I wanted to stick with CSVs it's because I'm getting the data from a third party and they may update it without them telling me exactly what changed so, my plan was to replace the CSV file with the new one whenever an update is required, without me having to go line by line to see what changed. I hope it make sense.


I think I will create some sort of script that can go and do the conversion, this way when an update is required I just run the script.


Thanks a lot for the good advice.

JSON? Wrong way Corrigan.


If you are getting this from a third party, you have no other option but to go line by line and see what has changed. It's a CSV. You can expect it to be corrupt. You will need to handle that. It isn't all that difficult. Write a script to import the CSV. In your script, you can automate the data integrity checks. That will tell you if you are missing data. But you must do this yourself. You can't rely on some 3rd party tool. Such tools often just throw out invalid data. I assume that isn't what you want.


I can guarantee the data isn't originally in CSV. (Well, that's a lie. I can't guarantee that. I've seen some monumentally stupid things.) Let's say the data probably isn't originally in CSV. That means two things. You might be able to get the data in a more stable, more easily parsed format. Ask. It also means that the source data could have content that isn't CSV-friendly. Funky Unicode perhaps? How well would you handle Arabic in that CSV?


But if this is the best there this, then you'll have to made do. Write a parser and validator. Emit content that your app can easily read. You really don't want something like JSON. That just has to be re-parsed again. SQLite is super easy. When you load it in your app, your parse time is 0 ms.


I know what you are thinking. Stop right there. You don't want to use Core Data. Just stuff the data in a dead-simple SQLite table. Add some indexes. You can extract the data with very little effort.

The only reason I wanted to stick with CSVs it's because I'm getting the data from a third party …

In that case you really need to be careful about how this third party has structured their CSV data. While there is a standard for CSV (RFC 4180), it’s something more honoured in the breach than in the observance. If you’re lucky the folks generating the CSV are following the standard and then you can find a parser that also follows the standard and you’re golden. My experience, however, is that you’re unlikely to be that lucky.

You have a couple of options here:

  • You can talk to the folks who generated the CSV and ask them to be specific about the format they’re generating.

  • You can look at the CSV data to see how it represents all the various edge cases.

A better option might be to talk to the who generated the CSV and ask them to generate the data in a better-defined format.

Share and Enjoy

Quinn “The Eskimo!”
Apple Developer Relations, Developer Technical Support, Core OS/Hardware

let myEmail = "eskimo" + "1" + "@apple.com"

Thank you for the good advice, I will ask how they are generating the data. Thanks a lot.

Thank you for following along. Hmm, I thought JSON was a good choice. I was actually able to convert the data to JSON using https://www.csvjson.com/csv2json without any errors.


Now that you mention SQLite, maybe my other option would be to use Realm Swift which I'm already using in my app, I didn't think it was easy to convert CSV files to Realm that's why I wasn't considering it, but it looks like there is an imported, maybe this is my best option.


Would you mind elaborating a little bit more on why you think JSON wouldn't be a good choice?

Primarily because you still have to parse it every time you load the data. SQLite is essentially a native format. JSON has its uses and is quickly replacing CSV in terms of ubiquity, but not for any inherently valid reasons.


The fundamental problem with any kind of conversion is that you still have to validate the data. At a minimum, you would have to count how many rows are in your CSV file and then verify that the JSON file has the same number of rows. As I mentioned before, most conversion utilities aren't going to make any effort to correct invalid data. At best, they will just drop invalid rows until they find something good. If you aren't that lucky, they will attempt to import the invalid data.


Even if you use SQLite, you just can't dump it off to some script You MUST write a validating importer. You WILL be burned.