TabularData Framework: crash on opening CSV file

Hi, I am getting a crash report from a user, where they get an application crash when they open a CSV file on their device. I use the standard DataFrame(contentsOfCSVFile: fileURL, options: options) initializer to create a DataFrame, but that's where it's crashing, even though it's inside a try-catch block:

public func loadInitialCSVData(withURL fileURL: URL) throws -> DataFrame {

     let options = CSVReadingOptions(hasHeaderRow: true, delimiter: ",")
      do {
         let dataFrame = try DataFrame(contentsOfCSVFile: fileURL, options: options)

   } catch {
        // log error here - doesn't get here
   }

This is from the crash report:

Exception Type:  SIGTRAP
Exception Codes: TRAP_BRKPT at 0x21e02be38
Crashed Thread:  0

Thread 0 Crashed:
0   TabularData                          0x000000021e02be38 __swift_project_boxed_opaque_existential_1 + 9488
1   TabularData                          0x000000021e099d64 __swift_memcpy17_8 + 4612
2   TabularData                          0x000000021e099958 __swift_memcpy17_8 + 3576
3   TabularData                          0x000000021e09935c __swift_memcpy17_8 + 2044
4   Contacts Journal CRM                 0x000000010433f614 Contacts_Journal_CRM.CJCSVHeaderMapper.loadInitialCSVData(withURL: Foundation.URL) throws -> TabularData.DataFrame (CJCSVHeaderMapper.swift:26)
5   Contacts Journal CRM                 0x00000001043009d8 (extension in Contacts_Journal_CRM):__C.MacContactsViewController.handleSelectedCSVFileForURL(selectedURL: Foundation.URL) -> () (MacContactsViewControllerExtension.swift:28)
6   Contacts Journal CRM                 0x0000000104301e64 @objc (extension in Contacts_Journal_CRM):__C.MacContactsViewController.handleSelectedCSVFileForURL(selectedURL: Foundation.URL) -> () (<compiler-generated>:0)
7   Contacts Journal CRM                 0x0000000104222c94 __51-[MacContactsViewController importCSVFileSelected:]_block_invoke (MacContactsViewController.m:954)
8   AppKit                               0x00000001bbe8f294 -[NSSavePanel didEndPanelWithReturnCode:] + 84`

I can't diagnose the crash, because it doesn't have more information. I don't have access to the CSV file currently either, so I don't know what else I can do to prevent it.

What could possibly be causing this crash? Does it not matter that I am also trying to catch the errors it's throwing, or can the app crash because of some internal reasons with the framework?

Answered by DTS Engineer in 716516022

Yes, I was able to replicate the issue with a CSV file with duplicate column names

Please try this out on the macOS 13 beta that we just seeded. My understanding is that this problem is fixed there.

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

Please post the full crash report. See Posting a Crash Report for advice on how to do that.

even though it's inside a try-catch block

Right. That’s because it’s trapping, which results in a machine exception rather than a language exception or a Swift error. For an explanation of the difference, see What is an exception?

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

Hi, Adding the full crash report. Let me know if there's anything there that helps narrow down the issue.

Thanks for the crash report.

Consider the crashing thread backtrace:

Thread 0 Crashed:
0   TabularData          … __swift_project_boxed_opaque_existential_1 + 8668
1   TabularData          … __swift_memcpy17_8 + 4512
2   TabularData          … __swift_memcpy17_8 + 3396
3   TabularData          … __swift_memcpy17_8 + 2120
4   Contacts Journal CRM … Contacts_Journal_CRM.CJCSVHeaderMapper.loadInitialCSVData…

There’s clearly something wonky going on with the symbolication of frames 3 through 0. The symbols don’t make sense and, even if they did, the large offsets suggest that they’re not the right symbols anyway. That makes it come up with any theories as to what’s going wrong here.

I was hoping that a full crash report would let me do a better job of symbolication, using tools I have access to here at Apple, but that didn’t pan out. Which is weird.

I suspect that this crash report was generated by a third-party crash reporter. If so, can you get an Apple crash report? Ideally that’d involve:

  1. Creating a build of your app that has the third-party crash reporter disabled.

  2. Sending that to your customer.

  3. Having them reproduce the problem.

  4. And sending you the resulting crash report.

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

I found a crash report for the same issue through Xcode's Organizer. Would that help? It seems to have more detailed symbols for the TabularData framework. Attaching it here.

That definitely helps. Earlier I wrote:

There’s clearly something wonky going on with the symbolication of frames 3 through 0 …

I was hoping that a full crash report would let me do a better job of symbolication, using tools I have access to here at Apple, but that didn’t pan out.

With this crash report I was able to get better symbols for those frames:

0 TabularData          … DataFrame.append(column:) + 1712 …
1 TabularData          … DataFrame.append(column:) + 356 …
2 TabularData          … static DataFrame.loadCSV(parser:rows:schema:options:) + 700 …
3 TabularData          … DataFrame.init(csvData:columns:rows:types:options:) + 1068 …
4 TabularData          … DataFrame.init(contentsOfCSVFile:columns:rows:types:options:) + 648 …
5 Contacts Journal CRM … CJCSVHeaderMapper.loadInitialCSVData(withURL:) + 1680 …

Remember that we’re hunting for a trap in frame 0, and looking at the code I see two potential causes of that, each with their own message:

  • Can't insert column COL_NAME. Names must be unique within the DataFrame.

  • Can't insert column COL_NAME. The column should have the same number of elements as other columns in the DataFrame.

Of these, the first seems most likely to come up during CSV parsing. Try this:

  1. Manually create a CSV file with duplicate column names.

  2. Run it through your app.

  3. Look at the resulting crash report. Does it match the one coming from your user?

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

Thanks! Yes, I was able to replicate the issue with a CSV file with duplicate column names, as well as with a CSV file with some extra empty columns.

I guess the next question is how to best handle this situation, since this happen when I just load the CSV file into the TabularData framework and it just crashes the app. I'm not sure how to 'prepare' the file before loading it, to check for duplicate columns or extra columns etc. Because that would, y'know, require a CSV parsing framework like TabularData! I would actually assume that theTabularData framework would be able to handle these situations, and return an error instead of crashing completely. Is there any other error handling I can do to avoid the crash?

First, consider submitting a bug report and then post the Feedback ID here. But unfortunately I think the biggest issue may be this line in the documentation:

Import, organize, and prepare a table of data to train a machine learning model.

That sounds like the whole tabular data API is intended as a developer tool rather than a general-purpose CSV parser for production code. So it’s not totally unreasonable if it assumes well-formed data comes from upstream in your workflow and throws a fatal error if this precondition doesn’t hold. Look in your debug output for helpful troubleshooting messages like the ones @eskimo quoted above.

Since you need bulletproof parsing and error handling for untrusted user-supplied data, this API probably isn’t the best tool for the job.

I created a bug report: FB10035567

Even for a developer tool, I wouldn't expect it to crash an app just because it detected a duplicate column name. The framework already throws all sorts of parsing errors, for e.g. if you specify a 'date' column and it can't parse the input in the cell, it'll throw a failedToParse error. You can see CSVReadingError for more details.

I would actually assume that the TabularData framework would be able to handle these situations, and return an error instead of crashing completely.

Agreed.

I created a bug report: FB10035567

Thank you.

Is there any other error handling I can do to avoid the crash?

I can’t think of anything off the top of my head, but I’m not a really an expert is this framework.

At this point my recommendation is that you… … … wait until after WWDC.

Normally I’d suggest opening a DTS tech support incident so that I, or one of my colleagues, can take a proper look at this. However, opening a TSI right now is pointless because DTS will be too busy to respond until after WWDC. And who knows what developments WWDC will bring, so my advice is that you wait a week (-:

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

Accepted Answer

Yes, I was able to replicate the issue with a CSV file with duplicate column names

Please try this out on the macOS 13 beta that we just seeded. My understanding is that this problem is fixed there.

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

TabularData Framework: crash on opening CSV file
 
 
Q