TabularData: DataFrame - parsing JSON array that is not at the root level

Question

Created Jul ’21

Replies 1

Boosts 0

Views 1k

Participants 1

DataFrame(contentsOfJSONFile:url) (and it's MLDataTable equivalent) assumes that the rows to be imported are at the root level of the JSON structure. However, I'm downloading a fairly large dataset that has "header" information, such as date created and validity period, with the targeted array at a subsidiary level. The DataFrame initialiser has JSONReadingOptions, but these don't apply to this situation (i.e. there's nothing that helps).

It seems that I'm faced with the options of 1) stripping the extraneous data from the JSON file, to leave just the array or 2) decoding the JSON file into a bespoke struct then converting its array into a DataFrame - which removes a lot of the benefits of using DataFrame in the first place.

Any thoughts?

Cheers, Michaela

Boost

Answer 1

AncientCoder OP

Jul ’21

I decided to go with stripping the extraneous data, so created a String extension:

extension String {

func midString(from: String, to: String) -> String {

guard let firstChar = firstIndex(of: "[") else { return "" }

guard let lastChar = lastIndex(of: "]") else { return ""}

return String(self[firstChar...lastChar])

}

get the JSON array with:

let newJSON = jsonStr?.midString(from: "[", to: "]")

This function is useful whenever there are header and footer surrounding a delimited array e.g. delimiters "{" and "}"

NOTE: I use lastIndex(of: "]" ) in the hope that Swift is smart enough to scan from the end backwards towards the start of the array.

Cheers, Michaela

0