CreateML: MLDataTable & DataFrame differences

Question

Created Jul ’21

Replies 1

Boosts 0

Views 1.7k

Participants 3

It seems that a DataFrame (TabularData framework) can be used in CreateML, instead of an MLDataTable - which makes sense, given the description of the TabularData API. However, there are differences.

One is that when using a DataFrame, the randomSplit method creates a tuple of DataFrame slices, which cannot then be used in MLLinearRegressor without first converting back to DataFrame (i.e. initialising a new DataFrame with the required slice). Using an MLDataTable as the source data, the output from randomSplit can be used directly in MLLinearRegressor.

I'm interested to hear of any other differences and whether the behaviour described above is a feature or a bug.

TabularData seems to have more features for data manipulation, although I haven't done any systematic comparison. I'm a bit puzzled as to why there are 2 similar, but separate, frameworks.

Boost

Answer 1

Frameworks Engineer OP

Apple

Jan ’22

randomSplit is more efficient in TabularData because it doesn't allocate memory, it returns slices which effectively point to the existing DataFrame. Converting them to a DataFrame is the right thing to do in this case until MLLinearRegressor supports taking a slice instead.

In terms of differences TabularData is more in line with Swift in terms of protocols like Collection and behaviours like copy-on-write. Going forward you should use TabularData if that is an option for you.

2