I did not use MLRegressor, but have some background in stats.
Definition of maximumError in MLRegressor is
The largest absolute difference between the expected values and the model's predicted values during testing or training.
So that means, as results may be 0 or 1, that for an expected 1 you get always a prediction above 0.44 and for an expected 0 a prediction less than 0.56.
So, building an estimator by comparing the result to 0.5 and deciding 0 if below 0.5 and 1 if above, would give already a pretty robust prediction.
That means also that winner is really influenced by playing at home or outside (winner playing at home more often probably).
So, it is not a toss, because with a toss, the maximum error would be close to 1 (in some cases, predicting zero where it should be 1).
have you measured the rootMeanSquaredError ? That could help evaluating the error probability with the above estimate.
maximumError was 0.558 and rootMeanSquaredError was 0.498.
So it seems I completely misinterpreted these results. I must dig out my old statistics books!
Could you post the code of your example, to play a little with it ?
In fact, it is a bit difficult to understand what this model training real means.
What is the output of the model: 0 or 1 or a value between 0 and 1 (should be otherwise max error would necessarily be 0 or 1)
But the point here is that 0.56 is the maximum error, not the average error. If average error was 0.56, it would probably mean it is a random choice. That's different with max error.
rootMeanSquaredError of 0.498 shows thar error is largely distributed between 0 and 0.56 ; I would guess an average error (if the model can provide it) of about 0.3
I wonder if regressor is best suited in this case. Do you uise a Linear regressor (that's what I guess) or decisionTree ?
As real values are discrete, linear regressor is probably not well suited.
Here's the code from the playground I'm using:
let home_wins_A = Bundle.main.url(forResource: "home_wins_A", withExtension: "csv") var dataTable_A = try MLDataTable(contentsOf: home_wins_A!) let home_wins_B = Bundle.main.url(forResource: "home_wins_B", withExtension: "csv") var dataTable_B = try MLDataTable(contentsOf: home_wins_B!) // Regression let (evaluationTable_A, trainingTable_A) = dataTable_A.randomSplit(by: 0.2, seed: 5) let regressor = try MLRegressor(trainingData: trainingTable_A, targetColumn: "result") let regressorEvaluation = regressor.evaluation(on: evaluationTable_A) regressorEvaluation.maximumError regressorEvaluation.rootMeanSquaredError // Classification let (evaluationTable_B, trainingTable_B) = dataTable_B.randomSplit(by: 0.2, seed: 5) let classifier = try MLClassifier(trainingData: trainingTable_B, targetColumn: "result") let classifierEvaluation = classifier.evaluation(on: evaluationTable_B) classifierEvaluation.classificationError
The file home_wins_A has three integer fields: home, away, and result. The first two are the id numbers of the teams. The last field is 1 if the home team won, and 0 otherwise. The file home_wins_B has three string fields with the same names. The first two are three-letter abbreviations of the team names (e.g. "MTL" for Montreal). The last field is "W" if the home team won, and "L" otherwise.
Both files were generated from the same data set, which lists the 11,434 games played in the NHL since 2010. Unfortunately, I don't see any way to attach the CSV files to this post.
Here are the results I'm now getting:
regressorEvaluation.maximumError = 0.852
regressorEvaluation.rootMeanSquaredError = 0.496
classifierEvaluation.classificationError = 0.45
Many thanks for your explanations: they've been very helpful!