The short answer is that, yes, under iOS 15 you can retrain your model using accumulated historical app data with CreateML, which provides information on the accuracy of the model. If you keep a permanent record of previous models' accuracies, then you can see how well the model performance evolves (or doesn't).
However, in the example you give (choice of clothing), one has to ask what data are used to train the model? Or, more pertinently, on what basis does the user decide what to wear? Just recording the attire (pants) and colour (blue) only gives you the outputs (choices) of the process, not the inputs (reasons for wearing something). For myself, I choose my attire depending on the day's activities and the weather: maybe pantsuit if it's cold, a dress if warm. If I have a business appointment, I might choose my power colours. If lunch with girlfriends, floral. So, if your app looks at the day's appointments and the weather forecast, then offers a suggested choice of clothing, then records the actual choice(s) you have a basis for prediction. But a prediction of what? The user's mood?, because choosing a dress on a cold day isn't necessarily "wrong".
In your clothing example, I don't think that CreateML & CoreML are the way to go. As you hinted at, the real issue is to give the user an informed choice of clothing depending on the circumstances. You can build up a profile of what they wear on what days, schedule, and weather and then adjust the presented suggestions accordingly - based on simple descriptive statistics (e.g. frequency).
The key reason for Machine Learning is to get a predicted outcome (output) from a given set of circumstances (inputs). That prediction needs to be as accurate as possible. A good example is creating a prediction of a runner's 10K finish time from their pace, cumulative time, heart rate, etc, at each kilometre i.e. a progressive prediction ( 9 predictions in a 10KM race). If the predictions are consistently saying 36 minutes and the user finishes in 40, oops!! - the model needs retraining, with more data (factors, eg. hills). With attire, if the user chooses a red dress instead of blue, so what?
Best wishes and regards, Michaela