I am training with about 50,000 images and 10 classes for 2000 iterations within Create ML App. I am seeing swings in loss from 12 down to 3 then jumping back up to 12 again. It then trends back down towards 3. It has done this 10 times in the first 1000 iterations. Is this normal in training? I expected loss to trend towards a lower value. Why is it jumping back up again? Is there something I can do to improve the training data? Do I have too many images? Is there an issue with the regularity that I have in my training data? There are lots of examples of one label then lots of another. Overall the numbers are balanced. Is it looking at all the training images on each iteration? (I think it may be keeping some for validation, but appart from those)
Reading up on this I think I may have two issues. There is a bug in my balancing logic in my augmentation code so my training data is unbalanced. Also this issue arises if the batch size is too small. This results in training using part of the training set that will, statistically, be unbalanced. The memory usage is not that high so I could increase the batch size but I cannot see a way to do this in the Create ML App. I will try and get rid of the bugs and report back.It looks like reducing the training data set size helps and also reducing the imbalance.