So what's happening in all 3 cases is that your model is predicting "left" 100% of the time. This gives you a recall of 100% for that class only! So it's not a great model, because there's no separation between the different classes. It has an okay F1 score because it's really good at telling you that all left examples are left, but it's not really useful. Would you be able to provide some of the training data to take a closer look at? There's a couple different things that could be happening:
- The model hasn't trained long enough to be useful
- There isn't enough data, and/or the data doesn't have enough to distinguish between left and right (or other). If they're hard to tell apart, it may be more difficult to get an accurate model.
- You've found a bug! The data will help us determine if this is the case so that we can fix it.