I've got a little project running simulations of a tennis match. This mostly involves a great number of loops generating random numbers, if statements to who gets the point, and incrementing a tally to keep track of points / games / sets / matches / tournaments.
I run 10000 simulations (to give a resulting probability distribution) and the process takes 21s to come up with an answer if done sequentially using a task group. So now I decide to run the same number of simulations, but this time adding 2 tasks of 5000 simulations to the group.
This time the result takes 12.5s. Woohoo I think. That's a 40% reduction in wall clock time.
Of course the next thought is if less is more, then how much more will more be?!? So I keep increasing the count.
To my disappointment not only did the marginal returns decrease, they actually went negative.
- 1x = 21s
- 2x = 12.5s
- 3x = 9.9s
- 4x = 10.6s
- 5x = 10.7s
- 100x = 13.9s
Can someone tell me whether this is similar to this posting where the suspicion is memory bandwidth limitations?
I can't really include any code as it is quite long. I can say I'm using drand48 for randomness (as much faster than Double.random). Everything on the simulation side is a struct, with classes linking the matches together (so winners of one round move to the next).
There is one tournament simulator class created for every task, with the match simulator structs it manages being created once but altering constantly.
This is on a M1 Max chip running macOS 12 and Xcode 14 (beta 5). I'm also running the app as an archived (i.e. release optimised) version.
If memory bandwidth is most likely the problem then I can live with knowing that. I'm not wanting to make the assumption that it's a bottleneck which can't be overcome and miss an optimisation chance in my code.