I recently wrote some code for a basic GAN (I am learning about neural networks, so not an expert), and got very strange results. Unable to debug, I tested someone else's code that I know works, and still got the same results. When running a GAN to generate digits (from the MNIST dataset) the images produced each epoch are identical to each other, and don't resemble digits at all. An example of the images produced can be seen below.
Rerunning the same code on Google Colab, and on my machine locally (with standard tensorflow, i.e. without the metal plugin) gives expected results of images resembling digits.
The code is used to test this can be found here: https://github.com/PacktPublishing/Deep-Learning-with-TensorFlow-2-and-Keras/blob/master/Chapter%206/VanillaGAN.ipynb
I am using these versions of relevant software: tensorflow-metal 0.5.0; tensorflow-macos 2.9.2; macOS Monterey 12.3;
I would be grateful if Apple engineers could advise, or give a timeframe for a solution please.
Hi @90jtip
Thanks for reporting this issue and providing the script to reproduce it. This does look like something is going wrong with the GPU implementation on this network. Based on a quick look the GAN is using fairly simple layers so I'm hoping the debugging process won't get too cumbersome. I can't make any promises on the timeline of the resolutions here but in the meanwhile you can use the block with tf.device('CPU: 0'):
to enclose parts of your code or the whole script to limit it to run on the CPU. This issue is likely related to something in the GPU implementation so this could get the correctness issue sorted out at the cost of longer runtime due to not being able to use the GPU but hopefully it'll be enough while learning and running through the examples in the book.