Post

Replies

Boosts

Views

Activity

Crash when running custom train step and layers
My environment: Tensorflow: 2.14, tf-metal: 1.1, M3 Max I am working on an GAN full of residual sum and concatenation. It is trained correctly if using CPU only. However, if I enable GPU, it would cause: oc("mps_slice_1"("(mpsFileLoc): /AppleInternal/Library/BuildRoots/d615290d-668b-11ee-9734-0697ca55970a/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm":359:0)): error: 'mps.slice' op failed: length value 32 does not fit within the dimension size (33) with start value (32) /AppleInternal/Library/BuildRoots/d615290d-668b-11ee-9734-0697ca55970a/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphExecutable.mm:2133: failed assertion `Error: MLIR pass manager failed' Some customization I guess might be related to the error: tf.bitwise.bitwise_xor, tf.concat, tf.pad in custom layers numpy.random in train steps. Another debug hint I found is that the "32" is the number of channel of my models' conv layer, and change as I change the number of channel. Is there anyone know what is wrong? Thank you so much
1
1
632
Dec ’23