Post

Replies

Boosts

Views

Activity

Tensorflow-metal training with l2 regularizer much slower than without regularizer
Hi, When I try to train resnet-50 with tensorflow-metal I found the l2 regularizer makes each epoch take almost 4x as long (~220ms instead of 60ms). I'm on a M1 Max 16" MBP. It seems like regularization shouldn't add that much time, is there anything I can do to make it faster? Here's some sample code that reproduces the issue: import tensorflow as tf from tensorflow.keras.layers import Input, Conv2D, MaxPooling2D, ZeroPadding2D,\ Flatten, BatchNormalization, AveragePooling2D, Dense, Activation, Add from tensorflow.keras.regularizers import l2 from tensorflow.keras.models import Model from tensorflow.keras import activations import random import numpy as np random.seed(1234) np.random.seed(1234) tf.random.set_seed(1234) batch_size = 64 (train_im, train_lab), (test_im, test_lab) = tf.keras.datasets.cifar10.load_data() train_im, test_im = train_im/255.0 , test_im/255.0 train_lab_categorical = tf.keras.utils.to_categorical( train_lab, num_classes=10, dtype='uint8') train_DataGen = tf.keras.preprocessing.image.ImageDataGenerator() train_set_data = train_DataGen.flow(train_im, train_lab, batch_size=batch_size, shuffle=False) # Change this to l2 for it to train much slower regularizer = None # l2(0.001) def res_identity(x, filters): x_skip = x f1, f2 = filters x = Conv2D(f1, kernel_size=(1, 1), strides=(1, 1), padding='valid', use_bias=False, kernel_regularizer=regularizer)(x) x = BatchNormalization()(x) x = Activation(activations.relu)(x) x = Conv2D(f1, kernel_size=(3, 3), strides=(1, 1), padding='same', use_bias=False, kernel_regularizer=regularizer)(x) x = BatchNormalization()(x) x = Activation(activations.relu)(x) x = Conv2D(f2, kernel_size=(1, 1), strides=(1, 1), padding='valid', use_bias=False, kernel_regularizer=regularizer)(x) x = BatchNormalization()(x) x = Add()([x, x_skip]) x = Activation(activations.relu)(x) return x def res_conv(x, s, filters): x_skip = x f1, f2 = filters x = Conv2D(f1, kernel_size=(1, 1), strides=(s, s), padding='valid', use_bias=False, kernel_regularizer=regularizer)(x) x = BatchNormalization()(x) x = Activation(activations.relu)(x) x = Conv2D(f1, kernel_size=(3, 3), strides=(1, 1), padding='same', use_bias=False, kernel_regularizer=regularizer)(x) x = BatchNormalization()(x) x = Activation(activations.relu)(x) x = Conv2D(f2, kernel_size=(1, 1), strides=(1, 1), padding='valid', use_bias=False, kernel_regularizer=regularizer)(x) x = BatchNormalization()(x) x_skip = Conv2D(f2, kernel_size=(1, 1), strides=(s, s), padding='valid', use_bias=False, kernel_regularizer=regularizer)(x_skip) x_skip = BatchNormalization()(x_skip) x = Add()([x, x_skip]) x = Activation(activations.relu)(x) return x input = Input(shape=(train_im.shape[1], train_im.shape[2], train_im.shape[3]), batch_size=batch_size) x = ZeroPadding2D(padding=(3, 3))(input) x = Conv2D(64, kernel_size=(7, 7), strides=(2, 2), use_bias=False)(x) x = BatchNormalization()(x) x = Activation(activations.relu)(x) x = MaxPooling2D((3, 3), strides=(2, 2))(x) x = res_conv(x, s=1, filters=(64, 256)) x = res_identity(x, filters=(64, 256)) x = res_identity(x, filters=(64, 256)) x = res_conv(x, s=2, filters=(128, 512)) x = res_identity(x, filters=(128, 512)) x = res_identity(x, filters=(128, 512)) x = res_identity(x, filters=(128, 512)) x = res_conv(x, s=2, filters=(256, 1024)) x = res_identity(x, filters=(256, 1024)) x = res_identity(x, filters=(256, 1024)) x = res_identity(x, filters=(256, 1024)) x = res_identity(x, filters=(256, 1024)) x = res_identity(x, filters=(256, 1024)) x = res_conv(x, s=2, filters=(512, 2048)) x = res_identity(x, filters=(512, 2048)) x = res_identity(x, filters=(512, 2048)) x = AveragePooling2D((2, 2), padding='same')(x) x = Flatten()(x) x = Dense(10, activation='softmax', kernel_initializer='he_normal')(x) model = Model(inputs=input, outputs=x, name='Resnet50') opt = tf.keras.optimizers.legacy.SGD(learning_rate = 0.01) model.compile(loss=tf.keras.losses.CategoricalCrossentropy(reduction=tf.keras.losses.Reduction.SUM_OVER_BATCH_SIZE), optimizer=opt) model.fit(x=train_im, y=train_lab_categorical, batch_size=batch_size, epochs=150, steps_per_epoch=train_im.shape[0]/batch_size)
0
0
652
Nov ’23
MPSGraph randomTensor works for inference but crashes when training
I'm trying to use the randomTensor function from MPS graph to initialize the weights of a fully connected layer. I can create the graph and run inference using the randomly initialized values, but when I try to train and update these randomly initialized weights, I'm hitting a crash: Assertion failed: (isa<To>(Val) && "cast<Ty>() argument of incompatible type!"), function cast, file Casting.h, line 578. I can train the graph if I instead initialize the weights myself on the CPU, but I thought using the randomTensor functions would be faster/allow initialization to occur on the GPU. Here's my code for building the graph including both methods of weight initialization: func buildGraph(variables: inout [MPSGraphTensor]) -> (MPSGraphTensor, MPSGraphTensor, MPSGraphTensor, MPSGraphTensor) { let inputPlaceholder = graph.placeholder(shape: [2], dataType: .float32, name: nil) let labelPlaceholder = graph.placeholder(shape: [1], name: nil) // This works for inference but not training let descriptor = MPSGraphRandomOpDescriptor(distribution: .uniform, dataType: .float32)! let weightTensor = graph.randomTensor(withShape: [2, 1], descriptor: descriptor, seed: 2, name: nil) // This works for inference and training // let weights = [Float](repeating: 1, count: 2) // let weightTensor = graph.variable(with: Data(bytes: weights, count: 2 * MemoryLayout<Float32>.size), shape: [2, 1], dataType: .float32, name: nil) variables += [weightTensor] let output = graph.matrixMultiplication(primary: inputPlaceholder, secondary: weightTensor, name: nil) let loss = graph.softMaxCrossEntropy(output, labels: labelPlaceholder, axis: -1, reuctionType: .sum, name: nil) return (inputPlaceholder, labelPlaceholder, output, loss) } And to run the graph I have the following in my sample view controller: override func viewDidLoad() { super.viewDidLoad() var variables: [MPSGraphTensor] = [] let (inputPlaceholder, labelPlaceholder, output, loss) = buildGraph(variables: &variables) let gradients = graph.gradients(of: loss, with: variables, name: nil) let learningRate = graph.constant(0.001, dataType: .float32) var updateOps: [MPSGraphOperation] = [] for (key, value) in gradients { let updates = graph.stochasticGradientDescent(learningRate: learningRate, values: key, gradient: value, name: nil) let assign = graph.assign(key, tensor: updates, name: nil) updateOps += [assign] } let commandBuffer = MPSCommandBuffer(commandBuffer: Self.commandQueue.makeCommandBuffer()!) let executionDesc = MPSGraphExecutionDescriptor() executionDesc.completionHandler = { (resultsDictionary, nil) in for (key, value) in resultsDictionary { var output: [Float] = [0] value.mpsndarray().readBytes(&output, strideBytes: nil) print(output) } } let inputDesc = MPSNDArrayDescriptor(dataType: .float32, shape: [2]) let input = MPSNDArray(device: Self.device, descriptor: inputDesc) var inputArray: [Float] = [1, 2] input.writeBytes(&inputArray, strideBytes: nil) let source = MPSGraphTensorData(input) let labelMPSArray = MPSNDArray(device: Self.device, descriptor: MPSNDArrayDescriptor(dataType: .float32, shape: [1])) var labelArray: [Float] = [1] labelMPSArray.writeBytes(&labelArray, strideBytes: nil) let label = MPSGraphTensorData(labelMPSArray) // This runs inference and works // graph.encode(to: commandBuffer, feeds: [inputPlaceholder: source], targetTensors: [output], targetOperations: [], executionDescriptor: executionDesc) // // commandBuffer.commit() // commandBuffer.waitUntilCompleted() // This trains but does not work graph.encode( to: commandBuffer, feeds: [inputPlaceholder: source, labelPlaceholder: label], targetTensors: [], targetOperations: updateOps, executionDescriptor: executionDesc) commandBuffer.commit() commandBuffer.waitUntilCompleted() } And a few other relevant variables are created at the class scope: let graph = MPSGraph() static let device = MTLCreateSystemDefaultDevice()! static let commandQueue = device.makeCommandQueue()! How can I use these randomTensor functions on MPSGraph to randomly initialize weights for training?
1
0
1.5k
Apr ’23
Error running UI tests on device with iOS 16 and xctestrun file
When I generate an xctestrun file using Xcode 14 with: xcodebuild build-for-testing -project UITestTesting.xcodeproj -scheme UITestTesting -destination 'id=00008020-000545362EE1002E' -derivedDataPath ./build I'm able to see the xctestrun file at "/Users/noahmartin/Desktop/UITestTesting/Build/Build/Products/UITestTesting_iphoneos16.0-arm64.xctestrun" However, I can't use it to run tests on my iOS 16 iPad. I'm trying to invoke the test like this: xcodebuild test-without-building -xctestrun "/Users/noahmartin/Desktop/UITestTesting/Build/Build/Products/UITestTesting_iphoneos16.0-arm64.xctestrun " -destination 'id=00008020-000545362EE1002E' -derivedDataPath ./build And getting this error: xcodebuild: error: Unable to find a destination matching the provided destination specifier: { id:00008020-000545362EE1002E } Available destinations for the "Transient Testing" scheme: { platform:macOS, arch:arm64e, id:00006001-0012198A0102801E } { platform:macOS, arch:arm64, id:00006001-0012198A0102801E } { platform:macOS, arch:x86_64, id:00006001-0012198A0102801E } { platform:macOS, arch:arm64e, variant:Mac Catalyst, id:00006001-0012198A0102801E } { platform:macOS, arch:arm64, variant:Mac Catalyst, id:00006001-0012198A0102801E } { platform:macOS, arch:x86_64, variant:Mac Catalyst, id:00006001-0012198A0102801E } { platform:macOS, arch:arm64e, variant:DriverKit, id:00006001-0012198A0102801E } { platform:macOS, arch:arm64, variant:DriverKit, id:00006001-0012198A0102801E } { platform:macOS, arch:arm64, variant:Designed for [iPad,iPhone], id:00006001-0012198A0102801E } { platform:iOS Simulator, id:3113F5B7-7358-4ADE-9660-667872D113A7, OS:16.0, name:iPad (9th generation) } { platform:iOS Simulator, id:D7232872-AC19-4DF1-BF18-D23C2C2A5BFE, OS:16.0, name:iPad Air (5th generation) } { platform:iOS Simulator, id:EE6444F9-2664-4907-9481-08CE44D028E9, OS:16.0, name:iPad Pro (9.7-inch) } { platform:iOS Simulator, id:10063708-10B3-4510-A41F-E948B2D09DB3, OS:16.0, name:iPad Pro (11-inch) (3rd generation) } { platform:iOS Simulator, id:794714E1-25E0-474A-BA79-F5E0EFA3E646, OS:16.0, name:iPad Pro (12.9-inch) (5th generation) } { platform:iOS Simulator, id:809FAD25-CFE5-44CA-B642-0FF2A0E638E5, OS:16.0, name:iPad mini (6th generation) } { platform:iOS Simulator, id:055D580A-859B-42F8-B306-BD8CB68BF1CC, OS:16.0, name:iPhone 8 } { platform:iOS Simulator, id:228440FE-9536-401D-9221-E909761468FE, OS:16.0, name:iPhone 8 Plus } { platform:iOS Simulator, id:D2F30B3E-32F9-44B2-9B02-20A1B6FEDFC3, OS:16.0, name:iPhone 11 } { platform:iOS Simulator, id:1677A963-4BBC-4DE3-896B-2D93FBF10A1B, OS:16.0, name:iPhone 11 Pro } { platform:iOS Simulator, id:67F569A4-B422-406F-961A-C9093F4EA861, OS:16.0, name:iPhone 11 Pro Max } { platform:iOS Simulator, id:927F62C4-CB4E-487A-B021-0A986B9B700A, OS:16.0, name:iPhone 12 } { platform:iOS Simulator, id:AAAE026B-A40E-4D84-9447-91B2034B6FDA, OS:16.0, name:iPhone 12 Pro } { platform:iOS Simulator, id:96E6D2AB-140B-496F-8C31-71A12412AB0C, OS:16.0, name:iPhone 12 Pro Max } { platform:iOS Simulator, id:B68ACAB0-A87A-4384-B9F8-63ECF09D9564, OS:16.0, name:iPhone 12 mini } { platform:iOS Simulator, id:E6FFBA89-6B9E-4A76-A99E-0510E1EFE97B, OS:16.0, name:iPhone 13 } { platform:iOS Simulator, id:0ECA5023-8C7B-4CFA-B264-E0539509E61A, OS:16.0, name:iPhone 13 Pro } { platform:iOS Simulator, id:203AC07C-503F-476E-9F14-DA7FA4D63131, OS:16.0, name:iPhone 13 Pro Max } { platform:iOS Simulator, id:16AA1AE3-1D75-4358-A4FB-8A73141A0551, OS:16.0, name:iPhone 13 mini } { platform:iOS Simulator, id:FAC28F61-2CF8-40C1-AB2D-64CE6F11B96B, OS:16.0, name:iPhone SE (3rd generation) } Ineligible destinations for the "Transient Testing" scheme: { platform:iOS, id:dvtdevice-DVTiPhonePlaceholder-iphoneos:placeholder, name:Any iOS Device } { platform:iOS Simulator, id:dvtdevice-DVTiOSDeviceSimulatorPlaceholder-iphonesimulator:placeholder, name:Any iOS Simulator Device } { platform:macOS, name:Any Mac } { platform:macOS, variant:Mac Catalyst, name:Any Mac } { platform:tvOS, id:dvtdevice-DVTiOSDevicePlaceholder-appletvos:placeholder, name:Any tvOS Device } { platform:tvOS Simulator, id:dvtdevice-DVTiOSDeviceSimulatorPlaceholder-appletvsimulator:placeholder, name:Any tvOS Simulator Device } { platform:watchOS, id:dvtdevice-DVTiOSDevicePlaceholder-watchos:placeholder, name:Any watchOS Device } { platform:watchOS Simulator, id:dvtdevice-DVTiOSDeviceSimulatorPlaceholder-watchsimulator:placeholder, name:Any watchOS Simulator Device } I am able to launch the test directly from xcode, or with the "project" and "scheme" flags and xcodebuild test but I can't get the xctestrun file to work with iOS 16. This method works fine for iOS 15/Xcode 13. Is there anything I'm missing to make this work with the new betas? Also filed as FB10129497
4
2
3.5k
Jun ’22
Larger app store download size than local LZFSE compression
I'm investigating why the download size reported by the App Store (shown in an alert when downloading an app on low data mode) is a larger size than I get locally. I've copied the example from https://developer.apple.com/documentation/accelerate/compressing_file_system_directories to compress my already thinned .app with LZFSE. The resulting file size is about 10MB smaller than what I see on the app store. However, if I do this same process for an app that has a binary of about the same size, but fewer resource files in the app bundle, my local archive is within 1MB of the App Store download size. The app thinning report built into Xcode seems to have similar inaccuracies with app bundles containing many files. I know the app store builds have additional code signing and encryption, but this is only a problem with apps that contain lots of files (specifically localizable strings) and isn't much of an issue with apps that are mostly a binary file. Are there any other differences between what is downloaded on the app store and my local .app that could cause the larger app store download size? Also, is the App Store using LZFSE or another algorithm like GZIP?
0
0
731
Jan ’22
Slower xctrace export with Xcode 13
I'm using xcrun xctrace export --output results.xml --input test_trace.trace --xpath '//trace-toc[1]/run[1]/data[1]/table' to export data from a test run with instruments as part of my app's CI. With Xcode 12 this resulted in an xml file that I could parse relatively quickly, but now with Xcode 13 the export process itself is taking 90+ seconds and generating a 160mb xml file for a 10 second recording. I noticed the table that has increased is the time-sample schema. Just attempting to export this table with --xpath '//trace-toc[1]/run[1]/data[1]/table[4]' takes quite a while. The table has about 790 thousand rows. I'm using a custom instrument based off the time profiler, and still have about the same number of stack trace samples in my output. Did anything change in Xcode 13 that caused instruments to include many more time samples that aren't corresponding to a stack trace? Is it possible to disable this to have fewer time samples in my trace (while preserving the stack trace frequency) so the xml can be parsed quicker?
0
0
1.2k
Oct ’21
Profiling iOS apps on M1 with Instruments xctrace invalid code signature
I'm trying to profile app launch of an iOS app running on an M1 Mac. After downloading a Mac compatible app from the iOS App Store (Airbnb) I'm able to select it in instruments by selecting the .app in /Applications This allows me to launch the app and profile it by pressing the record button Now I want to do this in an automated way with xctrace so I ran : xcrun xctrace record --template 'App Launch' --device CE3D229D-A2BF-5455-8923-5D49498F06DC --launch -- '/Applications/Airbnb.app' Using the id of my Mac returned from xcrun xctrace list devices. This attempts to launch the app but it immediately crashes with an error about an invalid code signature: If I try the same process with a MacOS app like Twitter, instead of an iOS app running on Mac, I'm able to launch it with xctrace. I'm also able to attach to Airbnb using the --attach PID option, but this doesn't help for testing app launch. I also tried launching the internal iOS .app using: xcrun xctrace record --template 'App Launch' --device CE3D229D-A2BF-5455-8923-5D49498F06DC --launch -- '/Applications/Airbnb.app/Wrapper/Airbnb.app' but got the same crash. How can I use the command line to launch an iOS app for profiling instruments? This is using Xcode 12.5.1 on macOS 11.5.2
2
0
1.7k
Aug ’21
Any reason not to use dyld fixups on iOS 13+?
I've been experimenting with chained fixups now that they're the default on iOS 15 and saw about 1.3mb reductions in app size because the data used by dyld to find fixup locations is much more compact. It looks like the -fixup_chains linker flag enables LC_DYLD_CHAINED_FIXUPS and LC_DYLD_EXPORTS_TRIE even pre iOS 15. I was able to launch an app with this linker flag on iOS 14.6 and iOS 13.4.1 without a problem. However, the Xcode 13 release notes say This uses different load commands and LINKEDIT data, and won’t run or load on older OS versions. It appears that it actually does run on older OS versions, what problems (if any) are there with using these load commands as far back as iOS 13.0?
0
0
2.5k
Jun ’21