Render advanced 3D graphics and perform data-parallel computations using graphics processors using Metal.

Posts under Metal tag

194 Posts
Sort by:

Post

Replies

Boosts

Views

Activity

Use Metal to conver HDR Pixelbuffer to SDR Pixelbuffer
I see some demo show convert HDR video to SDR Pixelbuffer,such AVAssetReader、 AVVideoComposition 、AVComposition 、AVFoundation. But In some cases,I want to render HDR Pixelbuffer and record video. AVCaptureSession *session = [[AVCaptureSession alloc] init]; session.sessionPreset = AVCaptureSessionPresetHigh; AVCaptureDevice *videoDevice = [AVCaptureDevice defaultDeviceWithMediaType:AVMediaTypeVideo]; if ([videoDevice isVideoHDRSupported]) { NSError *error = nil; if ([videoDevice lockForConfiguration:&error]) { videoDevice.automaticallyAdjustsVideoHDREnabled = NO; videoDevice.videoHDREnabled = YES; // 开启 HDR [videoDevice unlockForConfiguration]; } else { NSLog(@"Error: %@", error.localizedDescription); } } Real-time processing of HDR data requires processing of video frame data (such as filters), ensuring that the processing chain supports 10-bit color depth and HDR metadata. And use imagesBuffer to object tracking, etc. How to solve this problem?
1
0
270
Jan ’25
CATransaction commit() crashed on background thread [EXC_BREAKPOINT: com.apple.root.****-qos.cooperative]
Problem Description We are developing a app for iOS and iPadOS that involves extensive custom drawing of paths, shapes, texts, etc. To improve drawing and rendering speed, we use CARenderer to generate cached images (CGImage) on a background thread. We adopted this approach based on this StackOverflow post: https://stackoverflow.com/a/75497329/9202699. However, we are experiencing frequent crashes in our production environment that we can hardly reproduce in our development environment. Despite months of debugging and seeking support from DTS and the Apple Feedback platform, we have not been able to fully resolve this issue. Our recent crash reports indicate that the crashes occur when calling CATransaction.commit(). We suspect that CATransaction may not be functioning properly outside the main thread. However, based on feedback from the Apple Feedback platform, we were advised to use CATransaction.begin() and CATransaction.commit() on a background thread. If anyone has any insights, we would greatly appreciate it. Code Sample The line CATransaction.commit() is causing the crash: [EXC_BREAKPOINT: com.apple.root.****-qos.cooperative] private let transactionLock = NSLock() // to ensure one transaction at a time private let device = MTLCreateSystemDefaultDevice()! @inline(never) static func drawOnCGImageWithCARenderer( layerRect: CGRect, itemsToDraw: [ItemsToDraw] ) -> CGImage? { // We have encapsulated everything related to CALayer and its // associated creations and manipulations within CATransaction // as suggested by engineers from Apple Feedback Portal. transactionLock.lock() CATransaction.begin() // Create the root layer. let layer = CALayer() layer.bounds = layerRect layer.masksToBounds = true // Add one sublayer for each item to draw itemsToDraw.forEach { item in // We have thousands or hundred thousands of drawing items to add. // Each drawing item may produce a CALayer, CAShapeLayer or CATextLayer. // This is also why we want to utilise CARenderer to leverage GPU rendering. layer.addSublayer( item.createCALayerOrCATextLayerOrCAShapeLayer() ) } // Create MTLTexture and CARenderer. let textureDescriptor = MTLTextureDescriptor.texture2DDescriptor( pixelFormat: .rgba8Unorm, width: Int(layer.frame.size.width), height: Int(layer.frame.size.height), mipmapped: false ) textureDescriptor.usage = [MTLTextureUsage.shaderRead, .shaderWrite, .renderTarget] let texture = device.makeTexture(descriptor: textureDescriptor)! let renderer = CARenderer(mtlTexture: texture) renderer.bounds = layer.frame renderer.layer = layer.self /* ********************************************************* */ // From our crash report, this is where the crash happens. CATransaction.commit() /* ********************************************************* */ transactionLock.unlock() // Rendering layers onto MTLTexture using CARenderer. renderer.beginFrame(atTime: 0, timeStamp: nil) renderer.render() renderer.endFrame() // Draw MTLTexture onto image. guard let colorSpace = CGColorSpace(name: CGColorSpace.sRGB), let ciImage = CIImage(mtlTexture: texture, options: [.colorSpace: colorSpace]) else { return nil } // Convert CIImage to CGImage. let context = CIContext() return context.createCGImage(ciImage, from: ciImage.extent) }
0
1
278
Jan ’25
Instruments showing incorrect values
Hello, I’m encountering an issue with the Instruments app while running a benchmark on an M2 Ultra Mac Studio. Despite being certain that GPU activities involving memory read and write operations are occurring, all related performance counters consistently return 0. Interestingly, this problem does not occur when using the same code on an M1 MacBook Air, where the counters behave as expected. What could be causing this discrepancy? Any insights or suggestions would be greatly appreciated. Thank you!
0
0
254
Jan ’25
Request for gaze data in fully immersive Metal apps
Hi, We are trying to port our Unity app from other XR devices to Vision Pro. Thus it's way easier for us to use the Metal rendering layer, fully immersive. And to stay true to the platform, we want to keep the gaze/pinch interaction system. But we just noticed that, unlike Polyspatial XR apps, VisionOS XR in Metal does not provide gaze info unless the user is actively pinching... Which forbids any attempt to give visual feedback on what they are looking at (buttons, etc). Is this planned in Apple's roadmap ? Thanks
3
0
370
3w
CATransaction commit [Crashed: com.apple.root.user-initiated-qos.cooperative]
Description We are developing a app for iOS and iPadOS that involves extensive custom drawing of paths, shapes, texts, etc. To improve drawing and rendering speed, we use CARenderer to generate cached images (CGImage) on a background thread. We adopted this approach based on this StackOverflow post: https://stackoverflow.com/a/75497329/9202699. However, we are experiencing frequent crashes in our production environment that we cannot reproduce in our development environment. Despite months of debugging and seeking support from DTS and the Apple Feedback platform, we have not been able to fully resolve this issue. Our recent crash reports indicate that the crashes occur when calling CATransaction.commit(). Crash traceback The method names in this traceback are mapped to those in the code sample below. The app name has been masked. Crashed: com.apple.root.user-initiated-qos.cooperative 0 MyApp 0x887408 specialized static CAUtils.commitCATransaction() + 4340151304 (<compiler-generated>:4340151304) 1 MyApp 0x887408 specialized static CAUtils.commitCATransaction() + 4340151304 (<compiler-generated>:4340151304) 2 MyApp 0x8874a4 specialized static CAUtils.addDrawingItemsToRenderer(***) + 250 (CAUtils.swift:250) 3 MyApp 0x887710 specialized static CAUtils.drawOnCGImageWithCARenderer(***) + 267 (CAUtils.swift:267) 4 MyApp 0x8878c0 specialized static CAUtils.drawOnCGImageWithCARendererWithRetry(***) + 315 (CAUtils.swift:315) 5 MyApp 0x736294 XXXManager.generateCGImages(***) + 570 (XXXManager.swift:570) 6 MyApp 0x73404c closure #1 in XXXManager.updateCachedCGImages(***) + 427 (XXXManager.swift:427) 7 libswift_Concurrency.dylib 0x61104 swift::runJobInEstablishedExecutorContext(swift::Job*) + 252 8 libswift_Concurrency.dylib 0x62514 swift_job_runImpl(swift::Job*, swift::SerialExecutorRef) + 144 9 libdispatch.dylib 0x15d8c _dispatch_root_queue_drain + 392 10 libdispatch.dylib 0x16590 _dispatch_worker_thread2 + 156 11 libsystem_pthread.dylib 0x4c40 _pthread_wqthread + 228 12 libsystem_pthread.dylib 0x1488 start_wqthread + 8 Code Sample Below is a sample of our code. While the complete snippet is too long, the issue occurs in addDrawingItemsToRenderer. Please refer to the other methods for completeness and reference purposes. private let transactionLock = NSLock() private let deviceLock = NSLock() private let device = MTLCreateSystemDefaultDevice()! /// This is the method we call from outside. @inline(never) static func drawOnCGImageWithCARenderer( layerRect: CGRect, drawingItems: [DrawingItem] ) -> CGImage? { guard let (texture, renderer) = addDrawingItemsToRenderer( layerRect: layerRect, drawingItems: drawingItems ) else { return nil } renderer.beginFrame(atTime: 0, timeStamp: nil) renderer.render() renderer.endFrame() guard let colorSpace = CGColorSpace(name: CGColorSpace.sRGB), let ciImage = CIImage(mtlTexture: texture, options: [.colorSpace: colorSpace]) else { return nil } let context = CIContext() return context.createCGImage(ciImage, from: ciImage.extent) } /// This is the method will the crash happens @inline(never) fileprivate static func addDrawingItemsToRenderer( layerRect: CGRect, drawingItems: [DrawingItem] ) -> (MTLTexture, CARenderer)? { // We have encapsulated everything related to CALayer and its // associated creations and manipulations within CATransaction // as suggested by engineers from Apple Feedback Portal. beginCATransaction() defer { commitCATransaction() // The crash happens here } let (layer, imageWidth, imageHeight) = addDrawingItemsToLayer(layerRect: layerRect, drawingItems: drawingItems) return createTextureAndRenderer( layer: layer, imageWidth: imageWidth, imageHeight: imageHeight ) } // Below are all internal methods. We have split the method into very // granular parts and marked them as @inline(never) to prevent the // compiler from inlining our code, which may otherwise obscure usage // trackback information in our crash reports. @inline(never) fileprivate static func beginCATransaction() { transactionLock.lock() CATransaction.begin() } @inline(never) fileprivate static func commitCATransaction() { // From our crash report, we believe the crash happens on this line. CATransaction.commit() // It is unlikely that the lock cause the crash as we added it only recently // to ensure that there is only one transaction on our background thread, // and after we added this lock, the crash rate indeed lowered, but still // not fully disappear transactionLock.unlock() } -------------------------------- // The methods below are provided for reference and completeness. While // they may have issues, they do not frequently appear in our crash // reports as the one caused by `CATransaction.commit()` @inline(never) fileprivate static func addDrawingItemsToLayer( layerRect: CGRect, drawingItems: [DrawingItem] ) -> (layer: CALayer, imageWidth: CGFloat, imageHeight: CGFloat) { let layer = CALayer() layer.isGeometryFlipped = SharedAppUtils.isIOS layer.anchorPoint = CGPoint.zero layer.bounds = layerRect layer.masksToBounds = true for drawingItem in drawingItems { // We have thousands or hundred thousands of drawing items to add. // Each drawing item may produce a CALayer, CAShapeLayer or CATextLayer. // This is also why we want to utilise CARenderer to leverage GPU rendering. let sublayerForDrawingItem = drawingItem.createCALayerOrCATextLayerOrCAShapeLayer() layer.addSublayer(sublayerForDrawingItem) } let imageWidth = max(1, layer.frame.size.width * UIScreen.main.scale) let imageHeight = max(1, layer.frame.size.height * UIScreen.main.scale) layer.transform = CATransform3DMakeScale(UIScreen.main.scale, UIScreen.main.scale, 1) layer.frame = .init(origin: .zero, size: .init(width: imageWidth, height: imageHeight)) return (layer, imageWidth, imageHeight) } @inline(never) fileprivate static func createTextureAndRenderer( layer: CALayer, imageWidth: CGFloat, imageHeight: CGFloat ) -> (MTLTexture, CARenderer)? { deviceLock.lock() defer { deviceLock.unlock() } let textureDescriptor = MTLTextureDescriptor.texture2DDescriptor( pixelFormat: .rgba8Unorm, width: Int(imageWidth), height: Int(imageHeight), mipmapped: false ) textureDescriptor.usage = [MTLTextureUsage.shaderRead, .shaderWrite, .renderTarget] guard let texture = device.makeTexture(descriptor: textureDescriptor) else { return nil } let renderer = CARenderer(mtlTexture: texture) renderer.bounds = layer.frame renderer.layer = layer.self return (texture, renderer) }
1
1
279
Jan ’25
Usage of colorCurves CIFilter
How can I use my RGB Curve points: let redCurve = [CIVector(x: 0, y: 0), CIVector(x: 0.235, y: 0.152), CIVector(x: 0.5, y: 0.5), CIVector(x: 1, y: 1)] let greenCurve = [CIVector(x: 0, y: 0), CIVector(x: 0.247, y: 0.196), CIVector(x: 0.5, y: 0.5), CIVector(x: 1, y: 1)] let blueCurve = [CIVector(x: 0, y: 0), CIVector(x: 0.235, y: 0.184), CIVector(x: 0.466, y: 0.466), CIVector(x: 1, y: 1)] in colorCurvesFilter which I've found in Apple Docs: func colorCurves(inputImage: CIImage) -&gt; CIImage { let colorCurvesEffect = CIFilter.colorCurves() colorCurvesEffect.inputImage = inputImage colorCurvesEffect.curvesDomain = CIVector(x: 0, y: 1) colorCurvesEffect.curvesData = Data( bytes: [Float32]([ 0.0,0.0,0.0, 0.8,0.8,0.8, 1.0,1.0,1.0 ]), count: 36) colorCurvesEffect.colorSpace = CGColorSpaceCreateDeviceRGB() return colorCurvesEffect.outputImage! }
0
0
229
Jan ’25
Swift playground + metal crashes on swift 6
Following code crashes (sigsegv in lldb-rpc-server) when run as swift 6, but runs correctly when run as swift 5 (from "Metal by tutorials"): import PlaygroundSupport import MetalKit print("start") guard let device = MTLCreateSystemDefaultDevice() else { fatalError("GPU is not supported") } let frame = CGRect(x: 0, y: 0, width: 600, height: 600) let view = MTKView(frame: frame, device: device) view.clearColor = MTLClearColor(red: 1, green: 1, blue: 0.8, alpha: 1) let allocator = MTKMeshBufferAllocator(device: device) let mdlMesh = MDLMesh(sphereWithExtent: [0.75,0.75,0.75], segments: [100, 100], inwardNormals: false, geometryType: .triangles, allocator: allocator) let mesh = try MTKMesh(mesh: mdlMesh, device: device) guard let commandQueue = device.makeCommandQueue() else { fatalError("Could not create a command queue") } let shader = """ #include <metal_stdlib> using namespace metal; struct VertexIn { float4 position [[attribute(0)]]; }; vertex float4 vertex_main(const VertexIn vertex_in [[stage_in]]) { return vertex_in.position; } fragment float4 fragment_main() { return float4(1, 0, 0, 1); } """ print("A") let library = try device.makeLibrary(source: shader, options: nil) let vertexFunction = library.makeFunction(name: "vertex_main") let fragmentFunction = library.makeFunction(name: "fragment_main") let pipelineDescriptor = MTLRenderPipelineDescriptor() pipelineDescriptor.colorAttachments[0].pixelFormat = .bgra8Unorm pipelineDescriptor.vertexFunction = vertexFunction pipelineDescriptor.fragmentFunction = fragmentFunction print("X") pipelineDescriptor.vertexDescriptor = MTKMetalVertexDescriptorFromModelIO(mesh.vertexDescriptor) let pipelineState = try device.makeRenderPipelineState(descriptor: pipelineDescriptor) guard let commandBuffer = commandQueue.makeCommandBuffer(), let renderPassDescriptor = view.currentRenderPassDescriptor, let renderEncoder = commandBuffer.makeRenderCommandEncoder(descriptor: renderPassDescriptor) else { fatalError() } renderEncoder.setRenderPipelineState(pipelineState) renderEncoder.setVertexBuffer(mesh.vertexBuffers[0].buffer, offset: 0, index: 0) guard let submesh = mesh.submeshes.first else { fatalError() } renderEncoder.drawIndexedPrimitives(type: .triangle, indexCount: submesh.indexCount, indexType: submesh.indexType, indexBuffer: submesh.indexBuffer.buffer, indexBufferOffset: 0) renderEncoder.endEncoding() guard let drawable = view.currentDrawable else { fatalError() } commandBuffer.present(drawable) commandBuffer.commit() print("test") PlaygroundPage.current.liveView = view Crash report: https://gist.githubusercontent.com/tumdum/8aa53bc806619c0d21c93a55fae07937/raw/370b00c07b08fff8856f9fc678de9888faa8d06e/crash.log I'm on macOS 15.1.1 (24B2091) + Xcode 16.2 (16C5032a)
1
1
281
5d
How to save a point cloud in the sample code "Capturing depth using the LiDAR camera" with the photoOutput
Hello dear community, I have the sample code from Apple “CapturingDepthUsingLiDAR” to access the LiDAR on my iPhone 12 Pro. My goal is to use the “photo output” function to generate a point cloud from a single image and then save it as a ply file. So far I have tested different approaches to create a .ply file from the depthmap, the intrinsic camera data and the rgba values. Unfortunately, I have had no success so far and the result has always been an incorrect point cloud. My question now is whether there are already approaches to this and whether anyone has any experience with it. Thank you very much in advance!!!
2
0
401
Jan ’25
metal-cpp syntax for MTL::Buffer float2 parameter
I'm trying to pass a buffer of float2 items from CPU to GPU. In the kernel, I can provide a parameter for the buffer: device const float2* values for example. How do I specify float2 as the type for the MTL::Buffer? I managed to get the code to work by "cheating" by defining a simple class that has the same data members as a float2, but there is probably a better way. class Coord_f { public: float x{0.0f}; float y{0.0f}; }; then using code to allocate like this: NS::TransferPtr(device->newBuffer(n_elements * sizeof(Coord_f), MTL::ResourceStorageModeManaged)) The headers for metal-cpp do not appear to define vector objects like float2, but I'm doubtless missing something. Thanks.
2
0
435
Jan ’25
Texture Definitions for MPSSVGF Denoise
I am trying to use the SVGF denoiser to denoise my ray traced shadows (and also other textures later). I do get a smoothed image, but with wonky denoising. I need the depth-normal textures and motion textures for the SVGF and assume that these are badly filled in my case. However, neither in the above linked documentation nor in the WWDC19 video I find how they should be defined. I am looking to answers to: Is depth in red or alpha channel for the depth-normal texture? Are the normals in screen space? Is depth linear? Is it distance or z coordinate in view space? Or even logarithmically scaled or something else? Are the motion vectors supposed to be in pixels per frame? What is the orientation of the axis? Is y up or down? Are there are other restrictions on the formats? Also the linked code did not help me (I have not found any SVGF so far; also all the code is in Objective-C++, not Swift, but that's a different topic). So how should I fill these textures. Can someone point me to the documentation where these kinds of questions are answered?
0
0
408
Dec ’24
MetalTools.framework Missing/Corrupted
Like I said in the title, it looks like MetalTools.framework is missing or corrupted. I think I saw that the symbolic link was broken. They look like aliases in the finder, but I can't find the original. This was a problem with Ventura (using the last compatible Xcode version) and Sequoia 15.2 (Xcode 16.2). I didn't use Xcode before that. Note that none of my apps need Metal API (I don't think). I only noticed it when Xcode gave an error regarding Metal. Sorry this is so long; I hope the Terminal info will help. I don't want to reinstall Sequoia and this has been a problem since at least Ventura. Recommendations? ls -l /System/Library/PrivateFrameworks/MetalTools.framework/ total 0 lrwxr-xr-x 1 root wheel 27 Dec 7 01:11 MetalTools -> Versions/Current/MetalTools lrwxr-xr-x 1 root wheel 26 Dec 7 01:11 Resources -> Versions/Current/Resources drwxr-xr-x 4 root wheel 128 Dec 7 01:11 Versions ls -la /System/Library/PrivateFrameworks/MetalTools.framework/ total 0 drwxr-xr-x 5 root wheel 160 Dec 7 01:11 . drwxr-xr-x 1885 root wheel 60320 Dec 7 01:11 .. lrwxr-xr-x 1 root wheel 27 Dec 7 01:11 MetalTools -> Versions/Current/MetalTools lrwxr-xr-x 1 root wheel 26 Dec 7 01:11 Resources -> Versions/Current/Resources drwxr-xr-x 4 root wheel 128 Dec 7 01:11 Versions codesign -v /System/Library/PrivateFrameworks/MetalTools.framework/MetalTools /System/Library/PrivateFrameworks/MetalTools.framework/MetalTools: No such file or directory ls -la /System/Library/PrivateFrameworks/MetalTools.framework/Versions/ total 0 drwxr-xr-x 4 root wheel 128 Dec 7 01:11 . drwxr-xr-x 5 root wheel 160 Dec 7 01:11 .. drwxr-xr-x 4 root wheel 128 Dec 7 01:11 A lrwxr-xr-x 1 root wheel 1 Dec 7 01:11 Current -> A ls -la /System/Library/PrivateFrameworks/MetalTools.framework/Versions/A/ total 0 drwxr-xr-x 4 root wheel 128 Dec 7 01:11 . drwxr-xr-x 4 root wheel 128 Dec 7 01:11 .. drwxr-xr-x 10 root wheel 320 Dec 7 01:11 Resources drwxr-xr-x 3 root wheel 96 Dec 7 01:11 _CodeSignature Note - -rwxr-xr-x 1 root wheel MetalTools should be in the above list (according to ChatGPT) system_profiler SPDisplaysDataType Intel UHD Graphics 630 and AMD Radeon Pro 5500M (includes: Metal Support: Metal 3 Playground code => "Metal is supported." Default device: Apple iOS simulator GPU. Thanks, Ashley
3
0
461
Dec ’24
[visionOS] How to render side-by-side stereo video?
I want to render a 3d/stereoscopic video in an Apple Vision Pro window using RealityKit/RealityView. The video is a left-right stereo. The straight forward approach would be to spawn a quad, and give it a custom Shader Graph material, which has a CameraIndexSwitch. The CameraIndexSwitch chooses between the right texture vs the left texture. https://i.sstatic.net/XawqjNcg.png The issue I have here is that I have to extract the video frames from my AVSampleBufferVideoRenderer. This should work ok, but not if I'm playing FairPlay content. So, my question is, how to render stereo FairPlay videos in a SwiftUI RealityView?
0
0
423
Dec ’24
Performance Regression: In iOS 18.2 3D rotation in volume rendering is not smooth as compared to iOS 18.1
We as a team of engineers work on an app intended to visualize medical images. The type of situations where the app is used involves time critical decision making for acute clinical conditions. Stability of the app and performance are of utmost importance and can directly help timely treatment action. The app we are developing uses multiple libraries and tools like vtk, webgl, opengl, webkit, gl-matrix etc. The problem specifically can be described as follows, it has been observed that when 3D volume is rendered in the app and we try to rotate the volume the rotation is slow, unresposive and laggy. Specifically, we have noticed that iOS 18.1 the volume rotation is much smoother as compared to latest iOS 18.2. Eariler, we have faced somewhat similar issue with iOS 17 but it got improved in iOS 18.1. This performance regression is affecting the user experience in our healthcare application. We have taken reference from the cornerstone.js code and you can reproduce the issue using the following example: https://www.cornerstonejs.org/live-examples/volumeviewport3d Steps to Reproduce: Load the above mentioned test example on an iPhone running version 18.2 using safari. Perform volume rendering using the provided dataset. Measure the time taken by volume for each rotate or drag action. Repeat the same steps on an iPhone running version 18.1 for comparison. Additional Information: Device Model Tested: iPhone12, iPhone13, iPhone14 iOS Version With Issue: 18.2 18.3(Beta) I would appreciate any insights or suggestions on how to address this performance regression. If additional information is needed, please let me know. Thank you.
2
5
610
2w
Performance Regression: iPhone Version 18.2 Takes More Time to Render Volume Than Version 18.1
Hello, I am experiencing a performance regression in my application when rendering volumes on iPhone. Specifically, I have noticed that iOS version 18.2 takes significantly more time for each render cycle as compared to iOS 18.1. Details: Affected Versions: iOS version 18.2 iOS version 18.1 (baseline for comparison) Issue Description: In iOS version 18.2, the time taken to render volumes has increased compared to iOS version 18.1. This performance regression is affecting the user experience in my application. Test Example: https://www.cornerstonejs.org/live-examples/volumeviewport3d Steps to Reproduce: Load the above test example on an iPhone running version 18.2 using safari. Perform volume rendering using the provided dataset. Measure the time taken by volume for each rotate or drag action. Repeat the same steps on an iPhone running version 18.1 for comparison. Additional Information: Device Model Tested: iPhone12, iPhone13, iPhone14 iOS Version With Issue: 18.2 18.3(Beta) I would appreciate any insights or suggestions on how to address this performance regression. If additional information is needed, please let me know. Thank you.
1
0
419
Dec ’24
RealityKit fails with EXC_BAD_ACCESS at CMClockGetAnchorTime in the simulator
Starting with iOS 18.0 beta 1, I've noticed that RealityKit frequently crashes in the simulator when an app launches and presents an ARView. I was able to create a small sample app with repro steps that demonstrates the issue, and I've submitted feedback: FB16144085 I've included a crash log with the feedback. If possible, I'd appreciate it if an Apple engineer could investigate and suggest a workaround. It's awkward to be restricted to the iOS 17 simulator, which does not exhibit this behavior. Please let me know if there's anything I can do to help. Thank you.
0
0
445
Dec ’24
How to use imageblock_slice
Is there a working example of imageblock_slice with implicit layout somewhere? I get a compilation error when i write this: imageblock_slilce color_slice = img_blk.slice(frag->color); Error: No matching member function for call to 'slice' candidate template ignored: couldn't infer template argument 'E' candidate function template not viable: requires 2 arguments, but 1 was provided Too few template arguments for class template 'imageblock_slice' It seems the syntax has changed since the Imageblocks presentation https://developer.apple.com/videos/play/tech-talks/603/ I tried supplying the struct type of the image block between <> but it still does not work.
1
0
493
Dec ’24
Tile Shaders performance when writing to tile texture vs. resolve texture
I am working on a custom resolve tile shader for a client. I see a big difference in performance depending on where we write to: 1- the resolve texture of the color attachment 2- a rw tile shader texture set via [renderEncoder setTileTexture: myResolvedTexture] Option 2 is more than twice as slow than option 1. Our compute shader writes to 4 UAVs so just using the resolve texture entry is not possible. Why such a difference as there is no more data being written? Can option 2 be as fast as option 1? I can demonstrate the issue in a modified version of the Multisample code sample.
5
0
412
1w
M1 GPU violates atomic_thread_fence across threadgroups
I have an M1 Pro with a 16-core GPU. When I run a shader with 8193 threads, atomic_thread_fence is violated across the boundary between thread 8191 (the last thread in the 7th threadgroup) and 8192 (the first thread in the 9th threadgroup). I've attached the Metal and Swift files, but I'll repost the relevant kernel here. It's a function that launches N threads to iterate through a binary tree from the leaves, where the first thread to reach the parent terminates and the second one populates it with the sum of the nodes two children. // clang-format off void sum(device const int& size, device const int* __restrict__ in, device int* __restrict__ out, device atomic_int* visited, uint i [[thread_position_in_grid]]) { // clang-format on int val = in[i]; uint cur = (size + i - 1); out[cur] = val; atomic_thread_fence(mem_flags::mem_device, memory_order_seq_cst); cur = (cur - 1) / 2; int proceed = atomic_fetch_add_explicit(&visited[cur], 1, memory_order_relaxed); while (proceed == 1) { uint left = 2 * cur + 1; uint right = 2 * cur + 2; uint val_left = out[left]; uint val_right = out[right]; uint val_cur = val_left + val_right; out[cur] = val_cur; if (cur == 0) { break; } cur = (cur - 1) / 2; atomic_thread_fence(mem_flags::mem_device, memory_order_seq_cst); proceed = atomic_fetch_add_explicit(&visited[cur], 1, memory_order_relaxed); } } What I'm observing is that thread 8192 hits the atomic_fetch_add first and terminates, while thread 8191 hits it second (observes that thread 8192 had incremented it by 1) and proceeds into the loop. Thread 8191 reads out[16383] (which it populated with 8191) and out[16384] (which thread 8192 populated with 8192 prior to the atomic_thread_fence). Instead of reading 8192 from out[16384] though, it reads 0. Maybe I'm missing something but this seems like a pretty clear violation of the atomic_thread_fence which (I thought) was supposed to guarantee that the write from thread 8192 to out[16384] would be visible to any thread observing the effects of the following atomic_fetch_add. Is atomic_fetch_add not a store operation? Modifying it to an atomic_store or atomic_exchange still results in the bug. Adding another atomic_thread_fence between the atomic_fetch_add and reading of out also doesn't change anything. I only begin to observe this on grid sizes of 8193 and upwards. That's 9 threadgroups per grid, which I assume could be related to my M1 Pro GPU having 16 cores. Running the same example on an A17 Pro GPU doesn't show any of this behavior up through a tested grid size of 4194303 (2^22-1), at which point testing larger grid sizes starts to run into other issues so I can't test anything larger. Removing the atomic_thread_fences on both the M1 and A17 cause the test to fail at much smaller grid sizes, as expected. sum.metal main.swift
2
0
405
Dec ’24
Metal GPU capture
I'm trying to get performance information in a Metal cpp program. I'm using Xcode 14.2. In the Counters tab, only the inscription No Data appears. What could be the problem, what am I doing wrong? The libraries Metal, MetalKit, AppKit are connected. The code is taken from the Apple website from the Learn Metal with C++ section.
1
0
333
Dec ’24