I see some demo show convert HDR video to SDR Pixelbuffer,such AVAssetReader、 AVVideoComposition 、AVComposition 、AVFoundation.
But In some cases,I want to render HDR Pixelbuffer and record video.
AVCaptureSession *session = [[AVCaptureSession alloc] init];
session.sessionPreset = AVCaptureSessionPresetHigh;
AVCaptureDevice *videoDevice = [AVCaptureDevice defaultDeviceWithMediaType:AVMediaTypeVideo];
if ([videoDevice isVideoHDRSupported]) {
NSError *error = nil;
if ([videoDevice lockForConfiguration:&error]) {
videoDevice.automaticallyAdjustsVideoHDREnabled = NO;
videoDevice.videoHDREnabled = YES; // 开启 HDR
[videoDevice unlockForConfiguration];
} else {
NSLog(@"Error: %@", error.localizedDescription);
}
}
Real-time processing of HDR data requires processing of video frame data (such as filters), ensuring that the processing chain supports 10-bit color depth and HDR metadata. And use imagesBuffer to object tracking, etc.
How to solve this problem?
Metal
RSS for tagRender advanced 3D graphics and perform data-parallel computations using graphics processors using Metal.
Posts under Metal tag
194 Posts
Sort by:
Post
Replies
Boosts
Views
Activity
Problem Description
We are developing a app for iOS and iPadOS that involves extensive custom drawing of paths, shapes, texts, etc. To improve drawing and rendering speed, we use CARenderer to generate cached images (CGImage) on a background thread. We adopted this approach based on this StackOverflow post: https://stackoverflow.com/a/75497329/9202699.
However, we are experiencing frequent crashes in our production environment that we can hardly reproduce in our development environment. Despite months of debugging and seeking support from DTS and the Apple Feedback platform, we have not been able to fully resolve this issue. Our recent crash reports indicate that the crashes occur when calling CATransaction.commit().
We suspect that CATransaction may not be functioning properly outside the main thread. However, based on feedback from the Apple Feedback platform, we were advised to use CATransaction.begin() and CATransaction.commit() on a background thread.
If anyone has any insights, we would greatly appreciate it.
Code Sample
The line CATransaction.commit() is causing the crash: [EXC_BREAKPOINT: com.apple.root.****-qos.cooperative]
private let transactionLock = NSLock() // to ensure one transaction at a time
private let device = MTLCreateSystemDefaultDevice()!
@inline(never)
static func drawOnCGImageWithCARenderer(
layerRect: CGRect,
itemsToDraw: [ItemsToDraw]
)
-> CGImage? {
// We have encapsulated everything related to CALayer and its
// associated creations and manipulations within CATransaction
// as suggested by engineers from Apple Feedback Portal.
transactionLock.lock()
CATransaction.begin()
// Create the root layer.
let layer = CALayer()
layer.bounds = layerRect
layer.masksToBounds = true
// Add one sublayer for each item to draw
itemsToDraw.forEach { item in
// We have thousands or hundred thousands of drawing items to add.
// Each drawing item may produce a CALayer, CAShapeLayer or CATextLayer.
// This is also why we want to utilise CARenderer to leverage GPU rendering.
layer.addSublayer(
item.createCALayerOrCATextLayerOrCAShapeLayer()
)
}
// Create MTLTexture and CARenderer.
let textureDescriptor = MTLTextureDescriptor.texture2DDescriptor(
pixelFormat: .rgba8Unorm,
width: Int(layer.frame.size.width),
height: Int(layer.frame.size.height),
mipmapped: false
)
textureDescriptor.usage = [MTLTextureUsage.shaderRead, .shaderWrite, .renderTarget]
let texture = device.makeTexture(descriptor: textureDescriptor)!
let renderer = CARenderer(mtlTexture: texture)
renderer.bounds = layer.frame
renderer.layer = layer.self
/* ********************************************************* */
// From our crash report, this is where the crash happens.
CATransaction.commit()
/* ********************************************************* */
transactionLock.unlock()
// Rendering layers onto MTLTexture using CARenderer.
renderer.beginFrame(atTime: 0, timeStamp: nil)
renderer.render()
renderer.endFrame()
// Draw MTLTexture onto image.
guard
let colorSpace = CGColorSpace(name: CGColorSpace.sRGB),
let ciImage = CIImage(mtlTexture: texture, options: [.colorSpace: colorSpace]) else {
return nil
}
// Convert CIImage to CGImage.
let context = CIContext()
return context.createCGImage(ciImage, from: ciImage.extent)
}
Hello,
I’m encountering an issue with the Instruments app while running a benchmark on an M2 Ultra Mac Studio. Despite being certain that GPU activities involving memory read and write operations are occurring, all related performance counters consistently return 0.
Interestingly, this problem does not occur when using the same code on an M1 MacBook Air, where the counters behave as expected.
What could be causing this discrepancy? Any insights or suggestions would be greatly appreciated.
Thank you!
Hi,
We are trying to port our Unity app from other XR devices to Vision Pro. Thus it's way easier for us to use the Metal rendering layer, fully immersive. And to stay true to the platform, we want to keep the gaze/pinch interaction system.
But we just noticed that, unlike Polyspatial XR apps, VisionOS XR in Metal does not provide gaze info unless the user is actively pinching... Which forbids any attempt to give visual feedback on what they are looking at (buttons, etc).
Is this planned in Apple's roadmap ?
Thanks
Description
We are developing a app for iOS and iPadOS that involves extensive custom drawing of paths, shapes, texts, etc. To improve drawing and rendering speed, we use CARenderer to generate cached images (CGImage) on a background thread. We adopted this approach based on this StackOverflow post: https://stackoverflow.com/a/75497329/9202699.
However, we are experiencing frequent crashes in our production environment that we cannot reproduce in our development environment. Despite months of debugging and seeking support from DTS and the Apple Feedback platform, we have not been able to fully resolve this issue. Our recent crash reports indicate that the crashes occur when calling CATransaction.commit().
Crash traceback
The method names in this traceback are mapped to those in the code sample below. The app name has been masked.
Crashed: com.apple.root.user-initiated-qos.cooperative
0 MyApp 0x887408 specialized static CAUtils.commitCATransaction() + 4340151304 (<compiler-generated>:4340151304)
1 MyApp 0x887408 specialized static CAUtils.commitCATransaction() + 4340151304 (<compiler-generated>:4340151304)
2 MyApp 0x8874a4 specialized static CAUtils.addDrawingItemsToRenderer(***) + 250 (CAUtils.swift:250)
3 MyApp 0x887710 specialized static CAUtils.drawOnCGImageWithCARenderer(***) + 267 (CAUtils.swift:267)
4 MyApp 0x8878c0 specialized static CAUtils.drawOnCGImageWithCARendererWithRetry(***) + 315 (CAUtils.swift:315)
5 MyApp 0x736294 XXXManager.generateCGImages(***) + 570 (XXXManager.swift:570)
6 MyApp 0x73404c closure #1 in XXXManager.updateCachedCGImages(***) + 427 (XXXManager.swift:427)
7 libswift_Concurrency.dylib 0x61104 swift::runJobInEstablishedExecutorContext(swift::Job*) + 252
8 libswift_Concurrency.dylib 0x62514 swift_job_runImpl(swift::Job*, swift::SerialExecutorRef) + 144
9 libdispatch.dylib 0x15d8c _dispatch_root_queue_drain + 392
10 libdispatch.dylib 0x16590 _dispatch_worker_thread2 + 156
11 libsystem_pthread.dylib 0x4c40 _pthread_wqthread + 228
12 libsystem_pthread.dylib 0x1488 start_wqthread + 8
Code Sample
Below is a sample of our code. While the complete snippet is too long, the issue occurs in addDrawingItemsToRenderer. Please refer to the other methods for completeness and reference purposes.
private let transactionLock = NSLock()
private let deviceLock = NSLock()
private let device = MTLCreateSystemDefaultDevice()!
/// This is the method we call from outside.
@inline(never)
static func drawOnCGImageWithCARenderer(
layerRect: CGRect,
drawingItems: [DrawingItem]
)
-> CGImage? {
guard
let (texture, renderer) = addDrawingItemsToRenderer(
layerRect: layerRect,
drawingItems: drawingItems
) else {
return nil
}
renderer.beginFrame(atTime: 0, timeStamp: nil)
renderer.render()
renderer.endFrame()
guard
let colorSpace = CGColorSpace(name: CGColorSpace.sRGB),
let ciImage = CIImage(mtlTexture: texture, options: [.colorSpace: colorSpace]) else {
return nil
}
let context = CIContext()
return context.createCGImage(ciImage, from: ciImage.extent)
}
/// This is the method will the crash happens
@inline(never)
fileprivate static func addDrawingItemsToRenderer(
layerRect: CGRect,
drawingItems: [DrawingItem]
)
-> (MTLTexture, CARenderer)? {
// We have encapsulated everything related to CALayer and its
// associated creations and manipulations within CATransaction
// as suggested by engineers from Apple Feedback Portal.
beginCATransaction()
defer {
commitCATransaction() // The crash happens here
}
let (layer, imageWidth, imageHeight) =
addDrawingItemsToLayer(layerRect: layerRect, drawingItems: drawingItems)
return createTextureAndRenderer(
layer: layer,
imageWidth: imageWidth,
imageHeight: imageHeight
)
}
// Below are all internal methods. We have split the method into very
// granular parts and marked them as @inline(never) to prevent the
// compiler from inlining our code, which may otherwise obscure usage
// trackback information in our crash reports.
@inline(never)
fileprivate static func beginCATransaction() {
transactionLock.lock()
CATransaction.begin()
}
@inline(never)
fileprivate static func commitCATransaction() {
// From our crash report, we believe the crash happens on this line.
CATransaction.commit()
// It is unlikely that the lock cause the crash as we added it only recently
// to ensure that there is only one transaction on our background thread,
// and after we added this lock, the crash rate indeed lowered, but still
// not fully disappear
transactionLock.unlock()
}
--------------------------------
// The methods below are provided for reference and completeness. While // they may have issues, they do not frequently appear in our crash
// reports as the one caused by `CATransaction.commit()`
@inline(never)
fileprivate static func addDrawingItemsToLayer(
layerRect: CGRect,
drawingItems: [DrawingItem]
)
-> (layer: CALayer, imageWidth: CGFloat, imageHeight: CGFloat) {
let layer = CALayer()
layer.isGeometryFlipped = SharedAppUtils.isIOS
layer.anchorPoint = CGPoint.zero
layer.bounds = layerRect
layer.masksToBounds = true
for drawingItem in drawingItems {
// We have thousands or hundred thousands of drawing items to add.
// Each drawing item may produce a CALayer, CAShapeLayer or CATextLayer.
// This is also why we want to utilise CARenderer to leverage GPU rendering.
let sublayerForDrawingItem =
drawingItem.createCALayerOrCATextLayerOrCAShapeLayer()
layer.addSublayer(sublayerForDrawingItem)
}
let imageWidth = max(1, layer.frame.size.width * UIScreen.main.scale)
let imageHeight = max(1, layer.frame.size.height * UIScreen.main.scale)
layer.transform = CATransform3DMakeScale(UIScreen.main.scale, UIScreen.main.scale, 1)
layer.frame = .init(origin: .zero, size: .init(width: imageWidth, height: imageHeight))
return (layer, imageWidth, imageHeight)
}
@inline(never)
fileprivate static func createTextureAndRenderer(
layer: CALayer,
imageWidth: CGFloat,
imageHeight: CGFloat
)
-> (MTLTexture, CARenderer)? {
deviceLock.lock()
defer {
deviceLock.unlock()
}
let textureDescriptor = MTLTextureDescriptor.texture2DDescriptor(
pixelFormat: .rgba8Unorm,
width: Int(imageWidth),
height: Int(imageHeight),
mipmapped: false
)
textureDescriptor.usage = [MTLTextureUsage.shaderRead, .shaderWrite, .renderTarget]
guard
let texture = device.makeTexture(descriptor: textureDescriptor) else {
return nil
}
let renderer = CARenderer(mtlTexture: texture)
renderer.bounds = layer.frame
renderer.layer = layer.self
return (texture, renderer)
}
How can I use my RGB Curve points:
let redCurve = [CIVector(x: 0, y: 0), CIVector(x: 0.235, y: 0.152), CIVector(x: 0.5, y: 0.5), CIVector(x: 1, y: 1)]
let greenCurve = [CIVector(x: 0, y: 0), CIVector(x: 0.247, y: 0.196), CIVector(x: 0.5, y: 0.5), CIVector(x: 1, y: 1)]
let blueCurve = [CIVector(x: 0, y: 0), CIVector(x: 0.235, y: 0.184), CIVector(x: 0.466, y: 0.466), CIVector(x: 1, y: 1)]
in colorCurvesFilter which I've found in Apple Docs:
func colorCurves(inputImage: CIImage) -> CIImage {
let colorCurvesEffect = CIFilter.colorCurves()
colorCurvesEffect.inputImage = inputImage
colorCurvesEffect.curvesDomain = CIVector(x: 0, y: 1)
colorCurvesEffect.curvesData = Data(
bytes: [Float32]([
0.0,0.0,0.0,
0.8,0.8,0.8,
1.0,1.0,1.0
]), count: 36)
colorCurvesEffect.colorSpace = CGColorSpaceCreateDeviceRGB()
return colorCurvesEffect.outputImage!
}
Following code crashes (sigsegv in lldb-rpc-server) when run as swift 6, but runs correctly when run as swift 5 (from "Metal by tutorials"):
import PlaygroundSupport
import MetalKit
print("start")
guard let device = MTLCreateSystemDefaultDevice() else {
fatalError("GPU is not supported")
}
let frame = CGRect(x: 0, y: 0, width: 600, height: 600)
let view = MTKView(frame: frame, device: device)
view.clearColor = MTLClearColor(red: 1, green: 1, blue: 0.8, alpha: 1)
let allocator = MTKMeshBufferAllocator(device: device)
let mdlMesh = MDLMesh(sphereWithExtent: [0.75,0.75,0.75], segments: [100, 100], inwardNormals: false, geometryType: .triangles, allocator: allocator)
let mesh = try MTKMesh(mesh: mdlMesh, device: device)
guard let commandQueue = device.makeCommandQueue() else {
fatalError("Could not create a command queue")
}
let shader = """
#include <metal_stdlib>
using namespace metal;
struct VertexIn {
float4 position [[attribute(0)]];
};
vertex float4 vertex_main(const VertexIn vertex_in [[stage_in]])
{
return vertex_in.position;
}
fragment float4 fragment_main() {
return float4(1, 0, 0, 1);
}
"""
print("A")
let library = try device.makeLibrary(source: shader, options: nil)
let vertexFunction = library.makeFunction(name: "vertex_main")
let fragmentFunction = library.makeFunction(name: "fragment_main")
let pipelineDescriptor = MTLRenderPipelineDescriptor()
pipelineDescriptor.colorAttachments[0].pixelFormat = .bgra8Unorm
pipelineDescriptor.vertexFunction = vertexFunction
pipelineDescriptor.fragmentFunction = fragmentFunction
print("X")
pipelineDescriptor.vertexDescriptor = MTKMetalVertexDescriptorFromModelIO(mesh.vertexDescriptor)
let pipelineState = try device.makeRenderPipelineState(descriptor: pipelineDescriptor)
guard let commandBuffer = commandQueue.makeCommandBuffer(),
let renderPassDescriptor = view.currentRenderPassDescriptor,
let renderEncoder = commandBuffer.makeRenderCommandEncoder(descriptor: renderPassDescriptor)
else {
fatalError()
}
renderEncoder.setRenderPipelineState(pipelineState)
renderEncoder.setVertexBuffer(mesh.vertexBuffers[0].buffer, offset: 0, index: 0)
guard let submesh = mesh.submeshes.first else {
fatalError()
}
renderEncoder.drawIndexedPrimitives(type: .triangle, indexCount: submesh.indexCount, indexType: submesh.indexType, indexBuffer: submesh.indexBuffer.buffer, indexBufferOffset: 0)
renderEncoder.endEncoding()
guard let drawable = view.currentDrawable else {
fatalError()
}
commandBuffer.present(drawable)
commandBuffer.commit()
print("test")
PlaygroundPage.current.liveView = view
Crash report: https://gist.githubusercontent.com/tumdum/8aa53bc806619c0d21c93a55fae07937/raw/370b00c07b08fff8856f9fc678de9888faa8d06e/crash.log
I'm on macOS 15.1.1 (24B2091) + Xcode 16.2 (16C5032a)
Hello dear community,
I have the sample code from Apple “CapturingDepthUsingLiDAR” to access the LiDAR on my iPhone 12 Pro. My goal is to use the “photo output” function to generate a point cloud from a single image and then save it as a ply file. So far I have tested different approaches to create a .ply file from the depthmap, the intrinsic camera data and the rgba values. Unfortunately, I have had no success so far and the result has always been an incorrect point cloud.
My question now is whether there are already approaches to this and whether anyone has any experience with it.
Thank you very much in advance!!!
I'm trying to pass a buffer of float2 items from CPU to GPU.
In the kernel, I can provide a parameter for the buffer:
device const float2* values
for example.
How do I specify float2 as the type for the MTL::Buffer?
I managed to get the code to work by "cheating" by defining a simple class that has the same data members as a float2, but there is probably a better way.
class Coord_f { public: float x{0.0f}; float y{0.0f}; };
then using code to allocate like this:
NS::TransferPtr(device->newBuffer(n_elements * sizeof(Coord_f), MTL::ResourceStorageModeManaged))
The headers for metal-cpp do not appear to define vector objects like float2, but I'm doubtless missing something.
Thanks.
I am trying to use the SVGF denoiser to denoise my ray traced shadows (and also other textures later). I do get a smoothed image, but with wonky denoising.
I need the depth-normal textures and motion textures for the SVGF and assume that these are badly filled in my case. However, neither in the above linked documentation nor in the WWDC19 video I find how they should be defined. I am looking to answers to:
Is depth in red or alpha channel for the depth-normal texture?
Are the normals in screen space?
Is depth linear?
Is it distance or z coordinate in view space? Or even logarithmically scaled or something else?
Are the motion vectors supposed to be in pixels per frame?
What is the orientation of the axis? Is y up or down?
Are there are other restrictions on the formats?
Also the linked code did not help me (I have not found any SVGF so far; also all the code is in Objective-C++, not Swift, but that's a different topic).
So how should I fill these textures.
Can someone point me to the documentation where these kinds of questions are answered?
Like I said in the title, it looks like MetalTools.framework is missing or corrupted. I think I saw that the symbolic link was broken. They look like aliases in the finder, but I can't find the original.
This was a problem with Ventura (using the last compatible Xcode version) and Sequoia 15.2 (Xcode 16.2). I didn't use Xcode before that. Note that none of my apps need Metal API (I don't think). I only noticed it when Xcode gave an error regarding Metal.
Sorry this is so long; I hope the Terminal info will help.
I don't want to reinstall Sequoia and this has been a problem since at least Ventura.
Recommendations?
ls -l /System/Library/PrivateFrameworks/MetalTools.framework/
total 0
lrwxr-xr-x 1 root wheel 27 Dec 7 01:11 MetalTools -> Versions/Current/MetalTools
lrwxr-xr-x 1 root wheel 26 Dec 7 01:11 Resources -> Versions/Current/Resources
drwxr-xr-x 4 root wheel 128 Dec 7 01:11 Versions
ls -la /System/Library/PrivateFrameworks/MetalTools.framework/
total 0
drwxr-xr-x 5 root wheel 160 Dec 7 01:11 .
drwxr-xr-x 1885 root wheel 60320 Dec 7 01:11 ..
lrwxr-xr-x 1 root wheel 27 Dec 7 01:11 MetalTools -> Versions/Current/MetalTools
lrwxr-xr-x 1 root wheel 26 Dec 7 01:11 Resources -> Versions/Current/Resources
drwxr-xr-x 4 root wheel 128 Dec 7 01:11 Versions
codesign -v /System/Library/PrivateFrameworks/MetalTools.framework/MetalTools
/System/Library/PrivateFrameworks/MetalTools.framework/MetalTools: No such file or directory
ls -la /System/Library/PrivateFrameworks/MetalTools.framework/Versions/
total 0
drwxr-xr-x 4 root wheel 128 Dec 7 01:11 .
drwxr-xr-x 5 root wheel 160 Dec 7 01:11 ..
drwxr-xr-x 4 root wheel 128 Dec 7 01:11 A
lrwxr-xr-x 1 root wheel 1 Dec 7 01:11 Current -> A
ls -la /System/Library/PrivateFrameworks/MetalTools.framework/Versions/A/
total 0
drwxr-xr-x 4 root wheel 128 Dec 7 01:11 .
drwxr-xr-x 4 root wheel 128 Dec 7 01:11 ..
drwxr-xr-x 10 root wheel 320 Dec 7 01:11 Resources
drwxr-xr-x 3 root wheel 96 Dec 7 01:11 _CodeSignature
Note - -rwxr-xr-x 1 root wheel MetalTools should be in the above list (according to ChatGPT)
system_profiler SPDisplaysDataType
Intel UHD Graphics 630 and AMD Radeon Pro 5500M (includes:
Metal Support: Metal 3
Playground code => "Metal is supported." Default device: Apple iOS simulator GPU.
Thanks, Ashley
I want to render a 3d/stereoscopic video in an Apple Vision Pro window using RealityKit/RealityView. The video is a left-right stereo. The straight forward approach would be to spawn a quad, and give it a custom Shader Graph material, which has a CameraIndexSwitch. The CameraIndexSwitch chooses between the right texture vs the left texture.
https://i.sstatic.net/XawqjNcg.png
The issue I have here is that I have to extract the video frames from my AVSampleBufferVideoRenderer. This should work ok, but not if I'm playing FairPlay content.
So, my question is, how to render stereo FairPlay videos in a SwiftUI RealityView?
We as a team of engineers work on an app intended to visualize medical images. The type of situations where the app is used involves time critical decision making for acute clinical conditions. Stability of the app and performance are of utmost importance and can directly help timely treatment action. The app we are developing uses multiple libraries and tools like vtk, webgl, opengl, webkit, gl-matrix etc.
The problem specifically can be described as follows, it has been observed that when 3D volume is rendered in the app and we try to rotate the volume the rotation is slow, unresposive and laggy. Specifically, we have noticed that iOS 18.1 the volume rotation is much smoother as compared to latest iOS 18.2. Eariler, we have faced somewhat similar issue with iOS 17 but it got improved in iOS 18.1. This performance regression is affecting the user experience in our healthcare application.
We have taken reference from the cornerstone.js code and you can reproduce the issue using the following example: https://www.cornerstonejs.org/live-examples/volumeviewport3d
Steps to Reproduce:
Load the above mentioned test example on an iPhone running version 18.2 using safari.
Perform volume rendering using the provided dataset.
Measure the time taken by volume for each rotate or drag action.
Repeat the same steps on an iPhone running version 18.1 for comparison.
Additional Information:
Device Model Tested:
iPhone12, iPhone13, iPhone14
iOS Version With Issue:
18.2
18.3(Beta)
I would appreciate any insights or suggestions on how to address this performance regression. If additional information is needed, please let me know.
Thank you.
Hello,
I am experiencing a performance regression in my application when rendering volumes on iPhone. Specifically, I have noticed that iOS version 18.2 takes significantly more time for each render cycle as compared to iOS 18.1.
Details:
Affected Versions:
iOS version 18.2
iOS version 18.1 (baseline for comparison)
Issue Description:
In iOS version 18.2, the time taken to render volumes has increased compared to iOS version 18.1. This performance regression is affecting the user experience in my application.
Test Example:
https://www.cornerstonejs.org/live-examples/volumeviewport3d
Steps to Reproduce:
Load the above test example on an iPhone running version 18.2 using safari.
Perform volume rendering using the provided dataset.
Measure the time taken by volume for each rotate or drag action.
Repeat the same steps on an iPhone running version 18.1 for comparison.
Additional Information:
Device Model Tested:
iPhone12, iPhone13, iPhone14
iOS Version With Issue:
18.2
18.3(Beta)
I would appreciate any insights or suggestions on how to address this performance regression. If additional information is needed, please let me know.
Thank you.
Starting with iOS 18.0 beta 1, I've noticed that RealityKit frequently crashes in the simulator when an app launches and presents an ARView.
I was able to create a small sample app with repro steps that demonstrates the issue, and I've submitted feedback: FB16144085
I've included a crash log with the feedback.
If possible, I'd appreciate it if an Apple engineer could investigate and suggest a workaround. It's awkward to be restricted to the iOS 17 simulator, which does not exhibit this behavior.
Please let me know if there's anything I can do to help.
Thank you.
Is there a working example of imageblock_slice with implicit layout somewhere?
I get a compilation error when i write this:
imageblock_slilce color_slice = img_blk.slice(frag->color);
Error:
No matching member function for call to 'slice'
candidate template ignored: couldn't infer template argument 'E'
candidate function template not viable: requires 2 arguments, but 1 was provided
Too few template arguments for class template 'imageblock_slice'
It seems the syntax has changed since the Imageblocks presentation https://developer.apple.com/videos/play/tech-talks/603/
I tried supplying the struct type of the image block between <> but it still does not work.
I am working on a custom resolve tile shader for a client. I see a big difference in performance depending on where we write to:
1- the resolve texture of the color attachment
2- a rw tile shader texture set via [renderEncoder setTileTexture: myResolvedTexture]
Option 2 is more than twice as slow than option 1.
Our compute shader writes to 4 UAVs so just using the resolve texture entry is not possible.
Why such a difference as there is no more data being written? Can option 2 be as fast as option 1?
I can demonstrate the issue in a modified version of the Multisample code sample.
I have an M1 Pro with a 16-core GPU. When I run a shader with 8193 threads, atomic_thread_fence is violated across the boundary between thread 8191 (the last thread in the 7th threadgroup) and 8192 (the first thread in the 9th threadgroup).
I've attached the Metal and Swift files, but I'll repost the relevant kernel here. It's a function that launches N threads to iterate through a binary tree from the leaves, where the first thread to reach the parent terminates and the second one populates it with the sum of the nodes two children.
// clang-format off
void sum(device const int& size,
device const int* __restrict__ in,
device int* __restrict__ out,
device atomic_int* visited,
uint i [[thread_position_in_grid]]) {
// clang-format on
int val = in[i];
uint cur = (size + i - 1);
out[cur] = val;
atomic_thread_fence(mem_flags::mem_device, memory_order_seq_cst);
cur = (cur - 1) / 2;
int proceed = atomic_fetch_add_explicit(&visited[cur], 1, memory_order_relaxed);
while (proceed == 1) {
uint left = 2 * cur + 1;
uint right = 2 * cur + 2;
uint val_left = out[left];
uint val_right = out[right];
uint val_cur = val_left + val_right;
out[cur] = val_cur;
if (cur == 0) {
break;
}
cur = (cur - 1) / 2;
atomic_thread_fence(mem_flags::mem_device, memory_order_seq_cst);
proceed = atomic_fetch_add_explicit(&visited[cur], 1, memory_order_relaxed);
}
}
What I'm observing is that thread 8192 hits the atomic_fetch_add first and terminates, while thread 8191 hits it second (observes that thread 8192 had incremented it by 1) and proceeds into the loop. Thread 8191 reads out[16383] (which it populated with 8191) and out[16384] (which thread 8192 populated with 8192 prior to the atomic_thread_fence). Instead of reading 8192 from out[16384] though, it reads 0.
Maybe I'm missing something but this seems like a pretty clear violation of the atomic_thread_fence which (I thought) was supposed to guarantee that the write from thread 8192 to out[16384] would be visible to any thread observing the effects of the following atomic_fetch_add. Is atomic_fetch_add not a store operation? Modifying it to an atomic_store or atomic_exchange still results in the bug. Adding another atomic_thread_fence between the atomic_fetch_add and reading of out also doesn't change anything.
I only begin to observe this on grid sizes of 8193 and upwards. That's 9 threadgroups per grid, which I assume could be related to my M1 Pro GPU having 16 cores.
Running the same example on an A17 Pro GPU doesn't show any of this behavior up through a tested grid size of 4194303 (2^22-1), at which point testing larger grid sizes starts to run into other issues so I can't test anything larger.
Removing the atomic_thread_fences on both the M1 and A17 cause the test to fail at much smaller grid sizes, as expected.
sum.metal
main.swift
Currently looking for Metal developers to port Quake 2 RTX to Metal RT in order to give Apple Silicon Macs an amazing Pathtracing demo, This project falls under NightSightProductions who is also working on a Portal 2 with RTX Remaster. if you are interested and want to help further Mac gaming, message me here or on discord at king_vulpes
I'm trying to get performance information in a Metal cpp program. I'm using Xcode 14.2. In the Counters tab, only the inscription No Data appears. What could be the problem, what am I doing wrong?
The libraries Metal, MetalKit, AppKit are connected. The code is taken from the Apple website from the Learn Metal with C++ section.