Is there a working example of imageblock_slice with implicit layout somewhere?
I get a compilation error when i write this:
imageblock_slilce color_slice = img_blk.slice(frag->color);
Error:
No matching member function for call to 'slice'
candidate template ignored: couldn't infer template argument 'E'
candidate function template not viable: requires 2 arguments, but 1 was provided
Too few template arguments for class template 'imageblock_slice'
It seems the syntax has changed since the Imageblocks presentation https://developer.apple.com/videos/play/tech-talks/603/
I tried supplying the struct type of the image block between <> but it still does not work.
Metal
RSS for tagRender advanced 3D graphics and perform data-parallel computations using graphics processors using Metal.
Post
Replies
Boosts
Views
Activity
I am working on a custom resolve tile shader for a client. I see a big difference in performance depending on where we write to:
1- the resolve texture of the color attachment
2- a rw tile shader texture set via [renderEncoder setTileTexture: myResolvedTexture]
Option 2 is more than twice as slow than option 1.
Our compute shader writes to 4 UAVs so just using the resolve texture entry is not possible.
Why such a difference as there is no more data being written? Can option 2 be as fast as option 1?
I can demonstrate the issue in a modified version of the Multisample code sample.
I have an M1 Pro with a 16-core GPU. When I run a shader with 8193 threads, atomic_thread_fence is violated across the boundary between thread 8191 (the last thread in the 7th threadgroup) and 8192 (the first thread in the 9th threadgroup).
I've attached the Metal and Swift files, but I'll repost the relevant kernel here. It's a function that launches N threads to iterate through a binary tree from the leaves, where the first thread to reach the parent terminates and the second one populates it with the sum of the nodes two children.
// clang-format off
void sum(device const int& size,
device const int* __restrict__ in,
device int* __restrict__ out,
device atomic_int* visited,
uint i [[thread_position_in_grid]]) {
// clang-format on
int val = in[i];
uint cur = (size + i - 1);
out[cur] = val;
atomic_thread_fence(mem_flags::mem_device, memory_order_seq_cst);
cur = (cur - 1) / 2;
int proceed = atomic_fetch_add_explicit(&visited[cur], 1, memory_order_relaxed);
while (proceed == 1) {
uint left = 2 * cur + 1;
uint right = 2 * cur + 2;
uint val_left = out[left];
uint val_right = out[right];
uint val_cur = val_left + val_right;
out[cur] = val_cur;
if (cur == 0) {
break;
}
cur = (cur - 1) / 2;
atomic_thread_fence(mem_flags::mem_device, memory_order_seq_cst);
proceed = atomic_fetch_add_explicit(&visited[cur], 1, memory_order_relaxed);
}
}
What I'm observing is that thread 8192 hits the atomic_fetch_add first and terminates, while thread 8191 hits it second (observes that thread 8192 had incremented it by 1) and proceeds into the loop. Thread 8191 reads out[16383] (which it populated with 8191) and out[16384] (which thread 8192 populated with 8192 prior to the atomic_thread_fence). Instead of reading 8192 from out[16384] though, it reads 0.
Maybe I'm missing something but this seems like a pretty clear violation of the atomic_thread_fence which (I thought) was supposed to guarantee that the write from thread 8192 to out[16384] would be visible to any thread observing the effects of the following atomic_fetch_add. Is atomic_fetch_add not a store operation? Modifying it to an atomic_store or atomic_exchange still results in the bug. Adding another atomic_thread_fence between the atomic_fetch_add and reading of out also doesn't change anything.
I only begin to observe this on grid sizes of 8193 and upwards. That's 9 threadgroups per grid, which I assume could be related to my M1 Pro GPU having 16 cores.
Running the same example on an A17 Pro GPU doesn't show any of this behavior up through a tested grid size of 4194303 (2^22-1), at which point testing larger grid sizes starts to run into other issues so I can't test anything larger.
Removing the atomic_thread_fences on both the M1 and A17 cause the test to fail at much smaller grid sizes, as expected.
sum.metal
main.swift
I am trying to load some PNG data with MTKTextureLoader newTextureWithData,but the result shows wrong at the alpha area.
Here is the code. I have an image URL, after it downloads successfully, I try to use the data or UIImagePNGRepresentation (image), they all show wrong.
UIImage *tempImg = [UIImage imageWithData:data];
CGImageRef cgRef = tempImg.CGImage;
MTKTextureLoader *loader = [[MTKTextureLoader alloc] initWithDevice:device];
id<MTLTexture> temp1 = [loader newTextureWithData:data options:@{MTKTextureLoaderOptionSRGB: @(NO), MTKTextureLoaderOptionTextureUsage: @(MTLTextureUsageShaderRead), MTKTextureLoaderOptionTextureCPUCacheMode: @(MTLCPUCacheModeWriteCombined)} error:nil];
NSData *tempData = UIImagePNGRepresentation(tempImg);
id<MTLTexture> temp2 = [loader newTextureWithData:tempData options:@{MTKTextureLoaderOptionSRGB: @(NO), MTKTextureLoaderOptionTextureUsage: @(MTLTextureUsageShaderRead), MTKTextureLoaderOptionTextureCPUCacheMode: @(MTLCPUCacheModeWriteCombined)} error:nil];
id<MTLTexture> temp3 = [loader newTextureWithCGImage:cgRef options:@{MTKTextureLoaderOptionSRGB: @(NO), MTKTextureLoaderOptionTextureUsage: @(MTLTextureUsageShaderRead), MTKTextureLoaderOptionTextureCPUCacheMode: @(MTLCPUCacheModeWriteCombined)} error:nil];
}] resume];
I am making a framework in C++ using metal-cpp, basically a small game engine. I am also consequently using metal-cpp-extensions provided in LearnMetalCPP to make applications work.
For one of my classes, I needed to add AppKit.hpp inside a public header file, so I moved it and its associate headers(NSApplication.hpp, NSMenu.hpp, etc.) from Project headers to Public in Build Phases' Headers, however, it started giving me the error "cast of C pointer type 'void *' to Objective-C pointer type 'Class' requires a bridged cast" at several points in the AppKit headers. They don't appear when AppKit and its associates are in the Project headers, or when they are in the Private headers and no headers import it.
I imagined that disabling Objective-C ARC and Using __bridge casts outside of ARC in Build Settings would solve it, but it didn't budge.
I imagined it wouldn't involve actively changing the headers would be the answer, but even if I try to put __bridge before the problematic casts, it didn't recognize __bridge.
How do I solve this? And why is it only happening in Public and not Project headers?
I’ve been trying to run Jurassic World Evolution 2 using the Game Porting Toolkit on macOS, but the game doesn’t launch and crashes immediately. Based on the error and research, it seems the issue is related to missing support for D3D12_TILED_RESOURCES_TIER_2 in the Metal API.
If this is the case, does anyone know if support for tiled resources is planned for future updates of the toolkit? Or are there any potential workarounds for bypassing this limitation?
Hi! I just installed GPTK2 on my new Mac , but the Terminal gave “Error:OpenSSL1.1 has been disabled.”
How should I fix it?Or waiting for the GPTK2 beta4?
Thanks.
Currently looking for Metal developers to port Quake 2 RTX to Metal RT in order to give Apple Silicon Macs an amazing Pathtracing demo, This project falls under NightSightProductions who is also working on a Portal 2 with RTX Remaster. if you are interested and want to help further Mac gaming, message me here or on discord at king_vulpes
Hello, I am using MTKView to display: camera preview & video playback. I am testing on iPhone 16. App crashes at a random moment whenever MTKView is rendering CIImage.
MetalView:
public enum MetalActionType {
case image(CIImage)
case buffer(CVPixelBuffer)
}
public struct MetalView: UIViewRepresentable {
let mtkView = MTKView()
public let actionPublisher: any Publisher<MetalActionType, Never>
public func makeCoordinator() -> Coordinator {
Coordinator(self)
}
public func makeUIView(context: UIViewRepresentableContext<MetalView>) -> MTKView {
guard let metalDevice = MTLCreateSystemDefaultDevice() else {
return mtkView
}
mtkView.device = metalDevice
mtkView.framebufferOnly = false
mtkView.clearColor = MTLClearColor(red: 0, green: 0, blue: 0, alpha: 0)
mtkView.drawableSize = mtkView.frame.size
mtkView.delegate = context.coordinator
mtkView.isPaused = true
mtkView.enableSetNeedsDisplay = true
mtkView.preferredFramesPerSecond = 60
context.coordinator.ciContext = CIContext(
mtlDevice: metalDevice, options: [.priorityRequestLow: true, .highQualityDownsample: false])
context.coordinator.metalCommandQueue = metalDevice.makeCommandQueue()
context.coordinator.actionSubscriber = actionPublisher.sink { type in
switch type {
case .buffer(let pixelBuffer):
context.coordinator.updateCIImage(pixelBuffer)
break
case .image(let image):
context.coordinator.updateCIImage(image)
break
}
}
return mtkView
}
public func updateUIView(_ nsView: MTKView, context: UIViewRepresentableContext<MetalView>) {
}
public class Coordinator: NSObject, MTKViewDelegate {
var parent: MetalView
var metalCommandQueue: MTLCommandQueue!
var ciContext: CIContext!
private var image: CIImage? {
didSet {
Task { @MainActor in
self.parent.mtkView.setNeedsDisplay() //<--- call Draw method
}
}
}
var actionSubscriber: (any Combine.Cancellable)?
private let operationQueue = OperationQueue()
init(_ parent: MetalView) {
self.parent = parent
operationQueue.qualityOfService = .background
super.init()
}
public func mtkView(_ view: MTKView, drawableSizeWillChange size: CGSize) {
}
public func draw(in view: MTKView) {
guard let drawable = view.currentDrawable, let ciImage = image,
let commandBuffer = metalCommandQueue.makeCommandBuffer(), let ci = ciContext
else {
return
}
//making sure nothing is nil, now we can add the current frame to the operationQueue for processing
operationQueue.addOperation(
MetalOperation(
drawable: drawable, drawableSize: view.drawableSize, ciImage: ciImage,
commandBuffer: commandBuffer, pixelFormat: view.colorPixelFormat, ciContext: ci))
}
//consumed by Subscriber
func updateCIImage(_ img: CIImage) {
image = img
}
//consumed by Subscriber
func updateCIImage(_ buffer: CVPixelBuffer) {
image = CIImage(cvPixelBuffer: buffer)
}
}
}
now the MetalOperation class:
private class MetalOperation: Operation, @unchecked Sendable {
let drawable: CAMetalDrawable
let drawableSize: CGSize
let ciImage: CIImage
let commandBuffer: MTLCommandBuffer
let pixelFormat: MTLPixelFormat
let ciContext: CIContext
init(
drawable: CAMetalDrawable, drawableSize: CGSize, ciImage: CIImage,
commandBuffer: MTLCommandBuffer, pixelFormat: MTLPixelFormat, ciContext: CIContext
) {
self.drawable = drawable
self.drawableSize = drawableSize
self.ciImage = ciImage
self.commandBuffer = commandBuffer
self.pixelFormat = pixelFormat
self.ciContext = ciContext
}
override func main() {
let width = Int(drawableSize.width)
let height = Int(drawableSize.height)
let ciWidth = Int(ciImage.extent.width) //<-- Thread 22: EXC_BAD_ACCESS (code=1, address=0x5e71f5490) A bad access to memory terminated the process.
let ciHeight = Int(ciImage.extent.height)
let destination = CIRenderDestination(
width: width, height: height, pixelFormat: pixelFormat, commandBuffer: commandBuffer,
mtlTextureProvider: { [self] () -> MTLTexture in
return drawable.texture
})
let transform = CGAffineTransform(
scaleX: CGFloat(width) / CGFloat(ciWidth), y: CGFloat(height) / CGFloat(ciHeight))
do {
try ciContext.startTask(toClear: destination)
try ciContext.startTask(toRender: ciImage.transformed(by: transform), to: destination)
} catch {
}
commandBuffer.present(drawable)
commandBuffer.commit()
commandBuffer.waitUntilCompleted()
}
}
Now I am no Metal expert, but I believe it's a very simple execution that shouldn't cause memory leak especially after we have already checked for whether CIImage is nil or not. I have also tried running this code without OperationQueue and also tried with @autoreleasepool but none of them has solved this problem.
Am I missing something?
my app use mtkview to render video, but [MTKView initwithFrame:device] takes 2-3s in some some 2019 macbook pro, system macos 15.0.1.
how can I do?
Project: I have some data wich could be transformed by shader, result may be kept in rgb channels of image. Great.
But now to mix dozens of those results? Not one by one, image after image, but all at once. Something like „complicated average” color of particular pixel from all delivered images.
Is it possible?
I have a very basic usdz file from this repo
I call loadTextures() after loading the usdz via MDLAsset. Inspecting the MDLTexture object I can tell it is assigning a colorspace of linear rgb instead of srgb although the image file in the usdz is srgb.
This causes the textures to ultimately render as over saturated.
In the code I later convert the MDLTexture to MTLTexture via MTKTextureLoader but if I set the srgb option it seems to ignore it.
This significantly impacts the usefulness of Model I/O if it can't load a simple usdz texture correctly. Am I missing something?
Thanks!
I'm writing a swift app that uses metal to render textures to the main view. I currently use a NSViewRepresentable to place a MTKView into the window and a MTKViewDelegate to perform the metal operations. It's running well and I see my metal view being updated.
However, when I close the window (either through the user clicking the close button or by programatically using the appropriate @Environment(\.dismissWindow) private var dismissWindow) and then reopen the window, I no longer receive calls to MTKViewDelegate draw(in mtkView: MTView). If I manually call the MTKView::draw() function my view updates it's content as expected, so it seems to be still be correctly setup / alive.
As best as I can tell the CVDisplayLink created by MTKView is no longer active (or at least that's my understanding of how the MTKView::draw() function is called).
I've setup the MTKView like this
let mtkView = MTKView()
mtkView.delegate = context.coordinator // My custom delegate
mtkView.device = context.coordinator.device // The default metal device
mtkView.preferredFramesPerSecond = 60
mtkView.enableSetNeedsDisplay = false
mtkView.isPaused = false
which I was hoping would call the draw function at 60fps while the view is visible.
I've also verified the values don't change while running.
Does anyone have any ideas on how I could restart the CVDisplayLink or anyother methods to avoid this problem??
Cheers
Jack
Hi, wanted to test if possible to use Mesa3D Dozen driver(Vulkan on D3D12 )+D3DMetal 2b3 to get maybe better Vulkan driver on Wine than default MoltenVK.. this will support Vulkan windows apps via using D3D12Metal..
using vulkan_dzn.dll,dzn_icd.x86_64.json,dxil.dll from x64 folder from: https://github.com/pal1000/mesa-dist-win/releases/download/24.3.0-rc1/mesa3d-24.3.0-rc1-release-msvc.7z
using simple vulkaninfo app and running like:
wine64 vulkaninfo
I get error:
[D3DMetal:LOG:2A825] Unsupported API: CheckFeatureSupport, unhandled support query 53
also seems D3DMetal Wine integration on Whisky doesn't expose d3d12core.dll and d3d12.dll like new Agility D3D12 dlls or VKD3D, so
getting:
MESA: error: Failed to retrieve D3D12GetInterface MESA: error: Failed to load DXCore
but anyways seems to try to load the driver as:
WARNING: dzn is not a conformant Vulkan implementation, testing use only.
full log:
MESA: error: Failed to retrieve D3D12GetInterface MESA: error: Failed to load DXCore WARNING: dzn is not a conformant Vulkan implementation, testing use only. [D3DMetal:LOG:2A825] Unsupported API: CheckFeatureSupport, unhandled support query 53 00bc:fixme:dcomp:DCompositionCreateDevice 0000000000000000, {c37ea93a-e7aa-450d-b16f-9746cb0407f3}, 000000000011E328. MESA: error: Failed to load DXCore WARNING: dzn is not a conformant Vulkan implementation, testing use only. [D3DMetal:LOG:2A825] Unsupported API: CheckFeatureSupport, unhandled support query 53 00bc:fixme:dcomp:DCompositionCreateDevice 0000000000000000, {c37ea93a-e7aa-450d-b16f-9746cb0407f3}, 000000000011E578. ERROR: [Loader Message] Code 0 : setup_loader_term_phys_devs: Call to 'vkEnumeratePhysicalDevices' in ICD c:\windows\system32\.\vulkan_dzn.dll failed with error code -3 ERROR: [Loader Message] Code 0 : setup_loader_term_phys_devs: Failed to detect any valid GPUs in the current config ERROR at C:\j\msdk0\build\Khronos-Tools\repo\vulkaninfo\vulkaninfo.h:241:vkEnumeratePhysicalDevices failed with ERROR_INITIALIZATION_FAILED
I want to turn off my ray-tracing conditionally. There's is_null_acceleration_structure but when I don't bind an acceleration structure (or pass nil to setFragmentAccelerationStructure), I get the following API validation error:
-[MTLDebugRenderCommandEncoder validateCommonDrawErrors:]:5782: failed assertion `Draw Errors Validation
Fragment Function(vol_deferred_lighting): missing instanceAccelerationStructure binding at index 6 for accelerationStructure[0].
I can turn off API validation and it works, but it seems like I should be able to use nil for the acceleration structure w/o triggering a validation error. Seems like a bug, right?
I suppose I can work around this by creating a separate pipeline with the ray-tracing disabled via a function constant instead of using is_null_acceleration_structure.
(Can we get a ray-tracing tag for questions?)
I would like to take YCbCr CVPixelBuffers from AVCaptureVideoDataOutput, apply some processing in RGB space, render to an MTKView, and pass to AVAssetWriter for recording.
Right now, I'm doing this all manually – deswing the incoming data if necessary, choose the right matrix to convert to RGB, apply processing, etc. I also have to convert back to YCbCr before feeding the frames to AVAssetWriter because encoding performs much better if I do. Is there any efficient, built-in way to achieve the same?
I can't use AVCaptureVideoPreviewLayer, since I need to do some further processing before display. I can't use AVCaptureVideoDataOutput's videoSettings to get automatic BGRA conversion because that would lose bit depth for 10 bit video formats (and isn't available on all formats anyway).
I see these Accelerate functions, but they seemingly don't use the GPU, nor do they support all the formats and bit depths I'd need.
I found reference to some undocumented MTLPixelFormats that seem to do exactly what I want, but I don't want to rely on something like this unless it's explicitly endorsed. This would also incur an RGB/YCbCr conversion on every texture read and write, right?
Is there anything I'm missing here?
Ever since the release of iOS18, we've been seeing a new crash related to calling jpegData(compressionQuality:). From reports, this isn't crashing during foreground usage of the app, but instead will prompt the user about a background app crash upon foregrounding. The stacks from crash reports show this crash is happening from a variety of callers, but all go through jpegData(compressionQuality:), down through [HDRImageConverter_Metal init] and end up in apthread_mutex_local call when it crashes.
Attached is a sample crash report from 18.2(22C5125e), but we've been seeing this since the first iOS18 release.
Did something change with around these calls in iOS18 that prohibits their use in the background?
crash.txt
I've noticed a major third-party app has the following flag set to 1/true in its Info.plist: CAMetalLayerWantsCompositingDependencies
Does anyone know if it’s recognized by Core Animation / Metal, and what it’s supposed to do? It might obviously have zero relationship to the OS, defined by that app and for that app... but since it looks very much like an unofficial/undocumented environment setting, it might be great to know what problem it solves. I happen to have issues related to compositing other CALayers over a CAMetalLayer in my app... so this definitely stood out as interesting.
Thank you!
I am building a MacOS desktop app (https://anukari.com) that is using Metal compute to do real-time audio/DSP processing, as I have a problem that is highly parallelizable and too computationally expensive for the CPU.
However it seems that the way in which I am using the GPU, even when my app is fully compute-limited, the OS never increases the power/performance state. Because this is a real-time audio synthesis application, it's a huge problem to not be able to take advantage of the full clock speeds that the GPU is capable of, because the app can't keep up with real-time.
I discovered this issue while profiling the app using Instrument's Metal tracing (and Game tracing) modes. In the profiling configuration under "Metal Application" there is a drop-down to select the "Performance State." If I run the application under Instruments with Performance State set to Maximum, it runs amazingly well, and all my problems go away.
For comparison, when I run the app on its own, outside of Instruments, the expensive GPU computation it's doing takes around 2x as long to complete, meaning that the app performs half as well.
I've done a ton of work to micro-optimize my Metal compute code, based on every scrap of information from the WWDC videos, etc. A problem I'm running into is that I think that the more efficient I make my code, the less it signals to the OS that I want high GPU clock speeds!
I think part of why the OS is confused is that in most use cases, my computation can be done using only a small number of Metal threadgroups. I'm guessing that the OS heuristics see that only a small fraction of the GPU is saturated and fail to scale up the power/clock state.
I'm not sure what to do here; I'm in a bit of a bind. One possibility is that I intentionally schedule busy work -- spin threadgroups just to waste energy and signal to the OS that I need higher clock speeds. This is obviously a really bad idea, but it might work.
Is there any other (better) way for my app to signal to the OS that it is doing real-time latency-sensitive computation on the GPU and needs the clock speeds to be scaled up?
Note that game mode is not really an option, as my app also runs as an AU plugin inside hosts like Garageband, so it can't be made fullscreen, etc.
I have multiple CAMetalLayers that I render content to and noticed that the graphics overview HUD does not function properly when I have more than one CAMetalLayer. The values reported will be very strange. For example, FPS may report 999 or some large negative value. It the HUD simply not designed to work with multiple CAMetalLayers or MTKViews? When I disable all but one of my CAMetalLayers, the HUD works as expected.