got the above error when using PyTorch with diffusion model
same error comes up in few games when running through D3D metal
Anyone has any idea what this is about?
Metal
RSS for tagRender advanced 3D graphics and perform data-parallel computations using graphics processors using Metal.
Post
Replies
Boosts
Views
Activity
Suppose I want to draw a red rectangle onto my render target using a compute shader.
id<MTLComputeCommandEncoder> encoder = [commandBuffer computeCommandEncoder];
[encoder setComputePipelineState:pipelineState];
simd_ushort2 position = simd_make_ushort2(100, 100);
simd_ushort2 size = simd_make_ushort2(50, 50);
[encoder setBytes:&position length:sizeof(position) atIndex:0];
[encoder setTexture:drawable.texture atIndex:0];
[encoder dispatchThreads:MTLSizeMake(size.x, size.y, 1)
threadsPerThreadgroup:MTLSizeMake(32, 32, 1)];
[encoder endEncoding];
#include <metal_stdlib>
using namespace metal;
kernel void
Compute(ushort2 position_in_grid [[thread_position_in_grid]],
constant ushort2 &position,
texture2d<half, access::write> texture)
{
texture.write(half4(1, 0, 0, 1), position_in_grid + position);
}
This works just fine:
Now, say for whatever reason I want to start using imageblocks in my compute kernel. First, I set the imageblock size on the CPU side:
id<MTLComputeCommandEncoder> encoder = [commandBuffer computeCommandEncoder];
[encoder setComputePipelineState:pipelineState];
MTLSize threadgroupSize = MTLSizeMake(32, 32, 1);
[encoder setImageblockWidth:threadgroupSize.width
height:threadgroupSize.height];
simd_ushort2 position = simd_make_ushort2(100, 100);
simd_ushort2 size = simd_make_ushort2(50, 50);
[encoder setBytes:&position length:sizeof(position) atIndex:0];
[encoder setTexture:drawable.texture atIndex:0];
MTLSize gridSize = MTLSizeMake(size.x, size.y, 1);
[encoder dispatchThreads:gridSize threadsPerThreadgroup:threadgroupSize];
And then I update the compute kernel to simply declare the imageblock – note I never actually read from or write to it:
#include <metal_stdlib>
using namespace metal;
struct Foo
{
int foo;
};
kernel void
Compute(ushort2 position_in_grid [[thread_position_in_grid]],
constant ushort2 &position,
texture2d<half, access::write> texture,
imageblock<Foo> imageblock)
{
texture.write(half4(1, 0, 0, 1), position_in_grid + position);
}
And now out of nowhere Metal’s shader validation starts complaining about mismatched texture usage flags:
2024-06-22 00:57:15.663132+1000 TextureUsage[80558:4539093] [GPUDebug] Texture usage flags mismatch executing kernel function "Compute" encoder: "1", dispatch: 0
2024-06-22 00:57:15.672004+1000 TextureUsage[80558:4539093] [GPUDebug] Texture usage flags mismatch executing kernel function "Compute" encoder: "1", dispatch: 0
2024-06-22 00:57:15.682422+1000 TextureUsage[80558:4539093] [GPUDebug] Texture usage flags mismatch executing kernel function "Compute" encoder: "1", dispatch: 0
2024-06-22 00:57:15.687587+1000 TextureUsage[80558:4539093] [GPUDebug] Texture usage flags mismatch executing kernel function "Compute" encoder: "1", dispatch: 0
2024-06-22 00:57:15.698106+1000 TextureUsage[80558:4539093] [GPUDebug] Texture usage flags mismatch executing kernel function "Compute" encoder: "1", dispatch: 0
The texture I’m writing to comes from a CAMetalDrawable whose associated CAMetalLayer has framebufferOnly set to NO. What am I missing?
I've got an iOS app that is using MetalKit to display raw video frames coming in from a network source. I read the pixel data in the packets into a single MTLTexture rows at a time, which is drawn into an MTKView each time a frame has been completely sent over the network. The app works, but only for several seconds (a seemingly random duration), before the MTKView seemingly freezes (while packets are still being received).
Watching the debugger while my app was running revealed that the freezing of the display happened when there was a large spike in memory. Seeing the memory profile in Instruments revealed that the spike was related to a rapid creation of many IOSurfaces and IOAccelerators. Profiling CPU Usage shows that CAMetalLayerPrivateNextDrawableLocked is what happens during this rapid creation of surfaces. What does this function do?
Being a complete newbie to iOS programming as a whole, I wonder if this issue comes from a misuse of the MetalKit library. Below is the code that I'm using to render the video frames themselves:
class MTKViewController: UIViewController, MTKViewDelegate {
/// Metal texture to be drawn whenever the view controller is asked to render its view.
private var metalView: MTKView!
private var device = MTLCreateSystemDefaultDevice()
private var commandQueue: MTLCommandQueue?
private var renderPipelineState: MTLRenderPipelineState?
private var texture: MTLTexture?
private var networkListener: NetworkListener!
private var textureGenerator: TextureGenerator!
override public func loadView() {
super.loadView()
assert(device != nil, "Failed creating a default system Metal device. Please, make sure Metal is available on your hardware.")
initializeMetalView()
initializeRenderPipelineState()
networkListener = NetworkListener()
textureGenerator = TextureGenerator(width: streamWidth, height: streamHeight, bytesPerPixel: 4, rowsPerPacket: 8, device: device!)
networkListener.start(port: NWEndpoint.Port(8080))
networkListener.dataRecievedCallback = { data in
self.textureGenerator.process(data: data)
}
textureGenerator.onTextureBuiltCallback = { texture in
self.texture = texture
self.draw(in: self.metalView)
}
commandQueue = device?.makeCommandQueue()
}
public func mtkView(_ view: MTKView, drawableSizeWillChange size: CGSize) {
/// need implement?
}
public func draw(in view: MTKView) {
guard
let texture = texture,
let _ = device
else { return }
let commandBuffer = commandQueue!.makeCommandBuffer()!
guard
let currentRenderPassDescriptor = metalView.currentRenderPassDescriptor,
let currentDrawable = metalView.currentDrawable,
let renderPipelineState = renderPipelineState
else { return }
currentRenderPassDescriptor.renderTargetWidth = streamWidth
currentRenderPassDescriptor.renderTargetHeight = streamHeight
let encoder = commandBuffer.makeRenderCommandEncoder(descriptor: currentRenderPassDescriptor)!
encoder.pushDebugGroup("RenderFrame")
encoder.setRenderPipelineState(renderPipelineState)
encoder.setFragmentTexture(texture, index: 0)
encoder.drawPrimitives(type: .triangleStrip, vertexStart: 0, vertexCount: 4, instanceCount: 1)
encoder.popDebugGroup()
encoder.endEncoding()
commandBuffer.present(currentDrawable)
commandBuffer.commit()
}
private func initializeMetalView() {
metalView = MTKView(frame: CGRect(x: 0, y: 0, width: streamWidth, height: streamWidth), device: device)
metalView.delegate = self
metalView.framebufferOnly = true
metalView.colorPixelFormat = .bgra8Unorm
metalView.contentScaleFactor = UIScreen.main.scale
metalView.autoresizingMask = [.flexibleWidth, .flexibleHeight]
view.insertSubview(metalView, at: 0)
}
/// initializes render pipeline state with a default vertex function mapping texture to the view's frame and a simple fragment function returning texture pixel's value.
private func initializeRenderPipelineState() {
guard let device = device, let library = device.makeDefaultLibrary() else {
return
}
let pipelineDescriptor = MTLRenderPipelineDescriptor()
pipelineDescriptor.rasterSampleCount = 1
pipelineDescriptor.colorAttachments[0].pixelFormat = .bgra8Unorm
pipelineDescriptor.depthAttachmentPixelFormat = .invalid
/// Vertex function to map the texture to the view controller's view
pipelineDescriptor.vertexFunction = library.makeFunction(name: "mapTexture")
/// Fragment function to display texture's pixels in the area bounded by vertices of `mapTexture` shader
pipelineDescriptor.fragmentFunction = library.makeFunction(name: "displayTexture")
do {
renderPipelineState = try device.makeRenderPipelineState(descriptor: pipelineDescriptor)
}
catch {
assertionFailure("Failed creating a render state pipeline. Can't render the texture without one.")
return
}
}
}
My question is simply: what gives?
Hi,
Reading the
copyFromBuffer
documentation states that on macOS, sourceOffset, destinationOffset, and size "needs to be a multiple of 4, but can be any value in iOS and tvOS".
However, I have noticed that, at least on my M2 Max, this limitation does not seem to exist as there are no warnings and the copy works correctly regardless of the offset value.
I'm curious to know if this is something that should still be avoided. Is the multiple of 4 limitation reserved for non Apple Silicon devices and that note can be ignored for Apple Silicon?
I ask because I am a contributor to Metal.jl, and recently noticed that our tests pass even when copying using copyWithBuffer with offsets and sizes that are not multiples of 4. If that coul cause issues/correctness problems, we would need to fix that.
Thank you.
Christian
Call of Duty: Black Ops 4 crashes right after launch while using Game Porting Toolkit 2. The following errors by wine-preloader can be found in the Console:
error 16:25:59.318998+0300 wine-preloader Unsupported API: IDXGISwapChain3::SetColorSpace1
error 16:26:00.039271+0300 wine-preloader Unsupported API: D3D11 timestamp query
Hi,
Introducing Swift Concurrency to my Metal app has been a bit challenging as Swift Concurrency is limited by the cooperative thread pool.
GPU work is obviously not CPU bound and can block forward moving progress, especially when using waitUntilCompleted on the command buffer. For concurrent render work this has the potential of under utilizing the CPU and even creating dead locks.
My question is, what is the Metal's teams general recommendation when it comes to concurrency? It seems to me that Dispatch or OperationQueues are still the preferred way for Metal bound tasks in order to gain maximum performance?
To integrate with Swift Concurrency my idea is to use continuations that kick off render jobs via Dispatch or Queues? Would this be the best solution to bridge async tasks with Metal work?
Thanks!
I've been upgrading Xcode consistently for years and have never seen Metal shaders behave differently from one version to another until now.
On macOS 14.5, Xcode 16 beta, suddenly several color outputs turn out completely black where there should be color. All validation is on and nothing seems to be wrong (and hasn't been since maybe Xcode version 11).
I've attached two screens. The first is the normal color scheme, the second is in Xcode 16. The settings are the exact same.
Normal:
Buggy with black + transparent colors (so it seems like either colors are overflowing or are all 0s)?
Before I file a bug report or code level request, may I have some thoughts on how to debug this? The only clue I have is that I'm using bindless to multiply color texture samples with color values from my vertex struct. But it still fails even if I use hard-coded values for the texture samples, meaning somehow the color values are not being sent to the shader correctly? This is the most stable part of my rendering pipeline, so I'm surprised if the issue is there.
Thank you.
I have an issue with hand occlusion in immersive mode. I have an entry view for the app and a Metal CompositorLayer (which is the immersive volume) where I have set .upperLimbVisibility(Visibility.hidden). The problem is that when I dismiss the entry view, sometimes it hides the hands and sometimes it doesn't (randomly).
@main
struct AVPainterApp: App
{
@State var hand: Int32 = 0
var body: some Scene
{
WindowGroup()
{
ContentView(hand: $hand)
}
.windowResizability(.contentSize)
ImmersiveSpace(id: "ImmersiveSpace")
{
CompositorLayer(configuration: MetalLayerConfiguration())
{
layerRenderer in SpatialSceneRun(layerRenderer, hand)
}
}
.upperLimbVisibility(Visibility.hidden)
.immersionStyle(selection: .constant(.full), in: .full)
}
}
It’s great that we’ll be able to use Metal custom renderers in passthrough mode on visionOS.
https://developer.apple.com/wwdc24/10092
This is a lot of complicated set-up, however. It’s also unclear how occlusion and custom algorithms / raytracing will work in tandem with scene understanding. May we have a project template and/or sample? Preferably with the C api and not just swift. This would be much-appreciated and helpful to everyone who wants this set-up. I’d like to see the whole process.
Thank you for introducing this feature!
I am seeking clarification regarding the new device-coherent memory (buffers and textures) in Metal 3.2. Do I understand the documentation correctly that this feature allows threads from different threadgroups to update data in device memory cooperatively? The documentation mentions, "[results of operations] are visible to other threads across thread groups if you synchronize them properly." How does one do proper synchronization? From what I understand, Metal has no device-scoped barriers.
We’re experiencing an issue with wrong SceneKit hit testing results in iOS 17.2 compared with iOS 16.1 when using the either Metal or OpenGLES2 engines.
Tapping on a 3D model to place a SCNNode
// pointInScene: tapped point
let hitResults = sceneView.hitTest(pointInScene, options: nil)
return hitResults.first { $0.node.name?.compare("node_name") == .orderedSame }
I've got a full-screen animation of a bunch of circles filled with gradients, with plenty of (careless) overdraw, plus real-time audio processing driving the animation, plus the overhead of SwiftUI's dependency analysis, and that app uses less energy (on iPhone 13) than the Xcode "Metal Game" template which is a rotating textured cube (a trivial GPU workload). Why is that? How can I investigate further?
Does CoreAnimation have access to a compositor fast-path that a Metal app cannot access?
Maybe another data point: when I do the same circles animation using SwiftUI's Canvas, the energy use is "Very High" and GPU utilization is also quite high. Eventually the phone's thermal state goes "Serious" and I get a message on the device that "Charging will resume when iPhone returns to normal temperature".
Why do I get this error almost immediately on starting my rendering pass?
Multiline
BlockQuote. 2024-05-29 20:02:22.744035-0500 RoomPlanExampleApp[491:10341] [] <<<< AVPointCloudData >>>> Fig assert: "_dataBuffer" at bail (AVPointCloudData.m:217) - (err=0)
2024-05-29 20:02:22.744455-0500 RoomPlanExampleApp[491:10341] [] <<<< AVPointCloudData >>>> Fig assert: "_dataBuffer" at bail (AVPointCloudData.m:217) - (err=0)
2024-05-29 20:05:54.079981-0500 RoomPlanExampleApp[491:10025] [CAMetalLayer nextDrawable] returning nil because allocation failed.
2024-05-29 20:05:54.080144-0500 RoomPlanExampleApp[491:10341] [] <<<< AVPointCloudData >>>> Fig assert: "_dataBuffer" at bail (AVPointCloudData.m:217) - (err=0)
I am a VOIP app developer.
I am planning to develop a VOIP app on iOS using WebRTC that operates in PiP (Picture-in-Picture) mode.
Since MTKView (CAMetalLayer) cannot be used in PiP mode, I am considering using AVSampleBufferDisplayLayer.
Regarding this, I am curious about the performance differences between CAMetalLayer and AVSampleBufferDisplayLayer.
As far as I know, CAMetalLayer utilizes the GPU.
Does AVSampleBufferDisplayLayer also render using the GPU?
If AVSampleBufferDisplayLayer renders using the GPU, will the rendering performance be similar?
=> Based on tests, there seems to be no difference in CPU usage between the two, which leads me to speculate that AVSampleBufferDisplayLayer also uses the GPU.
If both use the GPU and there are no performance differences, is there a significant advantage to using CAMetalLayer?
Thank you in advance.
Hello,
This exact question was already asked in this forum (8 years ago) but I can't find a definitive answer:
Does Metal allow using the same color texture as both an input and output (color attachment) of a fragment shader? Is the behavior defined somewhere?
I believe this results in undefined behavior under both DirectX and OpenGL, so I'd assume the same for Metal, but then why doesn't Metal warn me about this as it does on some many other "misconfigurations"? It also seems to work correctly in my case, as I found out by accident.
Would love to get a clarification!
Thanks ahead!
Hello all
We would like to use AMD's FidelityFx Downsampler in our custom game engine and we are having difficulties to correctly implement it for Metal due to its use of the globallycoherent keyword. We have done extensive search online but have not succeeded in finding an answer. What we have found is the largely undocumented 'volatile' keyword, so we were hypothesising that marking a texture with 'volatile' (which implies 'device volatile' since it's a texture) could have the same effect but we are far from convinced it would work. Does anyone have insights into this?
I have provided a test UIKit app which displays three different images, side by side, each inside a separate MTKView. Each image is tagged with a different color profile:
Display P3
uRGB
Test RGB (from an image supplied in Apple's ImageApp sample).
I set up default values for all color spaces and formats. I then check if the image is tagged and, if so, I override those values with state from the tagged color space.
The variables I am setting:
“workingColorSpace” in the Metal CIContext, default = sRGB
“workingFormat” in the Metal CIContext, default = RGBAf
“outputColorSpace” in the Metal CIContext, default = displayP3
“colorPixelFormat” in the MTKView, default = bgra8Unorm
“colorSpace” in a CIRenderDestination that I use in the MTKView delegate draw method
The “colorSpace” default value = CGColorSpaceCreateDeviceRGB()
I also set “pixelFormat” in CIRenderDestination with the MTKView.colorPixelFormat.
If the image is tagged, I override the following values with the tagged colorSpace:
CIContext.workingColorSpace
CIContext.outputColorSpace
CIRenderDestination.colorSpace
If the tagged colorSpace.isWideGamutRGB = true, then I set the CIRenderDestination.colorSpace to extendedSRGB, ignoring the color space in the tagged wide gamut color space, as well as set the colorPixelFormat = bgr10_xr
Results:
The above scenario will properly render the DisplayP3 image, and the uRGB image. The “Test RGB” image fails:
If I do not override the CIRenderDestination.colorSpace with a value from the tagged image, then the “Test RGB” image succeeds, but the “uRGB” image fails to render properly:
Question: Do I have everything hooked up correctly and, if so, why does one image fail, and the other succeed?
Link to sample project:
https://www.dropbox.com/scl/fi/57u2fcrgdvys7jtzykzxt/ColorSpaceTest.zip?rlkey=unjeeiu7mi0wx9wfpylt78nwd&dl=0
After the build 4.2.9. I have a weird bug. It keep crashing and when I read the message, it display
validateRenderPassDescriptor:782: failed assertion `RenderPass Descriptor Validation
Texture at colorAttachment[0] has usage (0x01) which doesn't specify MTLTextureUsageRenderTarget (0x04)
This happen when I run in debug mode and try to hook up the motion template. I found out that the output texture create have usage only "MTLTextureUsageShaderRead" but no "MTLTextureUsageRenderTarget"
Anyone have problem like me?
I uusing fxplug 4.2.9
motion 5.7 and final cut 10.7.1. running in sonoma 14.2.1
when I try to import MetalFX in visionOS, Xcode show error : No such module 'MetalFX' , but the document show visionOS is supported
Hi,
I am using xcode frame capture to profile my app's shader. And I got some question about the shader per line profile statistics. Please see the two screen shot first, it is my compute shader.
Begin:
End:
The first image is the head of the shader. The profile show's that the shader entry function takes 72.44% of the time.
And at the end of the shader, the profile shows that the right brace '}' takes 60.45%.
Here is my question:
How to properly understand the profile data? What's the real performance data of this shader?
Why the shader entry function does not take 100% of the time?
Can someone help me to answer the question?
Thanks!
Boson