Currently looking for Metal developers to port Quake 2 RTX to Metal RT in order to give Apple Silicon Macs an amazing Pathtracing demo, This project falls under NightSightProductions who is also working on a Portal 2 with RTX Remaster. if you are interested and want to help further Mac gaming, message me here or on discord at king_vulpes
Metal
RSS for tagRender advanced 3D graphics and perform data-parallel computations using graphics processors using Metal.
Posts under Metal tag
200 Posts
Sort by:
Post
Replies
Boosts
Views
Activity
I'm trying to get performance information in a Metal cpp program. I'm using Xcode 14.2. In the Counters tab, only the inscription No Data appears. What could be the problem, what am I doing wrong?
The libraries Metal, MetalKit, AppKit are connected. The code is taken from the Apple website from the Learn Metal with C++ section.
Submited as : FB16052050
I am looking to adopt Machine Learning in a more granular manner, going beyond just using pre-built Metal, Core ML, or Create ML approaches. Specifically, I want to train models using Open Python PyTorch libraries, as these offer greater flexibility compared to Apple's native tools. However, these PyTorch APIs are primarily optimised for NVIDIA GPUs (or TPUs), not Apple's M3 or Apple Neural Engine (ANE).
My goal is to train the models locally without resorting to cloud-based solutions for training or inference, and to then convert the models into Core ML format for deployment on Apple hardware. This would allow me to leverage Apple's hardware acceleration (via ANE, Metal, and MPS) while maintaining control over the training process in PyTorch.
I want to know:
What are my options for training models in PyTorch on local hardware (Apple M3 or equivalent), and how can I ensure that the PyTorch model can eventually be converted to Core ML without losing flexibility in model training and customisation?
How can I perform training in PyTorch and avoid being restricted to inference-only workflows as Core ML typically allows? Is it possible to use the training capabilities of PyTorch and still get the performance benefits of Apple's hardware for both training and inference?
What are the best practices or tools to ensure that my training pipeline in PyTorch is compatible with Apple's hardware constraints and optimised for local execution?
I'm seeking a practical, cloud-free approach on Apple Hardware only that allows me to train models in PyTorch (keeping control over the training process) while ensuring that they can be deployed efficiently using Core ML on Apple hardware.
Hello, I am using MTKView to display: camera preview & video playback. I am testing on iPhone 16. App crashes at a random moment whenever MTKView is rendering CIImage.
MetalView:
public enum MetalActionType {
case image(CIImage)
case buffer(CVPixelBuffer)
}
public struct MetalView: UIViewRepresentable {
let mtkView = MTKView()
public let actionPublisher: any Publisher<MetalActionType, Never>
public func makeCoordinator() -> Coordinator {
Coordinator(self)
}
public func makeUIView(context: UIViewRepresentableContext<MetalView>) -> MTKView {
guard let metalDevice = MTLCreateSystemDefaultDevice() else {
return mtkView
}
mtkView.device = metalDevice
mtkView.framebufferOnly = false
mtkView.clearColor = MTLClearColor(red: 0, green: 0, blue: 0, alpha: 0)
mtkView.drawableSize = mtkView.frame.size
mtkView.delegate = context.coordinator
mtkView.isPaused = true
mtkView.enableSetNeedsDisplay = true
mtkView.preferredFramesPerSecond = 60
context.coordinator.ciContext = CIContext(
mtlDevice: metalDevice, options: [.priorityRequestLow: true, .highQualityDownsample: false])
context.coordinator.metalCommandQueue = metalDevice.makeCommandQueue()
context.coordinator.actionSubscriber = actionPublisher.sink { type in
switch type {
case .buffer(let pixelBuffer):
context.coordinator.updateCIImage(pixelBuffer)
break
case .image(let image):
context.coordinator.updateCIImage(image)
break
}
}
return mtkView
}
public func updateUIView(_ nsView: MTKView, context: UIViewRepresentableContext<MetalView>) {
}
public class Coordinator: NSObject, MTKViewDelegate {
var parent: MetalView
var metalCommandQueue: MTLCommandQueue!
var ciContext: CIContext!
private var image: CIImage? {
didSet {
Task { @MainActor in
self.parent.mtkView.setNeedsDisplay() //<--- call Draw method
}
}
}
var actionSubscriber: (any Combine.Cancellable)?
private let operationQueue = OperationQueue()
init(_ parent: MetalView) {
self.parent = parent
operationQueue.qualityOfService = .background
super.init()
}
public func mtkView(_ view: MTKView, drawableSizeWillChange size: CGSize) {
}
public func draw(in view: MTKView) {
guard let drawable = view.currentDrawable, let ciImage = image,
let commandBuffer = metalCommandQueue.makeCommandBuffer(), let ci = ciContext
else {
return
}
//making sure nothing is nil, now we can add the current frame to the operationQueue for processing
operationQueue.addOperation(
MetalOperation(
drawable: drawable, drawableSize: view.drawableSize, ciImage: ciImage,
commandBuffer: commandBuffer, pixelFormat: view.colorPixelFormat, ciContext: ci))
}
//consumed by Subscriber
func updateCIImage(_ img: CIImage) {
image = img
}
//consumed by Subscriber
func updateCIImage(_ buffer: CVPixelBuffer) {
image = CIImage(cvPixelBuffer: buffer)
}
}
}
now the MetalOperation class:
private class MetalOperation: Operation, @unchecked Sendable {
let drawable: CAMetalDrawable
let drawableSize: CGSize
let ciImage: CIImage
let commandBuffer: MTLCommandBuffer
let pixelFormat: MTLPixelFormat
let ciContext: CIContext
init(
drawable: CAMetalDrawable, drawableSize: CGSize, ciImage: CIImage,
commandBuffer: MTLCommandBuffer, pixelFormat: MTLPixelFormat, ciContext: CIContext
) {
self.drawable = drawable
self.drawableSize = drawableSize
self.ciImage = ciImage
self.commandBuffer = commandBuffer
self.pixelFormat = pixelFormat
self.ciContext = ciContext
}
override func main() {
let width = Int(drawableSize.width)
let height = Int(drawableSize.height)
let ciWidth = Int(ciImage.extent.width) //<-- Thread 22: EXC_BAD_ACCESS (code=1, address=0x5e71f5490) A bad access to memory terminated the process.
let ciHeight = Int(ciImage.extent.height)
let destination = CIRenderDestination(
width: width, height: height, pixelFormat: pixelFormat, commandBuffer: commandBuffer,
mtlTextureProvider: { [self] () -> MTLTexture in
return drawable.texture
})
let transform = CGAffineTransform(
scaleX: CGFloat(width) / CGFloat(ciWidth), y: CGFloat(height) / CGFloat(ciHeight))
do {
try ciContext.startTask(toClear: destination)
try ciContext.startTask(toRender: ciImage.transformed(by: transform), to: destination)
} catch {
}
commandBuffer.present(drawable)
commandBuffer.commit()
commandBuffer.waitUntilCompleted()
}
}
Now I am no Metal expert, but I believe it's a very simple execution that shouldn't cause memory leak especially after we have already checked for whether CIImage is nil or not. I have also tried running this code without OperationQueue and also tried with @autoreleasepool but none of them has solved this problem.
Am I missing something?
I have been playing around with the idea of drawing directly onto the pixels of the Vision Pro, as I am working on a telepresence app that streams a live stereoscopic feed from an articulated robot neck to the wearer.
I was playing around in the Compositor Services demo and modified it to show the following.
I created a grid pattern using normalized device coordinates (-1 to 1) and it looks great when it shows up in the simulator as shown below.
I wanted to see the effects of lens distortion on the image so I launched this script inside the actual Vision Pro, it seems that each eye has only a portion of this screen visible. I have included a screen capture of a screen recording inside of the Vision Pro when running this modified app.
The lines appear straight, which says to me that there must be some automatic pre-distortion correction applied (similar to the image shown below taken from an AVP teardown that I cannot link here).
However, I am wondering why the grid appears cropped and what the bounds of the frame are defined by?
I am trying to extract some built-in and custom render passes from SceneKit, so that I can pass them into a metal pipeline and do some additional work with them.
I have a metal viewport, and have instantiated a SCNRenderer so that I can render a SCNScene using SceneKit to a texture as part of my metal draw pass. This works as expected.
Now I want to output multiple textures from the SceneKit render, not just the final color. I want to extract Depth, Normal, Lighting, Colour and a custom SCNTechnique for world position.
I can easily use a SCNTechnique to render one of these to the color output, but it's not clear how I would render multiple passes in one render call.
Is there some way to pass a writeable buffer/texture to a SCNTechnique, so that I can populate it in my SCNTechnique shader at render time with the output from the pass? Similar to how one would bind a buffer for a metal shader. SCNTechnique obfuscates things, so it's not clear how to proceed.
Does anyone have any ideas?
I have the following TaskExecutor code in Swift 6 and is getting the following error:
//Error
Passing closure as a sending parameter risks causing data races between main actor-isolated code and concurrent execution of the closure.
May I know what is the best way to approach this?
This is the default code generated by Xcode when creating a Vision Pro App using Metal as the Immersive Renderer.
Renderer
@MainActor
static func startRenderLoop(_ layerRenderer: LayerRenderer, appModel: AppModel) {
Task(executorPreference: RendererTaskExecutor.shared) { //Error
let renderer = Renderer(layerRenderer, appModel: appModel)
await renderer.startARSession()
await renderer.renderLoop()
}
}
final class RendererTaskExecutor: TaskExecutor {
private let queue = DispatchQueue(label: "RenderThreadQueue", qos: .userInteractive)
func enqueue(_ job: UnownedJob) {
queue.async {
job.runSynchronously(on: self.asUnownedSerialExecutor())
}
}
func asUnownedSerialExecutor() -> UnownedTaskExecutor {
return UnownedTaskExecutor(ordinary: self)
}
static let shared: RendererTaskExecutor = RendererTaskExecutor()
}
I have a very basic usdz file from this repo
I call loadTextures() after loading the usdz via MDLAsset. Inspecting the MDLTexture object I can tell it is assigning a colorspace of linear rgb instead of srgb although the image file in the usdz is srgb.
This causes the textures to ultimately render as over saturated.
In the code I later convert the MDLTexture to MTLTexture via MTKTextureLoader but if I set the srgb option it seems to ignore it.
This significantly impacts the usefulness of Model I/O if it can't load a simple usdz texture correctly. Am I missing something?
Thanks!
I'm writing a swift app that uses metal to render textures to the main view. I currently use a NSViewRepresentable to place a MTKView into the window and a MTKViewDelegate to perform the metal operations. It's running well and I see my metal view being updated.
However, when I close the window (either through the user clicking the close button or by programatically using the appropriate @Environment(\.dismissWindow) private var dismissWindow) and then reopen the window, I no longer receive calls to MTKViewDelegate draw(in mtkView: MTView). If I manually call the MTKView::draw() function my view updates it's content as expected, so it seems to be still be correctly setup / alive.
As best as I can tell the CVDisplayLink created by MTKView is no longer active (or at least that's my understanding of how the MTKView::draw() function is called).
I've setup the MTKView like this
let mtkView = MTKView()
mtkView.delegate = context.coordinator // My custom delegate
mtkView.device = context.coordinator.device // The default metal device
mtkView.preferredFramesPerSecond = 60
mtkView.enableSetNeedsDisplay = false
mtkView.isPaused = false
which I was hoping would call the draw function at 60fps while the view is visible.
I've also verified the values don't change while running.
Does anyone have any ideas on how I could restart the CVDisplayLink or anyother methods to avoid this problem??
Cheers
Jack
I am seeking a comprehensive pathway to learning Metal programming on VisionOS. The official documentation’s Pathway on Metal is insufficient in this regard. I kindly request that someone create a detailed pathway to assist me in this endeavor.
The pathway should encompass the following key areas:
Knowledge Base:
Understand the fundamental principles of Metal and other frameworks, as well as basic concepts, to prepare for future learning.
Metal3 (very important) :
Gain a deep understanding of Metal itself, the programming language used to communicate with the GPU on the device to render graphics. This knowledge forms the foundation for all Metal-related tasks.
Compositor Services and ARKit (important) :
Learn how to display Metal scenes within the Vision device’s space and enable augmented reality (AR) and hand interaction. This knowledge is essential for creating interactive and immersive experiences.
Metal Performance Shaders:
Acquire expertise in optimizing material rendering to enhance performance.
MetalKit:
Simplifies the tasks that display your Metal content onscreen.
MetalFX:
Develop proficiency in using MetalFX to improve rendering efficiency and achieve visually stunning effects.
I would appreciate it if you could provide me with a detailed and comprehensive pathway, including the URLs of relevant documents, to guide my learning journey. Thank you for your assistance.
Hi, I'm trying to modify the ScreenCaptureKit Sample code by implementing an actor for Metal rendering, but I'm experiencing issues with frame rendering sequence.
My app workflow is:
ScreenCapture -> createFrame -> setRenderData
Metal draw callback -> renderAsync (getData from renderData)
I've added timestamps to verify frame ordering, I also using binarySearch to insert the frame with timestamp, and while the timestamps appear to be in sequence, the actual rendering output seems out of order.
// ScreenCaptureKit sample
func createFrame(for sampleBuffer: CMSampleBuffer) async {
if let surface: IOSurface = getIOSurface(for: sampleBuffer) {
await renderer.setRenderData(surface, timeStamp: sampleBuffer.presentationTimeStamp.seconds)
}
}
class Renderer {
...
func setRenderData(surface: IOSurface, timeStamp: Double) async {
_ = await renderSemaphore.getSetBuffers(
isGet: false,
surface: surface,
timeStamp: timeStamp
)
}
func draw(in view: MTKView) {
Task {
await renderAsync(view)
}
}
func renderAsync(_ view: MTKView) async {
guard await renderSemaphore.beginRender() else { return }
guard let frame = await renderSemaphore.getSetBuffers(
isGet: true, surface: nil, timeStamp: nil
) else {
await renderSemaphore.endRender()
return }
guard let texture = await renderSemaphore.getRenderData(
device: self.device,
surface: frame.surface) else {
await renderSemaphore.endRender()
return
}
guard let commandBuffer = _commandQueue.makeCommandBuffer(),
let renderPassDescriptor = await view.currentRenderPassDescriptor,
let renderEncoder = commandBuffer.makeRenderCommandEncoder(descriptor: renderPassDescriptor) else {
await renderSemaphore.endRender()
return
}
// Shaders ..
renderEncoder.endEncoding()
commandBuffer.addCompletedHandler() { @Sendable (_ commandBuffer)-> Swift.Void in
updateFPS()
}
// commit frame in actor
let success = await renderSemaphore.commitFrame(
timeStamp: frame.timeStamp,
commandBuffer: commandBuffer,
drawable: view.currentDrawable!
)
if !success {
print("Frame dropped due to out-of-order timestamp")
}
await renderSemaphore.endRender()
}
}
actor RenderSemaphore {
private var frameBuffers: [FrameData] = []
private var lastReadTimeStamp: Double = 0.0
private var lastCommittedTimeStamp: Double = 0
private var activeTaskCount = 0
private var activeRenderCount = 0
private let maxTasks = 3
private var textureCache: CVMetalTextureCache?
init() {
}
func initTextureCache(device: MTLDevice) {
CVMetalTextureCacheCreate(kCFAllocatorDefault, nil, device, nil, &self.textureCache)
}
func beginRender() -> Bool {
guard activeRenderCount < maxTasks else { return false }
activeRenderCount += 1
return true
}
func endRender() {
if activeRenderCount > 0 {
activeRenderCount -= 1
}
}
func setTextureLoaded(_ loaded: Bool) {
isTextureLoaded = loaded
}
func getSetBuffers(isGet: Bool, surface: IOSurface?, timeStamp: Double?) -> FrameData? {
if isGet {
if !frameBuffers.isEmpty {
let frame = frameBuffers.removeFirst()
if frame.timeStamp > lastReadTimeStamp {
lastReadTimeStamp = frame.timeStamp
print(frame.timeStamp)
return frame
}
}
return nil
} else {
// Set
let frameData = FrameData(
surface: surface!,
timeStamp: timeStamp!
)
// insert to the right position
let insertIndex = binarySearch(for: timeStamp!)
frameBuffers.insert(frameData, at: insertIndex)
return frameData
}
}
private func binarySearch(for timeStamp: Double) -> Int {
var left = 0
var right = frameBuffers.count
while left < right {
let mid = (left + right) / 2
if frameBuffers[mid].timeStamp > timeStamp {
right = mid
} else {
left = mid + 1
}
}
return left
}
// for setRenderDataNormalized
func tryEnterTask() -> Bool {
guard activeTaskCount < maxTasks else { return false }
activeTaskCount += 1
return true
}
func exitTask() {
activeTaskCount -= 1
}
func commitFrame(timeStamp: Double,
commandBuffer: MTLCommandBuffer,
drawable: MTLDrawable) async -> Bool {
guard timeStamp > lastCommittedTimeStamp else {
print("Drop frame at commit: \(timeStamp) <= \(lastCommittedTimeStamp)")
return false
}
commandBuffer.present(drawable)
commandBuffer.commit()
lastCommittedTimeStamp = timeStamp
return true
}
func getRenderData(
device: MTLDevice,
surface: IOSurface,
depthData: [Float]
) -> (MTLTexture, MTLBuffer)? {
let _textureName = "RenderData"
var px: Unmanaged<CVPixelBuffer>?
let status = CVPixelBufferCreateWithIOSurface(kCFAllocatorDefault, surface, nil, &px)
guard status == kCVReturnSuccess, let screenImage = px?.takeRetainedValue() else {
return nil
}
CVMetalTextureCacheFlush(textureCache!, 0)
var texture: CVMetalTexture? = nil
let width = CVPixelBufferGetWidthOfPlane(screenImage, 0)
let height = CVPixelBufferGetHeightOfPlane(screenImage, 0)
let result2 = CVMetalTextureCacheCreateTextureFromImage(
kCFAllocatorDefault,
self.textureCache!,
screenImage,
nil,
MTLPixelFormat.bgra8Unorm,
width,
height,
0, &texture)
guard result2 == kCVReturnSuccess,
let cvTexture = texture,
let mtlTexture = CVMetalTextureGetTexture(cvTexture) else {
return nil
}
mtlTexture.label = _textureName
let depthBuffer = device.makeBuffer(bytes: depthData, length: depthData.count * MemoryLayout<Float>.stride)!
return (mtlTexture, depthBuffer)
}
}
Above's my code - could someone point out what might be wrong?
In our app we use CoreML. But ever since macOS 15.x was released we started to get a great bunch of crashes like this:
Incident Identifier: 424041c3-884b-4e50-bb5a-429a83c3e1c8
CrashReporter Key: B914246B-1291-4D44-984D-EDF84B52310E
Hardware Model: Mac14,12
Process: <REMOVED> [1509]
Path: /Applications/<REMOVED>
Identifier: com.<REMOVED>
Version: <REMOVED>
Code Type: arm64
Parent Process: launchd [1]
Date/Time: 2024-11-13T13:23:06.999Z
Launch Time: 2024-11-13T13:22:19Z
OS Version: Mac OS X 15.1.0 (24B83)
Report Version: 104
Exception Type: SIGABRT
Exception Codes: #0 at 0x189042600
Crashed Thread: 36
Thread 36 Crashed:
0 libsystem_kernel.dylib 0x0000000189042600 __pthread_kill + 8
1 libsystem_c.dylib 0x0000000188f87908 abort + 124
2 libsystem_c.dylib 0x0000000188f86c1c __assert_rtn + 280
3 Metal 0x0000000193fdd870 MTLReportFailure.cold.1 + 44
4 Metal 0x0000000193fb9198 MTLReportFailure + 444
5 MetalPerformanceShadersGraph 0x0000000222f78c80 -[MPSGraphExecutable initWithMPSGraphPackageAtURL:compilationDescriptor:] + 296
6 Espresso 0x00000001a290ae3c E5RT::SharedResourceFactory::GetMPSGraphExecutable(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, NSDictionary*) + 932
.
.
.
43 CoreML 0x0000000192d263bc -[MLModelAsset modelWithConfiguration:error:] + 120
44 CoreML 0x0000000192da96d0 +[MLModel modelWithContentsOfURL:configuration:error:] + 176
45 <REMOVED> 0x000000010497b758 -[<REMOVED> <REMOVED>] (<REMOVED>)
No similar crashes on macOS 12-14!
MetalPerformanceShadersGraph.log
Any clue what is causing this?
Thanks! :)
I want to turn off my ray-tracing conditionally. There's is_null_acceleration_structure but when I don't bind an acceleration structure (or pass nil to setFragmentAccelerationStructure), I get the following API validation error:
-[MTLDebugRenderCommandEncoder validateCommonDrawErrors:]:5782: failed assertion `Draw Errors Validation
Fragment Function(vol_deferred_lighting): missing instanceAccelerationStructure binding at index 6 for accelerationStructure[0].
I can turn off API validation and it works, but it seems like I should be able to use nil for the acceleration structure w/o triggering a validation error. Seems like a bug, right?
I suppose I can work around this by creating a separate pipeline with the ray-tracing disabled via a function constant instead of using is_null_acceleration_structure.
(Can we get a ray-tracing tag for questions?)
The Problem
When transitioning between view controllers that each have their own MTKView but share a Metal renderer backend, we run into delegate ownership conflicts. Only one MTKView can successfully render at a time, since setting the delegate on one view requires removing it from the other, leading to paused views during transitions.
For my app, I need to display the same visuals across multiple views and have them all render correctly.
Current Implementation Approach
I've created a container object that manages the MTKView and its relationship with the shared renderer:
class RenderContainer {
let metalView: MTKView
private let renderer: MetalRenderer
func startRendering() {
metalView.delegate = renderer
metalView.isPaused = false
}
func stopRendering() {
metalView.isPaused = true
metalView.delegate = nil
}
}
View controllers manage the rendering lifecycle in their view appearance methods:
override func viewWillAppear(_ animated: Bool) {
super.viewWillAppear(animated)
renderContainer.startRendering()
}
override func viewWillDisappear(_ animated: Bool) {
super.viewWillDisappear(animated)
renderContainer.stopRendering()
}
Observations & Issues
During view controller transitions, one MTKView must stop rendering before the other can start. Also there is no guarantee that the old view will stop rendering before the new one starts, with the current API design.
This creates a visual "pop" during animated transitions
Setting isPaused = true helps prevent unnecessary render calls but doesn't solve the core delegate ownership problem
The shared renderer maintains its state but can only output to one view at a time
Questions
What's the recommended approach for handling MTKView delegate ownership during animated transitions?
Are there ways to maintain visual continuity without complex view hierarchies?
Should I consider alternative architectures for sharing the Metal content between views?
Any insights for this scenario would be appreciated.
I am using SCNTechnique in combination with ARSCNView.
The technique is doing so minor post-processing. I have written several filter variant for this post-processing, but I'm facing an issue when with one of the filters/fragment shaders, SCNTechnique discards my output and just presents the plain camera feed on screen instead. This is clearly visible in the Metal pipeline, using the GPU frame debugger.
Let me stress that my setup works for 90% of my filters, but not this one and I want to know why.
iOS 18.1, iPhone 13 Mini. Xcode 16.1.
Encoder 0 & 1 are injected by the system.
Render encoder 2 & 3 correspond to my SCNTechnique's render passes: one to manipulate pixel data (darken it in this case) and another to BLIT it back to the main texture. I know the separate buffer is not strictly for this particular operation, but it shouldn't matter.
Note that the issue occurs in encoder 4 (not mine but ARKit's).
In Render Encoder 4, scn_postprocess_AR_fragment handle my texture (#0, ending in f980) and another from the camera feed (Texture 2). I know this pass is typically used for grain because that's what it used to do before I disabled grain on ARSCNView (+ the buffer still contains grain paramaters).
I have other post-processing filters that work just fine. By what magic is ARKit determining to use Texture 2 instead of my Texture 0?
Sure, I could keep digging into the minute differences between my shaders to find out which LoC affects how some ARKit shader down the line operates, but it's awfully opaque so far.
I plan to create a simple motion graphics software for macOS that animates text, basic shapes, and handles audio. I'll use SwiftUI for the UI.
What are the commonly used technologies for rendering animated graphics? Core Animation is suitable for UI animations but not for exporting and controlling UI animations.
Basic requirements:
Timeline user interface
Animation of text and basic shapes
Viewer in SwiftUI GUI with transport control (play, pause, scrub, …)
Export to video file
Is Metal or Core Graphics typically used directly? I want to keep it as simple as possible.
I would like to take YCbCr CVPixelBuffers from AVCaptureVideoDataOutput, apply some processing in RGB space, render to an MTKView, and pass to AVAssetWriter for recording.
Right now, I'm doing this all manually – deswing the incoming data if necessary, choose the right matrix to convert to RGB, apply processing, etc. I also have to convert back to YCbCr before feeding the frames to AVAssetWriter because encoding performs much better if I do. Is there any efficient, built-in way to achieve the same?
I can't use AVCaptureVideoPreviewLayer, since I need to do some further processing before display. I can't use AVCaptureVideoDataOutput's videoSettings to get automatic BGRA conversion because that would lose bit depth for 10 bit video formats (and isn't available on all formats anyway).
I see these Accelerate functions, but they seemingly don't use the GPU, nor do they support all the formats and bit depths I'd need.
I found reference to some undocumented MTLPixelFormats that seem to do exactly what I want, but I don't want to rely on something like this unless it's explicitly endorsed. This would also incur an RGB/YCbCr conversion on every texture read and write, right?
Is there anything I'm missing here?
I am building a MacOS desktop app (https://anukari.com) that is using Metal compute to do real-time audio/DSP processing, as I have a problem that is highly parallelizable and too computationally expensive for the CPU.
However it seems that the way in which I am using the GPU, even when my app is fully compute-limited, the OS never increases the power/performance state. Because this is a real-time audio synthesis application, it's a huge problem to not be able to take advantage of the full clock speeds that the GPU is capable of, because the app can't keep up with real-time.
I discovered this issue while profiling the app using Instrument's Metal tracing (and Game tracing) modes. In the profiling configuration under "Metal Application" there is a drop-down to select the "Performance State." If I run the application under Instruments with Performance State set to Maximum, it runs amazingly well, and all my problems go away.
For comparison, when I run the app on its own, outside of Instruments, the expensive GPU computation it's doing takes around 2x as long to complete, meaning that the app performs half as well.
I've done a ton of work to micro-optimize my Metal compute code, based on every scrap of information from the WWDC videos, etc. A problem I'm running into is that I think that the more efficient I make my code, the less it signals to the OS that I want high GPU clock speeds!
I think part of why the OS is confused is that in most use cases, my computation can be done using only a small number of Metal threadgroups. I'm guessing that the OS heuristics see that only a small fraction of the GPU is saturated and fail to scale up the power/clock state.
I'm not sure what to do here; I'm in a bit of a bind. One possibility is that I intentionally schedule busy work -- spin threadgroups just to waste energy and signal to the OS that I need higher clock speeds. This is obviously a really bad idea, but it might work.
Is there any other (better) way for my app to signal to the OS that it is doing real-time latency-sensitive computation on the GPU and needs the clock speeds to be scaled up?
Note that game mode is not really an option, as my app also runs as an AU plugin inside hosts like Garageband, so it can't be made fullscreen, etc.
I have to guess which command queue is the one I want to debug in Xcode here:
I have label set, but it doesn't seem to affect this part of Xcode.
I have multiple CAMetalLayers that I render content to and noticed that the graphics overview HUD does not function properly when I have more than one CAMetalLayer. The values reported will be very strange. For example, FPS may report 999 or some large negative value. It the HUD simply not designed to work with multiple CAMetalLayers or MTKViews? When I disable all but one of my CAMetalLayers, the HUD works as expected.