drewcrawford’s Profile | Apple Developer Forums

How does the performance of MTLIOCommandQueue compare to GCD?

I am implementing asset pipelines for a metal app and am deciding between the new Metal IO support and more traditional approaches like dispatch_io. Does the MTLCommandQueue use GCD as its backend, or how do the approaches compare?

Posted

by

drewcrawford.

Last updated

.

`assert` in metal?

Is there an equivalent of assert in metal? I'm looking for a way to abnormally halt a shader and indicate it ought to be debugged. assert itself is defined ((void) 0), so that won't work. __builtin_trap() will halt execution, but there's no indication the shader was trapped other than lack of side effects of it completing. I'm wondering if there is some "other" way to trap, like emitting an invalid instruction into the shader, or doing something the metal API validator will flag as illegal.

Metal

Posted

by

drewcrawford.

Last updated

.

Creating a custom stack type with Child : Protocol

I'm trying to create a custom Stack type for my application, exploiting the fact that its children conform to a custom protocol. Desired end-goal is DStack { 		MyView() 		MyView2() 		//etc. } where MyView, MyView 2, etc. conform to some protocol, and in the implementation of DStack we use the protocol in some way on the child elements. For example, we apply a similar configuration to the elements, or interrogate the elements to configure the stack itself. For my first attempt: import SwiftUI import PlaygroundSupport protocol Foo: View { func foo() } struct MyView1: Foo { var body: some View { Text("Hello world") } func foo() { preconditionFailure("implement this") } } struct DStack<Content>: View where Content: Foo { let builder: () -> Content init(@ViewBuilder content: @escaping () -> Content) { builder = content } var body: some View { ZStack(content: builder) //additional customization here exploiting conformance to Foo } } struct Demo: View { var body: some View { DStack { MyView1() MyView1() } } } PlaygroundPage.current.setLiveView(Demo()) This produces Generic struct 'DStack' requires that 'TupleView<(MyView1, MyView1)>' conform to 'Foo' Ok, fair enough. The obvious solution here is to apply a conditional conformance TupleView: Foo, which very roughly would be //conform TupleView.T extension (T,V): Foo where T: Foo, V: Foo { ...} //conditionally conform TupleView extension TupleView: Foo where T: Foo { ... } However, since in Swift tuples are non-nominal we can't do this. Maybe the problem here is TupleView, and I need to traffic in my own view-group type, perhaps that models its storage as an array or something. This would require my own functionbuilder. Here's a sketch of that... @_functionBuilder struct FooBuilder { static func buildBlock(_ children: Foo...) -> Foo { //... } } Protocol 'Foo' can only be used as a generic constraint because it has Self or associated type requirements So we need existentials for both the arguments and return type. The arguments are "straightforward", just create a type-erased wrapper. The return value... cannot be erased, because eventually we need to create a SwiftUI ZStack, and it wants to know how many arguments there are and so on. I'm stumped. What's the right way to do this?

SwiftUI

Posted

by

drewcrawford.

Last updated

.

Float16

Already filed FB7747586 for this, but I’d be interested to know if there’s a good way to use simd functions on half-precision (that is, __fp16 or Float16) types. I have a codebase that I maintain in both C and Metal, and although half is supported on the Metal side, and although Swift now has a Float16 type, the simd framework functions don’t know anything about half-precision and so code that works on one can’t be easily ported.

simd

Posted

by

drewcrawford.

Last updated

.

Synchronizing MTLBuffer in a reader/writer pattern.

I have 2 MTLCommandQueue on 1 device. Each queue involve several passes and they run at different frequencies. One queue ("the writer") runs more slowly and in some stages writes data to a buffer, the other queue ("the renderer") has a pass that renders this data. It is ok (and expected, due to frequency mismatch) that the renderer will render several frames with the same data in the buffer. What isn't ok is rendering a buffer that is in the process of being written. In psueudocode let encoder = writerQueue.makeCommandEncoder()! encodePass1(encoder) encodePass2(encoder) //do some operation to lock the MTLBuffer encodePassPartiallyWritingToBuffer(encoder) encodePassCompletingWriteToBuffer(encoder) //do some operation to unlock the MTLBuffer encodePass5() encoder.endEncoding() commandBuffer.commit() And then on the render side, similarly let encoder = rendererQueue.makeCommandEncoder()! draw1(encoder) draw2(encoder) //do some operation to lock the MTLBuffer encodePassRenderingBuffer(encoder) //do some operation to unlock the MTLBuffer draw4(encoder) encoder.endEncoding() commandBuffer.commit() Using MTLFence to synchronize would require a single command queue, and I think a wait/signal pattern would force these into running at the same frequency with 1 read : 1 write, which isn't what I want. I could use encodeSignalEvent(event, 1) to indicate we are done writing the buffer, but I'm not sure how to use encodeWaitForEvent(event, ?) to limit execution to either the writer or the reader. There doesn't seem to be "semaphore wait" style operator. What's the "right" way to synchronize this?

Metal

Posted

by

drewcrawford.

Last updated

.

viewport-based culling for metal shaders

I have a similar geometry culling question to this - https://developer.apple.com/forums/thread/133745, but my situation is more about culling geometries outside the viewport. In my case I have a custom geometry, let's say it's a 2d polyline with fixed points for simplicity. (In reality it's a bit more complex than this, the vertices require some computation and sometimes there are faces but rarely they overlap so depth-based culling is of limited help.) typedef struct PolyLine { 		simd_float2 start; 		simd_float2 a; 		simd_float2 b; 		simd_float2 end; } I have these in a device PolyLine *buffer and encode a draw call with _commandEncoder.drawPrimitives(type: .lineStrip, vertexStart: 0, vertexCount: 4, instanceCount: 50000) If you're zoomed in very closely on this scene, possibly most of the geometry is well outside the viewing area. Or in some cases geometry could be partially visible (such as start->a), but not the rest (a->b and b->end). For geometries with more vertices it is more likely that a very small number of the full geometry is visible. Often, I don't cull this at all. Other times if the shader is more expensive I will do some kind of bailout check on the whole instance as a prelude in the vertex shader. If the instance is invisible, I choose a vertex output position with some constant value outside the viewport. What is the best practice for avoiding unnecessary work here? Should I be indirectly encoding draws for each visible instance, or does that introduce more overhead? Is there a best-practice way to tell Metal that a vertex (or an instance) can be discarded or is picking some faraway position ok?

Metal

Posted

by

drewcrawford.

Last updated

.

Buffer preloading failed

Someone in labs suggested I post here. I have this shader, which is derived from an old apple sample project - https://developer.apple.com/documentation/metal/basic_tasks_and_concepts/performing_calculations_on_a_gpu. I can post my updated project if helpful, it's also attached to FB7741493. kernel void add_arrays(constant const float* inA, constant const float* inB, device float* result, uint index [[thread_position_in_grid]]) { //work around "prevented device address mode store/load" int index_signed = index; result[index_signed] = inA[index_signed] + inB[index_signed]; } Xcode gives me the following remark: Buffer Preloading Failed Make sure your data size is a multiple of 4 bytes and aligned to 4 bytes and try using a simple access pattern.	For constant buffers, try using a fixed buffer size. inA could not be promoted inB could not be promoted result could not be promoted What is this remark trying to warn me about? Floats are 4 bytes and aligned to 4 bytes. This is a very simple access pattern. Is this the best practice to lay out data for this kind of computation, or how could it be improved? I also tried using texture buffers (don't have this remark, but not clearly faster and often slower), and providing the input data in a [[stage_in]] (shows a similar remark).

Metal

Posted

by

drewcrawford.

Last updated

.

User Profile

drewcrawford

Posts

Posts

How does the performance of MTLIOCommandQueue compare to GCD?

`assert` in metal?

Creating a custom stack type with Child : Protocol

Float16

Synchronizing MTLBuffer in a reader/writer pattern.

viewport-based culling for metal shaders

Buffer preloading failed