Posts

Post not yet marked as solved
5 Replies
2.2k Views
I'm trying to obtain the intrinsic matrix for each video frame of AVCaptureSession (the same intrisic matrix as ARKit provides), however the isCameraIntrinsicMatrixDeliverySupported property of AVCaptureConnection is false in my use-case.The documentation of the property says "This property's value is true only if both the connection's input device format and output class support delivery of camera intrinsics."How do I know which device formats support delivery of intrinsic matrix? What do I need to do to be able to enable the intrinsic matrix delivery?Simple code to illustrate my problem:import UIKit import AVFoundation class ViewController: UIViewController { var sess: AVCaptureSession! var sessOut: AVCaptureVideoDataOutput! var prevLayer: AVCaptureVideoPreviewLayer! override func viewDidLoad() { super.viewDidLoad() sess = AVCaptureSession() let device = AVCaptureDevice.default(.builtInWideAngleCamera, for: AVMediaType.video, position: .back) let input = try! AVCaptureDeviceInput(device: device!) sess.addInput(input) sessOut = AVCaptureVideoDataOutput() sess.addOutput(sessOut) sessOut.connections.first?.videoOrientation = .landscapeRight sessOut.connections.first?.preferredVideoStabilizationMode = .cinematic print(sessOut.connections.first?.isCameraIntrinsicMatrixDeliverySupported) // <-- false - why? prevLayer = AVCaptureVideoPreviewLayer(session: sess) prevLayer.frame = self.view.frame prevLayer.videoGravity = .resizeAspectFill prevLayer.connection?.videoOrientation = .landscapeRight self.view.layer.addSublayer(prevLayer) sess.startRunning() } }
Posted Last updated
.
Post not yet marked as solved
2 Replies
5.2k Views
Situation (simplified):There're 3 Metal buffers (created on app initialization via MTLDevice – not via MTLHeap);Buffer A: contains total of n float3 elements (which may represent particle positions)Buffer B: double-buffered ABuffer C: contains total of n float elements (i-th element of B corresponds to i-th element of A (and also B))I also have 3 kernels (Metal compute functions), which maipulate the 3 buffers;Kernel preprocess(A, B): its parameters are A and B. It reads (old) values from A and writes updated values into B. [1 thread = 1 element]Kernel process(B, C): this is a very computation-expensive/time-comsuming kernel. (Just to be clear: each element of buffer B represents particle position.) For each element/particle from B, the kernel finds the particle's nearest neighbors (i.e. their positions). After the neighbors are found, it reduces their positions into one float value. The float value is written to buffer C. [1 thread = 1 element (i.e. particle)]Kernel postprocess(A, B, C): it reads element from B and C, combines them in a simple way and writes the result back to A. (Something like A[i] = B[i]*C[i])[1 thread = 1 element]Problem:The kernel process takes serious amount of time to complete (Because for each particle it iterates over a range of B's values. Particles that are near (i.e. adjacent threads) iterate over very similar ranges in B – there's heavy memory reuse of B between adjacent kernels). The kernel is defined similarly to this:kernel void process(..., device float3* B, ...) {...}Question:Since the kernel process loads repeteadly the same regions of B (even though I use threadgroup memory) and B is read-only in this kernel, I thought it'd be a good idea for performance improvement to cache the buffer B – to transfer it into a highly cached portion of memory optimized for repeated reads. AFAIK such portion of memory is the constant address space. Is it possible to do the following? And if it is, then how?Call the kernel preprocess(A, B) [this updates the contents of buffer B – hence buffer B cannot be in the constant address space]Transfer the buffer B (containing the new, updated values from previous step) into constant address space. <——— How do I do this?Call the kernel process(B, C) [read repeatedly values from the buffer B which is in constant address space (improving performance)]Call the kernel postprocess(A, B, C)Extra question:When I create an buffer at the app initialization, I don't specify what address space it belongs to, so when is it decided that a buffer belongs into device or constant address space? Is it per Command Buffer, per Encoder or per each individual Pipeline state of one Encoder? Or does a buffer's position in memory address spaces change throughout the app lifetime?Example: Say I have 1 buffer and 2 kernels:// CPU side: var data = device.newBuffer(...) commandEncoder.setComputePipelineState(fooPipeline); commandEncoder.setBuffer(data, offset: 0, at: 0) commandEncoder.dispatchThreadgroups(...) commandEncoder.setComputePipelineState(barPipeline); commandEncoder.setBuffer(data, offset: 0, at: 0) commandEncoder.dispatchThreadgroups(...) commandEncoder.endEncoding() // GPU side kernel foo(device float* data){...} // foo writes to data kernel bar(constant float& data){...} // bar only reads from dataWhere is the buffer data stored (in the device or the (cached) constant address space space)? What would happen when I swap the order in which are foo and bar dispatched?
Posted Last updated
.