Below, the sampleBufferProcessor closure is where the Vision body pose detection occurs.
/// Transfers the sample data from the AVAssetReaderOutput to the AVAssetWriterInput,
/// processing via a CMSampleBufferProcessor.
///
/// - Parameters:
/// - readerOutput: The source sample data.
/// - writerInput: The destination for the sample data.
/// - queue: The DispatchQueue.
/// - completionHandler: The completion handler to run when the transfer finishes.
/// - Tag: transferSamplesAsynchronously
private func transferSamplesAsynchronously(from readerOutput: AVAssetReaderOutput,
to writerInput: AVAssetWriterInput,
onQueue queue: DispatchQueue,
sampleBufferProcessor: SampleBufferProcessor,
completionHandler: @escaping () -> Void) {
/*
The writerInput continously invokes this closure until finished or
cancelled. It throws an NSInternalInconsistencyException if called more
than once for the same writer.
*/
writerInput.requestMediaDataWhenReady(on: queue) {
var isDone = false
/*
While the writerInput accepts more data, process the sampleBuffer
and then transfer the processed sample to the writerInput.
*/
while writerInput.isReadyForMoreMediaData {
if self.isCancelled {
isDone = true
break
}
// Get the next sample from the asset reader output.
guard let sampleBuffer = readerOutput.copyNextSampleBuffer() else {
// The asset reader output has no more samples to vend.
isDone = true
break
}
// Process the sample, if requested.
do {
try sampleBufferProcessor?(sampleBuffer)
} catch {
/*
The `readingAndWritingDidFinish()` function picks up this
error.
*/
self.sampleTransferError = error
isDone = true
}
// Append the sample to the asset writer input.
guard writerInput.append(sampleBuffer) else {
/*
The writer could not append the sample buffer.
The `readingAndWritingDidFinish()` function handles any
error information from the asset writer.
*/
isDone = true
break
}
}
if isDone {
/*
Calling `markAsFinished()` on the asset writer input does the
following:
1. Unblocks any other inputs needing more samples.
2. Cancels further invocations of this "request media data"
callback block.
*/
writerInput.markAsFinished()
/*
Tell the caller the reader output and writer input finished
transferring samples.
*/
completionHandler()
}
}
}
The processor closure runs body pose detection on every sample buffer so that later in the VNDetectHumanBodyPoseRequest completion handler, VNHumanBodyPoseObservation results are fed into a custom Core ML action classifier.
private func videoProcessorForActivityClassification() -> SampleBufferProcessor {
let videoProcessor: SampleBufferProcessor = { sampleBuffer in
do {
let requestHandler = VNImageRequestHandler(cmSampleBuffer: sampleBuffer)
try requestHandler.perform([self.detectHumanBodyPoseRequest])
} catch {
print("Unable to perform the request: \(error.localizedDescription).")
}
}
return videoProcessor
}
How could I improve the performance of this pipeline? After testing with an hour long 4K video at 60 FPS, it took several hours to process running as a Mac Catalyst app on M1 Max.
Post
Replies
Boosts
Views
Activity
I just got an app feature working where the user imports a video file, each frame is fed to a custom action classifier, and then only frames with a certain action classified are exported.
However, I'm finding that testing a one hour 4K video at 60 FPS is taking an unreasonably long time - it's been processing for 7 hours now on a MacBook Pro with M1 Max running the Mac Catalyst app. Are there any techniques or general guidance that would help with improving performance? As much as possible I'd like to preserve the input video quality, especially frame rate. One hour length for the video is expected, as it's of a tennis session (could be anywhere from 10 minutes to a couple hours). I made the body pose action classifier with Create ML.
After creating a custom action classifier in Create ML, previewing it (see the bottom of the page) with an input video shows the label associated with a segment of the video. What would be a good way to store the duration for a given label, say, each CMTimeRange of segment of video frames that are classified as containing "Jumping Jacks?"
I previously found that storing time ranges of trajectory results was convenient, since each VNTrajectoryObservation vended by Apple had an associated CMTimeRange.
However, using my custom action classifier instead, each VNObservation result's CMTimeRange has a duration value that's always 0.
func completionHandler(request: VNRequest, error: Error?) {
guard let results = request.results as? [VNHumanBodyPoseObservation] else {
return
}
if let result = results.first {
storeObservation(result)
}
do {
for result in results where try self.getLastTennisActionType(from: [result]) == .playing {
var fileRelativeTimeRange = result.timeRange
fileRelativeTimeRange.start = fileRelativeTimeRange.start - self.assetWriterStartTime
self.timeRangesOfInterest[Int(fileRelativeTimeRange.start.seconds)] = fileRelativeTimeRange
}
} catch {
print("Unable to perform the request: \(error.localizedDescription).")
}
}
In this case I'm interested in frames with the label "Playing" and successfully classify them, but I'm not sure where to go from here to track the duration of video segments with consecutive frames that have that label.
Modifying guidance given in an answer on AVFoundation + Vision trajectory detection, I'm instead saving time ranges of frames that have a specific ML label from my custom action classifier:
private lazy var detectHumanBodyPoseRequest: VNDetectHumanBodyPoseRequest = {
let detectHumanBodyPoseRequest = VNDetectHumanBodyPoseRequest(completionHandler: completionHandler)
return detectHumanBodyPoseRequest
}()
var timeRangesOfInterest: [Int : CMTimeRange] = [:]
private func readingAndWritingDidFinish(assetReaderWriter: AVAssetReaderWriter,
asset
completionHandler: @escaping FinishHandler) {
if isCancelled {
completionHandler(.success(.cancelled))
return
}
// Handle any error during processing of the video.
guard sampleTransferError == nil else {
assetReaderWriter.cancel()
completionHandler(.failure(sampleTransferError!))
return
}
// Evaluate the result reading the samples.
let result = assetReaderWriter.readingCompleted()
if case .failure = result {
completionHandler(result)
return
}
/*
Finish writing, and asynchronously evaluate the results from writing
the samples.
*/
assetReaderWriter.writingCompleted { result in
self.exportVideoTimeRanges(timeRanges: self.timeRangesOfInterest.map { $0.value }) { result in
completionHandler(result)
}
}
}
func exportVideoTimeRanges(timeRanges: [CMTimeRange], completion: @escaping (Result<OperationStatus, Error>) -> Void) {
let inputVideoTrack = self.asset.tracks(withMediaType: .video).first!
let composition = AVMutableComposition()
let compositionTrack = composition.addMutableTrack(withMediaType: .video, preferredTrackID: kCMPersistentTrackID_Invalid)!
var insertionPoint: CMTime = .zero
for timeRange in timeRanges {
try! compositionTrack.insertTimeRange(timeRange, of: inputVideoTrack, at: insertionPoint)
insertionPoint = insertionPoint + timeRange.duration
}
let exportSession = AVAssetExportSession(asset: composition, presetName: AVAssetExportPresetHighestQuality)!
try? FileManager.default.removeItem(at: self.outputURL)
exportSession.outputURL = self.outputURL
exportSession.outputFileType = .mov
exportSession.exportAsynchronously {
var result: Result<OperationStatus, Error>
switch exportSession.status {
case .completed:
result = .success(.completed)
case .cancelled:
result = .success(.cancelled)
case .failed:
// The `error` property is non-nil in the `.failed` status.
result = .failure(exportSession.error!)
default:
fatalError("Unexpected terminal export session status: \(exportSession.status).")
}
print("export finished: \(exportSession.status.rawValue) - \(exportSession.error)")
completion(result)
}
}
This worked fine with results vended from Apple's trajectory detection, but using my custom action classifier TennisActionClassifier (Core ML model exported from Create ML), I get the console error getSubtractiveDecodeDuration signalled err=-16364 (kMediaSampleTimingGeneratorError_InvalidTimeStamp) (Decode timestamp is earlier than previous sample's decode timestamp.) at MediaSampleTimingGenerator.c:180. Why might this be?
I followed Apple's guidance in their articles Creating an Action Classifier Model, Gathering Training Videos for an Action Classifier, and Building an Action Classifier Data Source. With this Core ML model file now imported in Xcode, how do use it to classify video frames?
For each video frame I call
do {
let requestHandler = VNImageRequestHandler(cmSampleBuffer: sampleBuffer)
try requestHandler.perform([self.detectHumanBodyPoseRequest])
} catch {
print("Unable to perform the request: \(error.localizedDescription).")
}
But it's unclear to me how to use the results of the VNDetectHumanBodyPoseRequest which come back as the type [VNHumanBodyPoseObservation]?. How would I feed to the results into my custom classifier, which has an automatically generated model class TennisActionClassifier.swift? The classifier is for making predictions on the frame's body poses, labeling the actions as either playing a rally/point or not playing.
My goal is to mark any tennis video's timestamps of both the start of each rally/point and the end of each rally/point. I tried trajectory detection, but the "end time" is when the ball bounces rather than when the rally/point ends. I'm not quite sure what direction to go from here to improve on this. Would action classification of body poses in each frame (two classes, "playing" and "not playing") be the best way to split the video into segments? A different technique?
I'm building a feature to automatically edit out all the downtime of a tennis video. I have a partial implementation that stores the start and end times of Vision trajectory detections and writes only those segments to an AVFoundation export session.
I've encountered a major issue, which is that the trajectories returned end whenever the ball bounce, so each segment is just one tennis shot and nowhere close to an entire rally with multiple bounces. I'm ensure if I should continue done the trajectory route, maybe stitching together the trajectories and somehow only splitting at the start and end of a rally.
Any general guidance would be appreciated.
Is there a different Vision or ML approach that would more accurately model the start and end time of a rally? I considered creating a custom action classifier to classify frames to be either "playing tennis" or "inactivity," but I started with Apple's trajectory detection since it was already built and trained. Maybe a custom classifier would be needed, but not sure.
I am saving time ranges from an input video asset where trajectories are found, then exporting only those segments to an output video file.
Currently I track these time ranges in a stored property var timeRangesOfInterest: [Double : CMTimeRange], which is set in the trajectory request's completion handler
func completionHandler(request: VNRequest, error: Error?) {
guard let request = request as? VNDetectTrajectoriesRequest else { return }
if let results = request.results,
results.count > 0 {
for result in results {
var timeRange = result.timeRange
timeRange.start = timeRange.start - self.assetWriterStartTime
self.timeRangesOfInterest[timeRange.start.seconds] = timeRange
}
}
}
Then these time ranges of interest are used in an export session to only export those segments
/*
Finish writing, and asynchronously evaluate the results from writing
the samples.
*/
assetReaderWriter.writingCompleted { result in
self.exportVideoTimeRanges(timeRanges: self.timeRangesOfInterest.map { $0.1 }) { result in
completionHandler(result)
}
}
Unfortunately however, I'm getting repeated trajectory video segments in the outputted video. Is this maybe because trajectory requests return "in progress" repeated trajectory results with slightly different time range start times? What might be a good strategy for avoiding or removing them? I noticed trajectory segments will appear out of order in the output as well.
Given an AVAsset, I'm performing a Vision trajectory request on it and would like to write out a video asset that only contains frames with trajectories (filter out downtime in sports footage where there's no ball moving).
I'm unsure what would be a good approach, but as a starting point I tried the following pipeline:
Copy sample buffer from the source AVAssetReaderOutput.
Perform trajectory request on a vision handler parameterized by the sample buffer.
For each resulting VNTrajectoryObservation (trajectory detected), use its associated CMTimeRange to configure a new AVAssetReader set to that time range.
Append the time range constrained sample buffer to one AVAssetWriterInput until the forEach is complete.
In code:
private func transferSamplesAsynchronously(from readerOutput: AVAssetReaderOutput,
to writerInput: AVAssetWriterInput,
onQueue queue: DispatchQueue,
sampleBufferProcessor: SampleBufferProcessor,
completionHandler: @escaping () -> Void) {
/*
The writerInput continously invokes this closure until finished or
cancelled. It throws an NSInternalInconsistencyException if called more
than once for the same writer.
*/
writerInput.requestMediaDataWhenReady(on: queue) {
var isDone = false
/*
While the writerInput accepts more data, process the sampleBuffer
and then transfer the processed sample to the writerInput.
*/
while writerInput.isReadyForMoreMediaData {
if self.isCancelled {
isDone = true
break
}
// Get the next sample from the asset reader output.
guard let sampleBuffer = readerOutput.copyNextSampleBuffer() else {
// The asset reader output has no more samples to vend.
isDone = true
break
}
let visionHandler = VNImageRequestHandler(cmSampleBuffer: sampleBuffer, orientation: self.orientation, options: [:])
do {
try visionHandler.perform([self.detectTrajectoryRequest])
if let results = self.detectTrajectoryRequest.results {
try results.forEach { result in
let assetReader = try AVAssetReader(asset: self.asset)
assetReader.timeRange = result.timeRange
let trackOutput = AVTrackOutputs.firstTrackOutput(ofType: .video, fromTracks: self.asset.tracks,
withOutputSettings: nil)
assetReader.add(trackOutput)
assetReader.startReading()
guard let sampleBuffer = trackOutput.copyNextSampleBuffer() else {
// The asset reader output has no more samples to vend.
isDone = true
return
}
// Append the sample to the asset writer input.
guard writerInput.append(sampleBuffer) else {
/*
The writer could not append the sample buffer.
The `readingAndWritingDidFinish()` function handles any
error information from the asset writer.
*/
isDone = true
return
}
}
}
} catch {
print(error)
}
}
if isDone {
/*
Calling `markAsFinished()` on the asset writer input does the
following:
1. Unblocks any other inputs needing more samples.
2. Cancels further invocations of this "request media data"
callback block.
*/
writerInput.markAsFinished()
/*
Tell the caller the reader output and writer input finished
transferring samples.
*/
completionHandler()
}
}
}
private func readingAndWritingDidFinish(assetReaderWriter: AVAssetReaderWriter,
completionHandler: @escaping FinishHandler) {
if isCancelled {
completionHandler(.success(.cancelled))
return
}
// Handle any error during processing of the video.
guard sampleTransferError == nil else {
assetReaderWriter.cancel()
completionHandler(.failure(sampleTransferError!))
return
}
// Evaluate the result reading the samples.
let result = assetReaderWriter.readingCompleted()
if case .failure = result {
completionHandler(result)
return
}
/*
Finish writing, and asynchronously evaluate the results from writing
the samples.
*/
assetReaderWriter.writingCompleted { result in
completionHandler(result)
return
}
}
When run I get the following:
No error is caught in the first catch clause, and none are caught in private func readingAndWritingDidFinish(assetReaderWriter: AVAssetReaderWriter, completionHandler: @escaping FinishHandler), the completion handler is called.
Help with any of the following questions would be appreciated:
What is causing what appears to be indefinite loading?
How might I isolate the problem further?
Am I misusing or misunderstanding how to selectively read from time ranges of AVAssetReader objects?
Should I forego the AVAssetReader / AVAsssetWriter route entirely, and use the time ranges with AVAssetExportSession instead? I don't know how the two approaches compare, or what to consider when choosing between the two.
Say you have a pinch gesture recognizer and pan gesture recognizer on an image view:
@IBAction func pinchPiece(_ pinchGestureRecognizer: UIPinchGestureRecognizer) {
guard pinchGestureRecognizer.state == .began || pinchGestureRecognizer.state == .changed,
let piece = pinchGestureRecognizer.view else {
// After pinch releases, zoom back out.
if pinchGestureRecognizer.state == .ended {
UIView.animate(withDuration: 0.3, animations: {
pinchGestureRecognizer.view?.transform = CGAffineTransform.identity
})
}
return
}
adjustAnchor(for: pinchGestureRecognizer)
let scale = pinchGestureRecognizer.scale
piece.transform = piece.transform.scaledBy(x: scale, y: scale)
pinchGestureRecognizer.scale = 1 // Clear scale so that it is the right delta next time.
}
@IBAction func panPiece(_ panGestureRecognizer: UIPanGestureRecognizer) {
guard panGestureRecognizer.state == .began || panGestureRecognizer.state == .changed,
let piece = panGestureRecognizer.view else {
return
}
let translation = panGestureRecognizer.translation(in: piece.superview)
piece.center = CGPoint(x: piece.center.x + translation.x, y: piece.center.y + translation.y)
panGestureRecognizer.setTranslation(.zero, in: piece.superview)
}
public func gestureRecognizer(_ gestureRecognizer: UIGestureRecognizer,
shouldRecognizeSimultaneouslyWith otherGestureRecognizer: UIGestureRecognizer) -> Bool {
true
}
The pinch gesture's view resets to its original state after the gesture is done, which occurs in its else clause. What would be a good way to do the same for the pan gesture recognizer? Ideally I'd like the gesture recognizers to be in an extension of UIImageView, which would also mean that I can't add a store property to the extension for tracking the initial state of the image view.
How do you only accept pan gestures when the user is in the process of a pinch gesture? In other words, I'd like to avoid delivering one finger pan gestures.
@IBAction func pinchPiece(_ pinchGestureRecognizer: UIPinchGestureRecognizer) {
guard pinchGestureRecognizer.state == .began || pinchGestureRecognizer.state == .changed,
let piece = pinchGestureRecognizer.view else {
// After pinch releases, zoom back out.
if pinchGestureRecognizer.state == .ended {
UIView.animate(withDuration: 0.3, animations: {
pinchGestureRecognizer.view?.transform = CGAffineTransform.identity
})
}
return
}
adjustAnchor(for: pinchGestureRecognizer)
let scale = pinchGestureRecognizer.scale
piece.transform = piece.transform.scaledBy(x: scale, y: scale)
pinchGestureRecognizer.scale = 1 // Clear scale so that it is the right delta next time.
}
@IBAction func panPiece(_ panGestureRecognizer: UIPanGestureRecognizer) {
guard panGestureRecognizer.state == .began || panGestureRecognizer.state == .changed,
let piece = panGestureRecognizer.view else {
return
}
let translation = panGestureRecognizer.translation(in: piece.superview)
piece.center = CGPoint(x: piece.center.x + translation.x, y: piece.center.y + translation.y)
panGestureRecognizer.setTranslation(.zero, in: piece.superview)
}
public func gestureRecognizer(_ gestureRecognizer: UIGestureRecognizer,
shouldRecognizeSimultaneouslyWith otherGestureRecognizer: UIGestureRecognizer) -> Bool {
true
}
Apple's sample code "AVReaderWriter: Offline Audio / Video Processing" has the following listing
let writingGroup = dispatch_group_create()
// Transfer data from input file to output file.
self.transferVideoTracks(videoReaderOutputsAndWriterInputs, group: writingGroup)
self.transferPassthroughTracks(passthroughReaderOutputsAndWriterInputs, group: writingGroup)
// Handle completion.
let queue = dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0)
dispatch_group_notify(writingGroup, queue) {
// `readingAndWritingDidFinish()` is guaranteed to call `finish()` exactly once.
self.readingAndWritingDidFinish(assetReader, assetWriter: assetWriter)
}
in CynanifyOperation.swift (an NSOperation subclass that stylizes imported video and exports it). How would I get about writing this part in modern Swift so that it compiles and works?
I've tried writing this as
let writingGroup = DispatchGroup()
// Transfer data from input file to output file.
self.transferVideoTracks(videoReaderOutputsAndWriterInputs: videoReaderOutputsAndWriterInputs, group: writingGroup)
self.transferPassthroughTracks(passthroughReaderOutputsAndWriterInputs: passthroughReaderOutputsAndWriterInputs, group: writingGroup)
// Handle completion.
writingGroup.notify(queue: .global()) {
// `readingAndWritingDidFinish()` is guaranteed to call `finish()` exactly once.
self.readingAndWritingDidFinish(assetReader: assetReader, assetWriter: assetWriter)
}
However, it's taking an extremely long time for self.readingAndWritingDidFinish(assetReader: assetReader, assetWriter: assetWriter) to be called, and my UI is stuck in the ProgressViewController with a loading spinner. Is there something I wrote incorrectly or missed conceptually in the Swift 5 version?
Apple's sample code Identifying Trajectories in Video contains the following delegate callback:
func cameraViewController(_ controller: CameraViewController, didReceiveBuffer buffer: CMSampleBuffer, orientation: CGImagePropertyOrientation) {
let visionHandler = VNImageRequestHandler(cmSampleBuffer: buffer, orientation: orientation, options: [:])
if gameManager.stateMachine.currentState is GameManager.TrackThrowsState {
DispatchQueue.main.async {
// Get the frame of rendered view
let normalizedFrame = CGRect(x: 0, y: 0, width: 1, height: 1)
self.jointSegmentView.frame = controller.viewRectForVisionRect(normalizedFrame)
self.trajectoryView.frame = controller.viewRectForVisionRect(normalizedFrame)
}
// Perform the trajectory request in a separate dispatch queue.
trajectoryQueue.async {
do {
try visionHandler.perform([self.detectTrajectoryRequest])
if let results = self.detectTrajectoryRequest.results {
DispatchQueue.main.async {
self.processTrajectoryObservations(controller, results)
}
}
} catch {
AppError.display(error, inViewController: self)
}
}
}
}
However, instead of drawing UI whenever detectTrajectoryRequest.results exist (https://developer.apple.com/documentation/vision/vndetecttrajectoriesrequest/3675672-results), I'm interested in using the CMTimeRange provided by each result to construct a new video. In effect, this would filter down the original video to only frames with trajectories. How might I accomplish this, perhaps through writing only specific time ranges' frames from one AVFoundation video to a new AVFoundation video?
I'd like a user's upload operation that's started in the foreground to continue when they leave the app. Apple's article Extending Your App's Background Execution Time has the following code listing
func sendDataToServer( data : NSData ) {
// Perform the task on a background queue.
DispatchQueue.global().async {
// Request the task assertion and save the ID.
self.backgroundTaskID = UIApplication.shared.
beginBackgroundTask (withName: "Finish Network Tasks") {
// End the task if time expires.
UIApplication.shared.endBackgroundTask(self.backgroundTaskID!)
self.backgroundTaskID = UIBackgroundTaskInvalid
}
// Send the data synchronously.
self.sendAppDataToServer( data: data)
// End the task assertion.
UIApplication.shared.endBackgroundTask(self.backgroundTaskID!)
self.backgroundTaskID = UIBackgroundTaskInvalid
}
}
The call to self.sendAppDataToServer( data: data) is unclear. Is this where the upload operation would go, wrapped in Dispatch.global().sync { }?
The Swift book says that "to prevent strong reference cycles, delegates are declared as weak references."
protocol SomeDelegate: AnyObject {
}
class viewController: UIViewController, SomeDelegate {
weak var delegate: SomeDelegate?
override func viewDidLoad() {
delegate = self
}
}
Say the class parameterizes a struct with that delegate
class viewController: UIViewController, SomeDelegate {
weak var delegate: SomeDelegate?
override func viewDidLoad() {
delegate = self
let exampleView = ExampleView(delegate: delegate)
let hostingController = UIHostingController(rootView: exampleView)
self.present(hostingController, animated: true)
}
}
struct ExampleView: View {
var delegate: SomeDelegate!
var body: some View {
Text("")
}
}
Should the delegate property in the struct also be marked with weak?