Feasibility of Real-Time Object Detection in Live Video with Core ML on M1 Pro and A-Series Chips

Question

Blume OP

Created Nov ’24

Replies 3

Boosts 0

Participants 2

Hello,

I am exploring real-time object detection, and its replacement/overlay with another shape, on live video streams for an iOS app using Core ML and Vision frameworks. My target is to achieve high-speed, real-time detection without noticeable latency, similar to what’s possible with PageFault handling and Associative Caching in OS, but applied to video processing.

Given that this requires consistent, real-time model inference, I’m curious about how well the Neural Engine or GPU can handle such tasks on A-series chips in iPhones versus M-series chips (specifically M1 Pro and possibly M4) in MacBooks. Here are a few specific points I’d like insight on:

Hardware Suitability: How feasible is it to perform real-time object detection with Core ML on the Neural Engine (i.e., can it maintain low latency)? Would the M-series chips (e.g., M1 Pro or newer) offer a tangible benefit for this type of task compared to the A-series in mobile devices? Which A- and M- chips would be minimum feasible recommendation for such task.
Performance Expectations: For continuous, live video object detection, what would be the expected frame rate or latency using an optimized Core ML model? Has anyone benchmarked such applications, and is the M-series required to achieve smooth, real-time processing?
Differences Across Apple Hardware: How does performance scale between the A-series Neural Engine and M-series GPU and Neural Engine? Is the M-series vastly superior for real-time Core ML tasks like object detection on live video feeds?

If anyone has attempted live object detection on these chips, any insights on real-time performance, limitations, or optimizations would be highly appreciated.

Please refer: Apple APIs

Thank you in advance for your help!

Boost

Answer 1

Blume OP

Nov ’24

In WWDC 2024 session 10223, Apple demonstrated the availability of multiple Core ML modules, including Vision, Natural Language, Sound, Speech, and Translation. These modules can be leveraged concurrently within an app, allowing tasks to run in parallel.

Could you provide guidance on best practices for performing multiple Core ML tasks simultaneously on iPhone and MacBook hardware? Specifically, I’d like to know about the feasibility and efficiency of running three parallel tasks on different Apple chipsets (A-series for iPhone and M-series for MacBook). Are there particular recommendations for optimizing parallel tasks on these chips to maintain performance and battery efficiency?

Thank you!

0

Answer 2

ondeviceguy OP

3w

Hey @Blume - We've worked with quite a few apps deploying real-time object detection models, and do a lot of performance benchmarking (inference speed, resource utilization, etc.).

Feel free to reach out on our website (and we can chat via email): https://www.runlocal.ai/

0

Answer 3

Blume OP

3w

Hey @ondeviceguy ,

Your platform looks impressive and fills an important gap for evaluating model and app performance across devices. It’s clear this requires good investment in hardware and expertise, and I can see how it serves as a valuable tool for developers.

A few considerations that might enhance its utility:

Ensuring diverse input payloads to guarantee balanced and representative analysis.
Clear guidance for integrating performance insights into early-stage design decisions to optimize resource consumption effectively.
Highlighting any accreditations or standards compliance to build additional trust in the results provided.

Currently, I'm in the app design phase, considering how hardware resource consumption fits into the architecture, especially when working with streams of frame buffers. Once the design decisions are made and the app is implemented with the models in action, I plan to consult this platform to further evaluate performance across devices. This will help ensure the app is optimized to run effectively on various hardware configurations.

Thanks for sharing—this looks like a much-needed resource for the developer community!

0