Efficient Dispatch for Cross-Platform Software on AMP Systems

The "Explore the new system architecture of Apple Silicon Macs" video emphasizes using GCD to schedule work tasks for optimal performance on devices with asymmetric multiprocessing capabilities.
What should I do when developing portable code, which can't reasonably use GCD? For instance, I'd like to optimize threading behavior in ffmpeg (which can have very high performance requirements, and may use either frame-based or slice-based threading, where frame threading will likely need to use the P cores at all times to avoid increasing latency and bottlenecking the pipeline, while slice threading may be able to process some tasks on E cores), and for my server application's own multipurpose thread pool code.
Is there any documentation available explaining how best to dispatch tasks when an application needs to use a custom multithreading solution?
Efficient Dispatch for Cross-Platform Software on AMP Systems
 
 
Q