Post

Replies

Boosts

Views

Activity

M1 Pro / Max / Ultra Thread Affinity (e.g. in OpenMP) and scheduler core migration
I'm trying to hint the task scheduler that some threads should be scheduled together using the task_policy_set API with THREAD_AFFINITY_POLICY (in lieu of there being no "real" thread to core affinity API). All the examples mention setting the policy after creation but before execution of the task(s). Unfortunately, I'm not creating these tasks (but OpenMP is), and when I then try to use the API on an already running thread, I get a return value of KERN_INVALID_ARGUMENT(= 4) thread_affinity_policy_data_t policy = { 1 }; auto r = thread_policy_set(mach_task_self(), THREAD_AFFINITY_POLICY, (thread_policy_t)&policy, THREAD_AFFINITY_POLICY_COUNT); When I replace mach_task_self() by pthread_mach_thread_np(pthread_self()), I get an KERN_NOT_SUPPORTED error instead (= 46, "Empty thread activation (No thread linked to it)"). Has anyone used these APIs successfully on an already running thread? Background: The code I'm working on divides a problem set into a small number of roughly equal sized pieces (e.g. 8 or 16, this is an input parameter derived from the number of cores to be utilized). These pieces are not entirely independent but need to be processed in lock-step (as occasionally data from neighboring pieces is accessed). Sometimes when a neighboring piece isn't ready yet for a fairly long time, we call std::this_thread::yield() which unfortunately seems to indicate to the scheduler that this thread should move to the efficiency cores (which then wreaks havoc with the assumption of each computation over a piece roughly requiring the same amount of time so all threads can remain in lock-step). :( A similar (?) problem seems to happen with OpenMP barriers, which have terrible performance on the M1 Ultra at least unless KMP_USE_YIELD=0 is used (for the OpenMP run-time from LLVM). Can this automatic migration (note: not the relinquishing of the remaining time-slice) be prevented?
4
0
2.8k
Mar ’22
How to bind threads to performance (P) or efficiency (E) cores?
For some simulation work-loads I have, I would like to use the system to its full potential and therefore use both P and E cores. Splitting the work-load into individual tasks is not easily possible (the threads communicate with each other and run in semi-lockstep). I can allocate smaller portions of the domain to the E cores (and iteratively adjust this so they take the same amount of time as the P cores). But in order for this to work well, I need to ensure that a given thread (with its associated workload) is bound to the right type of core: *either* the performance (doing larger chunks of the domain) or the efficiency (doing smaller chunks of the domain) cores. What's the best way to do this? So far, I don't think thread-to-core affinity has been something that was choosable in macOS. The documentation mentioned the QoS classes, but which class(es) (or relative priorities) would I pick? c pthread_set_qos_class_self_np(QOS_CLASS_UTILITY, 0); The existing classifications don't really map well, the work is user-initiated (i.e. they launched a console application), but not a GUI program. Would I use 4 threads with QOS_CLASS_UTILITY and 4 with QOS_CLASS_BACKGROUND? Would I just use UTILITY with relative priority for performance vs. efficiency cores?
7
0
5.5k
Feb ’21
Override title for "NowPlaying" on macOS when using AVPlayerView?
I'm using AVPlayerView to display a video whose title I want to override in the system-wide NowPlaying view (e.g. on Big Sur in the Notification Center). On iOS / tvOS / Catalyst, this can (supposedly) be done by setting the AVPlayerItem's externalMetadata as desired, but this property is unsupported on macOS. What's the supported way of doing this for a "normal" AppKit app? My simple attempt of manually setting the information via MPNowPlayingInfoCenter didn't work; I assume that's getting overwritten by the "automatic" support from AVPlayer(View) with the faulty (empty) title from the actual video. Any pointers?
1
0
1.4k
Dec ’20
How to measure kAudioDevicePropertyIOCycleUsage with asymmetric cores?
My (Mac) application is latency sensitive, so I let the AudioServer know (via kAudioDevicePropertyIOCycleUsage) how late it can call my thread to provide output for the device. So far, I've done this by "benchmarking" a worst-case work-load when setting up my IOProc func (see here - https://github.com/q-p/SoundPusher/blob/master/SoundPusher/DigitalOutputContext.cpp#L97 if you're curious). How would I do this now with potentically asymmetric cores? I would like my benchmark to be called under the same performance characteristics as under the "real output" case, but without actually having a real deadline or having to produce real output.
3
0
986
Jun ’20
AudioServerPlugIn: HALC_ProxyObject::GetPropertyDataSize ('stm#', 'inpt', 0, AI32) error ('who?')
As soon as my AudioServerPlugIn (as well as all samples from Apple, e.g. NullAudio or SimpleAudio) is installed, I see the following errors in Console.app: default 19:43:34.953638+0200 systemsoundserverd HALC_ProxyObject::GetPropertyDataSize ('stm#', 'inpt', 0, AI32): got an error from the server, 0x77686F3F error 19:43:34.953574+0200 coreaudiod HALS_Object_GetPropertyData_DAI32: the object does not implement the property default 19:43:34.953814+0200 systemsoundserverd HALC_ProxyObject::GetPropertyData ('stm#', 'inpt', 0, DAI32): got an error from the server, 0x77686F3F (Those 4CCs are kAudioDevicePropertyStreams and kAudioObjectPropertyScopeInput) But looking at the code, HasProperty / GetPropertyDataSize / GetPropertyData all seem to be implemented correctly (same as they are for NullAudio) for those, but this error still seems to get thrown. The same is true for the NullAudio sample. (I'm not even sure they're from the AudioServerPlugIn, but if I remove any HAL drivers and relaunch coreaudiod, then the messages disappear. Does anyone know what the actual requests might be? Debugging (or even just logging) from an AudioServerPlugIn seems impossible, so I cannot really debug or log to see which queries arrive at my plugin to see what I might be missing. Does anyone have any ideas?
3
0
2.8k
May ’20