Problem appears gone in macOS 15.0 Beta (24A5320a) and now let’s hope it stays this way til release.
Post
Replies
Boosts
Views
Activity
I'm adding this here in case someone ends up on this thread through a search and could possibly be investigating similar crashes.
Through to the use of the "Record reference counts" in the Allocations tool, filtering by type ("CGColorSpace") and some logging/breakpoints courtesy of associated objects, I think I have a better picture of what’s going on in Sequoia.
Is there really a regression in CGColorSpaceCreateWithICCData on macOS 15 Sequoia?
To the best of my knowledge, yes, there still appears to be a regression in Sequoia, but it isn't in CGColorSpaceCreateWithICCData. One of the APIs our software uses seems to have gained an extra CFRelease() as compared to previous versions of macOS.
Because our software uses almost every graphics API present in macOS, it’s been very hard to pinpoint who might be responsible for it (some call stacks are in private APIs that aren't easily understood). The list made by the Allocations tool of each CFRetain/CFRelease called on every color space instance is great, but so long in my case as to be impractical. I hope others might have an easier environment to debug this issue in.
If there is nothing wrong with CGColorSpaceCreateWithICCData, what’s really going on?
It appears to be just one CFRelease too many, but there is a reason it was discovered through CGColorSpaceCreateWithICCData. Most times, software ends up using one of the system-provided color spaces created by name (e.g. kCGColorSpaceSRGB). These are singletons, with an infinite retain count. So calling an extra CFRelease on one of these singletons will never produce a crash. Our software, in some circumstances, allocates its own CGColorSpaceRef from ICC data. Whether CoreGraphics maintains a cache or not of these instances, they are regular (non-singleton) instances. So if some API somewhere is responsible for an extra CFRelease, the last unlucky caller to call CFRelease will cause a crash. In your own code, you would notice the reference count for an instance created via CGColorSpaceCreateWithICCData go up and down as it gets passed around, but eventually and unpredictably, when the last use of that color space is complete, you’re likely to see a crash with a stack trace that includes CF_IS_OBJC.
There aren't too many ways to create color spaces that aren't singletons. I suspect color spaces created as derivatives of singletons might also be affected. If you stumble on this page and your code uses stuff like CGColorSpaceCreateLinearized, CGColorSpaceCreateCopyWithStandardRange or similar, then perhaps you have confirmation that derived color spaces are indeed affected by this problem, even when derived from singletons.
Is a workaround possible?
The only brute force solution that seems available is to make sure you maintain your own singletons of every color space created from ICC data or through APIs that derive a new color space from an existing instance. Assuming you’re working with a finite set of ICC profiles, or need a limited set of color spaces derived from others, this global list shouldn't grow too large or grow indefinitely.
Core Foundation has no function that says CFMakeUncollectable() or CFSetRetainCount(+infinity) so not only do you need to maintain a reference to these color spaces in a global pool, you also need to periodically keep its retain count high. Assuming there is really a regression in macOS and not my code, some graphics API somewhere is calling an extra CFRelease any time you use it, so make sure your retain count is always high enough to never allow it to reach zero (e.g. periodically make sure it’s at least 1000).
I'm still hoping the problem is actually in my own code (I'm at the third review already) but just in case this is a real regression in macOS, hope the above helps others.
Thank you, that is great to know, and even better to learn about the Allocations tool "Record reference counts" option. The problem is still 100% reproducible for me on the latest Sequoia seed (24A5309e) in two different apps (After Effects and Premiere Pro, where our third-party code is loaded) and I need to build a reproducible case that doesn't involve two behemoths :-)
That is correct! I managed to associated a regular Obj-C object (via ATOMIC RETAIN) to a CGColorSpaceRef in the hope of being able to set a breakpoint on the deallocation of the CGColorSpaceRef. My intention is to spot exactly what the stack frame looks like when the object goes away.
Before I continue along this path, it helped to know if that machinery is just as reliable with bridgeable CFTypes as it is with regular Obj-C instances.
I tried setting NSZombie on but I don't think it has any effect on CFTypes either.
The issue has been filed as FB14474267. I just attached a crash report to it. It didn't occur to me that it was needed because luckily the steps to reproduce are very simple:
Create a new macOS application project, using Obj-C + Xib and skip Core Data.
Add a .r file to the project. Unless you do so, Xcode will leave the option to add the Build Carbon Resources Phase grayed out. The .r file can be empty.
Under the Build Phases section of your target, add a Build Carbon Resources phase.
Attempt to switch to the Build Settings pane for that same target, and observe the crash.
Thank you for looking at this so promptly!
Since I've come to depend on threads such as these as essential, living-breathing documentation, I wanted to add an epilogue for whoever stumbled on this. I did end up removing any code that skipped locking, allowing me to re-enable TSAN everywhere. Recall that my very first snippet of code relied on the following:
Another area where TSAN is clearly upset at my codebase is more "clever" techniques that rely on the atomicity of pointer-sized assignments on both Intel and ARM 64-bit architectures. The reason why you can get away with unguarded reads in (probably all) these techniques is that the bit pattern of a pointer-sized value will never contain a partially written sequence of bits
Slightly rephrased: you can read a lazily-allocated variable without a lock because it will either be nil or it will contain a fully initialized, retained reference to your local variable. Only when your unguarded read returns nil you pay the price for the expensive lock (@synchronized). The second identical check against nil within the protected code avoids allocating and initializing the variable twice.
That original assumption was wrong in more than just the advice given by Quinn. Some digging around revealed that when optimizing code for size, the compiler may in fact update a single pointer-sized variable in two instructions. This means a different thread might in fact be looking at a partial bit-pattern from a fully executed instruction (obviously unsafe!). I don't know what pre-conditions need to be in place for the compiler to do this. It is not super interesting to me, beyond the fact that it can happen and therefore your code cannot assume otherwise. Apparently this same "clever" assumption was at some point made by/in the Obj-C runtime, causing problems. I wish could locate the original Twitter/Mastodon post but I wasn't keeping good records during my sleuthing.
In my original snippet of pseudo-code, you could explicitly update the pointer-sized value atomically:
- (id)myLazyValue {
if (_myLazyValue == nil) {
@synchronized(self) {
if (_myLazyValue == nil) {
id value = <allocate and initialize>
// Storing pointer-sized value on the stack so member variable
// accessed by multiple threads can be updated atomically below
<atomically write value to _myLazyValue>
}
}
}
return _myLazyValue;
}
...and in principle that works around the compiler potentially updating the pointer-sized value through multiple instructions, but at least for my use case this falls firmly into the "overkill" category. It is still not a safe design, in the sense that it would trigger TSAN just the same.
The bigger lesson (at least for me) is that TSAN isn’t even the main reason to abandon these techniques. Creating "safe by design" code means you don't have to know or worry about edge cases in the compiler, or that the target ISA even gives the compiler options to do something truly unexpected. The risk/reward for trying something like this seems tipped heavily towards risk.
As to the poor performance of @synchronized... it is not horrible but you'll find plenty of blog posts comparing various primitives/techniques on OS X and yes this rare bit of syntactic sugar doesn't have many supporters.
Sorry I should have been a lot clearer on what I meant. Let’s rephrase: no matter how "clever" a technique is, and whether it may ultimately guarantee safe access of shared state under any circumstance, any time the same location in memory is read or written concurrently by two separate threads TSAN will flag it as a violation. My question was whether atomic operations (e.g. atomic_load/_fetch_add/_fetch_sub) would ever trigger TSAN. They clearly touch a shared memory location simultaneously, but their very nature is to do so predictably. My guess is they shouldn't upset TSAN but if you google related keywords you'll end up reading about whether false positives are possible (perhaps in previous versions of TSAN only?)
Since this thread is turning more interesting than I had hoped for, it goes without saying Quinn’s advice is sound as always: all "clever" techniques are a difficult bet not just on writing code that is safe today, but will forever remain safe in light of evolving compiler technologies. Since his first reply I've already guilt-added some os_unfair_lock/unlock to keep TSAN watching over some functions I had previously disabled TSAN on.
My use of @synchronized() in that particular example is made with full awareness of its poor performance: does it matter how badly the implementation is, when you pay the price once in most cases? In some parts of my codebase relying on the syntactic sugar seems fair. We use GCD queues, os_unfair_lock, atomic_int and pthreads elsewhere, as appropriate. I assume (wrongly?) that @synchronized uses NSRecursiveLock or something similar to it behind the scenes. Would that be the reason it is slow? With so much else in the ObjC runtime having been optimized to the extreme, one wonders why this particular hole was left alone.
Another area where TSAN is clearly upset at my codebase is more "clever" techniques that rely on the atomicity of pointer-sized assignments on both Intel and ARM 64-bit architectures. The reason why you can get away with unguarded reads in (probably all) these techniques is that the bit pattern of a pointer-sized value will never contain a partially written sequence of bits. It will either contain all bits from the last-executed write, or it won't. The bits you read on any thread will never be the result of interleaving previous state with state from an "in-flight" CPU instruction. You can have some fun with this – again with the noble goal of avoiding any and all kinds of locking primitives in your most common access pattern. To be clear: this liberally-defined "atomicity" of a CPU instruction is a very, very thin foundation on which to be clever. But there are legitimate cases where trying to squeeze some extra performance makes sense for us, and I assume for other multi-threaded, graphics-intensive code aiming for highest FPS possible. My original question/hope was indeed one that makes sense in my domain: can one selectively disable TSAN on a statement-by-statement basis, or is one forced to snooze it for an entire function body? It seems that right now, only the blunt option is available.
@endecotp: I assume std::call_once() uses the same technique and therefore has the same excellent performance as dispatch_once. I am not against it, but would rather keep sources Obj-C only. Why would anyone prefer @synchronized over a dispatch_once? I sometimes do, when performance isn't an issue, so that the code is easier to read. Syntactic sugar is very sweet and one wonders why they stopped improving Obj-C in this regard. Anyone who follows Swift language "evolution" should readily notice that they entertain introducing language-wide changes for almost any scenario they come across. The line of where to stop adding syntactic sugar is rarely drawn, and each release adds more and more scenarios where you write less code (this trend troubles me in profound ways, but I like it :-)). There is something to dislike about dispatch_once and std:call_once() too... that pesky statically allocated token that sits there and – if you leave it at file scope – has the potential to be reused accidentally, be it from an auto-complete typo or by carelessly copying/pasting code. My preference is to scope a dispatch_once token within the body of a block defined locally to the function. This involves C macros though. For example if you want to have a lazily allocated, static/global variable:
#define UNPACK(...) __VA_ARGS__
#define ONCE(body) (^typeof((body)) { \
static typeof((body)) value;\
static dispatch_once_t once; \
dispatch_once(& once, ^{ value = (body); }); \
return value; \
})()
/// You would only use the following macro. The above two are just to support the pattern
#define Once(...) ONCE(UNPACK(__VA_ARGS__))
The goal being that the dispatch_once_t variable is static and crucially starts as 0 when your process is loaded, but it is invisible outside the scope of the locally defined block. No variable name "leaks" outside that scope, and the resulting code (e.g. someVar = Once([[MyClass alloc] init]) can be copied/pasted at will with no fear of re-using the same static token by accident. (Again in the hope I'm not overlooking something horrible...)
Thanks Quinn! Will take your advice to heart. Would you say that only atomic operations are reasonably immune from being flagged as TSAN evolves?
Are you on the latest macOS Sonoma 14.2 Beta, by any chance? I'm currently investigating some unexpected/odd problems and they seem related to the -[CIImage imageByClampingToExtent:] / clampedToExtent() method too.
Same problem here, filed as FB13323831.
Seems slower than Beta 1, but possibly due to some specific regressions. For example: up to 20 seconds between selecting Edit All in Scope and actually being able to edit the symbol. This is an apparent data point since this is an operation I was doing many times a day, i.e. until Beta 2. 🤷♂️
I would be far from trivial to simply keep Quartz Composer running on modern macOS, mostly for its foundations in the now-ancient OpenGL 1.x APIs, its support for OpenCL, its ghastly inclusion of JavaScript and a scene graph centered around the fixed functionality pipeline. With that in mind, a rewrite in Metal was always the only sensible option, even if it meant abandoning any hope of backward compatibility with existing QTZ files. Which is exactly what some companies ended up doing to preserve everything that was truly great about QC, while maybe tailoring it to slightly different goals. Once you experience how wonderful it is to prototype effects through node-based compositing (no code environments in modern lingo) it’s hard to go back. Take a look here:
https://fxfactory.com/fxcore/
For posterity, if you follow some of the previous advice in this thread and use the ADDITIONAL_SDKS build setting to tell Xcode about your local copy of the "FxPlug.sdk" folder, you'll have to include this additional line in your xcconfig file*:
FRAMEWORK_SEARCH_PATHS = $(inherited) "/Library/Frameworks"
This is because the FxPlug.sdk folder does not contain the frameworks at its root. The FxPlug.framework and PlugInManager.framework are nested in /Library/Frameworks
You don't have to use xcconfig files, strictly speaking... both the ADDITIONAL_SDKS and FRAMEWORK_SEARCH_PATHS build settings are available through Xcode’s Build Settings UI.
@fuseaudiolabs the error message "The binary uses an SDK older than the 10.9 SDK." suggests to me that you might be using an older version of the SDK, or at least that the PluginManager.framework that you are copying to your built products is from an older SDK.
With any recent version of the FxPlug SDK and Xcode, it is no longer necessary to add custom build phases to get plugins recognized and loaded by the host app. You can simply link with the frameworks, and under the General → Frameworks and Libraries section of your project configuration, select the "Embed & Sign" option. Under Build Phases, the part where the SDK frameworks are copied to the Frameworks directory inside your bundle should have the Code Sign on Copy option enabled.
First, I would make sure you have the very latest version of the FxPlug SDK. This message thread pre-dates the reorganization of the SDK, following Apple’s newest conventions. Besides letting the FxPlug SDK install the "FxPlug.sdk" folder where it wants, you could make your own local copy (so it's part of your repository) and add a build configuration setting so that Xcode can locate it when building your target, for example:
ADDITIONAL_SDKS = "$(PROJECT_DIR)/SDK/FxPlug/FxPlug.sdk"
I haven't had the time to file a bug yet, but this might help anyone else who bumps into this...
The problem seems to be triggered by the target being set to be deployed. I'm referring to the Build Settings whose xcconfig equivalents are DEPLOYMENT_LOCATION and INSTALL_PATH, e.g.:
DEPLOYMENT_LOCATION = YES
INSTALL_PATH = $(LOCAL_APPS_DIR)
The deployment location might also play a role, though this is speculation "inspired" by macOS security policies.
In our case, the framework whose default Metal library was not being created, was set up to be deployed to the Frameworks directory for the current user, i.e.
INSTALL_PATH = $(LOCAL_LIBRARY_DIR)/Frameworks
...and this is enough to trigger the failure.
This problem seems to affect more than just the Metal compiler. I noticed it too with the Storyboard compiler for a silly Swift app. Since the app was set to be deployed to $(LOCAL_APPS_DIR) the storyboardc invocation was failing to copy the compiled storyboard to the app bundle.
Whether this is a bug in recent versions of Xcode, or a side effect of these various tools being unable to touch files at those locations, the solution is to add built phases to "do what the underlying compiler is no longer doing" ;-)