Posts

Post not yet marked as solved
0 Replies
323 Views
So upstream went and added a mutex_enter_interruptible() which Linux calls mutex_lock_interruptible() and FreeBSD sx_xlock_sig(lock). I was simply going to point it to lck_mtx_lock() and call it a day and ignoring the interruptible bit, but I am curious if there is a way to achieve something similar on XNU. In this case, to be able to hit ^C in userland, get a signal, and have lck_mtx_lock() or variant, giveup and return error.
Posted
by lundman.
Last updated
.
Post not yet marked as solved
3 Replies
1.8k Views
Trying to get some minimum development working again, I've been waiting to be able to macOS in VMs on M1. Currently both VirtualBuddy, and UTM, can install macOS, I can go to Recovery Boot to disable SIP and enable 3rd party extensions. My M1 runs: ProductVersion: 13.0 BuildVersion: 22A5331f I've tested VM macOS versions of Monterey and Ventura. Here is my old kext (known to be working) loaded on M1 (Ventura) bare-metal 250 0 0xfffffe0006b70000 0x862ac 0x862ac org.openzfsonosx.zfs (2.1.0) BE4DF1D3-FF77-3E58-BC9A-C0B8E175DD97 <21 7 5 4 3 1> The same pkg, using the same steps in the VM, will after clicking Allow, ask to reboot (suspiciously fast), then come up with: System Extension Error: An error occurred with your system extensions during startup and they need to be rebuilt before they can be used. Of course clicking Allow just does the same, reboot, fail, ask to approve again, reboot..fail... Directly on the hardware, the dialog "rebuilding cache" pops up for a few seconds, but with the VMs I do not see it. I'm unfamiliar with the new system, so I'm not sure which log files to look at, but here is the output from kmtuil log, both at Allow and after reboot: https://www.lundman.net/kmutil-log.txt If I was going to make an uneducated guess and pull out some lines by random, maybe: 2022-08-29 20:01:13.169897+0900 0x251 Error 0x0 100 0 kernelmanagerd: Kcgen roundtrip failed with: Boot policy error: Error creating linked manifest: code BOOTPOLICY_ERROR_ACM 2022-08-29 20:01:13.170200+0900 0x251 Error 0x0 100 0 kernelmanagerd: Kcgen roundtrip failed checkpoint saveAuxkc: status:error fatalError:Optional("Boot policy error: Error creating linked manifest: code BOOTPOLICY_ERROR_ACM") 2022-08-29 20:01:13.170201+0900 0x251 Error 0x0 100 0 kernelmanagerd: Kcgen roundtrip failed: missing last checkpoint or errors found 2022-08-29 20:01:13.170242+0900 0x251 Default 0x0 100 0 kernelmanagerd: Deleting Preboot content Any work arounds? Loading kexts on my only M1 is a hard way to develop.
Posted
by lundman.
Last updated
.
Post not yet marked as solved
1 Replies
554 Views
So, M2 came with 22D68 on it, and closest KDK is 22D49. In the Intel days, I would just copy the KDK 22D49 to the system and reboot into it. Can we still achieve this on arm? Recovery boot, disable authenticated-root, something-something snapshot.. ? Anyone figured it out already?
Posted
by lundman.
Last updated
.
Post not yet marked as solved
1 Replies
840 Views
So we have produced kexts that run well, on Intel and Arm64, on (for example) an MBP/M1 (all macOS currently available), and MacStudio (Monterey). But on Monterey + Ventura it enters a boot panic loop. One example is: "build" : "macOS 13.1 (22C5033e)", "product" : "Mac13,2", "socId" : "0x00006002", "kernel" : "Darwin Kernel Version 22.2.0: Sun Oct 16 18:09:52 PDT 2022; root:xnu-8792.60.32.0.1~11\/RELEASE_ARM64_T6000", "incident" : "8D3814E3-DCBB-42A6-AACF-C37F66D6BBC8", "crashReporterKey" : "FF922DC9-99E1-68B9-75FB-9427F2BBF431", "date" : "2022-10-28 00:12:53.22 +0100", "panicString" : "panic(cpu 6 caller 0xfffffe001e4b11e8): \"apciec[pcic2-bridge]::handleInterrupt: Request address is greater than 32 bits linksts=0x99000001 pcielint=0x02220060 linkcdmsts=0x00000000 (ltssm 0x11=L0)\\n\" @AppleT8103PCIeCPort.cpp:1301\n Debugger message: panic\nMemory ID: 0x6\nOS release type: User\nOS version: 22C5033e\nKernel version: Darwin Kernel Version 22.2.0: Sun Oct 16 18:09:52 PDT 2022; root:xnu-8792.60.32.0.1~11\/RELEASE_ARM64_T6000\nFileset Kernelcache UUID: D767CC1C43ABBCC48AD47B6010804F47\nKernel UUID: 99C80004-214F-342C-ADF2-402BC1EAC155\nBoot session UUID: 8D3814E3-DCBB-42A6-AACF-C37F66D6BBC8\niBoot version: iBoot-8419.60.31\nsecure boot?: YES\nroots installed: 0\nPaniclog version: 14\nKernelCache slide: 0x00000000149f4000\nKernelCache base: 0xfffffe001b9f8000\nKernel slide: 0x0000000015c54000\nKernel text base: 0xfffffe001cc58000\nKernel text exec slide: 0x0000000015d40000\nKernel text exec base: 0xfffffe001cd44000\nmach_absolute_time: 0x1506187a\nEpoch Time: sec usec\n Boot : 0x635b102d 0x0009de48\n Sleep : 0x00000000 0x00000000\n Wake : 0x00000000 0x00000000\n Calendar: 0x635b1035 0x000bfb29\n\nZone info:\n Zone map: 0xfffffe10219f0000 - 0xfffffe30219f0000\n . VM : 0xfffffe10219f0000 - 0xfffffe14ee6bc000\n . RO : 0xfffffe14ee6bc000 - 0xfffffe1688054000\n . GEN0 : 0xfffffe1688054000 - 0xfffffe1b54d20000\n . GEN1 : 0xfffffe1b54d20000 - 0xfffffe20219ec000\n . GEN2 : 0xfffffe20219ec000 - 0xfffffe24ee6b8000\n . GEN3 : 0xfffffe24ee6b8000 - 0xfffffe29bb384000\n . DATA : 0xfffffe29bb384000 - 0xfffffe30219f0000\n Metadata: 0xfffffe3021a00000 - 0xfffffe3029a00000\n Bitmaps : 0xfffffe3029a00000 - 0xfffffe3049a00000\n\nTPIDRx_ELy = {1: 0xfffffe29bae26818 0: 0x0000000000000006 0ro: 0x0000000000000000 }\nCORE 0 PVH locks held: None\nCORE 1 PVH locks held: None\nCORE 2 PVH locks held: None\nCORE 0: PC=0x00000001af0305ec, LR=0x00000001af0304e8, FP=0x000000016dda6200\nCORE 1: PC=0x00000001aeee1c94, LR=0x00000001bb020624, FP=0x000000016e206140\nCORE 2: PC=0xfffffe001d4a15e0, LR=0xfffffe001d4a14cc, FP=0xfffffe8ff2263d30\nCORE 3: PC=0xfffffe001cd9c9f8, LR=0xfffffe001d6eac6c, FP=0xfffffe80212ab920\nCORE 4: PC=0xfffffe001d416704, LR=0xfffffe001d4a265c, FP=0xfffffe802138fd50\nCORE 5: PC=0xfffffe001cd9c974, LR=0xfffffe001f14aca0, FP=0xfffffe8020a4ba50\nCORE 6 is the one that panicked. Check the full backtrace for details.\nCORE 7: PC=0xfffffe001cdbbad0, LR=0xfffffe001cdbbb88, FP=0xfffffe8ff1f6fd90\nCORE 8: PC=0xfffffe001cddfca4, LR=0xfffffe001cddfca4, FP=0xfffffe8020edff00\nCORE 9: PC=0xfffffe001cddfca4, LR=0xfffffe001cddfca4, FP=0xfffffe8ff2197f00\nCORE 10: PC=0xfffffe001cfd1054, LR=0xfffffe001f9d7d14, FP=0xfffffe8ff1f87610\nCORE 11: PC=0x00000001b2064b88, LR=0x00000001b2064af0, FP=0x000000016f3e5860\nCORE 12: PC=0xfffffe001cddfca4, LR=0xfffffe001cddfca4, FP=0xfffffe8021543f00\nCORE 13: PC=0xfffffe001cddfca4, LR=0xfffffe001cddfca4, FP=0xfffffe8020963f00\nCORE 14: PC=0xfffffe001cddfca4, LR=0xfffffe001cddfca4, FP=0xfffffe8020adff00\nCORE 15: PC=0xfffffe001cddfca4, LR=0xfffffe001cddfca4, FP=0xfffffe8020a03f00\nCORE 16: PC=0xfffffe001cddfca4, LR=0xfffffe001cddfca4, FP=0xfffffe8ff1f4bf00\nCORE 17: PC=0xfffffe001cddfca4, LR=0xfffffe001cddfca4, FP=0xfffffe802146bf00\nCORE 18: PC=0xfffffe001cddfca4, LR=0xfffffe001cddfca4, FP=0xfffffe8ff21c7f00\nCORE 19: PC=0xfffffe001cddfca4, LR=0xfffffe001cddfca4, FP=0xfffffe88090fff00\nCompressor Info: 0% of compressed pages limit (OK) and 0% of segments limit (OK) with 0 swapfiles and OK swap space\nPanicked task 0xfffffe168805d678: 0 pages, 1020 threads: pid 0: kernel_task\nPanicked thread: 0xfffffe29bae26818, backtrace: 0xfffffe80213176a0, tid: 285\n\t\t lr: 0xfffffe001cda2adc fp: 0xfffffe8021317710\n\t\t lr: 0xfffffe001cda2884 fp: 0xfffffe8021317790\n\t\t lr: 0xfffffe001cf06a18 fp: 0xfffffe80213177b0\n\t\t lr: 0xfffffe001cef7f88 fp: 0xfffffe8021317820\n\t\t lr: 0xfffffe001cef588c fp: 0xfffffe80213178e0\n\t\t lr: 0xfffffe001cd4b7f8 fp: 0xfffffe80213178f0\n\t\t lr: 0xfffffe001cda220c fp: 0xfffffe8021317ca0\n\t\t lr: 0xfffffe001d5e6b74 fp: 0xfffffe8021317cc0\n\t\t lr: 0xfffffe001e4b11e8 fp: 0xfffffe8021317d80\n\t\t lr: 0xfffffe001e4be774 fp: 0xfffffe8021317e50\n\t\t lr: 0xfffffe001d4ea934 fp: 0xfffffe8021317ea0\n\t\t lr: 0xfffffe001d4e6b54 fp: 0xfffffe8021317ee0\n\t\t lr: 0xfffffe001d4e779c fp: 0xfffffe8021317f20\n\t\t lr: 0xfffffe001cd54e98 fp: 0x0000000000000000\n Kernel Extensions in backtrace:\n com.apple.driver.AppleT6000PCIeC(1.0)[D6E00E5A-7BC8-33A5-9CDC-C4CF13DB1C01]@0xfffffe001e4a2250->0xfffffe001e4c7fa3\n dependency: com.apple.driver.AppleARMPlatform(1.0.2)[F8A12C7A-9C6E-3DCC-A2C3-56A2050A7E73]@0xfffffe001d78def0->0xfffffe001d7dc45f\n dependency: com.apple.driver.AppleEmbeddedPCIE(1)[2BA5358A-87CA-33DF-A148-FFE0C2733367]@0xfffffe001ddc8400->0xfffffe001dde2817\n dependency: com.apple.driver.ApplePIODMA(1)[7CB8682A-FA16-3841-9E30-B6032E5C6925]@0xfffffe001e1d7e60->0xfffffe001e1dc5eb\n dependency: com.apple.driver.IODARTFamily(1)[5C3EEB18-7785-328B-B994-3B0DA5B7D2FE]@0xfffffe001edfd870->0xfffffe001ee1135f\n dependency: com.apple.iokit.IOPCIFamily(2.9)[D6265950-5027-3125-A532-DC325E6A0639]@0xfffffe001f176220->0xfffffe001f1a12a3\n dependency: com.apple.iokit.IOReportFamily(47)[ED601736-6073-3B7D-A228-0FEC344992FA]@0xfffffe001f1a4ee0->0xfffffe001f1a7ecb\n dependency: com.apple.iokit.IOThunderboltFamily(9.3.3)[4F558D51-13A7-3E87-AAD6-23E0A7C0A146]@0xfffffe001f2a02c0->0xfffffe001f3dbd4f\n\nlast started kext at 341403491: com.apple.UVCService\t1 (addr 0xfffffe001c13a690, size 1772)\nloaded Presumably our kext is guilty of something, but it isn't (obviously) involved yet. Machine generally boots without our kext, but not always. Same kext works on Monterey, as well as Ventura on M1. Any particular areas to look at for this kind of panic? Are there hints in the stack? (can we resolve panic stack yet?)
Posted
by lundman.
Last updated
.
Post not yet marked as solved
0 Replies
943 Views
The amr64 panic logs are new and a bit different, has a whole bunch of information which is nice but, sometimes I get something like: panic(cpu 11 caller 0xfffffe0013d81f1c): Kernel data abort. at pc 0xfffffe001512adb4, lr 0xfffffe001512ad9c Debugger message: panic\n Memory ID: 0x6\n OS release type: User\n OS version: 21G115\n Kernel version: Darwin Kernel Version 21.6.0: Mon Aug 22 20:19:52 PDT 2022; root:xnu-8020.140.49~2\/RELEASE_ARM64_T6000\n Fileset Kernelcache UUID: 39A7E336B0FAA0022B3764E49DFF29D2\n Kernel UUID: 778CC57A-CF0B-3D35-8EE8-5035142D0177\ni Boot version: iBoot-7459.141.1\n secure boot?: YES\n Paniclog version: 13\n KernelCache slide: 0x000000000bc48000\n KernelCache base: 0xfffffe0012c4c000\n Kernel slide: 0x000000000c40c000\n Kernel text base: 0xfffffe0013410000\n Kernel text exec slide: 0x000000000c4f4000\n Kernel text exec base: 0xfffffe00134f8000 ktrace: 0xfffffe180eaaea80, tid: 144477\n\t\t lr: 0xfffffe0013551400 fp: 0xfffffe180eaaeaf0\n\t\t lr: 0xfffffe00135510c8 fp: 0xfffffe180eaaeb60\n\t\t lr: 0xfffffe001369733c fp: 0xfffffe180eaaeb80\n\t\t lr: 0xfffffe00136890cc fp: 0xfffffe180eaaebf0\n\t\t lr: 0xfffffe0013686cb0 fp: 0xfffffe180eaaecb0\n\t\t lr: 0xfffffe00134ff7f8 fp: 0xfffffe180eaaecc0\n\t\t lr: 0xfffffe0013550d4c fp: 0xfffffe180eaaf060\n\t\t lr: 0xfffffe0013550d4c fp: 0xfffffe180eaaf0d0\n\t\t lr: 0xfffffe0013d7954c fp: 0xfffffe180eaaf0f0\n\t\t lr: 0xfffffe0013d81f1c fp: 0xfffffe180eaaf270\n\t\t lr: 0xfffffe0013688ecc fp: 0xfffffe180eaaf2e0\n\t\t lr: 0xfffffe0013686fb4 fp: 0xfffffe180eaaf3a0\n\t\t lr: 0xfffffe00134ff7f8 fp: 0xfffffe180eaaf3b0\n\t\t lr: 0xfffffe001512ad9c fp: 0xfffffe180eaaf740\n\t\t lr: 0xfffffe001515ac20 fp: 0xfffffe180eaaf7a0\n\t\t lr: 0xfffffe001511a03c fp: 0xfffffe180eaaf9a0\n\t\t lr: 0xfffffe001511dc78 fp: 0xfffffe180eaafa10\n\t\t lr: 0xfffffe0015148d14 fp: 0xfffffe180eaafa40\n\t\t lr: 0xfffffe00137b8b24 fp: 0xfffffe180eaafad0\n\t\t lr: 0xfffffe0015145c4c fp: 0xfffffe180eaafce0\n\t\t lr: 0xfffffe00137cc864 fp: 0xfffffe180eaafd20\n\t\t lr: 0xfffffe00137b88c8 fp: 0xfffffe180eaafda0\n\t\t lr: 0xfffffe00137cc7ac fp: 0xfffffe180eaafdb0\n\t\t lr: 0xfffffe0013bbaa28 fp: 0xfffffe180eaafe50\n\t\t lr: 0xfffffe0013686d84 fp: 0xfffffe180eaaff10\n\t\t lr: 0xfffffe00134ff7f8 fp: 0xfffffe180eaaff20\n Kernel Extensions in backtrace:\n com.apple.filesystems.hfs.kext(583.100.10)[45F25204-8A60-3A88-B71F-974BDDBDB3BF]@0xfffffe00151148a0->0xfffffe00151634e3\n dependency: com.apple.filesystems.hfs.encodings.kext(1)[4183166A-286A-3CEB-8C2C-AF85AA1F4D16]@0xfffffe00151634f0->0xfffffe001516441f\n\n last started kext at 3074954554: com.apple.filesystems.smbfs\t4.0 (addr 0xfffffe00133f4c30, size 65195)\n loaded kexts:\n org.openzfsonosx.zfs\t2.1.99\n com.apple.filesystems.smbfs\t So if you are really lucky, it will list the address of your kext here, in this case, just com.apple.filesystems.hfs.kext. But nearly all the time, you have no way to get the load address for org.openzfsonosx.zfs, which I think means I can not lookup symbols, or anything useful at all. I think HFS called into ZFS and we returned something cursed. Would it be possible to have the load addresses listed in the large list of kext loaded?
Posted
by lundman.
Last updated
.
Post not yet marked as solved
0 Replies
598 Views
Not really a question. As part of porting other platform code, FreeBSD and Linux, there is a #define macro used to specify module parameters. It is desirable for these new sysctl to show automatically when "upstream" adds them. (without having to manually maintain a list) This is usually done with "Linker Sets" but they are not available in kexts, mostly due to __mh_execute_header. I took a different approach with: #define ZFS_MODULE_PARAM(scope_prefix, name_prefix, name, type, perm, desc) \ SYSCTL_DECL( _kstat_zfs_darwin_tunable_ ## scope_prefix); \ SYSCTL_##type( _kstat_zfs_darwin_tunable_ ## scope_prefix, OID_AUTO, name, perm, \ &name_prefix ## name, 0, desc) ; \ __attribute__((constructor)) void \ _zcnst_sysctl__kstat_zfs_darwin_tunable_ ## scope_prefix ## _ ## name (void) \ { \ sysctl_register_oid(&sysctl__kstat_zfs_darwin_tunable_ ## scope_prefix ## _ ## name ); \ } \ __attribute__((destructor)) void \ _zdest_sysctl__kstat_zfs_darwin_tunable_ ## scope_prefix ## _ ## name (void) \ { \ sysctl_unregister_oid(&sysctl__kstat_zfs_darwin_tunable_ ## scope_prefix ## _ ## name ); \ } Ie, when macro is used, I use __attribute__((constructor)) on a function named after the sysctl, which is then called automatically on kext load, and each one of those functions, call sysctl_register_oid(). And likewise for destructor / unregister. So far it works quite well. Any known drawbacks? I've not tested it on M1.
Posted
by lundman.
Last updated
.
Post not yet marked as solved
6 Replies
1.1k Views
Having a peculiar issue trying to support the use of O_EXCL. (Fail if O_CREAT and file exists). It will fail the first time, then if the call is repeated, it works as expected. It is not entirely clear how macOS should handle O_EXCL, it has been mentioned that vnop_create() should always return EEXIST - does that mean even in the success case, it should return EEXIST instead of 0? That seems odd. Output of test program is: # (1) Create the file with (O_WRONLY|O_CREAT). open okay write okay close okay 86 -rw-r----- 1 501 0 29 Jan 12 17:08 /Volumes/BOOM/teest.out Deleting /Volumes/BOOM/teest.out # (2) Try creating with (O_WRONLY|O_CREAT|O_EXCL). writef: Stale NFS file handle 436207628 87 ---------- 1 501 wheel 0 0 "Jul 9 07:53:53 2037" "Jan 12 17:09:02 2022" "Jan 12 17:09:02 2022" "Jan 1 09:00:00 1970" 1048576 0 0 /Volumes/BOOM/teest.out So, since the file is deleted in between the tests, O_EXCL shouldn't really kick in here, and yet, something goes wrong. The nfs server sends ESTALE to the nfs client. The dtrace stack is: Stack: kernel.development`nfsrv_setattr+0x7c6 kernel.development`nfssvc_nfsd+0xbdc kernel.development`nfssvc+0x106 kernel.development`unix_syscall64+0x2ba kernel.development`hndl_unix_scall64+0x16 Result: 0 259014 nfsrv_setattr: entry 0 259014 mac_vnode_check_open:entry 0 259015 hook_vnode_check_open:return 2 nfsd 0 259015 mac_vnode_check_open:return 2 nfsd 0 229396 nfsrv_rephead:entry 0 1 2 3 4 5 6 7 8 9 a b c d e f 0123456789abcdef 0: 46 00 00 00 F... So, nfssrv_setattr() replies with 0x46/70 (ESTALE) seemingly because the call hook_vnode_check_open() returns 2 (ENOENT). Why though, the file was removed, I verified the cache has no entry. Then created again, confirmed it IS in the cache. <zfs`zfs_vnop_remove (zfs_vnops_osx.c:1700)> zfs_vnop_remove error 0: checking cache: NOTFOUND <zfs`zfs_vnop_create (zfs_vnops_osx.c:1427)> *** zfs_vnop_create: with 1: EXCL <zfs`zfs_create (zfs_vnops_os.c:660)> zfs_create: zp is here 0x0 <zfs`zfs_vnop_create (zfs_vnops_osx.c:1458)> ** zfs_vnop_create created id 82 <zfs`zfs_vnop_create (zfs_vnops_osx.c:1475)> zfs_vnop_create error -1: checking cache: FOUND I am having issues finding where the code for hook_vnode_check_open comes from anyway? The failure call in nfs server is: if (!error && mac_vnode_check_open(ctx, vp, FREAD | FWRITE)) { error = ESTALE; } So uh, why? If I let the test run again, this time the file exists, it returns EEXIST as expected. If I run the first test twice, ie, without O_EXCL, both work. So it seems to only go wrong with O_EXCL, and file doesn't exist. It is curious as to why nfs server figures out that exclusive is set, then clears va_mode? case NFS_CREATE_EXCLUSIVE: exclusive_flag = 1; if (vp == NULL) { VATTR_SET(vap, va_mode, 0); But doesn't use exclusive_flag until after calling VNOP_CREATE(), and it doesn't pass it either.
Posted
by lundman.
Last updated
.
Post not yet marked as solved
0 Replies
613 Views
Ever since 10.15.5 (I think it was) brought in the new proc_lock_ APIs it has been quite easy to deadlock namei() lookups and mount at the same time. Stack 1 *1000 unix_syscall64 + 698 (kernel.development + 9558170) [0xffffff8000b1d89a] *1000 lstat64 + 47 (kernel.development + 4947279) [0xffffff80006b7d4f] *1000 fstatat_internal + 327 (kernel.development + 4944567) [0xffffff80006b72b7] *1000 nameiat + 117 (kernel.development + 4919557) [0xffffff80006b1105] *1000 namei + 3857 (kernel.development + 4813841) [0xffffff8000697411] *1000 lookup + 1842 (kernel.development + 4817810) [0xffffff8000698392] *1000 lookup_handle_found_vnode + 677 (kernel.development + 4814677) [0xffffff8000697755] *1000 vfs_busy + 79 (kernel.development + 4847775) [0xffffff800069f89f] *1000 IORWLockRead + 738 (kernel.development + 3527154) [0xffffff800055d1f2] Stack 2 1000 mount + 10 (libsystem_kernel.dylib + 41114) [0x7fff72fc109a] *1000 hndl_unix_scall64 + 22 (kernel.development + 1622534) [0xfffff f800038c206] *1000 unix_syscall64 + 698 (kernel.development + 9558170) [0xfffff f8000b1d89a] *1000 mount + 78 (kernel.development + 4901838) [0xffffff80006ac bce] *1000 __mac_mount + 1330 (kernel.development + 4903186) [0xfff fff80006ad112] *1000 mount_common + 4860 (kernel.development + 4897964) [0xffffff80006abcac] *1000 checkdirs + 115 (kernel.development + 4901059) [0xffffff80006ac8c3] *1000 proc_iterate + 892 (kernel.development + 8110892) [0xffffff80009bc32c] *1000 checkdirs_callback + 139 (kernel.development + 4901547) [0xffffff80006acaab] *1000 IORWLockWrite + 1240 (kernel.development + 3528664) [0xffffff800055d7d8] The mount call will vfs_busy() then wait for proc_dirs_lock_exclusive() (IORWLockWrite). Whereas stat will grab proc_dirs_lock_share() in namei(), then because it needs to cross mountpoint, it calls lookup_traverse_mountpoints() which calls vfs_busy(). Classic A-B, B-A deadlock. Having a hard to time to 1) avoid it, or 2) detect it will happen, since everything is opaque, settings like NOCROSSMNT is not something I can set.
Posted
by lundman.
Last updated
.
Post not yet marked as solved
2 Replies
1.4k Views
So what is the current status of symbolication on the M1? When I trigger something like: panic(cpu 5 caller 0xfffffe0027b72dc8): Break 0xC472 instruction exception from kernel. Ptrauth failure with DA key resulted in 0xbffffe16708b1aa0 at pc 0xfffffe002763c748, lr 0xfffffe00266449d4 (saved state: 0xfffffe30b4fc3470) OS version: 20E241 Kernel version: Darwin Kernel Version 20.4.0: Thu Apr 22 21:46:41 PDT 2021; root:xnu-7195.101.2~1/RELEASE_ARM64_T8101 Fileset Kernelcache UUID: 0B829878C98BF0B6E3AF7BF571B60BF2 Kernel UUID: 1DC99FEF-0771-3229-974C-9B18710700AE KernelCache slide: 0x000000001f764000 KernelCache base: 0xfffffe0026768000 Kernel slide: 0x00000000202a4000 Kernel text base: 0xfffffe00272a8000 Kernel text exec base: 0xfffffe0027370000 Panicked task 0xfffffe166ef76730: 251 pages, 1 threads: pid 1007: zfs Panicked thread: 0xfffffe166acb1980, backtrace: 0xfffffe30b4fc2b80, tid: 10850 lr: 0xfffffe00273be920 fp: 0xfffffe30b4fc2bf0 lr: 0xfffffe00266449d4 fp: 0xfffffe30b4fc3800 lr: 0xfffffe002650ab60 fp: 0xfffffe30b4fc3830 lr: 0xfffffe002650fad4 fp: 0xfffffe30b4fc3900 lr: 0xfffffe002650dc88 fp: 0xfffffe30b4fc39e0 lr: 0xfffffe0026517798 fp: 0xfffffe30b4fc3a10 Kernel Extensions in backtrace: org.openzfsonosx.zfs(2.0)[EB1A7CDB-C33F-3E0A-A7C2-316765670F52]@0xfffffe002641c000-0xfffffe0026647fff It would be nice to be able to look those symbols up. But both atos and lldb give "clearly not the correct symbols" for kext, and kernel; atos -o /Library/Extensions/zfs.kext/Contents/MacOS/zfs -arch arm64e -l 0xfffffe002641c000 0xfffffe00266449d4 0xfffffe002650ab60 0xfffffe002650fad4 0xfffffe002650dc88 0xfffffe0026517798 0xfffffe002763f82c ZSTD_compressBlock_btopt (in zfs) + 140 dsl_dataset_get_holds (in zfs) (dsl_userhold.c:677) ldi_open_by_name (in zfs) (ldi_osx.c:1906) hkdf_sha512 (in zfs) (hkdf.c:162) handle_unmap_iokit (in zfs) (ldi_iokit.cpp:2008) vmem_init.initial_default_block (in zfs) + 12695596 Almost so random it could be ASLR. Annoyingly keepsyms=1 does not work here (or with this type of crash?) and debug=x0144 is ignored (it just boots again).
Posted
by lundman.
Last updated
.
Post not yet marked as solved
2 Replies
748 Views
This bug report is from Catalina, but we have confirmed it happens in BigSur as well, it is just tedious to do kext work in BigSur. The following process: zpool create mypool disk1 chown -R lundman /Volumes/mypool chown: /Volumes/mypool/.Spotlight-V100/Store-V2: No such file or directory chown: /Volumes/mypool/.Spotlight-V100/VolumeConfiguration.plist: No such file or directory chown: /Volumes/mypool/.fseventsd: No such file or directory Create a new filesystem, mount, try to chown -R and get errors. The names of files that error stay the same for subsequent chown runs, but different may fail if I re-create the filesystem. Then do: ssh localhost chown -R lundman /Volumes/mypool So ssh to the exact same machine, and chown runs fine. It does something differently if I'm on the UI, vs, if I'm ssh'ed in (ssh on same UI or remote, ssh fixes it). The errored files stat just fine, and you can chown it just fine. (without -R). Even after doing a working chown -R over ssh, the UI chown -R will still fail. Digging as deep as I can with dtrace, I have traced it to lookup:return 2 chown namei:return 2 chown vn_open_auth:return 2 chown So it isn't even reaching VNOP_LOOKUP() in my filesystem yet. (But perhaps readdir could be returning something bad?) So triggering a panic when it is about to return ENOENT: dtrace -** 'lookup:return {printf("%d %s", arg1,execname); if (execname =="chown" &amp;&amp; arg1 == 2 &amp;&amp; val++ == 10) { printf("This one"); panic()}}' : mach_kernel : trap_from_kernel + 0x26 : mach_kernel : _lookup + 0x208 : mach_kernel : _namei + 0xea6 : mach_kernel : _nameiat + 0x75 : mach_kernel : _fstatat_internal + 0x147 : mach_kernel : _stat64 + 0x2f frame #13: 0xffffff800489ff88 kernel.development`lookup(ndp=unavailable) at vfs_lookup.c:1457:1 [opt] (lldb) p *ndp (nameidata) $1 = { ni_dirp = 140556031248840 ni_segflg = UIO_USERSPACE64 ni_op = OP_SETATTR ni_startdir = 0x0000000000000000 ni_rootdir = 0xffffff801f23d700 ni_usedvp = 0x0000000000000000 ni_vp = 0x0000000000000000 ni_dvp = 0xffffff801f552700 ni_pathlen = 1 ni_next = 0xffffff8077d4bc1a no value available ni_pathbuf = { [0] = '.' [1] = 'f' [2] = 's' [3] = 'e' [4] = 'v' [5] = 'e' [6] = 'n' [7] = 't' [8] = 's' [9] = 'd' [10] = '\0' [255] = '\0' } ni_loopcnt = 0 ni_cnd = { cn_nameiop = 0 cn_flags = 1097792 cn_context = 0xffffff80262c2120 cn_ndp = 0xffffff8077d4bbc8 cn_pnbuf = 0xffffff8077d4bc10 ".fseventsd" cn_pnlen = 256 cn_nameptr = 0xffffff8077d4bc10 ".fseventsd" cn_namelen = 10 cn_hash = 1753311157 cn_consume = 0 } ni_flag = 0 ni_ncgeneration = 0 } (lldb) p *ndp-ni_cnd.cn_context (vfs_context) $2 = { vc_thread = 0xffffff80206b8550 vc_ucred = 0xffffff80254d1490 (lldb) p *ndp-ni_dvp v_name = 0xffffff801f23b500 "Volumes" (lldb) frame variable (int) wantparent = 6 (int) docache = 1 Nothing stands out to my green eyes, but it is annoying that I can not see most variables. It is time to boot kernel.debug instead. But unfortunately, the chown -R does not happen with booting kernel.debug! D'oh. Tested re-creating and running chown -R 4 times before it had a panic with xnu_debug/xnu-6153.101.5/osfmk/kern/thread.c:2535 Assertion failed: io_tier IO_NUM_PRIORITIES called from _apfs_vnop_strategy() - probably unrelated. Don't think I've come across a problem with my filesystem that changed depending on if I had ssh'ed in. Using UI vs ssh presumably changes context? But it must be related to my code, since it doesn't happen with hfs.
Posted
by lundman.
Last updated
.
Post not yet marked as solved
0 Replies
759 Views
The userland code can pass an fd (file-descriptor) into the kernel to do some IO on (file_vnode_withvid() + vn_rdwr(), but the "other platforms" can just access the equivalent of fp-fp_glob-fg_offset; to know what offset we should start from. I believe that all those structs are opaque. I don't see a method for accessing offset of procfd/fp/fp_glob. There are various functions like fill_fileinfo(), but looks like none of the *info functions are exported. I was wondering if I can end up in vn_read() with FOF_OFFSET in flags, as that seems to set uio_offset to the fg_offset, and issue a zero-length read, but don't think I can get there from a fd. Has to come from fo_read() which is not exported. Any other ideas? Obviously, since I pass the fd from userland, I can also pass the offset - and I will probably end up doing that, it would just be a smaller "change" if I could find the offset from the kernel.
Posted
by lundman.
Last updated
.
Post not yet marked as solved
2 Replies
1.3k Views
I've been working hard trying to get rid of all the kernel functions that we aren't allowed to call, and now have only a handful left. Loads fine on Intel, but not on arm64e. 2: Could not use 'net.lundman.zfs' because: Failed to bind '_cpu_number' in 'net.lundman.zfs' (at offset 0x3c0 in __DATA_CONST, __got) as could not find a kext which exports this symbol For arm64e: 6 symbols not found in any library kext: _vnop_getnamedstream_desc _vnop_removenamedstream_desc _kmem_alloc _vnop_makenamedstream_desc _kmem_free _cpu_number The documentation suggest I should use kmem_alloc(), and it is certainly in the t8101 kernel. I suppose it is in com.apple.kpi.unsupported - does that mean I'm not allowed to call them, or I should use some other method to allocate memory? The dependency list is: keyOSBundleLibraries/key dict keycom.apple.iokit.IOStorageFamily/key string1.6/string keycom.apple.iokit.IOAVFamily/key string1.0.0/string keycom.apple.kpi.bsd/key string8.0.0/string keycom.apple.kpi.iokit/key string8.0.0/string keycom.apple.kpi.libkern/key string10.0/string keycom.apple.kpi.mach/key string8.0.0/string keycom.apple.kpi.unsupported/key string8.0.0/string /dict (I think for namedstream issues, perhaps that has been removed on arm, so can just go without). cpu_number() I can probably live without, mostly used to spread out used locks semi-randomly. But I gotsa get me some memory! Lund
Posted
by lundman.
Last updated
.
Post not yet marked as solved
0 Replies
519 Views
Having issues calling kauth&#92;&#95;cred&#92;&#95;getgroups() as non-root cred_t from BigSur. Get panic: 0xffffffa843a737b0 : 0x0 0xffffffa843a738e0 : 0xffffff7fa5ab889e net.lundman.zfs : _dsl_load_user_sets + 0xbe > 126 ret = kauth_cred_getgroups((kauth_cred_t)cr, gids, &count); I see nothing suspicious with the arguments either: (lldb) p *cr (cred_t) $4 = { &#9;cr_link = { &#9;&#9;le_next = 0xffffff868f0ac370 &#9;&#9;le_prev = 0xffffff80056582d0 &#9;} &#9;cr_ref = 52 &#9;cr_posix = { &#9;&#9;cr_uid = 501 &#9;&#9;cr_ruid = 501 &#9;&#9;cr_svuid = 501 &#9;&#9;cr_ngroups = 16 &#9;&#9;cr_groups = { &#9;&#9;&#9;[0] = 20 &#9;&#9;&#9;[1] = 12 &#9;&#9;&#9;[2] = 61 &#9;&#9;&#9;[3] = 79 &#9;&#9;&#9;[4] = 80 &#9;&#9;&#9;[5] = 81 &#9;&#9;&#9;[6] = 98 &#9;&#9;&#9;[7] = 701 &#9;&#9;&#9;[8] = 33 &#9;&#9;&#9;[9] = 100 &#9;&#9;&#9;[10] = 204 &#9;&#9;&#9;[11] = 250 &#9;&#9;&#9;[12] = 395 &#9;&#9;&#9;[13] = 398 &#9;&#9;&#9;[14] = 399 &#9;&#9;&#9;[15] = 400 &#9;&#9;} &#9;&#9;cr_rgid = 20 &#9;&#9;cr_svgid = 20 &#9;&#9;cr_gmuid = 501 &#9;&#9;cr_flags = 2 &#9;} &#9;cr_label = 0xffffff868fdb41c0 &#9;cr_audit = { &#9;&#9;as_aia_p = 0xffffff934aef0a18 &#9;&#9;as_mask = (am_success = 12288, am_failure = 12288) &#9;} } (lldb) p gids (gid_t [16]) $1 = { &#9;[0] = 0 &#9;[1] = 0 &#9;[2] = 0 &#9;[3] = 0 &#9;[4] = 0 &#9;[5] = 0 &#9;[6] = 0 &#9;[7] = 0 &#9;[8] = 0 &#9;[9] = 0 &#9;[10] = 0 &#9;[11] = 0 &#9;[12] = 0 &#9;[13] = 0 &#9;[14] = 0 &#9;[15] = 0 } (lldb) p count (int) $2 = 16 Works every time if I am root, but will panic as non-root. Stack having NULL is also odd. Runs on Catalina and before.
Posted
by lundman.
Last updated
.