Getting inode number from URL

I couldn't find any other way of getting the inode number without using FileManager.attributesOfItem(atPath: url.path)[.systemFileNumber]. I'm already using FileManager.enumerator(at:includingPropertiesForKeys:errorHandler:) for enumerating large directories and using that other FileManager method only for accessing the inode number doubles the scan time. I looked for a URLResourceKey but there doesn't seem to be any. I would be really grateful for any kind of help.
Answered by DTS Engineer in 626656022
Code Block
/// Adapts `getattrlistbulk` to use Swift-style errors.
func getattrlistbulk2(_ dirFD: CInt, _ attrListPtr: UnsafeMutablePointer<attrlist>, _ attrBuf: UnsafeMutableRawBufferPointer, _ options: UInt64) throws -> Int {
while true {
let result = getattrlistbulk(dirFD, attrListPtr, attrBuf.baseAddress!, attrBuf.count, 0)
if result >= 0 {
return Int(result)
}
let err = errno
if err != EINTR {
throw NSError(domain: NSPOSIXErrorDomain, code: Int(errno), userInfo: nil)
}
// continue on `EINTR`
}
}
/// A copy of `vtype`.
///
/// This system enum is not available to Swift. In a real project you’d access it via a
/// bridging header.
enum vtype: fsobj_type_t {
/* 0 */
case VNON
/* 1 - 5 */
case VREG; case VDIR; case VBLK; case VCHR; case VLNK
/* 6 - 10 */
case VSOCK; case VFIFO; case VBAD; case VSTR; case VCPLX
};
/// Prints the attributes in the supplied buffer.
func printAttrs(_ itemCount: Int, _ attrBuf: UnsafeMutableRawBufferPointer) {
var entryStart = attrBuf.baseAddress!
for _ in 0..<itemCount {
var field = entryStart
let length = Int(field.load(as: UInt32.self))
field += MemoryLayout<UInt32>.size
entryStart += length
let returned = field.load(as: attribute_set_t.self)
field += MemoryLayout<attribute_set_t>.size
var error: UInt32 = 0
if (returned.commonattr & attrgroup_t(bitPattern: ATTR_CMN_ERROR)) != 0 {
error = field.load(as: UInt32.self)
field += MemoryLayout<UInt32>.size
}
var name: String = ""
if (returned.commonattr & attrgroup_t(bitPattern: ATTR_CMN_NAME)) != 0 {
let base = field
let nameInfo = field.load(as: attrreference_t.self)
field += MemoryLayout<attrreference_t>.size
name = String(cString: (base + Int(nameInfo.attr_dataoffset)).assumingMemoryBound(to: CChar.self))
}
if error != 0 {
print("Error in reading attributes for directory entry \(error)");
continue
}
var objectType: fsobj_type_t = vtype.VNON.rawValue
if (returned.commonattr & attrgroup_t(bitPattern: ATTR_CMN_OBJTYPE)) != 0 {
objectType = field.load(as: fsobj_type_t.self)
field += MemoryLayout<fsobj_type_t>.size
switch objectType {
case vtype.VREG.rawValue:
print("file \(name)")
case vtype.VDIR.rawValue:
print(" dir \(name)")
default:
print(" *** \(name)")
}
}
}
}
func demo(_ dirPath: String) throws {
let dirFD = open(dirPath, O_RDONLY)
guard dirFD >= 0 else {
throw NSError(domain: NSPOSIXErrorDomain, code: Int(errno), userInfo: nil)
}
defer {
let junk = close(dirFD)
assert(junk == 0)
}
var attrList = attrlist()
attrList.bitmapcount = u_short( ATTR_BIT_MAP_COUNT )
attrList.commonattr =
attrgroup_t(ATTR_CMN_RETURNED_ATTRS) |
attrgroup_t(bitPattern: ATTR_CMN_NAME) |
attrgroup_t(bitPattern: ATTR_CMN_ERROR) |
attrgroup_t(bitPattern: ATTR_CMN_OBJTYPE)
let attrBuf = UnsafeMutableRawBufferPointer.allocate(byteCount: 256, alignment: 16)
defer {
attrBuf.deallocate()
}
while true {
let itemCount = try getattrlistbulk2(dirFD, &attrList, attrBuf, 0)
guard itemCount > 0 else {
return
}
printAttrs(itemCount, attrBuf)
}
}

What are you using the inode number for?

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@apple.com"
I'm building an app that makes periodic backups and would like to know what files have changed between two backups (in particular when a file has been renamed so that I can rename it as well on the backup drive instead of removing the old file and copying the new one over again), which as far as I understand is only possible by comparing the inode numbers.
Unfortunately there isn’t a public URL property that matches inode semantics. Normally I suggest using NSURLFileResourceIdentifierKey but that’s explicitly documented to not persist across restart.

You may be able to find a faster way to get the inode with a separate system call (stat being the obvious choice). If that still performs too badly for your requirements, you can always drop down to getdirentriesattr.

Finally, I encourage you to file an enhancement request against NSURL for a public inode property. It’d be nice to have.

Please post your bug number, just for the record.

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@apple.com"
Thank you, I already filed FB8308273.
The problem even when only using stat is that FileManager will already have been its own call to get the other attributes, so it's always approximately double the scan time.
What is the difference between getdirentriesattr and getattrlistbulk? Are there any examples in Swift? I already tried using getattrlistbulk in the past, but could only find examples written in Objective-C which were quite difficult to translate to Swift.
Also, do you know why I'm not getting any email notification when a new answer is posted? I was told in the past that there should be a setting for this, but I cannot find any. I have to keep refreshing this page every now and then to check for new answers.

do you know why I'm not getting any email notification when a new answer is posted?

That is one of the features we lost in the recent transition to the new DevForums platform. Normally I’d recommend that you file a bug against DevForums itself, but in this case the folks responsible are well aware of the need for this.



so it's always approximately double the scan time.

I’m surprised it doubles but, yeah, it’s definitely going to hurt.

What is the difference between getdirentriesattr and getattrlistbulk?

Sorry, my bad. getdirentriesattr was deprecated in favour of getdirentriesattr but my brain is stuck in the old world.

Are there any examples in Swift?

Oi vey! that’s going to be painful. Pasted in below is a Swift port of the example from the getattrlistbulk man page. If if were doing this from scratch in Swift I’d probably use a different approach, but this should be enough to get you going.

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@apple.com"
Accepted Answer
Code Block
/// Adapts `getattrlistbulk` to use Swift-style errors.
func getattrlistbulk2(_ dirFD: CInt, _ attrListPtr: UnsafeMutablePointer<attrlist>, _ attrBuf: UnsafeMutableRawBufferPointer, _ options: UInt64) throws -> Int {
while true {
let result = getattrlistbulk(dirFD, attrListPtr, attrBuf.baseAddress!, attrBuf.count, 0)
if result >= 0 {
return Int(result)
}
let err = errno
if err != EINTR {
throw NSError(domain: NSPOSIXErrorDomain, code: Int(errno), userInfo: nil)
}
// continue on `EINTR`
}
}
/// A copy of `vtype`.
///
/// This system enum is not available to Swift. In a real project you’d access it via a
/// bridging header.
enum vtype: fsobj_type_t {
/* 0 */
case VNON
/* 1 - 5 */
case VREG; case VDIR; case VBLK; case VCHR; case VLNK
/* 6 - 10 */
case VSOCK; case VFIFO; case VBAD; case VSTR; case VCPLX
};
/// Prints the attributes in the supplied buffer.
func printAttrs(_ itemCount: Int, _ attrBuf: UnsafeMutableRawBufferPointer) {
var entryStart = attrBuf.baseAddress!
for _ in 0..<itemCount {
var field = entryStart
let length = Int(field.load(as: UInt32.self))
field += MemoryLayout<UInt32>.size
entryStart += length
let returned = field.load(as: attribute_set_t.self)
field += MemoryLayout<attribute_set_t>.size
var error: UInt32 = 0
if (returned.commonattr & attrgroup_t(bitPattern: ATTR_CMN_ERROR)) != 0 {
error = field.load(as: UInt32.self)
field += MemoryLayout<UInt32>.size
}
var name: String = ""
if (returned.commonattr & attrgroup_t(bitPattern: ATTR_CMN_NAME)) != 0 {
let base = field
let nameInfo = field.load(as: attrreference_t.self)
field += MemoryLayout<attrreference_t>.size
name = String(cString: (base + Int(nameInfo.attr_dataoffset)).assumingMemoryBound(to: CChar.self))
}
if error != 0 {
print("Error in reading attributes for directory entry \(error)");
continue
}
var objectType: fsobj_type_t = vtype.VNON.rawValue
if (returned.commonattr & attrgroup_t(bitPattern: ATTR_CMN_OBJTYPE)) != 0 {
objectType = field.load(as: fsobj_type_t.self)
field += MemoryLayout<fsobj_type_t>.size
switch objectType {
case vtype.VREG.rawValue:
print("file \(name)")
case vtype.VDIR.rawValue:
print(" dir \(name)")
default:
print(" *** \(name)")
}
}
}
}
func demo(_ dirPath: String) throws {
let dirFD = open(dirPath, O_RDONLY)
guard dirFD >= 0 else {
throw NSError(domain: NSPOSIXErrorDomain, code: Int(errno), userInfo: nil)
}
defer {
let junk = close(dirFD)
assert(junk == 0)
}
var attrList = attrlist()
attrList.bitmapcount = u_short( ATTR_BIT_MAP_COUNT )
attrList.commonattr =
attrgroup_t(ATTR_CMN_RETURNED_ATTRS) |
attrgroup_t(bitPattern: ATTR_CMN_NAME) |
attrgroup_t(bitPattern: ATTR_CMN_ERROR) |
attrgroup_t(bitPattern: ATTR_CMN_OBJTYPE)
let attrBuf = UnsafeMutableRawBufferPointer.allocate(byteCount: 256, alignment: 16)
defer {
attrBuf.deallocate()
}
while true {
let itemCount = try getattrlistbulk2(dirFD, &attrList, attrBuf, 0)
guard itemCount > 0 else {
return
}
printAttrs(itemCount, attrBuf)
}
}

Thank you so much! Although I don't understand why you use a while loop inside getattrlistbulk2. Shouldn't it be like this:
Code Block
    func getattrlistbulk2(_ dirFD: CInt, _ attrListPtr: UnsafeMutablePointer<attrlist>, _ attrBuf: UnsafeMutableRawBufferPointer, _ options: UInt64) throws -> Int {
        let result = getattrlistbulk(dirFD, attrListPtr, attrBuf.baseAddress!, attrBuf.count, 0)
        if result >= 0 {
            return Int(result)
        }
        let err = errno
        if err != EINTR {
            throw NSError(domain: NSPOSIXErrorDomain, code: Int(errno), userInfo: nil)
        }
        return 0
    }


I kept wondering why your code is slightly slower than enumeration done with FileManager. It turns out that using let attrBuf = UnsafeMutableRawBufferPointer.allocate(byteCount: 1024, alignment: 16) instead of let attrBuf = UnsafeMutableRawBufferPointer.allocate(byteCount: 256, alignment: 16) is about 25% faster when scanning my Documents directory.
In order to get the inode I use
Code Block
var fileId: UInt32 = 0
if (returned.commonattr & attrgroup_t(bitPattern: ATTR_CMN_FILEID)) != 0 {
    fileId = field.load(as: UInt32.self)
    field += MemoryLayout<UInt32>.size
}

which seems to work fine (at least for all files in my Documents directory it's equal to FileManager.attributesOfItem(atPath: url.path)[.systemFileNumber]). Is UInt32 the right type? (The documentation says it should be u_int64_t, but using UInt64 crashes at runtime at the line fileId = field.load(as: UInt64.self) with an error Fatal error: load from misaligned raw pointer.)

I'm also having difficulties reading the modification date. With this code
Code Block
attrList.commonattr =
    attrgroup_t(ATTR_CMN_RETURNED_ATTRS) |
attrgroup_t(bitPattern: ATTR_CMN_NAME) |
attrgroup_t(bitPattern: ATTR_CMN_ERROR) |
attrgroup_t(bitPattern: ATTR_CMN_OBJTYPE) |
attrgroup_t(bitPattern: ATTR_CMN_MODTIME) |
attrgroup_t(bitPattern: ATTR_CMN_FILEID)

and this code appended to the end of printAttrs
Code Block
var modtime: timespec
if (returned.commonattr & attrgroup_t(bitPattern: ATTR_CMN_MODTIME)) != 0 {
    modtime = field.load(as: timespec.self)
    field += MemoryLayout<timespec>.size
}
var fileId: UInt32 = 0
if (returned.commonattr & attrgroup_t(bitPattern: ATTR_CMN_FILEID)) != 0 {
    fileId = field.load(as: UInt32.self)
    field += MemoryLayout<UInt32>.size
}

I get again the same error Fatal error: load from misaligned raw pointer at the line modtime = field.load(as: timespec.self). Any idea what the problem could be?
A few things to add here:
  • Going forward, you might want to take a look at "fileContentIdentifierKey". It isn't documented, but it uniquely identifies file "contents" so any two (or more) files on the same volume with identical values for that key have identical file contents (they're clones of each other).

<https://developer.apple.com/documentation/foundation/urlresourcekey/3616043-filecontentidentifierkey>
  • For context, the reason "getattrlistbulk" exists is specifically to speed up retrieval by allowing the file system to retrieve and return more data in a single "pass". So a larger buffer will be faster.

Commenting on Quinn's comment here:

Normally I suggest using NSURLFileResourceIdentifierKey but that’s explicitly documented to not persist across restart.

If you look at the value "NSURLFileResourceIdentifierKey" returns and compare it to stat, it's pretty easy to tell what it's actually returning (ignoring the fact the little endian is "dumb"). The reason this matters is that the same things that would cause it to "not persist across reboots" can also distort systemFileNumber. Notably, you won't get reliably persistent values from all file systems.
  • Kevin


Shouldn't it be like this

No. EINTR means that the system call blocked waiting for I/O and, while the thread was blocked, a signal was delivered to the thread that caused it to leave the kernel. The correct response is to retry the system call.

This is one of the many miseries mysteries of POSIX APIs (-:

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@apple.com"
Thanks Kevin, for apps targeting macOS 11+ fileContentIdentifierKey will be useful, but for my existing apps that still run on older systems I will probably have to use getattrlistbulk. Does that key effectively return the inode, or what kind of identifier is it? It would be helpful to mention it in the documentation.

Thanks also Quinn, I didn't know the meaning of EINTR. If someone could have a look at the issues with ATTR_CMN_MODTIME and ATTR_CMN_FILEID I mentioned in my previous post, that would be great.
Your misaligned raw pointer access problems are coming out of the Swift standard library. When you read from a raw pointer using load(as:), the pointer must be aligned to match the alignment of the target value.

Code Block
print(MemoryLayout<UInt64>.alignment) // prints 8
let d = Data([0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08, 0x09, 0x0a, 0x0b, 0x0c])
d.withUnsafeBytes { buf in
let pPlus0 = buf.baseAddress!
let pPlus4 = pPlus0 + 4
let vAtPPlus0 = pPlus0.load(as: UInt64.self)
print("0x\(String(vAtPPlus0, radix: 16))") // prints: 0x807060504030201
let vAtPPlus4 = pPlus4.load(as: UInt64.self) // traps
print("0x\(String(vAtPPlus4, radix: 16))")
}


The alignment of UInt64 is 8, so accessing pPlus4 as a UInt64 traps.

So, what alignment does getattrlistbulk guarantee? The getattrlistbulk man page says:

The attributes for any given directory entry are grouped together and packed in exactly the same way as they are returned from getattrlist and are subject to exactly the same alignment specifications and restrictions.

The getattrlist man page man page says:

Each attribute is aligned to a 4-byte boundary (including 64-bit data types).

So it seems that misaligned values are a fact of life with getattrlist. To resolve this, you’ll have to copy the bytes to an aligned buffer and then load them from that. So, you could allocate a buffer like this:

Code Block
let tmp = UnsafeMutableRawBufferPointer.allocate(byteCount: 256, alignment: 16)


And then replace the load(as:) call with a call to something like this:

Code Block
func readUnaligned<Result>(pointer: UnsafeRawPointer, as: Result.Type) -> Result {
tmp.copyMemory(from: UnsafeRawBufferPointer(start: pointer, count: MemoryLayout<Result>.size))
return tmp.baseAddress!.load(as: Result.self)
}


Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@apple.com"
Thank you, that was it.
I hope I'm now facing the last issue with this solution: thanks to your code we have name = String(cString: (base + Int(nameInfo.attr_dataoffset)).assumingMemoryBound(to: CChar.self)). I need the absolute path so I did let path = "\(dirPath)/\(name)", but this produces some kind of string that is not "fast": using the Instruments app I found out that when using path as a key for a Swift dictionary with some custom object as the value, the scan is more than double the time than if I use let path = String(format: "%@/%@", directoryPath, name) or even let path = URL(fileURLWithPath: "\(dirPath)/\(name)").path. Is there an easy explanation for this? I would have thought that string interpolation should produce the same kind of string as when using String(format:), only more efficiently.
Getting inode number from URL
 
 
Q