I have a need to read first half and second half of a file concurrently. Is there any best practices for this scenario?
BTW, I did research on DispatchIO but it turned out DispatchIO is all about asynchronous operations on a single file handle; it cannot perform parallel reading.
// naive approach
Task {
fileHandle.read(into:buffer)
}
Task {
// seek to file size / 2
fileHandle.read(into:buffer)
}
The fundamental primitive you’re looking for here is pread(…)
, which lets you supply a file offset. This avoids the current file position, which is shared mutable state that’s an obvious source of concurrency problems. See the pread
man page for details.
Dispatch I/O supports this concept via its offset
parameter. See the discussion in the read(offset:length:queue:ioHandler:)
documentation.
it turned out DispatchIO is all about asynchronous operations on a single file handle; it cannot perform parallel reading.
That’s not true. You just need multiple channels.
Presumably you’re trying to work with a large file. If so, the approach you’re suggesting is a mistake because each of those tasks ends up blocking a Swift concurrency thread in the read(…)
system call for the duration of the read. There’s only a limited number of such threads — usually one per core — and you don’t want to block them waiting for ‘slow’ I/O [1].
If you want to do task using Swift concurrency it’s best to bridge to the Dispatch I/O async interface using a continuation.
endecotp wrote:
Have you considered memory-mapping it?
Memory mapping is an option, but it’s an option with some sharp edges. For example:
-
It’s only safe if the file is on a volume, like the root volume, where a disk error is both unlikely and fatal. If the file is on, say a network volume or a removable volume, memory mapping is problematic because any disk error translates to a machine exception.
-
For large files there’s a risk of running out of address space. This is particularly problematic on iOS.
-
Everything goes through the unified buffer cached (UBC), which is less than ideal if you’re streaming through a large file.
Memory mapping works best when the volume is safe, the file is of a reasonable size, and your access pattern is random-ish. If you’re streaming through a large file, Dispatch I/O with noncached reads is a much better choice.
Share and Enjoy
—
Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"
[1] The definition of slow is a bit fuzzy here, but I’m presuming that these are large files and thus your reads will block for a while.