It’s important when the buffer size is less than the file size because it keeps all transfers aligned within the file, which facilitates uncached I/O.
Good, that's what I was aiming at: use a block size equal to the file size up to a maximum power of 2. I don't think I was really trying to optimize the buffer allocation time; rather it didn't feel right to allocate an arbitrarily large buffer. After all, space is constrained and there is a reason why we can specify the block size, or it would automatically be set to infinity... right?
it’s time to benchmark and use that to guide your optimisation
Here are my results when copying a file with a given size using varying block sizes and 5 repetitions. (Explicitly writing random data to the file makes the file creation quite slow, but simply writing a buffer with uninitialized data seems to work as well.)
Different file sizes seem to follow a similar trend. Using a multiple of 1000 or a multiple of the page size also doesn't seem to make a difference in the overall trend.
The lowest block sizes, 1'024 and 2'048, seem to be special and cause a very fast copy.
From 4'096 upwards, the time decreases...
...up to 65'536, where it suddenly increases again, but from then it definitely decreases.
The bigger the file, the higher the block size needs to be to notice a difference.
With a 100 MB file, increasing the block size from 1'048'576 to 2'097'152 makes the operation about twice as fast, with little improvements above that block size.
With a 1 GB file, increasing the block size from 1'048'576 to 8'388'608 makes the operation about twice as fast, with little improvements above that block size.
Without using F_NOCACHE, the operation gets slowly faster when increasing the block size from 1'048'576, and then gets slower again from 8'388'608 upwards. Not sure if that means anything.
Here are the graphs for a 100 MB and a 1 GB file.
Copying a 100 MB file:
Copying a 1 GB file:
Copying a 100 MB file created without F_NOCACHE:
And here is the code:
class AppDelegate: NSObject, NSApplicationDelegate {
func applicationDidFinishLaunching(_ aNotification: Notification) {
print("page size", getpagesize())
let openPanel = NSOpenPanel()
openPanel.canChooseDirectories = true
openPanel.runModal()
test(url: openPanel.urls[0])
}
func test(url: URL) {
let source = url.appendingPathComponent("file source")
let destination = url.appendingPathComponent("file destination")
let fileSizes = [1_000, 1_000_000, 10_000_000, 100_000_000, 1_000_000_000, 10_000_000_000, Int(getpagesize()) * 10_000]
let blockSizes: [Int32] = (10..<31).map({ 1 << $0 })
let repetitions = 5
var times = [[TimeInterval]](repeating: [TimeInterval](repeating: 0, count: repetitions), count: blockSizes.count)
for fileSize in fileSizes {
print("fileSize", fileSize)
for (i, blockSize) in blockSizes.enumerated() {
print("blockSize", blockSize)
let date = Date()
for j in 0..<repetitions {
try? FileManager.default.removeItem(at: destination)
var date = Date()
print("create", terminator: " ")
createFile(source: source, size: fileSize)
print(-date.timeIntervalSinceNow)
date = Date()
print("copy", terminator: " ")
do {
try copy(source: source, destination: destination, blockSize: blockSize)
} catch {
preconditionFailure(error.localizedDescription)
}
let time = -date.timeIntervalSinceNow
times[i][j] = time
print(time)
}
let average = -date.timeIntervalSinceNow / Double(repetitions)
print("average copy", average)
print()
}
let header = blockSizes.map({ NumberFormatter.localizedString(from: $0 as NSNumber, number: .decimal) }).joined(separator: "\t")
try! Data(([header] + (0..<repetitions).map { j in
(["\(j)"] + (0..<blockSizes.count).map { i in
return timeToString(times[i][j])
}).joined(separator: "\t")
}).joined(separator: "\n").utf8).write(to: url.appendingPathComponent("results \(fileSize).tsv"))
}
}
func timeToString(_ time: TimeInterval) -> String {
return String(format: "%.6f", time)
}
func createFile(source: URL, size: Int) {
var buffer = UnsafeMutableRawBufferPointer.allocate(byteCount: size, alignment: Int(getpagesize()))
// for i in 0..<size {
// buffer[i] = UInt8.random(in: 0...255)
// }
let fp = fopen(source.path, "w")
let success = fcntl(fileno(fp), F_NOCACHE, 1)
assert(success == 0)
let bytes = fwrite(buffer.baseAddress!, 1, size, fp)
assert(bytes == size)
fclose(fp)
}
func copy(source: URL, destination: URL, blockSize: Int32) throws {
try source.withUnsafeFileSystemRepresentation { sourcePath in
try destination.withUnsafeFileSystemRepresentation { destinationPath in
let state = copyfile_state_alloc()
defer {
copyfile_state_free(state)
}
var blockSize = blockSize
if copyfile_state_set(state, UInt32(COPYFILE_STATE_BSIZE), &blockSize) != 0 || copyfile_state_set(state, UInt32(COPYFILE_STATE_STATUS_CB), unsafeBitCast(copyfileCallback, to: UnsafeRawPointer.self)) != 0 || copyfile_state_set(state, UInt32(COPYFILE_STATE_STATUS_CTX), unsafeBitCast(self, to: UnsafeRawPointer.self)) != 0 || copyfile(sourcePath, destinationPath, state, copyfile_flags_t(COPYFILE_ALL | COPYFILE_NOFOLLOW | COPYFILE_EXCL)) != 0 {
throw NSError(domain: NSPOSIXErrorDomain, code: Int(errno))
}
}
}
}
private let copyfileCallback: copyfile_callback_t = { what, stage, state, src, dst, ctx in
if what == COPYFILE_COPY_DATA {
if stage == COPYFILE_ERR {
return COPYFILE_QUIT
}
}
return COPYFILE_CONTINUE
}
}