Very low disk performance on certain write patterns.

We are observing an issue where certain file write patterns perform inexplicably slow, in the range of 100ms instead of the in the low hundreds of microseconds. We are running on macOS 10.14.6 using unencrypted APFS volumes and we've tested on MacMini (2018) and MBP 15" (2017) with very similar results. Both of these systems are capable of reaching write throughput in the 2~3GB range and we do see that in general. Certain IO patterns however, bring the system down to a crawl, namely around 10MB/s. We first observed this in our own application and then were able to reproduce the scenario using FIO (https://github.com/axboe/fio) to illustrate the problem. We basically generate a largish file, say 16Gb which doesn't fit entirely in the disk cache and then start writing sequentially to it, using buffered large IOs and sending multiple IOs at a time. Here is the fio job file:


----------------------------

[global]

name=fio-seq-write

rw=write

bs=1024K

group_reporting=1

ioengine=posixaio


[fio-seq-write]

size=16G

iodepth=4

------------------------------


Here is a typical result:


Run status group 0 (all jobs):

WRITE: bw=10.5MiB/s (11.0MB/s), 10.5MiB/s-10.5MiB/s (11.0MB/s-11.0MB/s), io=6325MiB (6632MB), run=600665-600665msec


Note that switching to direct IO (direct=1) immediatelly brings up the throughput to 2GB/s+. Lowering the iodepth down to 1 also brings it up to 1~1.5GB/s which is to be expected with a single IO. Reducing the file size (I guess so the entire file fits in the cache) also has a dramatic performance boost.


All in all, we are forced to conclude that there must be some performance issue in the disk cache on macOS to explain this behavior. We haven't tested with prior OS verions or HFS+ so not sure if the issue is specific to the OS or the file system.



Thanks in advance,

George

Replies

I doubt that many people have ever heard of FIO. I guess use direct IO then? What's your question? Unless you do something specific to avoid it, your I/O is going to use mmap. That means you are essentially running all your I/O through the virtual memory system. That works great for small files, but is really, really slow for large ones.


This is just a quirky aspect of programming that most people don't like to talk about. Just as most developers simply don't know how to manage memory on their own, most developers also don't know how to do I/O. So the operating system, developer toolchain, etc. performs all kinds of gymnastics to ensure that end users get a system that runs halfway decently anyway.

FIO is a performance tool that everyone in the storage industry if familiar with, that's why I used it to illustrate a problem with the OS itself.


The issue is that the system does not run decently, it exhibits pathologically low performance. The difference between 2GB/s and 10MB/s is 200x. The vfs cache is hopelessly broken in Mojave 10.14.6. I've also verified that the problem isn't there in 10.14.2 so it got broken somewhere in between. It's also worth pointing out that during entire FIO run the CPU is pegged at 100% in the kernel, which is wrong for a workload that is essentially IO-bound (vfs cache processing notwithstanding).


This bug has a severe impact on any application that deals with large files that won't fit neatly in the buffer cache. And no, you can't just turn off caching because that has it's own implications on performance. The file system cache is there for a reason.


My question is: Can someone from the kernel group at Apple take a look and let us know when we will have a fix?

Maybe it is just a bug in FIO. Their Mac support doesn't look very strong. They don't list a Mac binary, although they do say it runs on "OSX". Apple hasn't used that name since 2015. Perhaps the problem is APFS and/or local snapshots.


I don't see any Mac related issues on the FIO page. Maybe you should create one to let them know. You are welcome to create feedback with Apple too. Considering the fact that no one has mentioned, or noticed, andy kind of problem, I wouldn't go marking the calendar if I were you.

From my original post: "We first observed this in our own application and then were able to reproduce the scenario using FIO".

From your original post: "Both of these systems are capable of reaching write throughput in the 2~3GB range and we do see that in general."


Sounds good!


"Certain IO patterns however, bring the system down to a crawl"


OK. Then don't do use those patterns.


"reproduce the scenario using FIO"


OK. Then don't use that either.


Sorry, but I don't what you tell you. It sounds like you are describing standard behaviour for the past two decades. If the buffered, memory-mapped I/O layer is having trouble sustaining multiple I/O requests, then don't do that. Either don't issue that many I/O requests or manage it yourself at a low level. This is a consumer operating system. It is designed so that new programmers using Swift won't crash their own programs and bring down the whole system. It was designed for mobile devices and then ported to PCs, a little bit every year. It is all implemented on a brand-new filesystem that should be stable in a couple of years now. By all means, file a bug report. If you can find a couple of hundred thousand people who experience the same bug, you might get some traction and get it fixed in a few months. Until then, you are going to have to find a workaround.

I just found out that the issue has been confirmed by Apple and a supplemental update has been issued. Quoting from the link at the bottom:

"In addition, it fixes a bug that can degrade performance when working with very large files and another that could prevent Pages, Keynote, Numbers, iMovie, and GarageBand from updating."

https://support.apple.com/en-us/HT209149

Unfortunately the fix, while an improvement, is pretty shoddy. As I reported earlier the performance with 10.14.6 was steady at ~10MB/s. After the supplemental update, the performance now fluctuates wildly between 2.2GB/s and 10MB/s. CPU utilization also fluctuates between 8% and 100%, so all in all still subpar performance as compared to 10.14.2.