I heard MetalFX TAA utilize NPU
I wonder SpatialScaling also utilizes NPU
I highly doubt that MetalFX utilizes the ANE. More information in https://developer.apple.com/forums/thread/707667. The reason is, switching contexts between accelerators incurs a lot of overhead, and the latency might be several milliseconds. Even if the Neural Engine has higher throughput, it's harder to access and less programmable. Furthermore, Apple GPUs, starting with the Apple7 generation, have hardware acceleration for matrix multiplication. It's called simdgroup_matrix and documented in the MSL specification. It increases the ALU utilization from 25% to 80%. The fact that this is limited to Apple7 and Apple8 GPUs - the only GPUs with simdgroup_matrix - further supports this hypothesis.
More explanation on how powerful simdgroup_matrix is: M1 Max has a GPU with 10 TFLOPS F32. Double that equals 20 TFLOPS F16, 80% is 16 TFLOPS F32. This is more processing power than the A14/M1's ANE, which is 11 TFLOPS F16. This could explain why Apple currently limits MetalFX to high-end Macs, where the GPU is more powerful than the ANE. On an A14/A15, it might be more power-efficient to use an image upscaling CoreML model on the ANE.