I feel you have a bit wrong idea what Neural Engine is. It have nothing common to CPU or GPU - it's just a machine learning accelerator with very limited area of application - even not every layer type in your model can be used. Look at This FAQ for better understanding what it is and what it can.
Q: Is there any low-level API to create my very own work-loads?
A: Yes and No. Low level AppleNeuralEngine.framework is private to Apple and you can't use it.
But:
take a look at ANE Tools - compiler and decompiler for Neural Engine. Also there is coremltools - this will help to interface with TensorFlow and PyTorch
Q: Can I use the Neural Engine to offload the CPU? I am especially interested in parallelism using threads.
A: Basically - No. ANE can't execute CPU/GPU code and don't have threads. It operates with layer connectivity map and net weights.
One more thing - vDSP and veclib DONT use ANE.