Would Accelerate improve this bit permutation function?

I’m working on a shuffling program that treats an array of bytes as a something-x-4 bitarray and randomly permutes each column of four bits. Here’s its current set-up, in Swift 3:


// permutation shift amounts for four items
let perms = [
    (0, 0, 0, 0), (0, 0, 1, 3), (0, 1, 3, 0), (0, 2, 3, 3), (0, 1, 1, 2), (0, 2, 0, 2),
    (1, 3, 0, 0), (1, 3, 1, 3), (2, 3, 3, 0), (3, 3, 3, 3), (2, 3, 1, 2), (3, 3, 0, 2),
    (1, 1, 2, 0), (1, 2, 2, 3), (2, 0, 2, 0), (3, 0, 2, 3), (2, 2, 2, 2), (3, 1, 2, 2),
    (1, 1, 1, 1), (1, 2, 0, 1), (2, 0, 1, 1), (3, 0, 0, 1), (2, 2, 3, 1), (3, 1, 3, 1)
]

func getByteBits(_ bytearray: inout UnsafeMutableBufferPointer<UInt8>, _ index: Int) -> UInt8 {
    let (byte, bit) = (index >> 3, UInt8(7 - (index & 7)))
    return (bytearray[byte] >> bit) & 1
}

func getMask(_ index: Int) -> UInt8 {
    let bit = UInt8(7 - (index & 7))
    return 1 << bit
}

// ‘bytearray’ is array of UInt8s, ‘randarray’ is array of random UInt32s
func matrixQuadShuffle(_ bytearray: inout [UInt8], randarray: inout [UInt32]) {
  
    let bitLen = bytearray.count * 8
    let bit8Len = bytearray.count
  
    // done in parallel…
    bytearray.withUnsafeMutableBufferPointer { bBuff in
        DispatchQueue.concurrentPerform(iterations: 2) { k in
          
            // set array for holding shifted bits within each iteration
            var shiftArray = [UInt8](repeatElement(0, count: 4))
          
            // set up bit indexes within each iteration
            let piece = k * bit8Len
            var bitIndexes = [0 + piece, (bitLen/4) + piece, (bitLen/2) + piece, (3 * bitLen/4) + piece]
          
            for i in 0..<bit8Len {
              
                // get value 0...23 from random array; yes, there's a little bias here
                let randValue = Int(randarray[i + piece]/178956971)
              
                // set to shift amount based on encrypt or decrypt
                let shiftValues = perms[randValue]
              
                // from each index in bytearray, get bit value
                // put bit values in 4-array, shifted by shift amount
                shiftArray[(0 + shiftValues.0) % 4] = getByteBits(&bBuff, bitIndexes[0])
                shiftArray[(1 + shiftValues.1) % 4] = getByteBits(&bBuff, bitIndexes[1])
                shiftArray[(2 + shiftValues.2) % 4] = getByteBits(&bBuff, bitIndexes[2])
                shiftArray[(3 + shiftValues.3) % 4] = getByteBits(&bBuff, bitIndexes[3])
              
                for j in 0..<4 {
                    // clear bit values at indices
                    bBuff[bitIndexes[j]/8] &= ~(getMask(bitIndexes[j]))
                    // put in new bit values (using OR)
                    bBuff[bitIndexes[j]/8] |= (shiftArray[j] << (UInt8(7 - (bitIndexes[j] & 7))))
                    // advance indices by one
                    bitIndexes[j] += 1
                }
            }
        }
    }
}


It works quite well; I can even get a little extra speed by making ‘bitIndexes’ a tuple and unrolling the last loop. I wonder, though, if I could use the Accelerate framework to improve the function’s performance? I briefly looked it over in a previous question, but I didn't end up using it, I'm still not very familiar with the framework and I’m not sure how things might change in Swift 4. Would Accelerate and its various functions be of any use here, or should I just leave well enough alone?