1 Reply
      Latest reply on Dec 7, 2019 8:14 AM by chris@interealtime
      maccan Level 1 Level 1 (0 points)

        My MacBook Pro 2018 has two GPUs:

        AMD Radeon Pro 555X and Intel(R) UHD Graphics 630.

        I supposed the AMD 555X would be superior in performance compared to the Intel(R) UHD Graphics 630.

         

        However, I observed a huge performance difference for Metal Performance Shaders (MPS) between the two GPUs.

        The Intel GPU performs the simple test code (a MPSMatrixMultiplication) 3 times faster compared to the AMD 555X.

         

        You can compile the attached code in a Terminal by 'swiftc -O matrixMul.swift'

        and run it by executing ./matrixMul

         

        In the test code, I can select execution on the AMD 555X with the statement

        let device = devices[0] // AMD Radeon Pro 555X

         

        and I get the following:

         

        start calculation on GPU-device <BronzeMtlDevice: 0x1071bf000>

            name = AMD Radeon Pro 555X

        ...

        GPU execution time = 12.612 seconds

         

        The Intel(R) UHD Graphics 630 is selected by

        let device = devices[1] // Intel(R) UHD Graphics 630

         

        and I get

         

        start calculation on GPU-device <MTLIGAccelDevice: 0x10f9c5000>

            name = Intel(R) UHD Graphics 630

        ...

        GPU execution time = 3.735 seconds

         

        As you can see the Intel UHD 630 performed the MPSMatrixMultiplication 3 times faster than the AMD 555X.

        I thought the AMD 555X would be more powerful than the Intel UHD 630, but this test shows the opposite.

        Any idea?

         

        -------------------- test code

         

        import Metal

         

        import Accelerate

        import MetalPerformanceShaders


        let devices = MTLCopyAllDevices()

        print("available GPUs")

        for d in devices {

          print(d)

        }

        let device = devices[0] // AMD Radeon Pro 555X

        //let device = devices[1] // Intel(R) UHD Graphics 630

        let commandQueue = device.makeCommandQueue()!;

        let commandBuffer = commandQueue.makeCommandBuffer()!;


        let n = 8192 // matrix dimension (n x n)

        let rowsA = n

        let columnsA = n

        let rowsB = n

        let columnsB = n

        let rowsC = n

        let columnsC = n


        // matrix A data

        var arrayA = [Float](repeating: 1, count: rowsA * columnsA)

        for i in 0..<arrayA.count {

          arrayA[i] = Float(2 * drand48() - 1)

        }


        // matrix B data

        var arrayB = [Float](repeating: 2, count: rowsB * columnsB)

        for i in 0..<arrayB.count {

          arrayB[i] = Float(2 * drand48() - 1)

        }


        // MTL data buffers for Matrices A,B,C

        let bufferA = device.makeBuffer(bytes: arrayA,

          length: rowsA * columnsA * MemoryLayout<Float>.stride,

          options: [])!;


        let bufferB = device.makeBuffer(bytes: arrayB,

          length: rowsB * columnsB * MemoryLayout<Float>.stride,

          options: [])!;


        let bufferC = device.makeBuffer(length: rowsC * columnsC * MemoryLayout<Float>.stride,

          options: [])!;


        // Matrix descriptions

        let descA = MPSMatrixDescriptor(dimensions: rowsA, columns: columnsA,

          rowBytes: columnsA * MemoryLayout<Float>.stride,

          dataType: .float32);


        let descB = MPSMatrixDescriptor(dimensions: rowsB, columns: columnsB,

          rowBytes: columnsB * MemoryLayout<Float>.stride,

          dataType: .float32);


        let descC = MPSMatrixDescriptor(dimensions: rowsC, columns: columnsC,

          rowBytes: columnsC * MemoryLayout<Float>.stride,

          dataType: .float32);


        // MTL matrix buffers

        let matrixA = MPSMatrix(buffer: bufferA, descriptor: descA);

        let matrixB = MPSMatrix(buffer: bufferB, descriptor: descB);

        let matrixC = MPSMatrix(buffer: bufferC, descriptor: descC);


        let matrixMultiplication = MPSMatrixMultiplication(device: device,

          transposeLeft: false, transposeRight: false,

          resultRows: rowsC, resultColumns: columnsC,

          interiorColumns: columnsA, alpha: 1, beta: 0);


        matrixMultiplication.encode(commandBuffer: commandBuffer, leftMatrix: matrixA,

          rightMatrix: matrixB, resultMatrix: matrixC);


        print("start calculation on GPU-device \(device)")

        let start = DispatchTime.now().uptimeNanoseconds;

        commandBuffer.commit()

        commandBuffer.waitUntilCompleted()

        let end = DispatchTime.now().uptimeNanoseconds

        let execTime = String(format: "%.3f", 1e-9 * Double(end - start))


        // we look at the result

        let rawPointer = matrixC.data.contents();

        let count = matrixC.rows * matrixC.columns;

        let typedPointer = rawPointer.bindMemory(to: Float.self, capacity: count);

        let bufferedPointer = UnsafeBufferPointer(start: typedPointer, count: count);


        // Print the first 10 results, to make sure it's not all 0s or NaNs.

        print("\nFirst 5 elements:")

        for i in 0..<5 {

          print("element \(i) =", bufferedPointer[i]);

        }

        print("...")

        print("last element =", bufferedPointer[n * n - 1]);

        print("...")

        print("GPU execution time = \(execTime) seconds")

        exit(0)

        ------------------ end test-code