Accelerate

RSS for tag

Make large-scale mathematical computations and image calculations with high-performance, energy-efficient computation using Accelerate.

Posts under Accelerate tag

22 Posts
Sort by:

Post

Replies

Boosts

Views

Activity

Segmentation Fault in np.matmul on macOS 15.2 with Accelerate BLAS
I'm encountering a segmentation fault when using np.matmul with relatively small arrays on macOS 15.2. The issue only occurs in specific scenarios and results in a crash with the following error: Exception Type: EXC_BAD_ACCESS (SIGSEGV) Exception Codes: KERN_INVALID_ADDRESS at 0x0000000000000110 Termination Reason: Namespace SIGNAL, Code 11 Segmentation fault: 11 Full error log: Gist link The crash consistently occurs on a specific line where np.matmul is called, despite similar np.matmul operations succeeding earlier in the same script. The issue cannot be reproduced in a separate script that contains identical operations. When I build the NumPy wheel using OpenBLAS, this issue no longer arises, which leads me to believe that it is related to a problem with Accelerate. Environment NumPy Version: 2.1.3 Python Version: 3.12.7 OS Version: macOS 15.2 BLAS Configuration: Build Dependencies: blas: detection method: system found: true include directory: unknown lib directory: unknown name: accelerate openblas configuration: unknown pc file directory: unknown version: unknown lapack: detection method: system found: true include directory: unknown lib directory: unknown name: accelerate openblas configuration: unknown pc file directory: unknown version: unknown Compilers: c: commands: cc linker: ld64 name: clang version: 15.0.0 c++: commands: c++ linker: ld64 name: clang version: 15.0.0 cython: commands: cython linker: cython name: cython version: 3.0.11 Machine Information: build: cpu: aarch64 endian: little family: aarch64 system: darwin host: cpu: aarch64 endian: little family: aarch64 system: darwin
1
0
95
4d
Polynomial Coefficients calculation
How can I calculate polynomial coefficients for Tone Curve points: // • Red channel: (0, 0), (60, 39), (128, 128), (255, 255) // • Green channel: (0, 0), (63, 50), (128, 128), (255, 255) // • Blue channel: (0, 0), (60, 47), (119, 119), (255, 255) CIFilter: func colorCrossPolynomial(inputImage: CIImage) -> CIImage? { let colorCrossPolynomial = CIFilter.colorCrossPolynomial() let redfloatArr: [CGFloat] = [1, 1, 1, 1, 0, 0, 0, 0, 0, 0] let greenfloatArr: [CGFloat] = [0, 1, 1, 0, 0, 0, 0, 0, 0, 1] let bluefloatArr: [CGFloat] = [0, 0, 1, 0, 0, 0, 0, 1, 1, 0] colorCrossPolynomial.inputImage = inputImage colorCrossPolynomial.blueCoefficients = CIVector(values: bluefloatArr, count: bluefloatArr.count) colorCrossPolynomial.redCoefficients = CIVector(values: redfloatArr, count: redfloatArr.count) colorCrossPolynomial.greenCoefficients = CIVector(values: greenfloatArr, count: greenfloatArr.count) return colorCrossPolynomial.outputImage }
1
0
143
6d
Questions about calculate the square root using Accelerate
I am currently studying the Accelerate library by referring to Apple documentation. Here is the link to the referenced document: https://developer.apple.com/documentation/accelerate/veclib/vforce When I executed the sample code provided at the bottom of the document, I found a case where the results were different. let n = 10_000 let x = (0..<n).map { _ in Float.random(in: 1 ... 10_000) } let y = x.map { return sqrt($0) } and let y = [Float](unsafeUninitializedCapacity: n) { buffer, initializedCount in vForce.sqrt(x, result: &buffer) initializedCount = n } The code below is provided to observe the issue described above. import Accelerate Task { let n = 1//10_000 let x = (0..<n).map { _ in Float(6737.015)//Float.random(in: 1 ... 10_000) } let y = x.map { return sqrt($0) } try? await Task.sleep(nanoseconds: 1_000_000_000) let z = [Float](unsafeUninitializedCapacity: n) { buffer, initializedCount in vForce.sqrt(x, result: &buffer) initializedCount = n } } For a value of 6737.015 when calculating the square root: Using the sqrt(_:) function gives the result 82.07932, While using the vForce.sqrt(_:result:) function gives the result 82.07933. Using a calculator, the value comes out as 82.07932139, which shows that the result from vForce is incorrect. Could you explain the reason behind this difference?
2
0
199
1w
Strange results from Accelerate DFT
Not sure if this is the right forum (don't see one for Maths...). There's no double-precision version of the newest Swift API for doing a DCT, so I tried to roll my own. Was expecting to have to scale the output, but by a constant factor... The results are out by a factor of about 0.2 at the start, and this falls smoothly to about 0.0067 by the end of the "period". I've included a direct calculation of the normalised DCT, as this was the kind of scaling error I was expecting to see. I think I'm doing it right: mirror the data into an array twice the size of the original, pass that in as the real part, pass zeroes in as the imaginary part, then take the real result. It's alternating zeroes all right, and it's in sync, but ... DO I need to scale the input? Chebyshev nodes. I'm at a loss. Anybody know what I'm doing wrong? PS Sorry for the huge long post. // Unnormalized values from SciPy, to save rewrites. // scipy.fft.dct([x for x in range(1,49)], type=2, norm=None) let fromSciPy = [ 2.35200000e+03, ... -3.27541474e-02] // Direct calculation which produces scaled output that agrees with SciPy // (although the zeroes don't seem to be exactly zero...) func dctII(input: [Double]) -> [Double] { let N = input.count let n = 1 / (2 * Double(N)) let factor = sqrt(2.0 / Double(N)) var result = (0..<N).map { k in factor * (0..<N).reduce(into: Double()) { sum_k, j in sum_k += input[j] * cos(Double.pi * Double(k) * (2 * Double(j) + 1) * n) } } result[0] /= sqrt(2.0) return result } let format = FloatingPointFormatStyle<Double>()... let factor = FloatingPointFormatStyle<Double>()... // According to the docs, an acceptable size. // .init() would fail if it wasn't. Effect's the same with monger sequences. let N = 48 let input = (1...N).map { Double($0) } // The bit that doesn't work. func dctII_viaDFT(input: [Double]) -> [Double] { let N = input.count // Initialize DFT object for 2N points guard let dft = try? vDSP.DiscreteFourierTransform( count: N * 2, direction: .forward, transformType: .complexComplex, ofType: Double.self ) else { fatalError("Failed to create DFT object") } // Extend the input signal to enforce even symmetry var real = [Double](repeating: 0, count: N * 2) var imag = [Double](repeating: 0, count: N * 2) for i in 0..<N { real[i] = input[i] real[(N * 2) - 1 - i] = input[i] } // Compute the DFT let (re, im) = dft.transform(real: real, imaginary: imag) // Extract DCT-II coefficients from the real part of the first N terms var dctCoefficients = [Double](repeating: 0, count: N) for k in 0..<N { dctCoefficients[k] = re[k] } // Normalize to match SciPy's orthogonal normalization let scaleFactor = sqrt(2 / Double(N)) dctCoefficients[0] *= scaleFactor for k in 1..<N { dctCoefficients[k] *= scaleFactor } return dctCoefficients } let viaFFT = dctII_viaDFT(input: input) let direct = dctII(input: input) print(" SciPy Direct Drct/SciPy DFT DFT/SciPy FT/Direct") for i in direct.indices where viaFFT[i] != 0 { let truth = fromSciPy[i] let naive = direct[i] let weird = viaFFT[i] print("\(truth.formatted(format))\t\(naive.formatted(format))\t\((truth / naive).formatted(factor))\t\(weird.formatted(format))\t\t\((weird / truth).formatted(factor))\t\((weird / naive).formatted(factor))") } And here are the results (I've missed out the zero terms): SciPy Direct Drct/SciPy DFT DFT/SciPy FT/Direct +352.000000 +169.740979 3.856406 +480.099990 0.204124 2.828427 -933.609299 -095.286100 9.797959 -190.470165 0.204015 1.998929 -103.585662 -010.572167 9.797959 -021.042519 0.203141 1.990369 -037.182805 -003.794954 9.797959 -007.488532 0.201398 1.973287 -018.886898 -001.927636 9.797959 -003.754560 0.198792 1.947754 -011.356294 -001.159047 9.797959 -002.218278 0.195335 1.913881 -007.542756 -000.769829 9.797959 -001.440976 0.191041 1.871812 -005.347733 -000.545801 9.797959 -000.994300 0.185929 1.821728 -003.968777 -000.405062 9.797959 -000.714465 0.180021 1.763843 -003.045311 -000.310811 9.797959 -000.527882 0.173343 1.698404 -002.395797 -000.244520 9.797959 -000.397515 0.165922 1.625693 -001.920738 -000.196035 9.797959 -000.303073 0.157790 1.546021 -001.561880 -000.159409 9.797959 -000.232693 0.148983 1.459728 -001.283256 -000.130972 9.797959 -000.179063 0.139538 1.367185 -001.061666 -000.108356 9.797959 -000.137480 0.129495 1.268787 -000.881581 -000.089976 9.797959 -000.104818 0.118898 1.164955 -000.732264 -000.074736 9.797959 -000.078932 0.107791 1.056136 -000.606076 -000.061857 9.797959 -000.058319 0.096223 0.942793 -000.497433 -000.050769 9.797959 -000.041906 0.084243 0.825414 -000.402149 -000.041044 9.797959 -000.028916 0.071903 0.704500 -000.316996 -000.032353 9.797959 -000.018783 0.059254 0.580569 -000.239422 -000.024436 9.797959 -000.011098 0.046352 0.454153 -000.167336 -000.017079 9.797959 -000.005564 0.033251 0.325791 -000.098968 -000.010101 9.797959 -000.001980 0.020008 0.196034 -000.032754 -000.003343 9.797959 -000.000219 0.006679 0.065438
4
0
270
Dec ’24
Integer arithmetic with Accelerate
Almost all the functions in Accelerate are for single precision (Float) and double precision (Double) operations. However, I stumbled upon three integer arithmetic functions which operate on Int32 values. Are there any more functions in Accelerate that operate on integer values? If not, then why aren't there more functions that work with integers?
1
0
312
Oct ’24
vImageConverter_CreateWithCGImageFormat Fails with kvImageInvalidImageFormat When Trying to Convert CMYK to RGB
So I get JPEG data in my app. Previously I was using the higher level NSBitmapImageRep API and just feeding the JPEG data to it. But now I've noticed on Sonoma If I get a JPEG in the CMYK color space the NSBitmapImageRep renders mostly black and is corrupted. So I'm trying to drop down to the lower level APIs. Specifically I grab a CGImageRef and and trying to use the Accelerate API to convert it to another format (to hopefully workaround the issue... CGImageRef sourceCGImage = `CGImageCreateWithJPEGDataProvider(jpegDataProvider,` NULL, shouldInterpolate, kCGRenderingIntentDefault); Now I use vImageConverter_CreateWithCGImageFormat... with the following values for source and destination formats: Source format: (derived from sourceCGImage) bitsPerComponent = 8 bitsPerPixel = 32 colorSpace = (kCGColorSpaceICCBased; kCGColorSpaceModelCMYK; Generic CMYK Profile) bitmapInfo = kCGBitmapByteOrderDefault version = 0 decode = 0x000060000147f780 renderingIntent = kCGRenderingIntentDefault Destination format: bitsPerComponent = 8 bitsPerPixel = 24 colorSpace = (DeviceRBG) bitmapInfo = 8197 version = 0 decode = 0x0000000000000000 renderingIntent = kCGRenderingIntentDefault But vImageConverter_CreateWithCGImageFormat fails with kvImageInvalidImageFormat. Now if I change the destination format to use 32 bitsPerpixel and use alpha in the bitmap info the vImageConverter_CreateWithCGImageFormat does not return an error but I get a black image just like NSBitmapImageRep
13
0
955
Nov ’24
MLTensor computation took more time than expected.
func testMLTensor() { let t1 = MLTensor(shape: [2000, 1], scalars: [Float](repeating: Float.random(in: 0.0...10.0), count: 2000), scalarType: Float.self) let t2 = MLTensor(shape: [1, 3000], scalars: [Float](repeating: Float.random(in: 0.0...10.0), count: 3000), scalarType: Float.self) for _ in 0...50 { let t = Date() let x = (t1 * t2) print("MLTensor", t.timeIntervalSinceNow * 1000, "ms") } } testMLTensor() The above code took more time than expected, especially in the early stage of iteration.
1
0
577
Aug ’24
MLTensor computation took more time than expected.
func testMLTensor() { let t1 = MLTensor(shape: [2000, 1], scalars: [Float](repeating: Float.random(in: 0.0...10.0), count: 2000), scalarType: Float.self) let t2 = MLTensor(shape: [1, 3000], scalars: [Float](repeating: Float.random(in: 0.0...10.0), count: 3000), scalarType: Float.self) for _ in 0...50 { let t = Date() let x = (t1 * t2) print("MLTensor", t.timeIntervalSinceNow * 1000, "ms") } } testMLTensor() The above code took more time than expected, especially in the early stage of iteration.
0
0
442
Aug ’24
MLTensor computation took more time than expected.
func testMLTensor() { let t1 = MLTensor(shape: [2000, 1], scalars: [Float](repeating: Float.random(in: 0.0...10.0), count: 2000), scalarType: Float.self) let t2 = MLTensor(shape: [1, 3000], scalars: [Float](repeating: Float.random(in: 0.0...10.0), count: 3000), scalarType: Float.self) for _ in 0...50 { let t = Date() let x = (t1 * t2) print("MLTensor", t.timeIntervalSinceNow * 1000, "ms") } } testMLTensor() The above code took more time than expected, especially in the early stage of iteration.
0
0
397
Aug ’24
Documentation and usage of BNNS.NormalizationLayer
Hello everybody, I am running into an error with BNNS.NormalizationLayer. It appears to only work with .vector, and matrix shapes throws layerApplyFail during training. Inference doesn't throw but the output stays the same. How to correctly use BNNS.NormalizationLayer with matrix shapes? How to debug layerApplyFail exception? Thanks let array: [Float32] = [ 01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, 12, 13, 14, 15, 16, 17, 18, ] // let inputShape: BNNS.Shape = .vector(6 * 3) // works let inputShape: BNNS.Shape = .matrixColumnMajor(6, 3) let input = BNNSNDArrayDescriptor.allocateUninitialized(scalarType: Float32.self, shape: inputShape) let output = BNNSNDArrayDescriptor.allocateUninitialized(scalarType: Float32.self, shape: inputShape) let beta = BNNSNDArrayDescriptor.allocate(repeating: Float32(0), shape: inputShape, batchSize: 1) let gamma = BNNSNDArrayDescriptor.allocate(repeating: Float32(1), shape: inputShape, batchSize: 1) let activation: BNNS.ActivationFunction = .identity let layer = BNNS.NormalizationLayer(type: .layer(normalizationAxis: 0), input: input, output: output, beta: beta, gamma: gamma, epsilon: 1e-12, activation: activation)! let layerInput = BNNSNDArrayDescriptor.allocate(initializingFrom: array, shape: inputShape) let layerOutput = BNNSNDArrayDescriptor.allocateUninitialized(scalarType: Float32.self, shape: inputShape) // try layer.apply(batchSize: 1, input: layerInput, output: layerOutput, for: .inference) // No throw try layer.apply(batchSize: 1, input: layerInput, output: layerOutput, for: .training) _ = layerOutput.makeArray(of: Float32.self) // All zeros when .inference
1
0
613
Jul ’24
Performant alternative to scaling a CIImage / PixelBuffer
Hey, I’m building a camera app where I am applying real time effects to the view finder. One of those effects is a variable blur, so to improve performance I am scaling down the input image using CIFilter.lanczosScaleTransform(). This works fine and runs at 30FPS, but when running the metal profiler I can see that the scaling transforms use a lot of GPU time, almost as much as the variable blur. Is there a more efficient way to do this? The simplified chain is like this: Scale down viewFinder CVPixelBuffer (CIFilter.lanczosScaleTransform) Scale up depthMap CVPixelBuffer to match viewFinder size (CIFilter.lanczosScaleTransform) Create CIImages from both CVPixelBuffers Apply VariableDepthBlur (CIFilter.maskedVariableBlur) Scale up final image to metal view size (CIFilter.lanczosScaleTransform) Render CIImage to a MTKView using CIRenderDestination From some research, I wonder if scaling the CVPixelBuffer using the accelerate framework would be faster? Also, Instead of scaling the final image, perhaps I could offload this to the metal view? Any pointers greatly appreciated!
2
0
728
Jul ’24
Peculiar EXC_BAD_ACCESS, involving sparse matrices
Helo all, Currently, I'm working on an iOS app that performs measurement and shows the results to the user in a graph. I use a Savitzky-Golay filter to filter out noise, so that the graph is nice and smooth. However, the code that calculates the Savitzky-Golay coefficients using sparse matrices crashes sometimes, throwing an EXC_BAD_ACCESS. I tried to find out what the problem is by turning on Address Sanitizer and Thread Sanitizer, but, for some reason, the bad access exception isn't thrown when either of these is on. What else could I try to trace back the problem? Thanks in advance, CaS To reproduce the error, run the following: import SwiftUI import Accelerate struct ContentView: View { var body: some View { VStack { Button("Try", action: test) } .padding() } func test() { for windowLength in 3...100 { let coeffs = SavitzkyGolay.coefficients(windowLength: windowLength, polynomialOrder: 2) print(coeffs) } } } class SavitzkyGolay { static func coefficients(windowLength: Int, polynomialOrder: Int, derivativeOrder: Int = 0, delta: Int = 1) -> [Double] { let (halfWindow, remainder) = windowLength.quotientAndRemainder(dividingBy: 2) var pos = Double(halfWindow) if remainder == 0 { pos -= 0.5 } let X = [Double](stride(from: Double(windowLength) - pos - 1, through: -pos, by: -1)) let P = [Double](stride(from: 0, through: Double(polynomialOrder), by: 1)) let A = P.map { exponent in X.map { pow($0, exponent) } } var B = [Double](repeating: 0, count: polynomialOrder + 1) B[derivativeOrder] = Double(factorial(derivativeOrder)) / pow(Double(delta), Double(derivativeOrder)) return leastSquaresSolution(A: A, B: B) } static func leastSquaresSolution(A: [[Double]], B: [Double]) -> [Double] { let sparseA = A.sparseMatrix() var sparseAValuesCopy = sparseA.values var xValues = [Double](repeating: 0, count: A.transpose().count) var bValues = B sparseAValuesCopy.withUnsafeMutableBufferPointer { valuesPtr in let a = SparseMatrix_Double( structure: sparseA.structure, data: valuesPtr.baseAddress! ) bValues.withUnsafeMutableBufferPointer { bPtr in xValues.withUnsafeMutableBufferPointer { xPtr in let b = DenseVector_Double( count: Int32(B.count), data: bPtr.baseAddress! ) let x = DenseVector_Double( count: Int32(A.transpose().count), data: xPtr.baseAddress! ) #warning("EXC_BAD_ACCESS is thrown below") print("This code is executed...") let status = SparseSolve(SparseLSMR(), a, b, x, SparsePreconditionerDiagScaling) print("...but, if an EXC_BAD_ACCESS is thrown, this code isn't") if status != SparseIterativeConverged { fatalError("Failed to converge. Returned with error \(status).") } } } } return xValues } } func factorial(_ n: Int) -> Int { n < 2 ? 1 : n * factorial(n - 1) } extension Array where Element == [Double] { func sparseMatrix() -> (structure: SparseMatrixStructure, values: [Double]) { let columns = self.transpose() var rowIndices: [Int32] = columns.map { column in column.indices.compactMap { indexInColumn in if column[indexInColumn] != 0 { return Int32(indexInColumn) } return nil } }.reduce([], +) let sparseColumns = columns.map { column in column.compactMap { if $0 != 0 { return $0 } return nil } } var counter = 0 var columnStarts = [Int]() for sparseColumn in sparseColumns { columnStarts.append(counter) counter += sparseColumn.count } let reducedSparseColumns = sparseColumns.reduce([], +) columnStarts.append(reducedSparseColumns.count) let structure: SparseMatrixStructure = rowIndices.withUnsafeMutableBufferPointer { rowIndicesPtr in columnStarts.withUnsafeMutableBufferPointer { columnStartsPtr in let attributes = SparseAttributes_t() return SparseMatrixStructure( rowCount: Int32(self.count), columnCount: Int32(columns.count), columnStarts: columnStartsPtr.baseAddress!, rowIndices: rowIndicesPtr.baseAddress!, attributes: attributes, blockSize: 1 ) } } return (structure, reducedSparseColumns) } func transpose() -> Self { let columns = self.count let rows = self.reduce(0) { Swift.max($0, $1.count) } return (0 ..< rows).reduce(into: []) { result, row in result.append((0 ..< columns).reduce(into: []) { result, column in result.append(row < self[column].count ? self[column][row] : 0) }) } } }
11
0
1k
Jul ’24
Data storage for a Matrix struct when working with Accelerate
I have a Matrix structure as defined below for working with 2D numerical data in Accelerate. The underlying numerical data in this Matrix struct is stored as an Array. struct Matrix<T> { let rows: Int let columns: Int var data: [T] init(rows: Int, columns: Int, fill: T) { self.rows = rows self.columns = columns self.data = Array(repeating: fill, count: rows * columns) } init(rows: Int, columns: Int, source: (inout UnsafeMutableBufferPointer<T>) -> Void) { self.rows = rows self.columns = columns self.data = Array(unsafeUninitializedCapacity: rows * columns) { buffer, initializedCount in source(&buffer) initializedCount = rows * columns } } subscript(row: Int, column: Int) -> T { get { return self.data[(row * self.columns) + column] } set { self.data[(row * self.columns) + column] = newValue } } } Multiplication is implemented by the functions shown below. import Accelerate infix operator .* func .* (lhs: Matrix<Double>, rhs: Matrix<Double>) -> Matrix<Double> { precondition(lhs.rows == rhs.rows && lhs.columns == rhs.columns, "Matrices must have same dimensions") let result = Matrix<Double>(rows: lhs.rows, columns: rhs.columns) { buffer in vDSP.multiply(lhs.data, rhs.data, result: &buffer) } return result } func * (lhs: Matrix<Double>, rhs: Matrix<Double>) -> Matrix<Double> { precondition(lhs.columns == rhs.rows, "Number of columns in left matrix must equal number of rows in right matrix") var a = lhs.data var b = rhs.data let m = lhs.rows // number of rows in matrices A and C let n = rhs.columns // number of columns in matrices B and C let k = lhs.columns // number of columns in matrix A; number of rows in matrix B let alpha = 1.0 let beta = 0.0 // matrix multiplication where C ← αAB + βC let c = Matrix<Double>(rows: lhs.rows, columns: rhs.columns) { buffer in cblas_dgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, m, n, k, alpha, &a, k, &b, n, beta, buffer.baseAddress, n) } return c } I can also define a Matrix structure where the underlying data is an UnsafeMutableBufferPointer. The buffer is handled by the MatrixData class. struct Matrix<T> { let rows: Int let columns: Int var data: MatrixData<T> init(rows: Int, columns: Int, fill: T) { self.rows = rows self.columns = columns self.data = MatrixData(count: rows * columns, fill: fill) } init(rows: Int, columns: Int) { self.rows = rows self.columns = columns self.data = MatrixData(count: rows * columns) } subscript(row: Int, column: Int) -> T { get { return self.data.buffer[(row * self.columns) + column] } set { self.data.buffer[(row * self.columns) + column] = newValue } } } class MatrixData<T> { var buffer: UnsafeMutableBufferPointer<T> var baseAddress: UnsafeMutablePointer<T> { get { self.buffer.baseAddress! } } init(count: Int, fill: T) { let start = UnsafeMutablePointer<T>.allocate(capacity: count) self.buffer = UnsafeMutableBufferPointer(start: start, count: count) self.buffer.initialize(repeating: fill) } init(count: Int) { let start = UnsafeMutablePointer<T>.allocate(capacity: count) self.buffer = UnsafeMutableBufferPointer(start: start, count: count) } deinit { self.buffer.deinitialize() self.buffer.deallocate() } } Multiplication for this approach is implemented by the functions shown here. import Accelerate infix operator .* func .* (lhs: Matrix<Double>, rhs: Matrix<Double>) -> Matrix<Double> { precondition(lhs.rows == rhs.rows && lhs.columns == rhs.columns, "Matrices must have same dimensions") let result = Matrix<Double>(rows: lhs.rows, columns: lhs.columns) vDSP.multiply(lhs.data.buffer, rhs.data.buffer, result: &result.data.buffer) return result } func * (lhs: Matrix<Double>, rhs: Matrix<Double>) -> Matrix<Double> { precondition(lhs.columns == rhs.rows, "Number of columns in left matrix must equal number of rows in right matrix") let a = lhs.data.baseAddress let b = rhs.data.baseAddress let m = lhs.rows // number of rows in matrices A and C let n = rhs.columns // number of columns in matrices B and C let k = lhs.columns // number of columns in matrix A; number of rows in matrix B let alpha = 1.0 let beta = 0.0 // matrix multiplication where C ← αAB + βC let c = Matrix<Double>(rows: lhs.rows, columns: rhs.columns) cblas_dgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, m, n, k, alpha, a, k, b, n, beta, c.data.baseAddress, n) return c } Both of these approaches give me similar performance. The only difference that I have noticed is the matrix buffer approach allows for reference semantics. For example, the code below uses half the memory with the matrix buffer approach compared to the matrix array approach. This is because b acts as a reference to a using the matrix buffer approach; otherwise, the matrix array approach makes a full copy of a. let n = 10_000 let a = Matrix<Double>(rows: n, columns: n, fill: 0) var b = a b[0, 0] = 99 b[0, 1] = 22 Other than reference semantics, are there any reasons to use one of these approaches over the other?
3
0
649
Jun ’24
Problem with vImagePiecewiseGamma_Planar8
In our app we use the following function for inverting a CGImageRef using vImage. The workflow is a obj-c version of the code in the AdjustingTheBrightnessAndContrastOfAnImage sample from Apple: CGImageRef InvertImage( CGImageRef frameImageRef ) { CGImageRef resultImage = nil; CGBitmapInfo imgBitmapInfo = CGImageGetBitmapInfo( frameImageRef ); size_t img_bPC = CGImageGetBitsPerComponent( frameImageRef ); size_t img_bPP = CGImageGetBitsPerPixel( frameImageRef ); vImage_CGImageFormat invIFormat; invIFormat.bitsPerComponent = img_bPC; invIFormat.bitsPerPixel = img_bPP; invIFormat.colorSpace = (img_bPP == 8) ? gDeviceGrayColorSpaceRef : gDeviceRGBColorSpaceRef; invIFormat.bitmapInfo = imgBitmapInfo; invIFormat.version = 0; invIFormat.decode = 0; invIFormat.renderingIntent = kCGRenderingIntentDefault; vImage_Buffer sourceVImageBuffer; vImage_Error viErr = vImageBuffer_InitWithCGImage( &sourceVImageBuffer, &invIFormat, nil, frameImageRef, kvImageNoFlags ); if (viErr == kvImageNoError) { vImage_Buffer destinationVImageBuffer; viErr = vImageBuffer_Init( &destinationVImageBuffer, sourceVImageBuffer.height, sourceVImageBuffer.width, img_bPP, kvImageNoFlags ); if (viErr == kvImageNoError) { float linearCoeffs[2] = { -1.0, 1.0 }; float expoCoeffs[3] = { 1.0, 0.0, 0.0 }; float gamma = 0.0; Pixel_8 boundary = 255; viErr = vImagePiecewiseGamma_Planar8( &sourceVImageBuffer, &destinationVImageBuffer, expoCoeffs, gamma, linearCoeffs, boundary, kvImageNoFlags ); if (viErr == kvImageNoError) { CGImageRef newImgRef = vImageCreateCGImageFromBuffer( &destinationVImageBuffer, &invIFormat, nil, nil, kvImageNoFlags, &viErr ); if (viErr == kvImageNoError) resultImage = newImgRef; } free( destinationVImageBuffer.data ); } free( sourceVImageBuffer.data ); } return resultImage; } The function works fine for 8-bit monochrome images. When I try it with 24-bit RGB images, although I get no errors from any of the calls, the output shows only the 1/3 of the image inverted as expected. What am I missing? I suspect I might have to use a different function for 24-bit images (instead of the vImagePiecewiseGamma_Planar8) but I cannot find which one in the headers. Thanks.
9
0
867
May ’24
How to make AppleArchive + ZLIB compatible with non-Apple systems?
I very much love the performance of AppleArchive and how approachable it is, and believe it to be one of the most underrated frameworks in the SDK. In a scenario quite typical, I need to compress files and submit them to a back end, where the server handling the files is not an Apple platform. Obviously, individual files compressed with AA will not be compatible with other systems out of the box, but there are compatible compression algorithms. ZLIB is recommended for cases where cross-platform compatibility is necessary. As I understand it, AA adds additional headers to files in order to support preservation of file attributes, ownership and other data. Following the steps outlined in the docs, I've written code to compress single files. I can easily compress and decompress using AA without issue. To create a proof-of-concept, I've written some code in python using its zlib module. In order to get to the compressed data, it's necessary to handle the AA header fields. The first 64 bytes of a compressed file appear as follows: AA documentation states that ZLIB Level 5 compression is used, and comes in the form of raw DEFLATE data prefixed with two header bytes. In this case, these bytes are 78 5e, which begin at the 28th byte and appear as x^ above. My hope was that seeking to the start of the compressed data, then passing what remains to a decompressor object initialized with the correct WBITS would work. It works fantastically for files 1MB or less in size. Files which are larger only decompress the first megabyte. The decompressor object is reaching EOF, and I've tried various ways of attempting to seek to and concatenate the other blocks, but to no avail. Using the older Compression framework and the method specified here, with the same algorithm, yields different results. I can decompress files of any size using python's zlib module. My assumption is that AppleArchive is doing something differently in order to support its multithreading capabilities, perhaps even with asymmetric encoding where the blocks are not ordered. Is there a solution to this problem? If not, why would one ever use ZLIB versus the much more efficient LZFSE? I could use the older Compression API, but it is significantly slower compressing synchronously, and performance is critical with the application I am adding this feature to.
1
0
883
Apr ’24
How to get the position of dominant colors in CGImage?
so, my app needs to find the dominant palette and the position in the image of the k-most dominant colors. I followed the very useful sample project from the vImage documentation https://developer.apple.com/documentation/accelerate/bnns/calculating_the_dominant_colors_in_an_image and the algorithm works fine although I can't wrap my head around how should I go on about and linking said colors with a point in the image. Since the algorithm works by filling storages first, I tried also filling an array of CGPoints called LocationStorage and working with that //filling the array for i in 0...width { for j in 0...height { locationStorage.append( CGPoint(x: i, y: j)) } . . . //working with the array let randomIndex = Int.random(in: 0 ..&lt; width * height) centroids.append(Centroid(red: redStorage[randomIndex], green: greenStorage[randomIndex], blue: blueStorage[randomIndex], position: locationStorage[randomIndex])) } struct Centroid { /// The red channel value. var red: Float /// The green channel value. var green: Float /// The blue channel value. var blue: Float /// The number of pixels assigned to this cluster center. var pixelCount: Int = 0 var position: CGPoint = CGPointZero init(red: Float, green: Float, blue: Float, position: CGPoint) { self.red = red self.green = green self.blue = blue self.position = position } } although it's not accurate. I also tried force trying every pixel in the image to get as close to each color but I think it's too slow. What do you think my approach should be? Let me know if you need additional info Please be kind I'm learning Swift.
3
0
866
Apr ’24
vDSP_conv returning wrong results
Hi! We're trying to calculate the delay between two audio inputs, represented by float arrays, by getting their maximum correlation, using vDSP_conv. Our solution is very similar to the one in the first answer here, only we are looking at a 0..5000 radius to find the delay in ms: https://stackoverflow.com/questions/65571299/swift-read-two-audio-files-and-calculate-their-cross-correlation The problem is that we had mixed results, sometimes the calculated delay is ok, but other times it isn't. Our best guess is that there is some overflow error happening, since the arrays we're working with can be pretty large (they can have around 4-5 million values). If we use a simple foreach to calculate these correlations we get good results, but obviously the is quite slow. Did anyone have similar problems?
1
0
842
Feb ’24