Posts

Post not yet marked as solved
1 Replies
2.2k Views
Hi,I'm writing performance tests using facilities provided by XCTest with Xcode 10.2: func testPerformanceExample() { self.measure { ... } }My issue is that the code I measure is supposed to be very fast (around 1ms) and I want the test to verify that no regression is introduced. However making the code much slower currently does not trigger a test failure, although Xcode clearly notices that it is slower as shown in these screenshots:https://artoverflow.io/downloads/worse.pnghttps://artoverflow.io/downloads/performance_result.pngThe test logs contain this:Test Case '-[DummyTests.DrawingDrawableTests testPerformanceExample]' started. [...]/DrawingDrawableTests.swift:132: Test Case '-[DummyTests.DrawingDrawableTests testPerformanceExample]' measured [Time, seconds] average: 0.006, relative standard deviation: 14.419%, values: [0.008725, 0.005750, 0.005917, 0.005780, 0.005808, 0.005798, 0.005741, 0.005819, 0.005837, 0.005759], performanceMetricID:com.apple.XCTPerformanceMetric_WallClockTime, baselineName: "Local Baseline", baselineAverage: 0.002, maxPercentRegression: 10.000%, maxPercentRelativeStandardDeviation: 10.000%, maxRegression: 0.100, maxStandardDeviation: 0.100 Test Case '-[DummyTests.DrawingDrawableTests testPerformanceExample]' passed (0.471 seconds).I notice that they contain "maxRegression: 0.100" which looks to be an absolute threshold in seconds. And indeed, making my block take more than 0.1s actually makes the performance test fail. But this makes XCTestCase.measure() pretty useless for anyone that wants to run really optimized code. In current case this is for real time rendering and I want the test to detect when the implementation is not able to reach 60 fps.Of course I could manually run the app and check, but I want to reduce manual testing time. That's where XCTest is supposed to help.Is there any way to configure this maxRegression or more generally to make XCTestCase.measure() usable for fast blocks?At the moment the only workaround I have is to artificially increase the amount of work being done in the measure blocks, but this has 2 very annoying drawbacks:- the test is much slower than needed- this arbitrarily chosen amount of work only allows detecting regression on my own hardware. If the test is run on faster hardware, the amount of measured work needs to be increased again, so this makes my test code not future-proof at all.
Posted
by Ceylo.
Last updated
.
Post not yet marked as solved
0 Replies
808 Views
Hello,I made some modifications to a fragment shader, to blend 4 textures instead of 2. And this made the shader awfully slow (32ms vs 8ms).The input textures are 4K x 4K, RGBA8unorm. Which makes 64Mpx per texture, or 256Mpx for the 4 textures. At 60fps this would require 15GB/s of bandwidth.The test hardware is a MBP from 2015 with Intel i5 5257U with Iris 6100 Graphics. According to Ark Intel, the max memory bandwidth for this CPU is 25.6GB/s. I assume (but I'm not so sure) that the memory bandwidth for the integrated GPU is also this 25.6GB/s.At this point I have the impression that my fragment shader (requiring 15GB/s) should run at a solid 60fps on the MBP, but with 32ms per frame (even not taking into account the WindowServer) it's obviously not the case. Here is the Metal fragment shader code :half4 blendColors(half4 c1, half4 c2) { // From https://en.wikipedia.org/wiki/Alpha_compositing#Alpha_blending const half4 dst = c1; const half4 src = c2; const half outA = src.a + dst.a * (1 - src.a); const half3 outRGB = outA == 0 ? half3(0) : (src.rgb * src.a + dst.rgb * dst.a * (1 - src.a)) / outA; return half4(outRGB, outA); } fragment float4 fragmentFunc(RasterizerData in [[stage_in]], constant int& inputCount [[buffer(kInputImageCountIndex)]], array<texture2d, 4> inputs [[texture(kInputImageIndex)]]) { constexpr sampler currentSampler(mag_filter::nearest, min_filter::linear, mip_filter::nearest); half4 blendedSample(1.0); for (int i = 0; i < inputCount; ++i) { auto layerSample = inputs[i].sample(currentSampler, in.textureCoordinate); blendedSample = blendColors(blendedSample, layerSample); } return float4(blendedSample); }Here are the pipeline statistics reported by the GPU Frame Debugger:https://artoverflow.io/downloads/pipeline%20statistics.pngAnd the performance metrics :https://artoverflow.io/downloads/performance%20metrics.pngOne very suspicious metric in my opinion is the L3 Cache Miss Rate, which was much lower before I add multiple input textures. This makes sense because a fragment does one sample from a texture, then one sample from another, etc. Rather than many consecutive samples from the same single input texture. And each fragment execution does that. Note that the 4 input textures are mipmapped but here it's a capture when Metal view is displayed on 4K display, and texture is sampled without any zoom so mipmapping should have no effect here.If I were to reduce this cache miss rate, I would do blending from 2 textures only but with 3 passes. Like blend tex A & B, then B & C, then C & D. But this implies reading 3 x 2 x 64Mpx and writing 3 x 64Mpx. That makes 23GB/s read and 11GB/s write. And that's assuming that read-write texture is available (not the case on Intel GPU I tested). So this would be worse…Are there recommendations about how to display blended textures more efficiently?I've been looking into MTLBlendFactor and MTLBlendOperation which I suppose are the same operations as in OpenGL, but as I'm making a drawing app I want to support more blend modes than the ones natively supported. And according to https://gamedev.stackexchange.com/questions/17043/blend-modes-in-cocos2d-with-glblendfunc built-in blend modes are not enough for that.
Posted
by Ceylo.
Last updated
.
Post not yet marked as solved
2 Replies
1.9k Views
Hello,I'm a bit new to signed macOS app distribution and I'm trying to use the Crashes panel of Xcode Organizer.Currently it is empty and saying that "AppName has not been uploaded to App Store Connect to receive crash logs.". I made my app crash on another device where sharing crash data with App Developers is enabled in System Preferences > Privacy.Question is: does this mean that macOS apps distributed outside of the Mac App Store can't receive crash logs in Xcode Organizer?I'm talking of apps distributed with "Developer ID" rather than "Mac App Store" in Organizer's "Distribute App" panel. I've read https://help.apple.com/xcode/mac/current/#/deved2cca77d but it's not clear wether only App Store apps are concerned.
Posted
by Ceylo.
Last updated
.