I have a model that uses ‘flatten’, and when I converted it to a Core ML model and ran it on Xcode with an iPhone XR, I noticed that ‘flatten’ was automatically converted to ‘reshape’. However, the NPU does not support ‘reshape’.
howerver, I got the Resnet50 model on apple models and performance it on XCode with the same iphone XR, I can see the 'flatten' operator which run on NPU.
On the other hand, when I used the following code to convert ResNet50 in PyTorch and ran it on Xcode Performance, the ‘flatten’ operation was converted to ‘reshape’, which then ran on the CPU.
? So I dont know how to keep 'flatten' operator when convert to ml model ?
coreml tool 7.1
iphone XR
ios 17.5.1
from torchvision import models
import coremltools as ct
import torch
import torch.nn as nn
network_name = "my_resnet50"
torch_model = models.resnet50(pretrained=True)
torch_model.eval()
width = 224
height = 224
example_input = torch.rand(1, 3, height, width)
traced_model = torch.jit.trace(torch_model, (example_input))
model = ct.convert(
traced_model,
convert_to = "neuralnetwork",
inputs=[
ct.TensorType(
name = "data",
shape = example_input.shape,
dtype = np.float32
)
],
outputs = [
ct.TensorType(
name = "output",
dtype = np.float32
)
],
compute_units = ct.ComputeUnit.CPU_AND_NE,
minimum_deployment_target = ct.target.iOS14,
)
model.save("my_resnet.mlmodel")
ResNet50 on Resnet50.mlmodel
My Convertion of ResNet50
Post
Replies
Boosts
Views
Activity
in Swift languange,
CVMetalTextureCacheCreateTextureFromImage return CVMetalTexture, and CVMetalTexture is Swift class, so. it doesn't need to call CVBufferRelease manually.
My question is : should I use a variable to keep strong reference until GPU finished (until addCompleteHandler callback ) ?
cvmetaltexturecachecreatetexture
Platfrom: iphone XR
System: ios 17.3.1
using iphone front camera(normal camera), configure data output format to 'kCVPixelFormatType_420YpCbCr8BiPlanarVideoRange' ('420v' (video range))
I found that Cb, Cr is inside [16, 240], but Y is outside range [16, 235], e.g 240, 255
It will lead that after convert to rbg, rgb may be negative number , and then clamp the r,g,b value between 0 and 255, finally convert clamped rgb back to yuv, yuv is different from origin yuv.
The maxium difference of y channel will be 20.
Both procssing by pure cpu and using metal shader will get this result
CVPixelBuffer.h
kCVPixelFormatType_420YpCbCr8BiPlanarVideoRange = '420v', /* Bi-Planar Component Y'CbCr 8-bit 4:2:0, video-range (luma=[16,235] chroma=[16,240]). baseAddr points to a big-endian CVPlanarPixelBufferInfo_YCbCrBiPlanar struct */
// ... some code ...
// config camra data output format
NSDictionary* options = @{
(__bridge NSString*)kCVPixelBufferPixelFormatTypeKey : @(kCVPixelFormatType_420YpCbCr8BiPlanarVideoRange),
//(__bridge NSString*)kCVPixelBufferMetalCompatibilityKey : @(YES),
};
[_videoDataOutput setVideoSettings:options];
// ... some code ...
- (void)captureOutput:(AVCaptureOutput *)output didOutputSampleBuffer:(CMSampleBufferRef)sampleBuffer fromConnection:(AVCaptureConnection *)connection;
{
CVImageBufferRef imageBuffer = CMSampleBufferGetImageBuffer(sampleBuffer);
CVPixelBufferRef pixelBuffer = imageBuffer;
CVPixelBufferLockBaseAddress(pixelBuffer, kCVPixelBufferLock_ReadOnly);
uint8_t* yBase = (uint8_t*)CVPixelBufferGetBaseAddressOfPlane(pixelBuffer, 0);
uint8_t* uvBase = (uint8_t*)CVPixelBufferGetBaseAddressOfPlane(pixelBuffer, 1);
int imageWidth = (int)CVPixelBufferGetWidth(pixelBuffer); // 720
int imageHeight = (int)CVPixelBufferGetHeight(pixelBuffer);// 1280
int y_width = (int)CVPixelBufferGetWidthOfPlane (pixelBuffer, 0); // 720
int y_height = (int)CVPixelBufferGetHeightOfPlane(pixelBuffer, 0); // 1280
int uv_width = (int)CVPixelBufferGetWidthOfPlane (pixelBuffer, 1); // 360
int uv_height = (int)CVPixelBufferGetHeightOfPlane(pixelBuffer, 1); // 640
int y_stride = (int)CVPixelBufferGetBytesPerRowOfPlane(pixelBuffer, 0); int uv_stride = (int)CVPixelBufferGetBytesPerRowOfPlane(pixelBuffer, 1); // 768
// check Y-plane
if (TRUE) {
for(int i = 0 ; i < imageHeight ; i++) {
for(int j = 0; j < imageWidth ; j++) {
uint8_t nv12pixel = *(yBase + y_stride * i + j );
if (nv12pixel < 16 || nv12pixel > 235) { // [16, 235]
NSLog(@"%s: y panel out of range, coord (x:%d, y:%d), h-coord (x:%d, y:%d) ; nv12 %u "
,__FUNCTION__
,j ,i
,j/2, i/2
,nv12pixel );
}
}
}
}
CVPixelBufferLockBaseAddress(pixelBuffer, kCVPixelBufferLock_ReadOnly);
}
// ... some code ...
How to deal with this case ?
Hope to get reply, Thanks
after install XCode,
llvm-dwarfdump llvm-objdump .. these can be found under
"/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/"
but llvm-symbolizer is not FOUND.
Should I install llvm by myself if I need llvm-symbolizer ?
brew install llvm
Error: llvm: the bottle needs the Apple Command Line Tools to be installed.
You can install them, if desired, with:
xcode-select --install
If you're feeling brave, you can try to install from source with:
brew install --build-from-source llvm
Device: Iphone XR(ios 14.2)
XCode: 13.4.1
Tools: Instrument -- Allocations
When I use 'Allocations' to check memory leak,
I found that VM:Stack retains many pthread_create or pthread_join that have not been released. (pthread_create is call by POSIX API directly or std::thread)
I make sure that each thread calls pthread_create and pthread_join in pair(no thread exited without thread_join).
But ‘Allocation’ VM:Stack Category shows that something created in pthread_create and something created in pthread_join not released.
So, that things created by pthread_create or pthread_join will recycle by system later ?
XCode 13.4.1 Instrucment: System Trace.
In Narrative,
it show data with wall-clock time ,like
"00:05.082.832 Called "psynch_cvwait" for 16.32 ms.".
In "Summary : System Calls "
it just summary cpu time of "psynch_cvwait" but no wall-clock time,
but sometimes, I want to known the wall-clock time, and now I have to filter "psynch_cvwait" in Narrative , and add them manually.
Is it possible to add sum of wall-clock time in "Summary : System Calls " ?
Thanks
Following the document and demo
mixing_metal_and_opengl_rendering_in_a_view
section "Select a Compatible Pixel Format" only show MTLPixelFormatBGRA8Unorm as followed.
if I want to use MTLPixelFormatRGBA8Unorm, how can I find the cvpixelformat and gl format which match MTLPixelFormatRGBA8Unorm??
Thanks in advance.
// Table of equivalent formats across CoreVideo, Metal, and OpenGL
static const AAPLTextureFormatInfo AAPLInteropFormatTable[] =
{
// Core Video Pixel Format, Metal Pixel Format, GL internalformat, GL format, GL type
{ kCVPixelFormatType_32BGRA, MTLPixelFormatBGRA8Unorm, GL_RGBA, GL_BGRA_EXT, GL_UNSIGNED_INT_8_8_8_8_REV },
#if TARGET_IOS
{ kCVPixelFormatType_32BGRA, MTLPixelFormatBGRA8Unorm_sRGB, GL_RGBA, GL_BGRA_EXT, GL_UNSIGNED_INT_8_8_8_8_REV },
#else
{ kCVPixelFormatType_ARGB2101010LEPacked, MTLPixelFormatBGR10A2Unorm, GL_RGB10_A2, GL_BGRA, GL_UNSIGNED_INT_2_10_10_10_REV },
{ kCVPixelFormatType_32BGRA, MTLPixelFormatBGRA8Unorm_sRGB, GL_SRGB8_ALPHA8, GL_BGRA, GL_UNSIGNED_INT_8_8_8_8_REV },
{ kCVPixelFormatType_64RGBAHalf, MTLPixelFormatRGBA16Float, GL_RGBA, GL_RGBA, GL_HALF_FLOAT },
#endif
};
Target Platform: iphone xr
XCode: 12.4
After setting
"Enable Malloc Scribble"
"Malloc Guard Edges"
"Goard Malloc"
in Diagnostics and
"MallocCheckHeapEach=1"
"MallocCheckHeapSleep=100"
"MallocCheckHeapStart=100000"
in Environment Variables
Start up the app on iphone and I get the following information:
xxxx(1394,0x16f933000) malloc: *** MallocCheckHeap: FAILED check at operation #7444968
Stack for last operation where the malloc check succeeded: 0x1aefaed70 0x1aefa2f94 0x112f20540 0x1e904e76c 0x1e905a5e8 0x1e9054bf4 0x1e9035fc0 0x1b5ec57c4 0x112f216c0 0x112f25000 0x112f24e7c 0x1b5ec5268 0x1b5ed1348 0x1b5ed0e40 0x1a0b103f8 0x1a0b0e9a4 0x1a07a751c 0x1a0aef310 0x1a07afb74 0x1a07b6d38 0x1a0b1511c 0x1a0b12b28 0x1a02c2cc8 0x1a02bbac4 0x1a02bc7b0 0x1a0336028 0x1a02bb3c0 0x1a0336b60 0x1a0335344 0x1a03354c0 0x112f1fbcc 0x112f216c0 0x112f29354 0x112f2a0f4 0x112f2b5e4 0x112f36644 0x1e901c804 0x1e902375c
(Use 'atos' for a symbolic stack)
xxxx(1394,0x16f933000) malloc: *** Will sleep for 100 seconds to leave time to attach
xxxx(1394,0x16f933000) malloc: *** check: incorrect tiny region 44, counter=28255155
*** invariant broken for tiny block 0x13628fea0 this msize=0 - size is too small
xxxx(1394,0x16f933000) malloc: *** set a breakpoint in malloc_error_break to debug
xxxx(1394,0x16f933000) malloc: *** sleeping to help debug
Q.1 "Stack for last operation where the malloc check succeeded" means what ?
Q.2 the address is 'stack address' ? e.g 0x1aefaed70.
Following the hints "(Use 'atos' for a symbolic stack) ", I get nothing for 0x1aefaed70
$atos -o ./DerivedData/Build/Products/Debug-iphoneos/xxxx.app.dSYM/Contents/Resources/DWARF/xxxx -arch arm64 -l 0x10225c000 0x10225c000
0x0000000100000000 (in xxxx)
$atos -o ./DerivedData/Build/Products/Debug-iphoneos/xxxx.app.dSYM/Contents/Resources/DWARF/xxxx -arch arm64 -l 0x10225c000 0x1aefaed70
0x1aefaed70
(nothing)
0x10225c000 is load adress getting from AppDelegate after app start up.
uint32_t numImages = _dyld_image_count();
for (uint32_t i = 0; i < numImages; i++) {
const struct mach_header *header = _dyld_get_image_header(i);
const char *name = _dyld_get_image_name(i);
const char *p = strrchr(name, '/');
if (p && (strcmp(p + 1, "xxxx") == 0 || strcmp(p + 1, "libXxx.dylib") == 0)) {
NSLog(@"module=%s, address=%p", p + 1, header);
}
}
```
Following the page
MallocDebug
After Enable Malloc Scribble,
free buffer will be set to 0x55,
malloc buffer will be set to 0xAA.
would I change 0x55 and 0xAA to another value ? e.g 0xFF ?
I have a simple "vertex shader" in metal file.
and then use
metal.exe -std=ios-metal1.1 mios-version-min=8.0 -c test.metal -o test.air
metalilb.exe test.air -o test.metallib
metal.exe/metalib.exe is under folder "Metal Developer Toos"/ios/bin/ ("Metal Develop tools for windows")
I found that the .metallib file(3434 bytes) is bigger than the origin metal file(995 bytes)
is that right ?? how explain it ?
ios: 14.2 iphone xr
I download the demo from "https://developer.apple.com/documentation/metal/mixing_metal_and_opengl_rendering_in_a_view?language=objc"
it play normally.
and then i change the code as followed:
the purpose is switching "mix render" and "just opengl rendering" and re-create AAPLMetalRenderer each time when re-entry "mix render"
there come a 'bug' phenomenon: the first frame each time from "just opengl" to "mix render" will display the old picture ( it was the last picture when "mix render" to "just opengl " ) .
the first frame it will re-create the AAPLMetalRenderer and call drawToInteropTexture, but it seems that the "InteropTexture" do not 'update' yet ( or opengl's draw do not wait for metal finish rendering to 'InteropTexture' ? )
so I have a question about how metal and opengl sync??
int counter = 0 ;
bool currentMetal = false ;
- (void)draw:(id)sender
{
[EAGLContext setCurrentContext:_context];
counter++;
counter = counter % 180;
if (counter < 90)
{
bool waitForFinish = false ;
if (!currentMetal) // re-entry "mix render"
{
// create MetalRender
_metalRenderer = nil;
_metalRenderer = [[AAPLMetalRenderer alloc] initWithDevice:_metalDevice colorPixelFormat:AAPLOpenGLViewInteropPixelFormat];
[_metalRenderer useTextureFromFileAsBaseMap];
[_metalRenderer resize:AAPLInteropTextureSize];
}
currentMetal = true ;
[_metalRenderer drawToInteropTexture:_interopTexture.metalTexture waitForFinish:waitForFinish];
[_openGLRenderer draw];
}
else
{
[_metalRenderer justUpdate]; // not metal render
[_openGLRenderer justClear]; // just clean opengl's fbo
currentMetal = false ;
}
glBindRenderbuffer(GL_RENDERBUFFER, _colorRenderbuffer);
[_context presentRenderbuffer:GL_RENDERBUFFER];
}
_openGLRenderer justClear is below:
- (void) justClear
{
glBindFramebuffer(GL_FRAMEBUFFER, _defaultFBOName);
glClearColor(0.5, 0.5, 0.5, 1);
glClear(GL_COLOR_BUFFER_BIT);
}
_metalRenderer justUpdate is below:
- (void)justUpdate
{
[self updateState];
}