NetworkExtension crashes during packet read

We have an L3 VPN where we use the NE provided method to read packet objects before tunnelling them. We are currently facing a random crash in NetworkExtension packet object reading and it does not seem to provide much details on whats wrong with packets. Here is the relevant stack trace which isn't pointing anything in my code. Therefore, making it difficult to root cause or fix it.

Code Block
Thread 3 name: Dispatch queue: NEPacketTunnelFlow queue
Thread 3 Crashed:
0 libobjc.A.dylib 0x00000001aca1748c objc_msgSend + 44
1 libobjc.A.dylib 0x00000001aca32d40 objc_getProperty + 140
2 NetworkExtension 0x00000001a9b05874 __61-[NEPacketTunnelFlow readPacketObjectsWithCompletionHandler:]_block_invoke + 120
3 NetworkExtension 0x00000001a9b5bf24 NEVirtualInterfaceReadMultiplePackets + 1024
4 NetworkExtension 0x00000001a9b5f478 __NEVirtualInterfaceCreateReadSource_block_invoke_2 + 76
5 libdispatch.dylib 0x00000001982ce280 _dispatch_client_callout + 16
6 libdispatch.dylib 0x0000000198273390 _dispatch_continuation_pop$VARIANT$mp + 412
7 libdispatch.dylib 0x00000001982840ac _dispatch_source_invoke$VARIANT$mp + 1308
8 libdispatch.dylib 0x0000000198276c94 _dispatch_lane_serial_drain$VARIANT$mp + 300
9 libdispatch.dylib 0x00000001982778a8 _dispatch_lane_invoke$VARIANT$mp + 424
10 libdispatch.dylib 0x0000000198281338 _dispatch_workloop_worker_thread + 712
11 libsystem_pthread.dylib 0x00000001e0e7d5a4 _pthread_wqthread + 272
12 libsystem_pthread.dylib 0x00000001e0e80874 start_wqthread + 8


The complete crash log can be found here:

The crash happens randomly after transferring several MBs for data. I have checked memory pressure which seems to be well within limits when the crash hits. Any help or hint to figure out the cause of this crash would be highly appreciated.

Replies

Having a segmentation fault in a Network Extension API call such as:

2 NetworkExtension 0x00000001a9b05874 __61-[NEPacketTunnelFlow readPacketObjectsWithCompletionHandler:]_block_invoke + 120

Suggest that there is something wrong at a deeper level here. My advice would be to open a bug report here with this crash log. Another interesting data point would be if you can make this happen by just reading packets from the interface only. I realize your tunnel will not work in this case, but if you just plain read from the interface do you also see this happening? If you run this test I would add the result to your bug report.


Matt Eaton
DTS Engineering, CoreOS
meaton3@apple.com

Another interesting data point would be if you can make this happen by just reading packets from the interface only.

I am not sure what do you mean by that? Is there any other class I can read packets from in packet tunnel? Can you elaborate a bit more on how can I do that?

One more thing I would like to add, not sure if related, is that we have an @autoreleasepool wrapping the code that writes packets back to the tunnelProvider.packetFlow. Could that be the cause of it?

Is there any other class I can read packets from in packet tunnel? Can you elaborate a bit more on how can I do that?

NEPacketTunnelFlow is the correct class to use to read packets from the virtual interface.

Code Block Objective-C
[self.tunnel.packetFlow readPacketsWithCompletionHandler:^(NSArray<NSData *> * _Nonnull packets,
NSArray<NSNumber *> * _Nonnull protocols) {
/* Read only here */
}];


My previous thought was that if you can trigger this crash by just reading these packet and not doing anything with them, this is absolutely a bug.

Regarding:

One more thing I would like to add, not sure if related, is that we have an @autoreleasepool wrapping the code that writes packets back to the tunnelProvider.packetFlow. Could that be the cause of it?

Hard to say, but I suspect not as the pointer is not being destroyed so any future processing upstream is not affected. Are you experiencing a memory bottleneck here though that you need to use @autoreleasepool?


Matt Eaton
DTS Engineering, CoreOS
meaton3@apple.com

My previous thought was that if you can trigger this crash by just reading these packet and not doing anything with them, this is absolutely a bug.

Ok, let me try that out.


Are you experiencing a memory bottleneck here though that you need to use @autoreleasepool?

Yes, if we don't wrap writePackets call in @autoreleasepool, then memory usage keeps on increasing and hits the limit in few minutes. We also checked whether we have any leaks in our code causing this but instruments doesn't point any. So far only @autoreleasepool has been able to keep the memory usage in check.
If I may add, I am not in the favour of using @autoreleasepool around writing packets. But without that, the NSData and NSArray objects I create for writing packets and protocols, never get released. They always appear in persistent memory in Instruments and keep on adding to memory usage. Usually I would expect these (local) objects to be autoreleased in ARC.

I noticed another developer reported having similar issue 4 months back. Many other threads I have found on similar issue and all of them suggest to use @autoreleasepool. But shouldn't NetworkExtension release those objects once consumed. Or do we have any better way of handling this? Because I suspect @autoreleasepool is sometimes causing random crashes which are not in my code.
A few things here that you can try; first would be to implement flow control on your reads and writes. This would help control how much memory is being used at any given point because you would have set an upper bound on how much memory you are willing to read before you write and clear out some space. Quinn wrote up an excellent article on this topic, called Network Extension Provider Memory Strategy. Next, instead of using an @autoreleasepool, you could test using a dispatch queue with the autorelease frequency of DISPATCH_AUTORELEASE_FREQUENCY_WORK_ITEM to see if that makes a difference.


Matt Eaton
DTS Engineering, CoreOS
meaton3@apple.com

 first would be to implement flow control on your reads and writes

We already have a circular queue with limited capacity to maintain the outbound packets read from NE. We do not have much while writing inbound packets back to the NE because we are able to write to NE without any problems. We do not buffer or drop packets. Instead we write it to NE as soon as we receive it. I would expect NE to maintain a queue or similar mechanism when receiving inbound packets. Or would you suggest to manage inbound packets differently? Also, the problem here is that NE is retaining the packets passed to it. I do not see any packets persisting in my code. Only the NSData and NSArray I write to NE are persistent.


Next, instead of using an @autoreleasepool, you could test using a dispatch queue with the autorelease frequency of DISPATCH_AUTORELEASE_FREQUENCY_WORK_ITEM to see if that makes a difference. 

Wouldn't that be same as having an @autoreleasepool inside a block scheduled on a different queue? What I have right now is this:

Code Block objc
-(size_t)writePackets:(const void*)buffer type:(uint32_t)type bufSize:(size_t)bufSize
{
    @autoreleasepool {
NSData *pckts = [NSData dataWithBytes:buffer length:bufSize];
    NSArray *protocols = [NSArray arrayWithObjects:[NSNumber numberWithUnsignedInteger:type], nil];
        [self.packetFlow writePackets:@[pckts] withProtocols:protocols];
    };
    return bufSize;
}


What you are suggesting would result into something like this:
Code Block objc
-(size_t)writePackets:(const void*)buffer type:(uint32_t)type bufSize:(size_t)bufSize
{
    dispatch_async(self.writeBackQueue, ^{
        @autoreleasepool {/*This pool is added just for illustration purposes. It will be added implicitly by the queue attrs*/
            NSData *pckts = [NSData dataWithBytes:buffer length:bufSize];
    NSArray *protocols = [NSArray arrayWithObjects:[NSNumber numberWithUnsignedInteger:type], nil];
        [self.packetFlow writePackets:@[pckts] withProtocols:protocols];
        };
    });
    return bufSize;
}

I don't see why the @autoreleasepool in second would be better than first one. Few hints here would be helpful for me to understand.

Or would you suggest to manage inbound packets differently?

I would read and then write, and not read again until your have written.

What you are suggesting would result into something like this:

No, I am suggesting that coupled with the flow control strategy I mentioned above, use your work item queue only to write packets on to see if this clears memory quicker than @autoreleasepool does or if there is any difference at all.


Matt Eaton
DTS Engineering, CoreOS
meaton3@apple.com