Intermittently browser is getting disconnected with NetworkExtension

Hi,

Greetings for the day!

We would like to update you that we have created Content Filter NetworkExtension and this extension is working fine till Big Sur M1 however we are facing some strange problem in M1 Monterey. Intermittently, When we try to browse websites, it does not respond and after 3-5 minutes its opened the websites correctly.

We would like to update you that our subclass overrides handleNewFlow, handleInboundDataFromFlow, handleOutboundDataFromFlow, handleInboundDataCompleteForFlow and handleOutboundDataCompleteForFlow. In all these methods we first check whether NEFilterFlow is nil or not and then pauseVerdict and once asynchronous methods completes execution then we call resumeFlow with verdict (allowVerdict/dropVerdict).

When above mentioned issue generated we collected console streaming log and found these lines in the logs (Not from our application):

Ignoring resume command for flow 3c8faf3c4a9f7 which does not exist

Ignoring resume command for flow 3c90795d4d6f9 which does not exist

Ignoring resume command for flow 3c9086d1ede69 which does not exist

Ignoring resume command for flow 3c909b251d53b which does not exist

We are not sure how above line get printed because we don’t have this logs in our source code so we would need your help to understand this problem and resolution so that we can solve this issue.

We have couple of extra queries:

  1. What is flow mentioned in above logs in bold text? Is it NEFilterFlow's identifier or something else?
  2. How we can validate whether NEFilterFlow is valid or not before calling resumeFlow
  3. Why above line is getting printed in log which says flow does not exist.
  4. Is there any timeout maintained by NetworkExtension?
  5. We are using XPC for interprocess communication so our question is that, Is NetworkExtension/XPC maintain the queue size and if it overflow the size then above line is getting printed. If this is the case then how we can handle that?
  6. Is it known issue in NetworkExtension framework itself on M1 Monterey?

Thanks & Regards,

Mohmad Vasim

We are not sure how above line get printed because we don’t have this logs in our source code Why above line is getting printed in log which says flow does not exist.

This log looks like it's being emitted from our system.

How we can validate whether NEFilterFlow is valid or not before calling resumeFlow

There is no API to tell if the flow is valid or not. Certainly if the flow has been dropped then it will be invalid but your app's code should be able to detect this. Also, if your app pauses a UDP based flow for more than 10 seconds then the flow will be dropped automatically, so be mindful of this behavior.

Is there any timeout maintained by NetworkExtension?

TCP based flows can be paused forever, UDP based flows can only be paused for 10 seconds before they are dropped by the system.

We are using XPC for interprocess communication so our question is that, Is NetworkExtension/XPC maintain the queue size and if it overflow the size then above line is getting printed. If this is the case then how we can handle that?

The logs that added in your post do not look like they are the result of XPC issues in your network extension.

Is it known issue in NetworkExtension framework itself on M1 Monterey?

Like every framework Network Extension has bugs. My advice here would be if you can reproduce this issue with the SimpleFirewall example on an M1 Mac then it would be grounds for a bug report.

Matt Eaton
DTS Engineering, CoreOS
meaton3@apple.com

Thanks Matt for your valuable response!

There is no API to tell if the flow is valid or not. Certainly if the flow has been dropped then it will be invalid but your app's code should be able to detect this. Also, if your app pauses a UDP based flow for more than 10 seconds then the flow will be dropped automatically, so be mindful of this behavior.

if app pause UDP flow more than 10 seconds and we see "Ignoring resume command for flow 3c8faf3c4a9f7 which does not exist" does any property of NEFilterFlow say remoteEndpoint or some other property is set to nil which we can check before invoking resumeFlow call.

This log looks like it's being emitted from our system.

As its very intermittent to reproduce the problem It would be appreciated if you elaborate more on this and provide us some hints why and when it will be generated by seeing the system code so that we will focus on that and will try to generate this again.

Thanks & Regards Mohmad Vasim

does any property of NEFilterFlow say remoteEndpoint or some other property is set to nil which we can check before invoking resumeFlow call.

Not that I'm aware of. You could check the flow description in your logs, but I would only use this for debugging purposes.

Regarding:

Ignoring resume command for flow 3c9086d1ede69 which does not exist It would be appreciated if you elaborate more on this and provide us some hints why and when it will be generated by seeing the system code so that we will focus on that and will try to generate this again.

I looked at this under the hood and this log is indeed generated by our Network Extension framework code when resumeFlow is called on a NEFilterFlow that is now consider nil by the provider. It looks like the main reasons this can happen is that flow identifier is lost, or becomes nil because it cannot be picked up as a NEFilterSocketFlow or NEFilterBrowserFlow (NEFilterBrowserFlow being for iOS only).

Matt Eaton
DTS Engineering, CoreOS
meaton3@apple.com

Thanks for your response Meaton!

if app pause UDP flow more than 10 seconds and we see "Ignoring resume command for flow 3c8faf3c4a9f7 which does not exist" does any property of NEFilterFlow say remoteEndpoint or some other property is set to nil which we can check before invoking resumeFlow call.

We have removed monitoring of UDP flow but still we are seeing the issue intermittently. In addition, we have nil check for the flow.

Below is the way that we are using to check the flow whether its nil or not before calling resumeFlow

if (flow.identifier != nil )
 {
       [self resumeFlow:flow withVerdict:[NEFilterNewFlowVerdict dropVerdict]];
 }

But still we can see OS exception in console streaming log and because of this we are thinking that there is delay in opening websites. Here is the OS exception:

error	10:46:45.714736+0530	trustd	__nwlog_err_simulate_crash simulate crash already simulated "nw_parameters_copy_context called with null parameters"
error	10:46:45.715296+0530	trustd	nw_parameters_copy_context called with null parameters, dumping backtrace:
        [x86_64] libnetcore-2750.120.19.0.1
    0   libnetwork.dylib                    0x00007ff8101bc7f8 __nw_create_backtrace_string + 120
    1   libnetwork.dylib                    0x00007ff80f9c79aa nw_parameters_copy_context + 250
    2   libnetwork.dylib                    0x00007ff8100e6658 nw_protocol_ipv4_connected + 88
    3   libnetwork.dylib                    0x00007ff80fe97ea9 nw_channel_connect + 105
    4   libusrtcp.dylib                     0x00007ff818d44477 nw_protocol_tcp_connect + 935
    5   libnetworkextension.dylib           0x00007ff81c9e1793 ne_filter_protocol_connect + 388
    6   libnetworkextension.dylib           0x00007ff81c9e381f ne_filter_process_verdict + 2832
    7   libnetworkextension.dylib           0x00007ff81c9eabc6 __ne_filter_data_protocol_send_new_flow_block_invoke.65 + 148
    8   libdispatch.dylib                   0x00007ff80b73c0cc _dispatch_call_block_and_release + 12
    9   libdispatch.dylib                   0x000<…>

Please check and help us if there is any other way to check whether flow is nil or not before calling resumeFlow API or any other way to avoid such error/exception.

error 10:46:45.715296+0530 trustd nw_parameters_copy_context called with null parameters, dumping backtrace:

This error looks to me like the flow's nw_paramater_t value is not being set correct when trustd is trying to access it. Or it is nil when it's accessed. Is this flow being altered in any way for a particular path?

Hi Meaton,

This error looks to me like the flow's nw_paramater_t value is not being set correct when trustd is trying to access it.

I would like to update you that its not only happening with trustd, its also happening with other process too. Here are couple of process log that we collected from console streaming:

error	10:34:17.607954+0530	com.apple.WebKit.Networking	__nwlog_err_simulate_crash simulate crash already simulated "nw_path_copy_flow_registration called with null context"
error	10:34:17.608175+0530	com.apple.WebKit.Networking	nw_path_copy_flow_registration called with null context, dumping backtrace:
        [arm64] libnetcore-2750.120.19.0.1
    0   libnetwork.dylib                    0x000000019b292f94 __nw_create_backtrace_string + 192
    1   libnetwork.dylib                    0x000000019aad9c84 nw_path_copy_flow_registration + 484
    2   libnetwork.dylib                    0x000000019b1b4644 nw_protocol_ipv4_connected + 108
    3   libnetwork.dylib                    0x000000019af1dde8 nw_channel_connect + 128
    4   libusrtcp.dylib                     0x00000001a3935a18 nw_protocol_tcp_connect + 208
    5   libnetworkextension.dylib           0x00000001a7751bbc ne_filter_protocol_connect + 156
    6   libnetworkextension.dylib           0x00000001a77537d0 ne_filter_process_verdict + 1632
    7   libnetworkextension.dylib           0x00000001a775a1bc __ne_filter_data_protocol_send_new_flow_block_invoke.72 + 160
    8   libdispatch.dylib                   0x000000019664a5f0 _dispatch_call_block_and_release + 32
    9   libdispatch.dylib                <…>
error	10:34:17.615360+0530	nsurlsessiond	nw_parameters_copy_context called with null parameters, dumping backtrace:
        [arm64] libnetcore-2750.120.19.0.1
    0   libnetwork.dylib                    0x000000019b292f94 __nw_create_backtrace_string + 192
    1   libnetwork.dylib                    0x000000019a9ef808 nw_parameters_copy_context + 340
    2   libnetwork.dylib                    0x000000019b1b4638 nw_protocol_ipv4_connected + 96
    3   libnetwork.dylib                    0x000000019af1dde8 nw_channel_connect + 128
    4   libusrtcp.dylib                     0x00000001a3935a18 nw_protocol_tcp_connect + 208
    5   libnetworkextension.dylib           0x00000001a7751bbc ne_filter_protocol_connect + 156
    6   libnetworkextension.dylib           0x00000001a77537d0 ne_filter_process_verdict + 1632
    7   libnetworkextension.dylib           0x00000001a775a1bc __ne_filter_data_protocol_send_new_flow_block_invoke.72 + 160
    8   libdispatch.dylib                   0x000000019664a5f0 _dispatch_call_block_and_release + 32
    9   libdispatch.dylib                   0x0000<…>
fault	10:34:17.615843+0530	nsurlsessiond	nw_path_copy_flow_registration called with null context
error	10:47:42.210335+0530	Sourcetree	__nwlog_err_simulate_crash simulate crash already simulated "nw_parameters_copy_context called with null parameters"
error	10:47:42.210628+0530	Sourcetree	nw_parameters_copy_context called with null parameters, dumping backtrace:
        [arm64] libnetcore-2750.120.19.0.1
    0   libnetwork.dylib                    0x000000019b292f94 __nw_create_backtrace_string + 192
    1   libnetwork.dylib                    0x000000019a9ef808 nw_parameters_copy_context + 340
    2   libnetwork.dylib                    0x000000019b1b4638 nw_protocol_ipv4_connected + 96
    3   libnetwork.dylib                    0x000000019af1dde8 nw_channel_connect + 128
    4   libusrtcp.dylib                     0x00000001a3935a18 nw_protocol_tcp_connect + 208
    5   libnetworkextension.dylib           0x00000001a7751bbc ne_filter_protocol_connect + 156
    6   libnetworkextension.dylib           0x00000001a77537d0 ne_filter_process_verdict + 1632
    7   libnetworkextension.dylib           0x00000001a775a1bc __ne_filter_data_protocol_send_new_flow_block_invoke.72 + 160
    8   libdispatch.dylib                   0x000000019664a5f0 _dispatch_call_block_and_release + 32
    9   libdispatch.dylib                   0x0000<…>
error	10:47:42.210757+0530	Sourcetree	__nwlog_err_simulate_crash simulate crash already simulated "nw_path_copy_flow_registration called with null context"
error	10:47:42.210993+0530	Sourcetree	nw_path_copy_flow_registration called with null context, dumping backtrace:
        [arm64] libnetcore-2750.120.19.0.1
    0   libnetwork.dylib                    0x000000019b292f94 __nw_create_backtrace_string + 192
    1   libnetwork.dylib                    0x000000019aad9c84 nw_path_copy_flow_registration + 484
    2   libnetwork.dylib                    0x000000019b1b4644 nw_protocol_ipv4_connected + 108
    3   libnetwork.dylib                    0x000000019af1dde8 nw_channel_connect + 128
    4   libusrtcp.dylib                     0x00000001a3935a18 nw_protocol_tcp_connect + 208
    5   libnetworkextension.dylib           0x00000001a7751bbc ne_filter_protocol_connect + 156
    6   libnetworkextension.dylib           0x00000001a77537d0 ne_filter_process_verdict + 1632
    7   libnetworkextension.dylib           0x00000001a775a1bc __ne_filter_data_protocol_send_new_flow_block_invoke.72 + 160
    8   libdispatch.dylib                   0x000000019664a5f0 _dispatch_call_block_and_release + 32
    9   libdispatch.dylib                <…>

 Is this flow being altered in any way for a particular path?

No, we are not altering flow, we just check whether flow needs to be allowed or block and based on that we call resumeFlow with verdicts. Could you please help us to know if there is any way or API to check whether flow is valid before calling resumeFlow . Another question we have is that, is there any way we can check the state of flow, whether it's in pause state or not.

Thanks & Regards,

Mohmad Vasim

Could you please help us to know if there is any way or API to check whether flow is valid before calling resumeFlow

I am not aware of a scenario in-which you could make a check in your code to prevent this.

Regarding:

Another question we have is that, is there any way we can check the state of flow, whether it's in pause state or not.

This state is handled internal by the provider so there isn't a good way to determine the flow state.

Regarding:

Could you please help us to know if there is any way or API to check whether flow is valid before calling resumeFlow 

Not that I'm aware of. Question on this; with these client connections that are causing __nwlog_err_simulate_crash, are they going away on the client before the provider has a chance to process the NEFilterDataVerdict? In that case I could see a scenario where this may happen.

Thanks for quick response Meaton!

with these client connections that are causing __nwlog_err_simulate_crash, are they going away on the client before the provider has a chance to process the NEFilterDataVerdict?

it seems Yes. Please see the console streaming log, its getting printed once we call resumeFlow .

default	10:34:17.607249+0530	com.xyz.networkapp.systemextension	***: handleNewFlow: id:(81B3B366-F3AF-4BB2-A9FB-E0D8C7BF0F78) After ScanURLScheme host = (api.bitbucket.org) whetherToDrop = (0). and flow is not nil and allowVerdict and invoking resumeFlow call.

error	10:34:17.607645+0530	com.apple.WebKit.Networking	__nwlog_err_simulate_crash simulate crash already simulated "nw_parameters_copy_context called with null parameters"

error	10:34:17.607916+0530	com.apple.WebKit.Networking	nw_parameters_copy_context called with null parameters, dumping backtrace:

        [arm64] libnetcore-2750.120.19.0.1

    0   libnetwork.dylib                    0x000000019b292f94 __nw_create_backtrace_string + 192

    1   libnetwork.dylib                    0x000000019a9ef808 nw_parameters_copy_context + 340

    2   libnetwork.dylib                    0x000000019b1b4638 nw_protocol_ipv4_connected + 96

    3   libnetwork.dylib                    0x000000019af1dde8 nw_channel_connect + 128

    4   libusrtcp.dylib                     0x00000001a3935a18 nw_protocol_tcp_connect + 208

    5   libnetworkextension.dylib           0x00000001a7751bbc ne_filter_protocol_connect + 156

    6   libnetworkextension.dylib           0x00000001a77537d0 ne_filter_process_verdict + 1632

    7   libnetworkextension.dylib           0x00000001a775a1bc __ne_filter_data_protocol_send_new_flow_block_invoke.72 + 160

    8   libdispatch.dylib                   0x000000019664a5f0 _dispatch_call_block_and_release + 32

    9   libdispatch.dylib                   0x0000<…>

Thanks & Regards,

Mohmad Vasim

Hi Meaton,

Are you seeing this behavior for all client request or only certain ones?

Once issue generated, we are seeing this behaviour for all client request post that and Internet stops working but after sometime(10 -15 minutes) it gets resumed, browsing start working.

No, client is not cancelling the request before provider resume the flow.

Once issue generated, we are seeing this behaviour for all client request post that and Internet stops working but after sometime(10 -15 minutes) it gets resumed, browsing start working.

Okay, thank you for confirming. What I would do from here is reproduce this issue with the SimpleFirewall example and then take a sysdiagnose after you have reproduced the issue. From there open a Feedback with the exact time and date the issue took place as well as some logs from the sysdiagnose illustrating the issue. Please post the Feedback id here.

Thanks for your response, Meaton!

We have opened a Feedback and here is the feedback link - https://feedbackassistant.apple.com/feedback/11523930 and all required details are mentioned in feedback.

Thanks & Regards,

Mohmad Vasim

Thank you for opening the Feedback. I see in your steps to reproduce that you are opening your app and starting it from the terminal. Does this still reproduce if you open the app and start the Network System Extension from finder? The reason I ask is because it is typical for an application to behave different when opened with Finder versus the terminal and I am wonder if it has any affect here? Can you update your Feedback with this information once you have tried it?

Hi Meaton,

We have updated the feedback with requested information.

Intermittently browser is getting disconnected with NetworkExtension
 
 
Q