Avoiding the race between pause() verdict and resumeFlow()

Hello,
Is there a way to avoid a possible race between pausing and resuming the flow in the following example?

Code Block
class FilterDataProvider: NEFilterDataProvider {
   override func handleNewFlow(_ flow: NEFilterFlow) -> NEFilterNewFlowVerdict {
/* this block is submitted for async processing */ {
let verdict: NEFilterNewFlowVerdict = self.makeDecision(flow) ? .allow() : .drop()
         self.resumeFlow(flow, with: verdict)
}
return .pause()
}
}


Since the flow is paused after the handleNewFlow() function returns, it seems impossible to ensure that resumeFlow() is called after the flow is actually paused.

Is there any suggested way of avoiding this race?

Thanks
It sounds like you are seeing the NEFilterNewFlowVerdict being returned so fast for .allow() : .drop() that it is running into a race condition between '.pause()`, is that correct? If so, can you return return allow/drop if the flow has not been paused yet? If not, can you provide more information on what you are seeing?


Matt Eaton
DTS Engineering, CoreOS
meaton3@apple.com
Hi, thanks for replying.
It seems to be working consistently, regardless of the order in which pause() and resumeFlow() are called.
See below for the code example, usleep forces the pause() to be delayed, and the connection is still established successfully.

So the only concern right now is that this might be an UB that just happens to work this way (comment on resumeFlow says that it is invalid to resume a flow that is not paused), and that there seems to be no way of avoiding it in case resumeFlow is called from a different thread.

Code Block swift
    DispatchQueue.global().async {
      let verdict: NEFilterNewFlowVerdict = .allow()
      os_log("flow resumed local endpoint %{public}@, remote endpoint %{public}@", localEndpoint, remoteEndpoint)
      self.resumeFlow(flow, with:verdict)
    }
    usleep(1000)
    os_log("flow paused local endpoint %{public}@, remote endpoint %{public}@", localEndpoint, remoteEndpoint)
    return .pause()


Thanks.
I do not think you should have to use usleep here to make this work. The idea behind pausing a flow and then resuming it is to allow your application, or a user, to perform logic to approve/deny the flow. Then the NEFilterNewFlowVerdict can respond with drop/allow in resumeFlow call. Knowing this, I think it would be good to take a look at how you are processing decisions on your flows. Are you running this through any filtering logic? What takes place here for you to make a decision on whether to allow/drop the flow?


Matt Eaton
DTS Engineering, CoreOS
meaton3@apple.com
Right, I used usleep() for illustration purposes only, just to artificially force the "reverse" order of the operations.

Normally the processing within the async block takes longer and the order is "natural", i.e. the pause() verdict is returned first, flow is paused and then sometime later resumeFlow() is called.
My point is that there is no way to actually enforce this order.

If the API allowed for something like this, there would be no need to worry about ordering:
Code Block  swift
self.pauseFlow(); /* explicitly set the flow state to paused before the flow is handed off to the async handler */
DispatchQueue.global().async {
/* do the async processing */
      let verdict: NEFilterNewFlowVerdict = ok ? .allow() : .deny() 
      self.resumeFlow(flow, with:verdict)
    }
    return .nop() // the "no-op" verdict that doesn't change the flow state

Or, if the doc explicitly allowed calling resumeFlow at any time, stating that the pause verdict will not affect the flow that is already resumed.

My point is that there is no way to actually enforce this order.

What looks like is happening is that the global queue is running your code block so quickly that is breaking down the pause -> resume workflow. The resume logic is placed in in a completion block with the understanding that filtering rules or another approval logic is going to be needed to make a drop/allow decision on the flow. Are you not in need of doing this and can make an immediate decision about the flow? If so, just do this without performing the pause -> resume workflow.


Matt Eaton
DTS Engineering, CoreOS
meaton3@apple.com

What looks like is happening is that the global queue is running your code block so quickly that is breaking down the pause -> resume workflow. The resume logic is placed in in a completion block with the understanding that filtering rules or another approval logic is going to be needed to make a drop/allow decision on the flow. Are you not in need of doing this and can make an immediate decision about the flow?

That is exactly my problem... A call to rule engine is made from inside the async block, but I can not make any predictions about timing up front. It could return instantly if, for example, no rules are configured, or the result is cached, but also may require a round trip to a different process or prompting the user, which takes considerable time
2 stupid questions:
  • is the handleNewFlowCall made on a concurrent queue?

  • assuming it's not: why not just run the async call on the same queue as the handleNewFlow call? This way, the async block will never run before the return pause() call.


is the handleNewFlowCall made on a concurrent queue?

According to this answer https://developer.apple.com/forums/thread/656031?answerId=624473022#624473022 the handleNewFlow() is called on a queue assigned specifically to the network criteria the network extension is filtering on.
It appears to be serial, so while one flow is being processed, no other flow can be handled. This is why asynchronous processing is required.

assuming it's not: why not just run the async call on the same queue as the handleNewFlow call? This way, the async block will never run before the return pause() call.

Theoretically, adding to the same queue could be a workaround... Submit a block to the same serial queue to be executed after handleNewFlow() returns, and then do a dispatch_async() from there. Doesn't seem like a good solution though because (aside from being kludgy) it relies on undocumented behavior (the queue being serial).
Also, is there a blessed way to submit work the the current queue? dispatch_get_current_queue() has long been deprecated.

The Endpoint Security framework uses a similar callback concept, but this sort of race is not a problem there because the only way to authorize/deny an activity is to call one of es_respond() functions, the callback itself doesn't need to return a decision.
If only there was a public queue property associated to NEFilterFlow object…
Avoiding the race between pause() verdict and resumeFlow()
 
 
Q