I found that the issue here is that iOS switches to another cache policy when there's no response from the DNS proxy
returnCacheDataElseLoad: Use existing cache data, regardless or age or expiration date, loading from originating source only if there is no cached data.
In normal cases it uses:
useProtocolCachePolicy: Use the caching logic defined in the protocol implementation, if any, for a particular URL load request.
For future reference, one of the ways to handle this is verify cache policy for intercepted browser flows within your Content Filter Providers. This way, you could add additional logic for flow filtering or remediation
Post
Replies
Boosts
Views
Activity
After some testing, it appears that the main issue is related to browser caching. When a connection was previously established, the browser uses its private cache to load the resource on subsequent launches, bypassing the NEDNSProxyProvider
Is there a way to handle this behaviour without explicitly blocking socket/browser flows in NEFilterDataProvider?
The app is deployed via MDM, so if a solution involves a configuration payload for supervised devices, that would be a suitable option as well. Thank you!
Thank you! Filed a bug report: FB16148630
It's been a while, but I wanted to give a quick update for anyone who might encounter a similar issue.
In my case, the problem had two parts:
Parsing DNS Packets in a Custom DNS Proxy Provider
If you're using a custom DoH resolver in your DNS Proxy Provider, it's important to parse the entire DNS packet, not just the requested domain. I recommend using the wire format with the application/dns-message MIME type and sending the complete packet to your provider for resolution.
Managing NWConnection Lifecycles
When using NWConnection for handling connections with local or remote resolvers, ensure proper lifecycle management for each instance of NWConnection. Use stateUpdateHandler to monitor the connection's state and release connections appropriately after they are completed.
A common issues is forgetting to release resources for completed connections, leading to memory leaks that can be a bit hard to detect. Some resources suggest setting stateUpdateHandler = nil and then calling .cancel() on the connection. However, simply calling .cancel() is quite enough, as it automatically releases all associated blocks and handlers.
Here's a snippet from the documentation to clarify:
/// Cancel the connection and release all associated handlers.
///
/// Cancel is asynchronous. The last callback will be to the `stateUpdateHandler` with the `.cancelled` state.
/// After that, all handlers are released to break retain cycles.
/// Subsequent calls to `cancel()` are ignored.
final public func cancel()
In my case, these two relatively small issues were causing significant and inconsistent issues within the app.
Thanks to Quinn for all the help!
[quote='810921022, DTS Engineer, /thread/764538?answerId=810921022#810921022']
Does this only happen when you’re debugging with Xcode? Or do you see it during deployment? For example, if you replace an older TestFlight build with a newer one?
[/quote]
It normally doesn't happen during debugging with Xcode. As for deployment, yes, when user first installs the app on their device, it returns this exception, but Wi-Fi toggle fixes it
It's not confirmed yet, but I think avoiding in-app packet serialisation might fix the problem. From what I had before, I noticed that some fields in DNS packet were encoded using UInt8 or Bool but Opcode and RCode should be UInt4 and Z is normally UInt3, etc.
But after sending packet back directly from DoH, I don't see any mismatch in packet structure
I will be able to verify if that resolved the issue soon after testing new build
Thank you for your response!
It only happens during the first build or if a new build is initiated from Xcode on top of existing configuration (app was previously installed)
But after some investigation, I found that packet hex is slightly different from system resolver's response. But even then the issue occurs only during first build and is fixed by reconnecting to the network
And if this occurs, then no URLSession completes successfully, unless network connection is toggled, then it works fine again
I found that packet hex is slightly different from system resolver's response
Regarding this, would it make more sense to receive already serialised packet from DoH by using dns-message content type? Because as of right now all de/se-rialization is handled in the DNS Proxy Provider
Yes, it seems to work fine with NWConnection
The app is designed to have optional resolver (system resolver/custom DoH server). If system resolver is in use, I use NWConnection, for DoH resolver it's HTTPS request with HTTP3 enabled (server only listens to HTTP3).
private func handleNewFlow(_ flow: NEAppProxyUDPFlow) -> Bool {
Task(priority: .high) { [weak self] in
await self?.handleNewFlow(flow)
}
return true
}
private func handleNewFlow(_ flow: NEAppProxyUDPFlow) async {
do {
try await flow.open(withLocalEndpoint: flow.localEndpoint as? NWHostEndpoint)
let datagrams = try await flow.readDatagrams()
let results = await datagrams.parallelMap { [weak self] in
let connection = DatagramConnection($0)
let connectionType = self?.connectionType
let resolverType = self?.resolverType
let serverStatus = self?.serverStatus
return await connection.transferData(
status: serverStatus,
resolverType: resolverType,
connectionType: connectionType
)
}
try await flow.writeDatagrams(results)
flow.closeReadWithError(nil)
flow.closeWriteWithError(nil)
} catch {
flow.closeReadWithError(error)
flow.closeWriteWithError(error)
}
}
In transferData there is a conditional call for
private func resolveDatagramWithSystem(datagram: Datagram) async -> Data?
or
private func resolveDatagramWithDoH(
question: DNSQuestion,
packet: DNSRR,
resolver: ProxyResolverType?,
server: ServerType?
) async -> Data?
Here is how my resolveDatagramWithSystem looks like
private func resolveDatagramWithSystem(datagram: Datagram) async -> Data? {
do {
var connection: NWConnection
switch datagram.endpoint {
case let .host(hostEndpoint):
guard let port = Network.NWEndpoint.Port(hostEndpoint.port) else {
throw NSError.unknown(thrownBy: Self.self)
}
let host = Network.NWEndpoint.Host(hostEndpoint.hostname)
connection = NWConnection(host: host, port: port, using: .udp)
case .bonjour:
throw NSError.unknown(thrownBy: Self.self)
}
try await connection.establish(on: .datagramConnection)
try await connection.send(content: datagram.packet)
let message = try await connection.receiveMessage()
let messageData = message.completeContent
return messageData
} catch {
Logger.statistics.error("[DatagramConnection] - Failed to handle connection: \(error, privacy: .public)")
}
return nil
}
Hi, it's been a while but I just wanted to give a quick update on the app and ask a couple questions.
Ever since I changed the shared container access and data sharing mechanism between the targets, app doesn't seem to crash anymore with EXC_BREAKPOINT (SIGTRAP). However, issue with the app not being able to find a server still persists.
Connection 4: received failure notification
Connection 4: failed to connect 12:8, reason 18 446 744 073 709 551 615
Connection 4: encountered error(12:8)
Task <01313C44-8C0D-4B29-8924-AB530B062FB7>.<3> HTTP load failed, 0/0 bytes (error code: 18 446 744 073 709 550 613 [12:8])
Task <01313C44-8C0D-4B29-8924-AB530B062FB7>.<3> finished with error [18 446 744 073 709 550 613] Error Domain=NSURLErrorDomain Code=-1003 "A server with the specified hostname could not be found." UserInfo={_kCFStreamErrorCodeKey=8, NSUnderlyingError=0x10c64cc50 {Error Domain=kCFErrorDomainCFNetwork Code=-1003 "(null)" UserInfo={_kCFStreamErrorDomainKey=12, _kCFStreamErrorCodeKey=8, _NSURLErrorNWResolutionReportKey=Resolved 0 endpoints in 5ms using unknown from cache, _NSURLErrorNWPathKey=satisfied (Path is satisfied), interface: en0[802.11], ipv4, dns, uses wifi}}, _NSURLErrorFailingURLSessionTaskErrorKey=LocalDataTask <01313C44-8C0D-4B29-8924-AB530B062FB7>.<3>, _NSURLErrorRelatedURLSessionTaskErrorKey=(
"LocalDataTask <01313C44-8C0D-4B29-8924-AB530B062FB7>.<3>"
), NSLocalizedDescription=A server with the specified hostname could not be found., NSErrorFailingURLStringKey=https://api_url, NSErrorFailingURLKey=https://api_url, _kCFStreamErrorDomainKey=12}
While investigating the issue, I found a couple articles for Network Extension guides from Apple. I took some advices from those articles regarding networking within the app with added Network Extensions:
have separate URL session configurations for each target
use timeouts for outgoing requests, etc.
But it didn't really change anything
Interesting thing is that before the issue with failed task occurs it prints out session protocols as ["-"], I guess it means that session failed to establish protocols for outgoing request.
Here are examples of URLSession configurations that I use for DNS Proxy Provider and my Main target
/// DNSProxy network service
public final class DNSProxyNetworkService: NSObject, Requestable, URLSessionTaskDelegate {
static let shared = DNSProxyNetworkService()
lazy var session: URLSession = {
let config = URLSessionConfiguration.ephemeral
return URLSession(
configuration: config,
delegate: self,
delegateQueue: nil
)
}()
}
extension DNSProxyNetworkService {
public func urlSession(_ session: URLSession, task: URLSessionTask, didFinishCollecting metrics: URLSessionTaskMetrics) {
let protocols = metrics.transactionMetrics.map { $0.networkProtocolName ?? "-" }
Logger.statistics.debug("[DNSProxyNetworkService] – session protocols: \(protocols, privacy: .public)")
}
}
/// MainTarget network service
public final class MainTargetNetworkService: NSObject, Requestable, URLSessionTaskDelegate {
static let shared = MainTargetNetworkService()
lazy var session: URLSession = {
let config = URLSessionConfiguration.default
return URLSession(
configuration: config,
delegate: self,
delegateQueue: nil
)
}()
}
extension MainTargetNetworkService {
public func urlSession(_ session: URLSession, task: URLSessionTask, didFinishCollecting metrics: URLSessionTaskMetrics) {
let protocols = metrics.transactionMetrics.map { $0.networkProtocolName ?? "-" }
Logger.statistics.debug("[MainTargetNetworkService] – session protocols: \(protocols, privacy: .public)")
}
}
Note: this issue mostly occurs if the build is initiated from Xcode when the device already has app installed or during initial launch for the first build on the device
Would be grateful to hear any advices or suggestions for further investigation of this issue, thank you!
[quote='806531022, DTS Engineer, /thread/764538?answerId=806531022#806531022']
OK. That should be possible by putting the data into an app group. The control provider will have read/write access to that app group; the data provider will only be able to read it.
And, yes, you will need some sort of concurrency control there (-:
[/quote]
Yep, that's exactly how I did it. My concurrency control for now is a shared container KVO with serial queue for read and async write for observed property within Content Filter scope.
And thank you for your previous response, it seems like the number of crashes decreased a lot for the new build, since I added some concurrency control for DNS Proxy Extension. It still requires some investigation but overall stability looks better
[quote='806159022, DTS Engineer, /thread/764538?answerId=806159022#806159022']
OTOH, if you want to write data in one provider and read it in another, things get more complex.
[/quote]
I think that's a great explanation for the problem I had with Core Data. Because my Content Filter is not limited by just two providers, I think target membership for custom controllers that add more logic to flow filtering could have granted access to these components for main target. Then, it makes sense why I was received errors for sandbox restrictions.
I guess my next steps would be ensuring that concurrent access is handled properly and maybe bringing back Core Data for Filter Data Provider
Thank you!
[quote='806159022, DTS Engineer, /thread/764538?answerId=806159022#806159022']
Which provider is writing this data? And which provider is reading it?
[/quote]
In my case, Filter Control Provider writes data received from MDM configuration profile, then Filter Data Provider reads this data to use it for flow filtering. But my Filter Data Provider also writes some data about intercepted flow, that is later used for resolving them.
Not sure if this is right, please correct me if I am wrong here. One of the possible causes for my issue could also be concurrent access to one memory address?
Ideally would probably be to rewrite some code with FileManager under AppGroup for large data?
If you have other diagnostics tools enabled from Xcode, try to disable them (especially malloc) as it places separate memory allocations on different virtual memory pages. Read more here
After disabling other tools, clean your build folder and try to run your app again (use physical device)
Tried to add Address Sanitizer but received the same runtime issue as here
upd: fixed by disabling other diagnostics tools 🥲
I see now. Quick question, could it be related to overuse of UserDefaults?
The reason I am asking is because some of the temporary data is stored in UserDefaults for my app.
I will give you an example. Because my app uses Content Filter (Filter Data Provider has sandbox restrictions), I wasn't able to use FileManager or CoreData for storing some information from Filter Data Provider because access was denied. So I had to use UserDefaults.
can't share the whole idea, but in a nutshell I needed to store resolved ips from flows
Additionally, I use UserDefaults for some data that is accessed from MDM config profile and shared to UI components via KVO