NSNetServiceBrowserDelegate's didRemoveService not being called

Some terminology first:

Publisher device - device publishing the NSNetService in the local. domain

Browser device - device browsing for service with NSNetServiceBrowser in the local. domain


The issue:

NSNetServiceBrowserDelegate's didRemoveService is not being called on the browser device when both the publisher and the browser device are connected to the same router and thepublisher device goes off the grid by enabling Airplane mode (I think turning WiFi off will also work). When the publisher and the browser are not connected to the router but are communicating directly (peer-to-peer), the issue doesn't reproduce.


The problem is easily reproducible with the Apple's WiTap samplecode application (that is no longer exists in Apple's sample code 😟, but can be found on the net).


Steps to reproduce:

1. Connect the application to the same router

2. Run the WiTap application on both devices

3. Turn airplane mode on publisher device


Result:

The browser device still displays the publisher service


Publisher code

NSNetService *server = [[NSNetService alloc] initWithDomain:@"local." type:@"_myservice._tcp." name:@"my-id" port:0];
server.includesPeerToPeer = YES;
server.delegate = self;
[server publishWithOptions:NSNetServiceNoAutoRename|NSNetServiceListenForConnections];
self.server = server;


Browser code

NSNetServiceBrowser *browser = [[NSNetServiceBrowser alloc] init];
browser.includesPeerToPeer = YES;
browser.delegate = self;
[browser searchForServicesOfType:@"_myservice._tcp." inDomain:@"local."];
self.browser = browser;


Any ideas & suggestions will be highly appreciated!


TL;DR

I suspect that the problem is not directly related to NSNetService but rather related to Bonjour implementation. The reason I think so is that after force closing the browser app and re-running it (while the publisher device is still in Airplane mode), the browser devicestill shows the service as a result of the didFindService delegate method being called for a service that the publisher doesn't publish anymore.


P.S. The issue reproduces on iOS 12 and below (tested down to iOS 7).


Thanks.

Accepted Reply

why the issue doesn't reproduce when the devices (publisher and browser) are not connected to a router?

I’d have to dig into the details to be sure, but I suspect it’s because in the peer-to-peer case the interface goes down, and that takes all mDNS records discovered over that interface with it.

Regarding the DNSServiceReconfirmRecord call - do you have a reference to a sample code that uses this function?

I have some very old code lying around for this. It’s not in a state I can post, but here’s a snippet:

err = DNSServiceReconfirmRecord(
    0,
    interfaceIndex,
    serviceName,
    kDNSServiceType_PTR,
    kDNSServiceClass_IN,
    (uint16_t) [recordData length],
    [recordData bytes]
);

Let’s look at the notable parameters:

  • As with all <dns_sd.h> calls, the interface index is the value coming back from if_nametoindex.

  • serviceName is a C string holding the FQDN of the Bonjour PTR record. For example, for an SSH service call “Guy Smiley” in the local domain this would be _ssh._tcp.local.

  • recordData is an NSData value containing the ‘stale’ value of the PTR record. Contining the above example, this would be the labels Guy Smiley, _ssh, _tcp, and local, all packed in standard DNS name form (that is, each label prefixed by a length byte, with a trailing 0x00 to indicate the root name), or:

      0A477579 20536D69 6C657904 5F737368 
      045F7463 70056C6F 63616C00
    

    .

What do you think of that approach?

Essentially what you’re doing here is adding a bunch of extra traffic to override the default mDNS TTL values. I don’t think that’s a good idea. Even the lowest-level DNS-SD API (DNSServiceRegister from <dns_sd.h>) does not let you override these defaults because getting these wrong can really punish your network. Section 10 of RFC 6762 lists these default values and explains how mDNS maintains cache coherency.

Share and Enjoy

Quinn “The Eskimo!”
Apple Developer Relations, Developer Technical Support, Core OS/Hardware
let myEmail = "eskimo" + "1" + "@apple.com"

Replies

This is expected behaviour. When you deregister a Bonjour service, the publishing device sends a ‘goodbye’ packet to let other devices know that the service is gone [1]. If you take the device offline without giving it a chance to send that packet (turning on Airplane Mode, turning off Wi-Fi, walking out of Wi-Fi range), the publishing device does not get an opportunity to send this goodbye packet and thus the service persists on clients until its various records expire.

Clients must be able to deal with this situation. How you deal with it depends on the specific requirements of your client. In the standard “user chooses a service and you try to connect” scenario, the connection will fail (because the service is offline) and you can deal with that failure as you would any other connection failure.

If a connection to a service fails, you may choose to make a reconfirm record request to indicate to the Bonjour subsystem that there might be something wrong with this service [2]. Doing this is tricky because there’s no high-level API; you have to use the low-level

DNSServiceReconfirmRecord
call.

Note It would be nice if

NSNetService
(and other APIs that resolve then connect) did this for you. Alas, that’s not the case (r. 12123445).

If your client implements some other scenario, please post the details and I can offer advice based on that.

Share and Enjoy

Quinn “The Eskimo!”
Apple Developer Relations, Developer Technical Support, Core OS/Hardware

let myEmail = "eskimo" + "1" + "@apple.com"

[1] See Section 10.1 of RFC 6762 Multicast DNS.

[2] You can learn more about this in Section 10.4 of that same RFC.

Thank you, Quinn! As always, your response is very helpful.


I am just curious - why the issue doesn't reproduce when the devices (publisher and browser) are not connected to a router? It would be logical to assume that the issue should reproduce no matter if the devices are connected to the router or not.


Regarding the DNSServiceReconfirmRecord call - do you have a reference to a sample code that uses this function? I Googled it (and also Binged it, which I rarely do) and I didn't find any sample code. My search wasn't limited to iOS, and still, I didn't find anything useful.


Before I saw your answer, I thought of using a different approach to identify the stale services. I thought of changing the publisher to publish a timestamp in the TXT record at every predefined interval. The browser, on the other hand, could implement some sort of aging algorithm and mark services as stale after noticing that the timestamp in TXT record wasn't updated for some predefined time. I understand that this solution is fragile because it relies on the fact that the (wall) clock on all devices is synchronized. I guess there are more reasons not to pick this approach other than the time thing.


What do you think of that approach?

why the issue doesn't reproduce when the devices (publisher and browser) are not connected to a router?

I’d have to dig into the details to be sure, but I suspect it’s because in the peer-to-peer case the interface goes down, and that takes all mDNS records discovered over that interface with it.

Regarding the DNSServiceReconfirmRecord call - do you have a reference to a sample code that uses this function?

I have some very old code lying around for this. It’s not in a state I can post, but here’s a snippet:

err = DNSServiceReconfirmRecord(
    0,
    interfaceIndex,
    serviceName,
    kDNSServiceType_PTR,
    kDNSServiceClass_IN,
    (uint16_t) [recordData length],
    [recordData bytes]
);

Let’s look at the notable parameters:

  • As with all <dns_sd.h> calls, the interface index is the value coming back from if_nametoindex.

  • serviceName is a C string holding the FQDN of the Bonjour PTR record. For example, for an SSH service call “Guy Smiley” in the local domain this would be _ssh._tcp.local.

  • recordData is an NSData value containing the ‘stale’ value of the PTR record. Contining the above example, this would be the labels Guy Smiley, _ssh, _tcp, and local, all packed in standard DNS name form (that is, each label prefixed by a length byte, with a trailing 0x00 to indicate the root name), or:

      0A477579 20536D69 6C657904 5F737368 
      045F7463 70056C6F 63616C00
    

    .

What do you think of that approach?

Essentially what you’re doing here is adding a bunch of extra traffic to override the default mDNS TTL values. I don’t think that’s a good idea. Even the lowest-level DNS-SD API (DNSServiceRegister from <dns_sd.h>) does not let you override these defaults because getting these wrong can really punish your network. Section 10 of RFC 6762 lists these default values and explains how mDNS maintains cache coherency.

Share and Enjoy

Quinn “The Eskimo!”
Apple Developer Relations, Developer Technical Support, Core OS/Hardware
let myEmail = "eskimo" + "1" + "@apple.com"

Eskimo, following your comments, I was able to successfully call

DNSServiceReconfirmRecord
and witness the stale service go away. Thanks a lot!


Here is the code

  #include <dns_sd.h>  
  ...
  unsigned int ifindex = if_nametoindex("en0");
  
  NSString *serviceId = @"Guy Smiley";
  NSString *serviceType = @"_ssh._tcp.";
  NSString *serviceDomain = @"local.";
  
  const char *fullName = [[serviceType stringByAppendingString:serviceDomain] UTF8String];
  
  // RDATA parts
  NSArray *typeArray = [serviceType componentsSeparatedByString:@"."];
  NSString *type = typeArray[0];
  NSString *transport = typeArray[1];
  NSString *domain = [serviceDomain componentsSeparatedByString:@"."][0];
  unsigned char zeroByte = 0;
  
  unsigned char idLength = serviceId.length;
  unsigned char typeLength = type.length;
  unsigned char transportLength = transport.length;
  unsigned char domainLength = domain.length;
  
  NSMutableData *rdata = [[NSMutableData alloc] init];
  [rdata appendBytes:&idLength length:1];
  [rdata appendBytes:[serviceId UTF8String] length:idLength];
  [rdata appendBytes:&typeLength length:1];
  [rdata appendBytes:[type UTF8String] length:typeLength];
  [rdata appendBytes:&transportLength length:1];
  [rdata appendBytes:[transport UTF8String] length:transportLength];
  [rdata appendBytes:&domainLength length:1];
  [rdata appendBytes:[domain UTF8String] length:domainLength];
  [rdata appendBytes:&zeroByte length:1];
  
  DNSServiceErrorType err = DNSServiceReconfirmRecord(
    0, 
    ifindex, 
    fullName, 
    kDNSServiceType_PTR, 
    kDNSServiceClass_IN, 
    rdata.length, 
    [rdata bytes]
  );
  if (err != 0) {
    NSLog(@"DNSServiceReconfirmRecord failed!");
  }


I’d have to dig into the details to be sure, but I suspect it’s because in the peer-to-peer case the interface goes down, and that takes all mDNS records discovered over that interface with it.

Since the issue doesn't reproduce when the devices are connected peer-to-peer, do you think it will be sufficient to call

DNSServiceReconfirmRecord
with index name equal to "en0" (Wi-Fi) interface's index? I already started writing code to retrieve the interface name from the socket FD, but now it seems like an overkill to me.

Here is the code

Cool. Make sure to test that your code does the right thing if the service name contains a dot, for example, “Guy.Smiley”.

… do you think it will be sufficient to call

DNSServiceReconfirmRecord
with index name equal to "en0" (Wi-Fi) interface's index?

Not really. There’s no guarantee that

en0
will be correct. For example, the user might have Wi-Fi off and instead be connected to the Ethernet via a USB dongle.

I already started writing code to retrieve the interface name from the socket FD …

Huh? If the connection attempt failed then the socket won’t have a local address and thus you can’t work out the interface from that.

Share and Enjoy

Quinn “The Eskimo!”
Apple Developer Relations, Developer Technical Support, Core OS/Hardware

let myEmail = "eskimo" + "1" + "@apple.com"

Does the

DNSServiceReconfirmRecord
call affects other devices as well?


According to the

DNSServiceReconfirmRecord
documentation
Discussion


Causes the record to be flushed from the daemon's cache (as well as all other daemons' caches on the network) if the record is determined to be invalid.

In my case, it seems that it affects only the device that calls this method that removes the stale service. Other devices (connected to the same router), still see the stale service.

For mDNS, I’d expect it to affect all mDNS responders within multicast range.

To investigate this further you’d need to take a packet trace on the other devices to see if they received the relevant traffic (per the RFC I mentioned in my 23 Jan post).

See Recording a Packet Trace for information about how to get a packet trace on iOS.

Share and Enjoy

Quinn “The Eskimo!”
Apple Developer Relations, Developer Technical Support, Core OS/Hardware

let myEmail = "eskimo" + "1" + "@apple.com"