VPN (Packet Tunnel Provider) over SMB

Some users at a certain company reported they are having problems when using my VPN app for Mac (Packet Tunnel Povider), with SMB:

Initiating an SMB upload will fail and then their VPN client will disconnect, in some cases the computer needs to be reboot.
I've collected logs for my client (nothing suspicious there), and I also got the logs from thier Console.


Those are some relevant lines from the Console, but again I don't see anything suspicious:



default 17:27:16.194234 -0400  secd       Microsoft Outloo[303]/1#16 LF=0 copy_matching Error Domain=NSOSStatusErrorDomain Code=-34018 "Client has neither com.apple.application-identifier, com.apple.security.application-groups nor keychain-access-groups entitlements" UserInfo={NSDescription=Client has neither com.apple.application-identifier, com.apple.security.application-groups nor keychain-access-groups entitlements}
default 17:27:16.252627 -0400  com.apple.WebKit.WebContent Current memory footprint: 14 MB
default 17:27:18.079823 -0400  symptomsd        rssi (-53) or transmitRate (585.000000) changed on interface en1 for BSSID:
default 17:27:18.751723 -0400  coreaudiod         gPTPClockStatistics for 0x995aebd999500000
default 17:27:18.751758 -0400  coreaudiod         Grandmaster Identity: 0x995aebd999500000
default 17:27:18.751781 -0400  coreaudiod         Clock lock state: Locked
default 17:27:18.751809 -0400  coreaudiod         0x995aebd999500000: First Sync 2199023255552,2199023255552,42252520443,42252520443,0x995aebd999500000,0
default 17:27:18.751830 -0400  coreaudiod         Rate Ratio: 2199023255552/2199023255552 (1.000000000000)
default 17:27:18.751845 -0400  coreaudiod         Anchors: 42252520443, 42252520443
default 17:27:18.751861 -0400  coreaudiod         Sync Identity: 0x995aebd999500000.0
default 17:27:19.466668 -0400  SophosConfigD CSSM Exception: -2147413736 CSSMERR_DL_DATASTORE_ALREADY_EXISTS
default 17:27:19.852256 -0400  Core Sync            TCP Conn 0x60800016e100 canceled
default 17:27:19.853018 -0400  Core Sync            TCP Conn [239:0x60000016df80] using empty proxy configuration
default 17:27:19.853033 -0400  Core Sync            Stream client bypassing proxies on TCP Conn [239:0x60000016df80]
default 17:27:19.853045 -0400  Core Sync            TCP Conn 0x60000016df80 started
default 17:27:19.852614 -0400  Core Sync            [242  stream, pid: 565] cancelled
                [242.1 61DEA637-675C-413E-A6A3-AA6E3FA344EC .51750<->]
                Connected Path: satisfied (Path is satisfied), interface: utun1, ipv4, dns
                Duration: 43.817s, DNS @0.000s took 0.597s, TCP @0.598s took 0.580s
                bytes in/out: 9096/2013, packets in/out: 11/7, rtt: 0.666s, retransmitted packets: 0, out-of-order packets: 0
default 17:27:19.853679 -0400  Core Sync            [243  ] start
default 17:27:21.403116 -0400  secd       Microsoft Outloo[303]/1#16 LF=0 copy_matching Error Domain=NSOSStatusErrorDomain Code=-34018 "Client has neither com.apple.application-identifier, com.apple.security.application-groups nor keychain-access-groups entitlements" UserInfo={NSDescription=Client has neither com.apple.application-identifier, com.apple.security.application-groups nor keychain-access-groups entitlements}
default 17:27:21.442425 -0400  nesessionmanager          -[NESMSession setStatus:]:776  NESMVPNSession[MyClient - someuser:76FF4E17-someMoreNumbers1EC359]: status changed to disconnecting
default 17:27:21.450275 -0400  nsurlsessiond    received network changed event
default 17:27:21.467515 -0400  CommCenter     #I DATA.DataNetworkMonitorOSX: handleNetworkStateChanged_sync: nwi_state: 0x7fdf256247f0
default 17:27:21.467577 -0400  CommCenter     #I DATA.DataNetworkMonitorOSX: checkIPConnectivity_sync: ***** ipConnectivityAvailable: true
default 17:27:21.467610 -0400  CommCenter     #I DATA.DataNetworkMonitorOSX: checkIPConnectivity_sync: Previous primary interface '' is DOWN.
default 17:27:21.464626 -0400  airportd                _processIPv4Changes: ARP/NDP offloads disabled, not programming the offload
default 17:27:21.467640 -0400  CommCenter     #I DATA.DataNetworkMonitorOSX: checkIPConnectivity_sync: Primary interface changed to '' with IP family: kDataProtocolFamilyIPv4
default 17:27:21.467692 -0400  CommCenter     #I Firing event 'dataWifiAvailable': with params=

The VPN disconnects at the line starting with default 17:27:21.442425 -0400


Is there something suspicious that I missed in those logs?

Is there anything that might cause this behavior, only on SMB?

When you say that "in some cases the computer needs to be reboot," do you mean that the VPN hangs and the machine needs to be rebooted? Can you tell me more about why the computer needs to be rebooted?


Matt Eaton

DTS Engineering, CoreOS

meaton3 at apple.com

Yes, the VPN hangs, and AFAIK, the reboot is needed because from that moment he can't access the internet.

But I already asked the customer about 'why the reboot', to verify that what I've just said is accurate, I'll update when he'll answer.

So from your answer, I guess there aren't any known issues with VPN over SMB?

It's hard to say one way or the other definitively without more information and a sysdiagnose. One thing that you could try to isolate the problem is using SMB with the system level IKEv2 VPN. IF you can still reproduce the issue there, then there is strong evidence that this is a SMB issue. If you cannot reproduce the issue then it's time to take a look at your VPN app while using SMB.

| So from your answer, I guess there aren't any known issues with VPN over SMB?



Matt Eaton

DTS Engineering, CoreOS

meaton3 at apple.com

Thanks for the advice! I'll ask the customer to to that and report back.

I also found out that this reproduces only on her Mac, while for other users of that company, at the same office, everything works fine with SMB and my app.

Also, another question - if using the IKEv2 they will be able to upload the files, then the problem is at my app. But what's my next step? Can I ask them to create a sysdiganose, or is it just for developers?

Update: I just reproduced it here on my Mac. Using my app for the VPN I connected between my Mac and a Win machine. The issue reproduced only when trying to upload big files (1GB).

At the Console I saw those lines (I copied only the relevant lines), all of them are from the kernel process:



utun_output - ctl_enqueuembuf failed: 55   //this line appears something like 200 times in a row..
smb2fs_smb_cmpd_query_dir: Warning: No lease found for 
smbfs_pack_vap: Missing Finder Info <.DS_Store>
tcp_timers: tcp_output() returned 0 with retransmission timer disabled for 51891 > 443 in state 4, reset timer to 1063
Starting poll type 4
Restarting poll type 4

And after a few seconds, those lines, coming from the process KernelEventAgent:


tid 31b110ce received event(s) VQ_NOTRESP (1)
tid 31b110ce type 'smbfs', mounted on '/Volumes/Roee', from '//user@10.10.1.8/Roee', not responding
tid 31b110ce found 1 filesystem(s) with problem(s)



Next steps: I'll check if it's persistant, and if so - I'll try to reproduce it with the IKEv2 as you said.

Also, I'll create a sysdiagnose.


But the question for now: Is it possible to tell what's the error from the above logs?

There's a few things that stick out to me here. First, is that it looks like SMB disconnected and is going through a reconnection transition. Second, "utun_output - ctl_enqueuembuf failed: 55," being called 200 times is concerning because this usually means that the memory buffer from the kernel control socket to the client is not being freed up, resulting in a continuous error telling the client to free memory. For large transfers like this you are responsible to manage the flow control of your network provider so you do not receive these errors. Quinn wrote an excellent post on this detailing how to manage the outbound data buffered in your network provider.


My recommendation would be to implement this outbound flow control as Quinn mentions in his post and see if that resolves the ctl_enqueuembuf error and also the issues with SMB.


Matt Eaton

DTS Engineering, CoreOS

meaton3 at apple.com

Thanks again @meaton for all the helpful replies!
As I said I would do, I tried to reproduce it a lot of times, including uploading big files (3GB, 5GB) and it didn't reproduce on my Mac.
However, at the customer, it reproduced easily, even with small files - she tries to upload 3 files of ~10mb each, and it happens.
I checked again her Console logs, and no sign for utun_output or ctl_enqueuembuf, but those line seems related:


default 17:27:22.110479 -0400  symptomsd        NWPath: kNetworkPrimary  NWPathStatusSatisfied   interface en1 index 6
default 17:27:22.110535 -0400  symptomsd        primary interface change to en1, type 1
default 17:27:22.110692 -0400  symptomsd        Primary change
default 17:27:22.110754 -0400  symptomsd        noi: NOI: v:0 type:Wifi, isAny:yes, isBuiltin:no, loi:-1, flags:0, event: kNotificationNewPrimaryInterface, oldLoadedLqm: 100, rawLoadedLqm: 100, newLoadedLqm: 100, not posting
default 17:27:22.118243 -0400  apsd      Looking up connection on peer: d573aec0   found 
default 17:27:22.118320 -0400  apsd      Looking up connection on peer: d573aec0   found 
default 17:27:22.118394 -0400  apsd      Looking up connection on peer: d573aec0   found 
default 17:27:22.131586 -0400  mapspushd         Delivering connectionStatusChange from apsd: NO
default 17:27:22.133413 -0400  mapspushd         Delivering connectionStatusChange from apsd: NO
default 17:27:22.152288 -0400  suggestd               Delivering connectionStatusChange from apsd: NO
default 17:27:22.152421 -0400  suggestd               Delivering connectionStatusChange from apsd: NO
default 17:27:22.152456 -0400  suggestd               Delivering connectionStatusChange from apsd: YES
default 17:27:22.152897 -0400  apsd      Looking up connection on peer: d57346a0   found 
default 17:27:22.154376 -0400  kernel   SIOCPROTODETACH_IN6: utun1 error=6
error      17:27:22.154579 -0400  MyAppExtension       interface_get_mtu:3730 SIOCGIFMTU failed: Device not configured
default 17:27:22.155072 -0400  MyAppExtension       PacketTunnelProvider deinit
default 17:27:22.161293 -0400  configd network changed
default 17:27:22.165453 -0400  nsurlsessiond    received network changed event
default 17:27:22.164295 -0400  airportd                _processIPv4Changes: ARP/NDP offloads disabled, not programming the offload
default 17:27:22.154392 -0400  kernel   com.citrix.kernel.dne:
default 17:27:22.154392 -0400  kernel
default 17:27:22.154394 -0400  kernel   socket_event_callback: ifnet_find_by_name("utun1") returned 6
default 17:27:22.154405 -0400  kernel   com.citrix.kernel.dne:
default 17:27:22.154406 -0400  kernel
default 17:27:22.154408 -0400  kernel   socket_event_callback: ifnet_find_by_name("utun1") returned 6
default 17:27:22.154428 -0400  kernel   com.citrix.kernel.dne:
default 17:27:22.154429 -0400  kernel
default 17:27:22.154431 -0400  kernel   socket_event_callback: ifnet_find_by_name("utun1") returned 6
default 17:27:22.227699 -0400  UserEventAgent               -[NEFileHandleMaintainer reset:84   Current file handles for com.apple.networkextension.file-descriptor-maintainer: (
    "Policy Session socket (58)",
    "Network Agent Registration socket (137) 76FF4E17-DE92-4B00-B8CC-EE7A2A1EC359 BFFC9455-2D18-47CF-A618-96170984E670 1"
)
default 17:27:22.227754 -0400  UserEventAgent               -[NEUserEventAgentFileHandleMa:57   File Handle Maintainer listening for readable events on Network Agent Registration socket (137) 76FF4E17-DE92-4B00-B8CC-EE7A2A1EC359 BFFC9455-2D18-47CF-A618-96170984E670 1
default 17:27:22.233833 -0400  assistantd             Delivering connectionStatusChange from apsd: NO





So right before the disconnection, there's something with utun1 error = 6


Since I can't reproduce it anymore, and it doesn't look related to the buffer side (at least not according to the customer's logs), I asked for them to try to reproduce it on their environment.

I'll update with my findings.

And if the logs I just posted above gives any clue - I'll be happy to know.

Thanks!
Edit: Just to add that it's not the case of too much outbound data buffered, I have logs for that scenario, and I can't see it didn't happen here.

Since this is reproducible on one machine, even with small files, I am suspecting that there is possible packet fragmentation going on here and the MTU size is getting lost on the socket between the client and the peer. Checkout the network path that the data traverses and see if the packet's are getting fragmented passing through firewalls or NATs. A packet trace would help you determine this.


default 17:27:22.154376 -0400  kernel   SIOCPROTODETACH_IN6: utun1 error=6 
error      17:27:22.154579 -0400  MyAppExtension       interface_get_mtu:3730 SIOCGIFMTU failed: Device not configured 
default 17:27:22.155072 -0400  MyAppExtension       PacketTunnelProvider deinit 
default 17:27:22.161293 -0400  configd network changed 
default 17:27:22.165453 -0400  nsurlsessiond    received network changed event

Matt Eaton

DTS Engineering, CoreOS

meaton3 at apple.com

Great! It sounds very reasonable, thank you very much for all the help!

No problem at all. Glad to help.


Matt Eaton

DTS Engineering, CoreOS

meaton3 at apple.com

VPN (Packet Tunnel Provider) over SMB
 
 
Q