Delay in APNS notification delivery

One of our customer has reported that pushes are delivered with big delay. Pushes are send to APNS in less than 5 seconds as per our statistics but it takes sometimes up to 5 minutes for notifications to arrive on devices.

The authentication used is certificate based even though token based authentications is also tried out before. They have around 346252 subscribed device tokens (users) to which important pushes are send out.

We use HTTP/2 based connections and reuses the connections. To avoid push bursts over selective connection, we distribute push traffic across different APNs servers.

Sample headers:

[:method: POST, :authority: api.push.apple.com, :path: /3/device/<device>, :scheme: https, apns-expiration: 1723721057, apns-priority: 10, apns-topic: <apns topic>, authorization: bearer <bearer token>]

Sample payload: {"reference":"{"id":"lux.DvKH5JCtCcus5EaW5Houcn","type":"articleReference"}","aps":{"badge":0,"alert":{"body":"„Warum werden nicht ein paar gesetzliche Feiertage gestrichen?“ Munich-Re-Chef Joachim Wenning fordert, dass die Deutschen mehr arbeiten sollten"},"sound":"default","mutable-content":1},"tracking":"{"piano":{"ivw_category":"thema_wirtschaft","pcat":"paid","date_sent":1723629099370,"main_topic":"unternehmen","push_channel":"11124","section":"wirtschaft","object_id":"lux.DvKH5JCtCcus5EaW5Houcn","push_text":"warum_werden_nicht_ein_paar_gesetzliche_feiertage_gestrichen_munich_re_chef_joachim_wenning_fordert_dass_die_deutschen_mehr_arbeiten_sollten_plus"},"ivw":{"ivw_category":"thema_wirtschaft","ivw_code":"spracheDE&#x2F;formatTXT&#x2F;erzeugerRED&#x2F;homepageNO&#x2F;auslieferungMOB&#x2F;appYES&#x2F;paidNO&#x2F;inhaltTHEMA&#x2F;merkmalWIRTSCHAFT&#x2F;ressortWIRTSCHAFT&#x2F;portalAPP"},"firebase":{"ivw_category":"thema_wirtschaft","pcat":"paid","date_sent":1723629099370,"main_topic":"unternehmen","push_channel":"11124","section":"wirtschaft","object_id":"lux.DvKH5JCtCcus5EaW5Houcn","push_text":"warum_werden_nicht_ein_paar_gesetzliche_feiertage_gestrichen_munich_re_chef_joachim_wenning_fordert_dass_die_deutschen_mehr_arbeiten_sollten"}}"}

Please let us know what could be the reason and steps we could take to avoid such delivery delays.

A lot depends on the specifics of this delay. Is it only for a few customers? Is the delay anecdotal or are there any specific metrics or logs showing the delayed delivery? For single or a few delivery delays the reason is typically due to unreliable network connection on the customer side, where APNs will stop retrying for a while to wait for a reliably persistent network connection to the device.

If it is for all their customers, are they sending all 350K notifications at once? If so, do they have enough resources to do so. Delays in this case are usually caused by insufficient bandwidth or not allocating enough resources. Without sufficient bandwidth and open connections to APNs, what ends up happening is the push server DoS's itself by blocking requests to APNs while waiting for responses and then the whole thing falls apart.

If they are trying to send all 350K notifications at once (within 1-2 seconds), then they should have dedicated about 350 open HTTP/2 connections and enough compute and network resources, and hosts to handle that kind of load.

Not to mention the bandwidth allocated to send all those requests at once.

These are common causes we see. If you want us to look at a sample notification that was delayed 5 minutes, we have a pinned post on how to provide diagnostic information so we can check what might be the problem. Please read through the post If you need assistance debugging your push notification issues

and supply the requested detailed information here for a notification request that was delayed and we can take a look


Argun Tekant /  DTS Engineer / Core Technologies

We uses around ~ 2 threads at most for a CPU core. On a 16 core machine so we uses 32 threads at most. The number of threads and connections should be ideally same as per the guideline. So the max connections we make to APNS are also 32. Machine has a 64GB RAM. By running the command "dig +short api.push.apple.com", seems there are less than 10 available servers for APNS.

Do you think with this configuration, the mentioned delivery time is expected? Do you recommend an open HTTP/2 connection for each 1k notifications in order to deliver notifications in 1-2 seconds? Sending is done asynchronous manner without waiting for response. There is also a flow control implemented. What is the maximum supported number of inflight notifications at a time?

Is there someway to identify when was message received at APNS? In https://icloud.developer.apple.com/dashboard → push notifications I can see some message delivery status but it is an aggregation of several days but do not show at what time of day notification was receieved in APNS

The mentioned delay is only for a customer but for others it is much faster.

One of our customer has reported that pushes are delivered with big delay. Pushes are send to APNS in less than 5 seconds as per our statistics but it takes sometimes up to 5 minutes for notifications to arrive on devices.

I have investigated many, many issues like this and, in the VAST majority of cases, they could ALL be summarized as:

  1. A network ("the device did not have a working network connection") or hardware ("the device was not turned on") level issue meant it was unable to communicate with our push servers.

  2. At some later point a working connection was established with out servers.

  3. The push reach the device IMMEDIATELY after that connection was established, typically reaching the target apps in <1s of the connection being reestablished.

  4. ALL significant delivery delays on the device side were caused by the target app itself, NOT the system. In concrete terms, when it takes 6s for a push to reach a voip app AFTER it's reached the iOS device, it's because it took ~6s for that voip app to finish launching and register with PushKit.

Having worked with many developer on this sort of issue, the starting assumption tends to be that the problem is either:

  • The submitting server did "something" wrong and was really slow at delivering the push.

OR

  • Our push server did "something" wrong and delayed the push unnecessarily.

OR

  • The device itself got the push very quickly and "should" have delivered it.

When, in the VAST majority of cases, the problem was actually "the network was in some way broken/unavailable and the push couldn't be delivered at all".

I would also highlight:

The mentioned delay is only for a customer but for others it is much faster.

Is this happening on a specific Wi-Fi network the customer controls? Particularly in a "Wi-Fi only" configuration? Some common issues I've seen:

  • The Wi-Fi network has never been properly surveyed and built out to ensure consistent coverage. This creates "gaps" where the devices looses Wi-Fi and our connectivity retry delays "expand" how long the device is offline.

  • The Wi-Fi network is unintentionally misconfigured. For example, Cisco has/had an access point configuration which would ask clients to disassociate after a fixed amount of time (default, ~10 min.) of "inactivity". iOS would disassociate as instructed and remain disassociated... because that's what the AP told us to do. Waking the device would trigger a reconnect, starting the cycle all over again.

  • The underlying network is misconfigured/broken in a way that interferes with normal push activity. For example, I've seen NAT servers which disconnect the "server" side of the connection WITHOUT notifying/closing the client side connection. The device believes it still has a working connection (because the NAT server kept it's connection open) and won't notice the issue until it does a periodic refresh.

Note that all of theses issues needed to be investigated and resolved at the network itself, not the iOS device.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

Delay in APNS notification delivery
 
 
Q