Users of my app have reported that they are sometimes unable to receive Voice-over-IP (VoIP) push notifications when using a SIM. (There is no problem when using WiFi)
VoIP push notifications were not received during the following period. Could you confirm diagnostic logs and could you please tell me why my app can't receive VoIP push?
[diagnostic logs] https://drive.google.com/drive/folders/1gSAbr1Fy1SrjlmRXuAzoXqiaxnNbFhj8?usp=sharing
[Problem period] 2024/06/17 05:34:59 - 2024/06/17 09:04:42 Number of times that the push server pushed and it received a normal APNs response: 31 Number of times that iPhone received pushes: 0 2024/06/17 23:05:03 - 2024/06/18 09:02:16 Number of times that the push server pushed and it received a normal APNs response: 192 Number of times that iPhone received pushes: 0
2024/06/15 00:35:56 - 2024/06/15 09:55:57 Number of times that the push server pushed and it received a normal APNs response: 138 Number of times that iPhone received pushes: 0
I tried grep "apsd[131:", 15:45:11.000+0900","15:34:22.998+0900", and "09:06:19.991+0900" for *.txt *.log in all sysdiagnoses, but there were no hits. How can I read sysdiagnose? Do I need to do any further processing on the resulting files after decompress tar? Also, is it OK for developers to read sysdiagnose?
A sysdiagnose archive is a standard zip archive with a bunch of files it. Of those files, the largest and most useful file is by FAR the file named "system_logs.logarchive". In the vast majority of cases, "sysdiagnose analysis" actually means "open the console logarchive and try to figure out what happened".
The post "Your Friend the System Log" has some good background on how else that file can be processed and manipulated but most of the time you'll be opening an viewing the file with Console.app. That's what will open it by default if you just double click on the archive.
The actual analysis process isn't easy to quickly summarize, as it relies as much on becoming familiar with how the system operates and logs, as well as inferring what's gone on using the information at hand. However, here's my informal "guide" to using Console.app for this sort of analysis:
a) In Big Sur, Console.app got a HUGE but easy to overlook feature. In the bottom left corner of the console window, the "Showing" popup now has an extra item called "Custom", which allows you to specify a particular time range. That SINGLE feature is a absolutely essential, as the total log size is often SO large that it's impossible to make useful progress by simply "scrolling blind".
b) On a related note, you can right click on the table header to customize exactly what fields are shown and hidden. Exactly what columns you use is entirely up to you, but my basic list in order is:
-
Time -> "when" is obviously critical in this entire process
-
Process/Library/subsystem-> all of these can be used to change the change exactly what subset of the log data you're looking at.
-
PID:Thread ID-> PID is critical because it let you differentiate between different launches of the same component, as well as filtering down to a specific process launch. Thread ID is a bit less critical, but it can let you trace a thread of "logic" inside a component that's processing multiple things at a time. These can be shown as seperate columns, but I generally combine them to reduce the column count.
-
Message-> What the system logged, which is obviously critical.
In terms of basic the basic "flow" I use for an looking at this sort of log, this is what I actually do, as well as some tips/tricks:
-
Narrowing the time range to what's "interesting" is the single most critical step. This is also why any kind of external information is so helpful. Knowing that an issue occurred within a 5 min. range is enormously helpful
-
Ruthlessly trim the remaining data until I find something "interesting". Note that you can "right click" on any entry and use that to refine/prune what's being shown. For a voip/CallKit issue, right clicking on a callservicesd and selecting "Show process callservicesd" is often the first place I start. If the issue is less clearcut I'll often work the other way and just hide processes that don't seem "interesting" until something catches my eye.
-
The other columns give you other tools you can manipulate the data with. For example, "Library" can be very useful, since it will let you trace activity across processes. If you chose "CallKit", you'd see any message the CallKit framework logged inside your app AND the actual CallKit usage inside callservicesd, which can let you trace things across frameworks. Similarly, if you "know" the problem isn't going to involve any sort of networking you can simply hide the libraries that aren't relevant.
-
The analysis process generally involves narrowing and expanding your search terms and time range as you try and sort out what actually happened. One trick here is that you can cut and paste the contents of the "search". I actually leave a text editor open and use it as a "scratch pad" as I'm working. I'll copy out log message that seem important, but I'll also save the search string itself, then continue narrowing. I can then go "back" to the previous state by building out the search box myself, instead of doing each to the steps manually.
On the "tip" side:
-
If you have any sort of in app logging (and if you don't, you should) make sure you're log prints into the general console as well as to whatever other source/location you use. If you do this right, your individual log will have EXACTLY the same timestamps as the console log, which means you can use to standalone log as an "index" into the system log, using the data in your app log to find "interesting" times in console log.
-
Also, I'd recommend logging the "pid" in your own logs as well as whatever else your logging. That makes it easy to differentiate between different launches of your app and can also be used to quickly narrow data in the same way other metadata can.
-
Be aware of what the system can and cannot tell you. The log will tell you what happened in the system and what went wrong in the system. It cannot tell you what happened outside of the system. It can tell you when a push arrived and it can show you how that push was processed. You MIGHT be able to tell you whether or not the device had an active network connection. It CANNOT tell you why a push didn't arrive or what was "wrong" with the network it was connected to.
__
Kevin Elliott
DTS Engineer, CoreOS/Hardware