URLSession dataTask Does not fetch data

Hi, i had successfully fetched data from a URL but i cannot figure out why its sub link from the same domain, does not fetch any data. it always shows nil, but it contains byte like 39962 bytes.

If it is nil, shouldnt that be something like 0 bytes

Sample url is this.

https://earthquake.phivolcs.dost.gov.ph/2022_Earthquake_Information/September/2022_0922_1059_B2.html

This is how I fetch the data from url

URLSession.shared.dataTask(with: URL(urlString)) { (data, response, error) in
      guard let data = data else {
        completion(nil)
        return
      }

      if error != nil {
        completion(nil)
      }
      else {
        print(data)
        completion(String(data: data, encoding: .utf8))
      }
    }.resume()

Thoughts? What could be wrong?

If it is https://earthquake.phivolcs.dost.gov.ph, i can fetch its data without issues.

  • Any insights will help why it works if i set .ascii instead of .utf8?

Add a Comment

Replies

why it works if i set .ascii instead of .utf8?

That’s the critical clue here. It turns out that file is not encoded using UTF-8, and contains some bytes that are not a legal UTF-8 encoding. So the file is being downloaded just fine but the string initializer fails because it can’t handle those non-UTF-8 bytes. Note the file actually uses the Windows-1252 encoding, according to this line:

<meta http-equiv=Content-Type content="text/html; charset=windows-1252">

Specifically the issue is around line 551 which contains a degree symbol for latitude and longitude. It’s encoded as single byte B0 in the file, but in UTF-8 that would be the byte sequence C2 B0.

So the most correct encoding to specify would be .windowsCP1252, though you’ve seen that .ascii also happens to work in this case since they are mostly compatible.

  • Funny... thanks to this I just discovered I’ve been typing the wrong “degree” symbol for years. I’ve always used Option+0 to get º but it turns out that’s actually U+00BA MASCULINE ORDINAL INDICATOR. To get U+00B0 DEGREE SIGN (°) you need to type Option+Shift+8.

Add a Comment

For https://earthquake.phivolcs.dost.gov.ph header is <meta http-equiv="content-type" content="application/xhtml+xml; charset=utf-8" />

For https://earthquake.phivolcs.dost.gov.ph/2022_Earthquake_Information/September/2022_0922_1059_B2.html header is <meta http-equiv=Content-Type content="text/html; charset=windows-1252">

Look at the charset

I never thought about that. Thought regardless of what other encoding was set, it should be compatible with utf-8 since i didnt see any out of the ordinary character in there.