JSON from inputStream

is JSONSerialization.jsonObject(with: inputStream) reliable? sometimes it works fine (e.g. with small objects) and sometimes it blocks forever (easier to get the block with big objects). yet sometimes it works ok even with big objects. tried to call it on a different queue - didn't help.

Can you show a complete code to reproduce the issue?

the sample code below. it normally works fine after the first blocking (i.e. after i kill the app and launch it again). i guess this is some sort of caching, despite of the "reloadIgnoringLocalAndRemoteCacheData" flag. to defeat this caching just change the url ever so slightly (e.g. increase pageSize by 1). the app works normally with small page sizes. note that i used this particular URL just as a guinea pig for the test.

    import Foundation

    class C: NSObject {
        private var session: URLSession!
        private let someSerialQueue: DispatchQueue
        private let otherSerialQueue: DispatchQueue
        private let someOperationQueue: OperationQueue

        override init() {
            someSerialQueue = DispatchQueue(label: "someSerialQueue")
            otherSerialQueue = DispatchQueue(label: "otherSerialQueue")
            someOperationQueue = OperationQueue()
            someOperationQueue.maxConcurrentOperationCount = 1
            someOperationQueue.underlyingQueue = someSerialQueue
            super.init()
        }
        
        func test() {
            let pageSize = 90
            let testEndpoint = "https://api.github.com/search/repositories?q=created:%3E2021-08-13&sort=stars&order=desc&accept=application/vnd.github.v3+json&per_page=\(pageSize)"
            URLCache.shared.removeAllCachedResponses()
            var request = URLRequest(url: URL(string: testEndpoint)!)
            request.timeoutInterval = 5
            request.httpShouldUsePipelining = true
            request.cachePolicy = .reloadIgnoringLocalAndRemoteCacheData
            session = URLSession(configuration: .default, delegate: self, delegateQueue: someOperationQueue)
            let task = session.dataTask(with: request)
            task.resume()
        }
    }

    extension C: URLSessionTaskDelegate {
        func urlSession(_ session: URLSession, task: URLSessionTask, didCompleteWithError error: Error?) {
            if let error = error {
                print("TaskDelegate: didCompleteWithError: \(error)")
            } else {
                print("TaskDelegate: didComplete with no error")
            }
        }
    }

    extension C: URLSessionDataDelegate {
        
        func urlSession(_ session: URLSession, dataTask: URLSessionDataTask, willCacheResponse proposedResponse: CachedURLResponse, completionHandler: @escaping (CachedURLResponse?) -> Void) {
            completionHandler(nil)
        }
        
        func urlSession(_ session: URLSession, dataTask: URLSessionDataTask, didReceive response: URLResponse, completionHandler: @escaping (URLSession.ResponseDisposition) -> Void) {
            let status = (response as! HTTPURLResponse).statusCode
            print("DataDelegate: didReceive response. status: \(status)")
            completionHandler(.becomeStream)
        }
        
        func urlSession(_ session: URLSession, dataTask: URLSessionDataTask, didBecome streamTask: URLSessionStreamTask) {
            print("DataDelegate: didBecome streamTask")
            streamTask.captureStreams()
        }

        func urlSession(_ session: URLSession, dataTask: URLSessionDataTask, didReceive data: Data) {
            fatalError("DataDelegate: shall not happen")
        }
    }

    extension C: URLSessionStreamDelegate {
        func urlSession(_ session: URLSession, streamTask: URLSessionStreamTask, didBecome inputStream: InputStream, outputStream: OutputStream) {
            print("StreamDelegate: didBecome inputStream / outputStream")
            inputStream.open()
            
            otherSerialQueue.async {
                print("b4 JSONSerialization.jsonObject")
                let object = try! JSONSerialization.jsonObject(with: inputStream, options: [])
                print("after JSONSerialization.jsonObject")
                let len = "\(object)".count
                print("got JSON, \(len) bytes")
            }
        }
    }

    let c = C()
    c.test()
    while true {
        print("run loop run")
        RunLoop.main.run()
    }

also the CPU goes to 98 - 100% when it blocks.

What do you hope to gain by using an async stream as the input for JSONSerialization?

The reason I ask is that, under the covers, the Foundation JSON parser is not a streaming parser; you have to present the parser with all the JSON data at once. So, internally, +JSONObjectWithStream:options:error: reads the stream until EOF, accumulating all the bytes into a Data value, and then calls through to +JSONObjectWithData:options:error: [1]. So there’s generally not a lot of point calling the streaming API:

  • If the JSON comes from an API that can return bulk data, like URLSession, just use that.

  • If the JSON comes from an API that only offers an async stream, External Accessory framework perhaps, there’s no real benefit to use +JSONObjectWithStream:options:error:; you can just accumulate the bytes yourself.

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

[1] You can see this in the Swift open source. Note that this the non-open source Foundation uses very different code here, but it uses exactly the same algorithm.

the Foundation JSON parser is not a streaming parser; you have to present the parser with all the JSON data at once

hmm. when i called the method from the link you provided (instead of the built-in "jsonObject" method):

extension JSONSerialization {
    // renamed to "jsonObject2" to avoid collision with the built-in method
    class func jsonObject2(with stream: InputStream, options opt: ReadingOptions = []) throws -> Any {
        var data = Data()
        guard stream.streamStatus == .open || stream.streamStatus == .reading else {
            fatalError("Stream is not available for reading")
        }
        repeat {
            let buffer = try [UInt8](unsafeUninitializedCapacity: 1024) { buf, initializedCount in
                let bytesRead = stream.read(buf.baseAddress!, maxLength: buf.count)
                initializedCount = bytesRead
                guard bytesRead >= 0 else {
                    throw stream.streamError!
                }
            }
            data.append(buffer, count: buffer.count)
        } while stream.hasBytesAvailable
        return try jsonObject(with: data, options: opt)
    }
}

it throwed "Unexpected end of file while parsing object."... but that's totally expectable... the loop breaks by getting zero, which indicates that "the end of the buffer was reached". and it doesn't try to read the whole data to the EOF before feeding it to json decoder, which obviously makes json decoder unhappy.

then i simply wrapped this "repeat loop" within another loop:

    var status = stream.streamStatus
    while status != .atEnd && status != .error && status != .closed {
        repeat {
        	...
        } while stream.hasBytesAvailable
        status = stream.streamStatus
    }
    return try jsonObject(with: data, options: opt)

and that actually worked! (although this busy waiting is still not the bestest thing to do.)

looks like built-in "jsonObject" is doing something more naughty which causes it to block. bug?

if i add an extra "hasBytesAvailable" check i can replicate the built-in jsonObject(with: stream) behaviour, and block forever:

    var status = stream.streamStatus
    while status != .atEnd && status != .error && status != .closed {
        repeat {
            if stream.hasBytesAvailable { // *** THIS CHECK
                let buffer = try [UInt8](unsafeUninitializedCapacity: 1024) { buf, initializedCount in
                    let bytesRead = stream.read(buf.baseAddress!, maxLength: buf.count)
                    initializedCount = bytesRead
                    guard bytesRead >= 0 else {
                        throw stream.streamError!
                    }
                }
                data.append(buffer, count: buffer.count)
            }
        } while stream.hasBytesAvailable
        status = stream.streamStatus
    }
    return try jsonObject(with: data, options: opt)

i don't completely understand why this version blocks though. the stream is getting populated (and doing so from a different queue for that matter), so i would expect that even if hasBytesAvailable is "false" temporarily and is false for a few subsequent iterations (or maybe a few million iterations) eventually it shall become "true" once the stream is populated with the next chunk of data, so eventually this loop shall progress to the end. but that's not happening.

possibly related:

hasBytesAvailable
true if the receiver has bytes available to read, otherwise false.
May also return true if a read must be attempted in order to determine the availability of bytes.

the documentation says "may also return true", not "should" nor "must". which i interpret as "even if read must be attempted in order to determine the availability of bytes, "hasByteAvailbale" may still return "false"... which is quite confusing behaviour to me. it would be much less confusing with the "Must return true...". and that "hasBytesAvailable" can't return the answer sometimes (but the "read" can) - is also quite confusing behaviour, in its own.

when i called the method from the link you provided

I’m sorry but that wasn’t the take-away message I was trying to communicate. Rather, I was saying that you should URLSession to read the entire JSON resource and then pass that to JSONSerialization. Using URLSession to create a input stream and then manually reading that input stream to EOF and passing the results to JSONSerialization is a very convoluted way to achieve your goal.

Unless, of course, you’re goal isn’t related to input streams vended by URLSession but rather some other type of input stream. If so, please explain what sort of stream you’re using, because the best way to deal with this depends on whether the underlying stream is asynchronous or not.

The code you’re using right now assumes that the stream is synchronous. URLSession streams are asynchronous, and this is why you’re having all the problems you’re having. The correct fix isn’t to add more loops and busy waiting, but to read the stream asynchronously, but that only makes sense if the underlying data is asynchronous. For example, if you’re dealing with a memory stream (created using -[NSInputStream inputStreamWithData:] there’s no point running it asynchronously.

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

Quinn, a few days ago i implemented my solution without relying on input streams and JsonSerialiser (in short i had to parse a sequence of JSON structs and realized i can't use JSONDecoder or JSONSerializer out of the box, but have to preprocess the data first to split it into individual json chunks.)

what's still puzzling me is this blocking behaviour (even if i don't use it now). i want to get to the bottom of the blocking issue regardless of json serialization as it reveals my lack of understanding of stream API and it may harm me in a different context one day, so please bear with me. can you please comment on these claims?

(a forenote: the stream in question is obtained from the url session task using the sample code above. so it is a true asynchronous network stream.)

  1. apple's version of JsonSerialization.json(with stream) tries to read until EOF... but fails to do this correctly (and thus blocks), and i think i found why, see 4 and 5 below.

  2. open source version of JsonSerialization.json(with stream) doesn't even try to read until EOF (and thus attempts to parse a partially read json data).

  3. my "quick & dirty" modification on top of open source version reads until EOF and seems to be working alright. ignore the bad "busy waiting" practice for now, i'm not going to use it in production just trying to simulate the blocking behaviour observed with apple's code.

  4. finally when i added the extra hasBytesAvailable check to (3) i can reproduce the blocking (great!), thus i suspect this is what apple code is doing internally that causes the block.

  5. the blocking itself is quite interesting phenomenon though. (see ** below)

  6. this behaviour of hasBytesAvailable looks like a bug. at the very least if it is clueless for some reason - it should be returning "true" instead of false to be more in-line with the mentioned documentation comment: "hasBytesAvailable... May also return true if a read must be attempted in order to determine the availability of bytes."

  7. the documentation comment (and the corresponding implementation) would be more sane if it was: "hasBytesAvailable... MUST return true if a read must be attempted in order to determine the availability of bytes". i mean, with the "MUST" - the code outlined in 5b is "correct" (and is as good as 5a). with the MAY" - the code in 5b is not correct to begin with... because: "yes, hasBytesAvailable MAY return true if it doesn't know the answer - so the subsequent read could be attempted. but at the same time hasBytesAvailable DOES NOT HAVE TO return true (even if it's clueless), so it MAY as well return false, and so the subsequent read would not even have a chance to be running at all"

  8. the very idea that "hasBytesAvailable" might not know the answer better than "read" (and thus has to return spurious "true's") deserves some serious justification... i mean, can it not be reimplemented as a wrapper on top of whatever read is doing inside, just without reading?

**) putting it here as it screws the numbering.

it looks like the code of this form works alright (pseudocode):

5a

while !stream.eof {
	stream.read() // might return 0 bytes
}

while the code in the form blocks forever:

5b

while !stream.eof {
    if stream.hasBytesAvailable {
        stream.read()
    }
}

as if "hasBytesAvailable" is improperly returning "false", even if the read could have returned some data.

what's still puzzling me is this blocking behaviour (even if i don't use it now).

I understand your desire to drive this issue to a conclusion but I just don’t have time to help you with that here on DevForums. If you want to continue with this, open a DTS tech support incident and we can pick it up in that context.

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

filed the relevant bugs against hasBytesAvailable & JSONObjectWithStream.

it's also quite easy to reproduce JSONObjectWithStream's hang with a custom InputStream subclass (details in FB9531840) but it looks like the main culprit is hasBytesAvailable returning false when it should not (FB9531304).

fwiw, the open source version at swift-corelibs-foundation is unfinished (according to its status page "jsonObject(with:options:) with streams remains unimplemented") so not surprising it's incorrectly tries to decode a partial json without collecting all the bytes until the eof first.

do you know any use cases when +JSONObjectWithStream:… would be useful?

No.

Although that isn’t to say that this will always be the case. For example, if we converted Foundation’s JSON parser to a streaming parser, this API would let you parse large JSON files without having both the file and the parser results in memory at the same time.

if not why does this API exist at all

See below.

and not deprecated?

I can’t give you an official answer to that. However, speaking for myself, I think this falls far below the threshold required for deprecation (although I could imagine it getting deprecated as part of some larger effort).

And speaking of the bigger picture, keep in mind that Foundation’s byte streaming APIs have evolved a lot over time and not always in one consistent direction. If you’d ask me five years ago which Foundation API to use for streaming I would definitely have recommended NSStream. At the time:

  • The alternatives (most notably, NSFileHandle) were stalled in a very bad place. For example, NSFileHandle was basically unusable from Swift because it responded to an error by throwing a language exception.

  • NSStream was a great way to model a TCP stream. Indeed, it was by far the easiest way to run a TLS-over-TCP connection.

  • We had finally fixed NSStream so that you could create a custom stream while colouring within the lines of the public API.

However, these days I’m much less likely to recommend NSStream. Notably:

  • NSFileHandle has received a lot of love recently.

  • NSStream is no longer the way to run a network connection. Indeed, CFSocketStream, the core code underlying its networking support, has been officially deprecated in favour of NWConnection.

  • We have just started down the async/await path. You’re already seeing the impact of this — for example, FileHandle has an bytes property that is, IMO, frikkin’ awesome — and I can only imagine that growing over time.

Right now I’m struggling to think of any scenario where I’d be happy to use NSStream. There are some cases where you have to use it — for example, doing streamed uploads in URLSession — but in such situations I’d be grumbling (and filing enhancement requests for a better API).

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

obviously stream api needs some facelift like async/await to not fall into obliteration completely.

Indeed. Personally I think that AsyncSequence is the right model moving forward, but our API surface is huge and it’ll take a while for all the relevant APIs to catch up. For example, on the networking front, this year’s OS releases have AsyncSequence support in URLSession but not NWConnection (r. 79137885).

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

JSON from inputStream
 
 
Q