Hi
My code reads a bunch of URLs from a file and does something with each one.
I recently noticed that some of the lines in the file had wrongly encoded path portions.
e.g:
(Or see here if the forum munges that up)
This looks like it was incorrectly percent-encoded from ISO 8859 character set instead of UTF8 as I am sure the last path component is meant to be 'café' when unencoded .
However my code didn't skip that line as URL(string: line) didn't return nil. But url only contained the scheme and the host. The path was empty.
I tested what URLComponents also did with that string and it gave a similar result - valid scheme and host but empty path.
However URLComponents.percentEncodedPath actually returns the original malformed path:
To complicate things further:
Surely both of those initializers should fail if the string can't be properly and fully parsed?
As it happened, my code went ahead and incorrectly did:
where url was
I haven't even looked at what would happen if the host, query or fragment components were also incorrectly encoded in my source.
I realise that URL and URLComponents are just wrappers around NSURL and NSURLComponents, but they behave the same too.
(the url string wasn't really an Apple one - I used that for simplicity)
My code reads a bunch of URLs from a file and does something with each one.
Code Block swift for line in everyLineFromSomeTextFile { guard let url = URL(string: line) else {continue} doSomethingWith(url) }
I recently noticed that some of the lines in the file had wrongly encoded path portions.
e.g:
Code Block https://www.apple.com/us/search/caf%e9
(Or see here if the forum munges that up)
This looks like it was incorrectly percent-encoded from ISO 8859 character set instead of UTF8 as I am sure the last path component is meant to be 'café' when unencoded .
However my code didn't skip that line as URL(string: line) didn't return nil. But url only contained the scheme and the host. The path was empty.
I tested what URLComponents also did with that string and it gave a similar result - valid scheme and host but empty path.
However URLComponents.percentEncodedPath actually returns the original malformed path:
Code Block /us/search/caf%e9
To complicate things further:
Code Block url.absoluteString ==> "https://www.apple.com/us/search/caf%e9"
Surely both of those initializers should fail if the string can't be properly and fully parsed?
As it happened, my code went ahead and incorrectly did:
Code Block doSomethingWith(url)
where url was
Code Block https://www.apple.com
I haven't even looked at what would happen if the host, query or fragment components were also incorrectly encoded in my source.
I realise that URL and URLComponents are just wrappers around NSURL and NSURLComponents, but they behave the same too.
(the url string wasn't really an Apple one - I used that for simplicity)