Some odd behaviour for a GB18030 file with 1001 bytes size.

Question

When working with a txt parse issue, I find an odd fact that a file with size 1000*n + 1 bytes and end with a two-byte character such as a Chinese character. Read NSData from a file, convert to NSString, but failed to convert the NSString to NSData. (error: *** -[__NSCFString dataUsingEncoding:allowLossyConversion:]: didn't convert all characters)

The 1001 bytes size only get 1000 bytes after reading from the file. In the meantime, if a character is added or deleted from the file, all goes fine. Does anyone get any idea for this?

Demo: https://github.com/singro/TextDecodingDemo

UIKit

825

Posted by

Singro

Reply

Add a Comment

Answer 1

It does have the appearance of a bug. In your demo, I replaced the lines that failed (inside the @try) with these:

NSString* content3 = [content substringFromIndex: 1];
NSString* content4 = [content substringToIndex: 1];
NSData *data3 = [content3 dataUsingEncoding:encoding allowLossyConversion:YES];
NSLog(@"data3: %@", data3);
NSData *data4 = [content4 dataUsingEncoding:encoding allowLossyConversion:YES];
NSLog(@"data4: %@", data4);

and that works correctly (and could be the basis of a workaround). You should submit a bug report with your sample project.

Posted by

QuinceyMorris

Add a Comment

Answer 2

niu bi

Posted by

763950084

Add a Comment

Some odd behaviour for a GB18030 file with 1001 bytes size.

Replies