Some odd behaviour for a GB18030 file with 1001 bytes size.

When working with a txt parse issue, I find an odd fact that a file with size 1000*n + 1 bytes and end with a two-byte character such as a Chinese character. Read NSData from a file, convert to NSString, but failed to convert the NSString to NSData. (error: *** -[__NSCFString dataUsingEncoding:allowLossyConversion:]: didn't convert all characters)


The 1001 bytes size only get 1000 bytes after reading from the file. In the meantime, if a character is added or deleted from the file, all goes fine. Does anyone get any idea for this?


Demo: https://github.com/singro/TextDecodingDemo

Replies

It does have the appearance of a bug. In your demo, I replaced the lines that failed (inside the @try) with these:


NSString* content3 = [content substringFromIndex: 1];
NSString* content4 = [content substringToIndex: 1];
NSData *data3 = [content3 dataUsingEncoding:encoding allowLossyConversion:YES];
NSLog(@"data3: %@", data3);
NSData *data4 = [content4 dataUsingEncoding:encoding allowLossyConversion:YES];
NSLog(@"data4: %@", data4);


and that works correctly (and could be the basis of a workaround). You should submit a bug report with your sample project.

niu bi