mdimport / mdls failure / truncation for multivalued CFNumbers (Array of numbers to spotlight index)

Hello


My application is using HFS+ Extended attributes in conjuntion with a spotlight importer plugin to successfully provide searchable attributes through the Spotlight/Finder Find UI and searchable only through API only attributes.


However, in experimenting with multivalued CFNumber attributes for my mdimporter plugin, I see behaviour where certain serialized plists of NSArrays of CFNumbers are not being seeing correctly by spotlight (only a the first value is seen by mdls, and it is truncated). However others similarly specified multivalue CFNumber attributes which run through the same code path are being seen properly by mdls.

Strange, I know. Let me explain. My application writes HFS+ Extended Attributes via setxattr. We serialize this data as a plist via NSPropertyListSerialization with NSPropertyListBinaryFormat_v1_0.

I can verify that my plist data is exported correctly on my file via xattr command, which for example produces:

xattr -l  /Users/vade/Documents/Repositories/Synopsis/Test\ Suite/Test\ Colors.mov_transcoded_analyzed.mov
com.apple.metadata:info_v002_synopsis_descriptors:
00000000  62 70 6C 69 73 74 30 30 A5 01 02 02 03 04 56 59  |bplist00......VY|
00000010  65 6C 6C 6F 77 56 50 75 72 70 6C 65 56 4F 72 61  |ellowVPurpleVOra|
00000020  6E 67 65 55 57 68 69 74 65 08 0E 15 1C 23 00 00  |ngeUWhite....#..|
00000030  00 00 00 00 01 01 00 00 00 00 00 00 00 05 00 00  |................|
00000040  00 00 00 00 00 00 00 00 00 00 00 00 00 29        |.............)|
0000004e
com.apple.metadata:info_v002_synopsis_dominant_colors:
00000000  62 70 6C 69 73 74 30 30 AF 10 0F 01 02 03 04 05  |bplist00........|
00000010  06 07 08 09 0A 0B 0C 0D 0E 0F 22 3E F7 AC EA 22  |..........">..."|
00000020  3F 39 6A C4 22 3E 6E 6A 1A 22 3E DF D9 DE 22 3E  |?9j.">nj.">...">|
00000030  34 06 DE 22 3F 23 7F D4 22 3E A6 CC 57 22 3E 42  |4.."?#..">..W">B|
00000040  10 7B 22 3E 77 49 C2 22 3F 60 19 EF 22 3E 99 90  |.{">wI."?`..">..|
00000050  AC 22 3E 97 1C 5F 22 3F 4A 7A 4F 22 3F 05 DC 56  |.">.._"?JzO"?..V|
00000060  22 3E EC 46 57 08 1A 1F 24 29 2E 33 38 3D 42 47  |">.FW...$).38=BG|
00000070  4C 51 56 5B 60 00 00 00 00 00 00 01 01 00 00 00  |LQV[`...........|
00000080  00 00 00 00 10 00 00 00 00 00 00 00 00 00 00 00  |................|
00000090  00 00 00 00 65                                   |....e|
00000095


Those two binary plists contain arrays of strings (the descriptors), and CFNumbers (dominant colors). You can verify this by copying the hex, saving it as a file and naming it with a plist extension and opening it in XCode. You will see one plist containing an array strings and one plist containing an array of numbers. So we know the xattr code path is correct.


Importing with mdimport -d2 indicates that Spotlight acknowledges my xattr'ed data (and it matches the plist as above)

mdimport -d2 /Users/vade/Documents/Repositories/Synopsis/Test\ Suite/IMG_0930.mov_transcoded_analyzed.mov
(Import.Info:845) com.apple.quicktime-movie
2016-07-30 20:39:07.339 mdimport[5965:223781] Imported '/Users/vade/Documents/Repositories/Synopsis/Test Suite/IMG_0930.mov_transcoded_analyzed.mov' of type 'com.apple.quicktime-movie' with plugIn /System/Library/Spotlight/CoreMedia.mdimporter.
2016-07-30 20:39:07.342 mdimport[5965:223781] Attributes: {
    ":EA:info_v002_synopsis_descriptors" =     (
        Gray,
        Orange,
        Gray,
        Gray,
        Black
    );
    ":EA:info_v002_synopsis_dominant_colors" =     (
        "0.4563954",
        "0.3744843",
        "0.2999708",
        "0.7910913",
        "0.5998821",
        "0.3693877",
        "0.5683019",
        "0.5030823",
        "0.4359812",
        "0.2698129",
        "0.221634",
        "0.1844318",
        "0.1419222",
        "0.1122651",
        "0.09307535"
    );


(clipped for brevity).


However, mdls does not show the entirety of the imported data for the CFNumber case:


mdls /Users/vade/Documents/Repositories/Synopsis/Test\ Suite/IMG_0930.mov_transcoded_analyzed.mov
_kMDItemOwnerUserID                = 501
info_v002_synopsis_descriptors     = (
    Gray,
    Orange,
    Gray,
    Gray,
    Black
)
info_v002_synopsis_dominant_colors = (
    "0.4563954"
)


Note that my dominant color array has been truncated.

What is odd, is that I have a much much larger array of CFNumbers which is imported by spotlight correctly (it is 768 values long, 256 numbers for each RGB channel as a histogram). This is imported correctly by mdimport and verified through mdls and though the NSMetadata api.


What is interesting, is I can isolate this behaviour to the *contents* of the CFArray I am serializing, and not to the name of the extended attribute or matching key in my mdimporter. If I write the contents of my working histogram array to both the dominant color extended attribute and the histogram extended attribute I am able to import via spotlight and see correct values for both arrays.


If I use the dominant color array, I see truncated values for both my histogram and my dominant colors.


For those curious, ive checked my mdimporter schema multiple times, it is as follows:


<?xml version="1.0" encoding="UTF-8"?>

<schema version="1.0" xmlns="                      xmlns:xsi="                      xsi:schemaLocation="http://www.apple.com/metadata file:///System/Library/Frameworks/CoreServices.framework/Frameworks/Metadata.framework/Resources/MetadataSchema.xsd">
<attributes>
        <attribute name="info_v002_synopsis_descriptors" multivalued="true" type="CFString" uniqued="false"/>
        <attribute name="info_v002_synopsis_perceptual_hash" multivalued="false" type="CFString" nosearch="true" uniqued="false"/>
        <attribute name="info_v002_synopsis_histogram" multivalued="true" type="CFNumber" nosearch="true" uniqued="false"/>
        <attribute name="info_v002_synopsis_dominant_colors" multivalued="true" type="CFNumber" nosearch="true" uniqued="false"/>
   </attributes>
    <types>
        <type name="info.v002.Synopsis">  <!-- add one <type>...</type> entry for each UTI that you need to define. -->
   
            <!-- 'allattrs' is a whitespace separated list of all of the attributes that this UTI type normally has.
                 It does not have to be exhaustive. -->
            <allattrs>
  info_v002_synopsis_descriptors
            info_v002_synopsis_perceptual_hash
            info_v002_synopsis_histogram
            info_v002_synopsis_dominant_colors
            info_v002_synopsis_motion_vector_values
            </allattrs>
            <!-- 'displayattrs' is a whitespace separated list of the attributes that should normally be displayed when previewing files of this UTI type. -->
            <displayattrs>
            info_v002_synopsis_descriptors
            </displayattrs>
      
        </type>
    </types>
</schema>


Further inspection into the content of my array has shown me some very interesting behaviour.

If the content of my Array is all doubles, it is imported, but only if the number of decimal places is enough for the double to be serialized as such by NSNumber, and not concatenated to a float. (This appears to be the case for my histogram data).


If the content of one of the NSNumbers of my an array is say, a Short Int of value 1, and the rest float with proper float values, the entire array is not imported and only the first short is. I can verify this by changing the value in my histogram array, and causing a failure in metadata import.


If the content of my array is all floats (as per the dominant color array) - it is truncated to the first value.


Can anyone shed any light this at all?


Thank you!


For those curious, you can find the pertinent code here:


Spotlight importer:


github.com/Synopsis/Synopsis/tree/master/Synopsis/SpotlightImporter


extended attribute writing:

github.com/Synopsis/Synopsis/blob/master/Synopsis/Synopsis/MetadataWriterTranscodeOperation.m#L782


Thanks again.

Replies

Its definitely something to do with doubles versus floats being imported to spotlight.

Serializing the following array to spotlight works as a multivalue CFNumber entry into spotlight:

[self xattrsetPlist:@[ @(FLT_MAX), @(FLT_MAX - 1.0), @(FLT_MAX - 2.0), @(FLT_MAX - 3.0), @(FLT_MAX - 4.0), @(FLT_MAX - 5.0), @(FLT_MAX - 6.0)]
                                       forKey:@"info_v002_synopsis_dominant_colors"];


Produces the following mdimport :


mdimport -d2  /Users/vade/Documents/Repositories/Synopsis/Test\ Suite/Test\ Colors.mov_transcoded_analyzed.mov
  ":EA:info_v002_synopsis_descriptors" =     (
        Yellow,
        Purple,
        Purple,
        Orange,
        White
    );
    ":EA:info_v002_synopsis_dominant_colors" =     (
        "3.402823e+38",
        "3.402823466385289e+38",
        "3.402823466385289e+38",
        "3.402823466385289e+38",
        "3.402823466385289e+38",
        "3.402823466385289e+38",
        "3.402823466385289e+38"
    );


Yet produces the following mdls:


mdls /Users/vade/Documents/Repositories/Synopsis/Test\ Suite/Test\ Colors.mov_transcoded_analyzed.mov
_kMDItemOwnerUserID                = 501
info_v002_synopsis_descriptors     = (
    Yellow,
    Purple,
    Purple,
    Orange,
    White
)
info_v002_synopsis_dominant_colors = (
    "3.402823e+38"
)


Compare however to the same exact code path but with doubles:


[self xattrsetPlist:@[ @(DBL_MAX), @(DBL_MAX - 1.0), @(DBL_MAX - 2.0), @(DBL_MAX - 3.0), @(DBL_MAX - 4.0), @(DBL_MAX - 5.0), @(DBL_MAX - 6.0)]
                                       forKey:@"info_v002_synopsis_dominant_colors"];


MDimport:

Attributes: {
    ":EA:info_v002_synopsis_descriptors" =     (
        Yellow,
        Purple,
        Purple,
        Orange,
        White
    );
    ":EA:info_v002_synopsis_dominant_colors" =     (
        "1.797693134862316e+308",
        "1.797693134862316e+308",
        "1.797693134862316e+308",
        "1.797693134862316e+308",
        "1.797693134862316e+308",
        "1.797693134862316e+308",
        "1.797693134862316e+308"
    );

And finally a working mdls:



mdls /Users/vade/Documents/Repositories/Synopsis/Test\ Suite/Test\ Colors.mov_transcoded_analyzed.mov
_kMDItemOwnerUserID                = 501
info_v002_synopsis_descriptors     = (
    Yellow,
    Purple,
    Purple,
    Orange,
    White
)
info_v002_synopsis_dominant_colors = (
    "1.797693134862316e+308",
    "1.797693134862316e+308",
    "1.797693134862316e+308",
    "1.797693134862316e+308",
    "1.797693134862316e+308",
    "1.797693134862316e+308",
    "1.797693134862316e+308"
)


So as far as I can tell something with actual indexing is borked.


Can anyone comment on this?

Also, for what its worth, INTs seem to work fine as multivalued CFNumbers in spotlight metadata:



mdls /Users/vade/Documents/Repositories/Synopsis/Test\ Suite/Test\ Colors.mov_transcoded_analyzed.mov
_kMDItemOwnerUserID                = 501
info_v002_synopsis_descriptors     = (
    Yellow,
    Purple,
    Purple,
    Orange,
    White
)
info_v002_synopsis_dominant_colors = (
    2147483647,
    2147483646,
    2147483645,
    2147483644,
    2147483643,
    2147483642,
    2147483641
)


but having 'mixed' value NSNumbers in the NSArray that is serialized messing things up :



                [self xattrsetPlist:@[ @(INT_MAX), @(INT_MAX - 1), @(FLT_MAX - 2.0), @(DBL_MAX - 3.0), @(INT_MAX - 4), @(FLT_MAX - 5.0), @(DBL_MAX - 6.0)]
                                       forKey:@"info_v002_synopsis_dominant_colors"];


mdls /Users/vade/Documents/Repositories/Synopsis/Test\ Suite/Test\ Colors.mov_transcoded_analyzed.mov
_kMDItemOwnerUserID                = 501
info_v002_synopsis_descriptors     = (
    Yellow,
    Purple,
    Purple,
    Orange,
    White
)
info_v002_synopsis_dominant_colors = (
    2147483647,
    2147483646,
    2147483643
)


Note that only the INT was saved, (the first data type).