When generating a MLWordEmbedding model, there seems to be some kind of compression happening with the original input vectors.
Just take the example from the documentation:
Larger datasets, like word2vec, seem to work a bit better. But input vectors are still changed in unexpected ways and more vector collisions occur.
I'm curious what spacial properties are expected to hold after compression? Is there some way to tune or disable this?
Thanks!
Just take the example from the documentation:
Code Block swift let vectors = [ "Hello" : [0.0, 1.2, 5.0, 0.0], "Goodbye" : [0.0, 1.3, -6.2, 0.1] ] let embedding = try! MLWordEmbedding(dictionary: vectors) embedding.vector(for: "Hello") == vectors["Hello"] // false embedding.vector(for: "Goodbye") == vectors["Goodbye"] // false // unexpectedly compressed to same vector embedding.vector(for: "Hello") == embedding.vector(for: "Goodbye") // true embedding.distance(between: "Hello", and: "Goodbye") // 0
Larger datasets, like word2vec, seem to work a bit better. But input vectors are still changed in unexpected ways and more vector collisions occur.
I'm curious what spacial properties are expected to hold after compression? Is there some way to tune or disable this?
Thanks!