Hi
How to check an array has same string elements
Hi
How to check an array has same string elements
Thanks Quinn, that was a very instructive discussion for me.
Does all this reply to your initial question ? If so, thanks to close the thread by marking the correct answer.
You mean, you want to check if a string is twice in it ?
like myArray = ["hello", "you", "hello"]
This is not the cleanest, but you can do this
let myArray = ["hello", "you", "hello"]
for (i, item) in myArray.enumerated() {
var newArray = myArray
newArray.remove(at: i)
if newArray.contains(item) {
print(item, "duplicate at ", i)
}
}
You get
hello duplicate at 0
hello duplicate at 2
If you just want to detect duplicates you can use this:
func hasDuplicateElements(_ a: [String]) -> Bool {
return Set(a).count != a.count
}
print(hasDuplicateElements(["Hello", "Cruel", "World!"])) // -> false
print(hasDuplicateElements(["Hello", "Cruel", "Hello"])) // -> true
It’s not the most efficient, but it’s super easy.
If you want to know the original and duplicate indexes, try this:
func indexesOfFirstDuplicate(_ a: [String]) -> (Int, Int)? {
var indexByString: [String:Int] = [:]
for (i, s) in zip(a.indices, a) {
if let original = indexByString[s] {
return (original, i)
} else {
indexByString[s] = i
}
}
return nil
}
That can even be made generic:
func indexesOfFirstDuplicate<C>(_ c: C) -> (C.Index, C.Index)? where
C : Collection,
C.Element : Hashable
{
var indexByElement: [C.Element:C.Index] = [:]
for (i, e) in zip(c.indices, c) {
if let original = indexByElement[e] {
return (original, i)
} else {
indexByElement[e] = i
}
}
return nil
}
These works on any collection where the elements are
Hashable
, that requirement being critical for performance.
Share and Enjoy
—
Quinn “The Eskimo!”
Apple Developer Relations, Developer Technical Support, Core OS/Hardware
let myEmail = "eskimo" + "1" + "@apple.com"
You can make it a bit nicer
typealias Duplicate = (String, [Int])
let myArray = ["hello", "you", "hello", "are", "you", "here"]
var listOfDuplicates = [Duplicate]()
for (i, item) in myArray.enumerated() {
var newArray = myArray
newArray.remove(at: i)
if newArray.contains(item) {
let toAppend = (item, [i])
listOfDuplicates.append(toAppend)
}
}
let allValues = Dictionary(listOfDuplicates, uniquingKeysWith: { $0 + $1 })
if allValues.count == 0 {
print("No Duplicate")
} else {
let duplicatesCount = allValues.count
print(duplicatesCount, "Duplicate\(duplicatesCount > 1 ? "s": ""):")
for (key, value) in allValues {
print(" ", key, "at positions", value)
}
}
And get
2 Duplicates:
hello at positions [0, 2, 6]
you at positions [1, 4]
Of course, you may start counting from 1 and not zero, by changing:
if newArray.contains(item) {
let toAppend = (item, [i+1]) // Will start from 1 to count of array
listOfDuplicates.append(toAppend)
}
I haven’t looked at your code in detail but I did notice that you’ve fallen foul of one common pitfall. Specifically, this code:
for (i, item) in myArray.enumerated() {
var newArray = myArray
newArray.remove(at: i)
…
}
assumes that
enumerated
returns array indexes, which is not the case. Rather,
enumerated
returns an offset from the start of the sequence, so the first value you get back will always be 0. That’s fine if you’re working with an array, but it causes problems if you deal with an array slice. This is particularly insidious when you deal with a type, like
Data
, which is its own slice type.
Consider this snippet:
let d = "Hello Cruel World!".data(using: .utf8)!
let dSub = d[6..<11]
for (o, e) in dSub.enumerated() {
print(o)
print(e)
print(dSub[o])
}
which crashes on line 6 because
dSub.startIndex
is 6, not 0.
If you want to get a sequence of index and element pairs, using
zip(a.indices, a)
not
a.enumerated()
.
Share and Enjoy
—
Quinn “The Eskimo!”
Apple Developer Relations, Developer Technical Support, Core OS/Hardware
let myEmail = "eskimo" + "1" + "@apple.com"
In the following sequence
let myArray = ["hello", "you", "hello", "are", "you", "here", "hello"]
var listOfDuplicates = [Duplicate]()
for (i, item) in myArray.enumerated() {
myArray is [String], i is Int and item is String
In the dsub case, dsub is not a [String]
So, I agree, the code may not work with any collection, but that was not the purpose, was it ?
So, I agree, the code may not work with any collection, but that was not the purpose, was it ?
I’m not sure what your purpose is. My goal is to try to prevent individual folks from falling into this pitfall, on the hope of encouraging herd immunity, and thus I challenge this assumption every time I see it.
I want to stress that this problem shows up even when you’re not specifically writing generic code. Consider this snippet:
var a = ["Hello", "Cruel", "World!"]
for (i, e) in a.enumerated() {
if e.hasPrefix("C") {
a[i] = e.uppercased()
}
}
print(a) // -> ["Hello", "CRUEL", "World!"]
Cool beans! Then, later on, you decide you don’t want to change the first word, so you add a
dropFirst()
.
var a = ["Hello", "Cruel", "World!"]
for (i, e) in a.dropFirst().enumerated() {
if e.hasPrefix("C") {
a[i] = e.uppercased()
}
}
print(a) // -> ["CRUEL", "Cruel", "World!"]
Whoops!
Personally I’d be in favour of removing
enumerated()
entirely, and thus making folks write
zip(0..., a)
or
zip(a.indices, a)
, but I vaguely recall this being thrashed out on Swift Evolution and not getting any traction.
Share and Enjoy
—
Quinn “The Eskimo!”
Apple Developer Relations, Developer Technical Support, Core OS/Hardware
let myEmail = "eskimo" + "1" + "@apple.com"
Thanks Quinn for the detailed explanation. I did not think of this situation.
Unfortunately, enumerated is widely described in existing doc, despite the pitfalls.
But don't you think that it is the slice implementation itself which is extremely dangerous ?
It is intended for internal optimization (so far so good), but the outside effect is often surprising, non natural and dangerous.
Consider:
var a = ["Hello", "Cruel", "World!"]
var b = a.dropFirst()
for (i, e) in b.enumerated() {
print(i, e)
if e.hasPrefix("C") {
print("Modify", a[i], i, e)
b[i] = e.uppercased()
}
}
Get
0 Cruel
Modify Hello 0 Cruel
Fatal error: Index out of bounds
and finally a crash
So, why doesn't enumarated provide the "right" index in this case ? It should return 1 Cruel and not 0 Cruel. It is totally misleading. Should we consider this a Swift design flaw to be corrected ?
Now, if I "just" make it an array (which is what I'd natural think drop is doing
var a = ["Hello", "Cruel", "World!"]
var b = Array(a.dropFirst())
for (i, e) in b.enumerated() {
print(i, e)
if e.hasPrefix("C") {
print("Modify", a[i], i, e)
b[i] = e.uppercased()
}
}
print(a, b)
I get the "expected" result
0 Cruel
Modify Hello 0 Cruel
1 World!
["Hello", "Cruel", "World!"] ["CRUEL", "World!"]
But don't you think that it is the slice implementation itself which is extremely dangerous ?
I agree that it’s a potential pitfall, yes. However, it also has significant benefits. Specifically, it decreases the ‘weight’ of slice types, making them both easier to implement and faster. This is especially relevant for collections where the indexes are not simple integers.
So, why doesn't
provide the "right" index in this case?enumerated
Because that’s not what it’s specified to do. The documentation for
enumerated
is really clear about what it actually does, the only problem being that this behaviour doesn’t align with your expectations )-: You’re not alone in that regard, which is why I waded into this thread in the first place!
Should we consider this a Swift design flaw to be corrected?
I can only speak for myself here, and I’ve already outlined my personal opinion about this in my last post. If you want to see a change then you need to drive that via Swift Evolution, and I’m only a spectator over there.
Share and Enjoy
—
Quinn “The Eskimo!”
Apple Developer Relations, Developer Technical Support, Core OS/Hardware
let myEmail = "eskimo" + "1" + "@apple.com"
Thanks Quinn, that was a very instructive discussion for me.
Does all this reply to your initial question ? If so, thanks to close the thread by marking the correct answer.
I posted on https://forums.swift.org/t/removing-enumerated/5050/42
That restarted a discussion that was stopped 2 years ago…