Array with same string

Hi

How to check an array has same string elements

Accepted Reply

Thanks Quinn, that was a very instructive discussion for me.


@mksmurali

Does all this reply to your initial question ? If so, thanks to close the thread by marking the correct answer.

Replies

You mean, you want to check if a string is twice in it ?

like myArray = ["hello", "you", "hello"]


This is not the cleanest, but you can do this


let myArray = ["hello", "you", "hello"]
for (i, item) in myArray.enumerated() {
    var newArray = myArray
    newArray.remove(at: i)
    if newArray.contains(item) {
        print(item, "duplicate at ", i)
    }
}

You get

hello duplicate at 0

hello duplicate at 2

If you just want to detect duplicates you can use this:

func hasDuplicateElements(_ a: [String]) -> Bool {
    return Set(a).count != a.count
}

print(hasDuplicateElements(["Hello", "Cruel", "World!"]))   // -> false
print(hasDuplicateElements(["Hello", "Cruel", "Hello"]))    // -> true

It’s not the most efficient, but it’s super easy.

If you want to know the original and duplicate indexes, try this:

func indexesOfFirstDuplicate(_ a: [String]) -> (Int, Int)? {
    var indexByString: [String:Int] = [:]
    for (i, s) in zip(a.indices, a) {
        if let original = indexByString[s] {
            return (original, i)
        } else {
            indexByString[s] = i
        }
    }
    return nil
}

That can even be made generic:

func indexesOfFirstDuplicate<C>(_ c: C) -> (C.Index, C.Index)? where
    C : Collection,
    C.Element : Hashable
{
    var indexByElement: [C.Element:C.Index] = [:]
    for (i, e) in zip(c.indices, c) {
        if let original = indexByElement[e] {
            return (original, i)
        } else {
            indexByElement[e] = i
        }
    }
    return nil
}

These works on any collection where the elements are

Hashable
, that requirement being critical for performance.

Share and Enjoy

Quinn “The Eskimo!”
Apple Developer Relations, Developer Technical Support, Core OS/Hardware

let myEmail = "eskimo" + "1" + "@apple.com"

You can make it a bit nicer


typealias Duplicate = (String, [Int])

let myArray = ["hello", "you", "hello", "are", "you", "here"]
var listOfDuplicates = [Duplicate]()
for (i, item) in myArray.enumerated() {
    var newArray = myArray
    newArray.remove(at: i)
    if newArray.contains(item) {
        let toAppend = (item, [i])
        listOfDuplicates.append(toAppend)
    }
}

let allValues = Dictionary(listOfDuplicates, uniquingKeysWith: { $0 + $1 })
if allValues.count == 0 {
    print("No Duplicate")
} else {
    let duplicatesCount = allValues.count
    print(duplicatesCount, "Duplicate\(duplicatesCount > 1 ? "s": ""):")
    for (key, value) in allValues {
        print("  ", key, "at positions", value)
    }
}



And get

2 Duplicates:

hello at positions [0, 2, 6]

you at positions [1, 4]


Of course, you may start counting from 1 and not zero, by changing:

    if newArray.contains(item) {
        let toAppend = (item, [i+1])     // Will start from 1 to count of array
        listOfDuplicates.append(toAppend)
    }

I haven’t looked at your code in detail but I did notice that you’ve fallen foul of one common pitfall. Specifically, this code:

for (i, item) in myArray.enumerated() {  
    var newArray = myArray  
    newArray.remove(at: i)  
    …
}

assumes that

enumerated
returns array indexes, which is not the case. Rather,
enumerated
returns an offset from the start of the sequence, so the first value you get back will always be 0. That’s fine if you’re working with an array, but it causes problems if you deal with an array slice. This is particularly insidious when you deal with a type, like
Data
, which is its own slice type.

Consider this snippet:

let d = "Hello Cruel World!".data(using: .utf8)!
let dSub = d[6..<11]
for (o, e) in dSub.enumerated() {
    print(o)
    print(e)
    print(dSub[o])
}

which crashes on line 6 because

dSub.startIndex
is 6, not 0.

If you want to get a sequence of index and element pairs, using

zip(a.indices, a)
not
a.enumerated()
.

Share and Enjoy

Quinn “The Eskimo!”
Apple Developer Relations, Developer Technical Support, Core OS/Hardware

let myEmail = "eskimo" + "1" + "@apple.com"

In the following sequence


let myArray = ["hello", "you", "hello", "are", "you", "here", "hello"]
var listOfDuplicates = [Duplicate]()
for (i, item) in myArray.enumerated() {

myArray is [String], i is Int and item is String


In the dsub case, dsub is not a [String]


So, I agree, the code may not work with any collection, but that was not the purpose, was it ?

So, I agree, the code may not work with any collection, but that was not the purpose, was it ?

I’m not sure what your purpose is. My goal is to try to prevent individual folks from falling into this pitfall, on the hope of encouraging herd immunity, and thus I challenge this assumption every time I see it.

I want to stress that this problem shows up even when you’re not specifically writing generic code. Consider this snippet:

var a = ["Hello", "Cruel", "World!"]
for (i, e) in a.enumerated() {
    if e.hasPrefix("C") {
        a[i] = e.uppercased()
    }
}
print(a)    // -> ["Hello", "CRUEL", "World!"]

Cool beans! Then, later on, you decide you don’t want to change the first word, so you add a

dropFirst()
.
var a = ["Hello", "Cruel", "World!"]
for (i, e) in a.dropFirst().enumerated() {
    if e.hasPrefix("C") {
        a[i] = e.uppercased()
    }
}
print(a)    // -> ["CRUEL", "Cruel", "World!"]

Whoops!

Personally I’d be in favour of removing

enumerated()
entirely, and thus making folks write
zip(0..., a)
or
zip(a.indices, a)
, but I vaguely recall this being thrashed out on Swift Evolution and not getting any traction.

Share and Enjoy

Quinn “The Eskimo!”
Apple Developer Relations, Developer Technical Support, Core OS/Hardware

let myEmail = "eskimo" + "1" + "@apple.com"

Thanks Quinn for the detailed explanation. I did not think of this situation.

Unfortunately, enumerated is widely described in existing doc, despite the pitfalls.


But don't you think that it is the slice implementation itself which is extremely dangerous ?

It is intended for internal optimization (so far so good), but the outside effect is often surprising, non natural and dangerous.


Consider:

var a = ["Hello", "Cruel", "World!"]
var b = a.dropFirst()
for (i, e) in b.enumerated() {
    print(i, e)
    if e.hasPrefix("C") {
        print("Modify", a[i], i, e)
        b[i] = e.uppercased()
    }
}

Get

0 Cruel

Modify Hello 0 Cruel

Fatal error: Index out of bounds

and finally a crash

So, why doesn't enumarated provide the "right" index in this case ? It should return 1 Cruel and not 0 Cruel. It is totally misleading. Should we consider this a Swift design flaw to be corrected ?


Now, if I "just" make it an array (which is what I'd natural think drop is doing

var a = ["Hello", "Cruel", "World!"]
var b = Array(a.dropFirst())
for (i, e) in b.enumerated() {
    print(i, e)
    if e.hasPrefix("C") {
        print("Modify", a[i], i, e)
        b[i] = e.uppercased()
    }
}
print(a, b)

I get the "expected" result

0 Cruel

Modify Hello 0 Cruel

1 World!

["Hello", "Cruel", "World!"] ["CRUEL", "World!"]

But don't you think that it is the slice implementation itself which is extremely dangerous ?

I agree that it’s a potential pitfall, yes. However, it also has significant benefits. Specifically, it decreases the ‘weight’ of slice types, making them both easier to implement and faster. This is especially relevant for collections where the indexes are not simple integers.

So, why doesn't

enumerated
provide the "right" index in this case?

Because that’s not what it’s specified to do. The documentation for

enumerated
is really clear about what it actually does, the only problem being that this behaviour doesn’t align with your expectations )-: You’re not alone in that regard, which is why I waded into this thread in the first place!

Should we consider this a Swift design flaw to be corrected?

I can only speak for myself here, and I’ve already outlined my personal opinion about this in my last post. If you want to see a change then you need to drive that via Swift Evolution, and I’m only a spectator over there.

Share and Enjoy

Quinn “The Eskimo!”
Apple Developer Relations, Developer Technical Support, Core OS/Hardware

let myEmail = "eskimo" + "1" + "@apple.com"

Thanks Quinn, that was a very instructive discussion for me.


@mksmurali

Does all this reply to your initial question ? If so, thanks to close the thread by marking the correct answer.

I posted on https://forums.swift.org/t/removing-enumerated/5050/42


That restarted a discussion that was stopped 2 years ago…