Swift replace exact string (Regex Replace?)

Hello!


For the past two days I've been struggling to find out how to replace exact word in a string using either some of the string functions in XCode or by using Regular Expressions. Is this even possible to accomplish? By I mean by that is for example to create a word censoring filter which prohibits the use of the word 'H e l l', and replaces it with '****'. Unfortunately all the things I have tried, I have ended up replacing words that contains 'H e l l', like in 'Hello'.


Please keep in mind that I am fairly new to Xcode, as I started by journey last week.


Help would be greatly appreciated.

Accepted Reply

I would use Regular Expressions as you mentioned.

(By the way, string functions are included in programming language or its libraries, which are included in SDKs bundled in Xcode. Some functions are only available in OS X SDK, some others only in iOS SDK... As they come with Xcode, you'd better specify -- Xcode version, target SDK and its version, and the language you use. Of course you are expected to be using Swift, if you post an article here.)


NSRegularExpression Class Reference

Assume 'the' is a bad word, and replace the occurrences of it in a string like "These are the bad words." .

var str = "These are the bad words."
let regex = try! NSRegularExpression(pattern: "\\bthe\\b", options: .CaseInsensitive)
str = regex.stringByReplacingMatchesInString(str, options: [], range: NSRange(0..<str.utf16.count), withTemplate: "***")
print(str) //->These are *** bad words.

Replies

Your replace function needs to be able to identify and transform words in the text. This is not trivial, as the concept of words is not trivial, depending on your requirements.


As an example, let's say you want to replace the word "foo" with "bar" in the following string:

let str = "Yeti says 'foo', and bigfoot says 'bar'. What is foo? Foo is a word."


I guess you want the resulting string to be:

let res = "Yeti says 'bar', and bigfoot says 'bar'. What is bar? Bar is a word."


As you already mentioned, the solution must not turn "bigfoot" into "bigbart".


And getting the above result also requires a replace function that is able to deal accordingly with lowercase, uppercase, punctuation etc, so it's not enough to eg just split the string around whitespace and turn it into lowercase.


You might be interested in reading the NSHipster article about NSLinguisticTagger. The NSLinguisticTagger can be used to tokenize the input string, so you get a range for each word, no matter if it has punctuation or whitespace around it.


Regarding tokenizing using eg regular expressions, remember that it's naïve to think that words are made up of a-zA-Z, even if considering only english words, such as naïve or café, this is true for punctuation too, ie does your tokenizer think that façade is one word or that it is two words fa and ade separated by ç?


EDIT: OOPer's solution below, using regex word boundaries (\b) is able to handle the above mentioned cases, and is probably the best/easiest solution if you just want to blank out offending words.

I would use Regular Expressions as you mentioned.

(By the way, string functions are included in programming language or its libraries, which are included in SDKs bundled in Xcode. Some functions are only available in OS X SDK, some others only in iOS SDK... As they come with Xcode, you'd better specify -- Xcode version, target SDK and its version, and the language you use. Of course you are expected to be using Swift, if you post an article here.)


NSRegularExpression Class Reference

Assume 'the' is a bad word, and replace the occurrences of it in a string like "These are the bad words." .

var str = "These are the bad words."
let regex = try! NSRegularExpression(pattern: "\\bthe\\b", options: .CaseInsensitive)
str = regex.stringByReplacingMatchesInString(str, options: [], range: NSRange(0..<str.utf16.count), withTemplate: "***")
print(str) //->These are *** bad words.

Finding words is hard; I strongly recommend that you use Foundation APIs to find the words rather than try to do it with a regular expression.

The easiest option here is

-enumerateSubstringsInRange:options:usingBlock:
. For example, this:
import Foundation

let s: NSString = "Foundation API's are 'cool'!"
s.enumerateSubstringsInRange(NSMakeRange(0, s.length), options: .ByWords) { (substring, substringRange, enclosingRange, _) -> () in
    print(substring)
}

prints this:

Optional("Foundation")
Optional("API\'s")
Optional("are")
Optional("cool")

Note how it handles the apostrophe in “API's” and the quotes in “'cool'” correctly.

If you want to then use a regular expression within each word, that’s cool.

Share and Enjoy

Quinn "The Eskimo!"
Apple Developer Relations, Developer Technical Support, Core OS/Hardware

let myEmail = "eskimo" + "1" + "@apple.com"

Thank you very much. I really appreciate your help! Saved me a whole lot of time!

Thank you for your response! I will definetely look into this later!

Hello again! I came across something: How do I get the regex case sensetive? I tried to put it to both 'options: nil' and 'options .caseSensetive' and '.nil'. Thanks in advance.

Let's see the reference:

- initWithPattern:options:error: of the NSRegularExpression Class Reference

The `options` parameter is an NSRegularExpressionOptions, and it's imported as an OptionSetType in Swift.

As CaseSensitive not found in NSRegularExpressionOptions, as you geussed, you need an empty option.

And in Swift 2, OptionSetType uses Set like notation, so, an empty option is represented by [], the same notation as empty Set.

(Unfortunately, it's also the same as an empty Array, which is sometimes confusing.)


Try using, options: [] .

Thank you very much! Unfortunately, I have stumbled upon a new problem! I have an array of banned words like so:


var bannedWords : [String] = ["bannedWord1", "bannedWord2", "bannedWord3"]


How can I replace multiple words in one code, instead of creating numerous 'Regex'-variables etc.?


Thanks in advance.


I want to be able to put in the banned words like so: bannedWords[1] - replaces with var Chars : String = "banned"

You can find many sites on the web, explaining how to utilize regular expression.

This pattern:

var pattern = "\\b(?:bannedWord1|bannedWord2|bannedWord3)\\b"

is an `OR` matching for three patterns "\\bbannedWord1\\", "\\bbannedWord2\\" and "\\bbannedWord3\\".


You can generate such patterns programatically:

var bannedWords : [String] = ["bannedWord1", "bannedWord2", "bannedWord3"]
let bannedPattern = bannedWords.joinWithSeparator("|")
pattern = "\\b(?:\(bannedPattern))\\b"

(Assuming any of your bannedWords do not contain regex special characters.)


But if you want to replace them with the same number of asterisks to the characters contained in the original word, it's got a little harder to work with NSRegularExpression.

(Finding `characters` is as hard as finding `words`, but I don't mean that.)

Thank you for giving me some more knowledge. However, I want to replace bannedWord1 with replacedWord1, and bannedWord2 with replacedWord2 - all case sensetive. Not sure how I would proceed though..


This is what I believe it will look like, in some way:


       let regex = try! NSRegularExpression(pattern: bannedWord[1] && bannedWord[2] && bannedWord[3], options: [])
      
        str = regex.stringByReplacingMatchesInString(str, options: [], range: NSRange(0..<str.utf16.count), withTemplate: replacedWord[1] && replacedWord[2] && replacedWord[3])

It's another harder to work with NSRegularExpression, but, as the title of this thread contains

Regex Replace

so, let's continue.


First, you need to prepare a data struct containing the info needed for replacement. In this case, Dictionary works well.

var banningDictionary: [String : String] = [
    "bannedWord1" : "replacedWord1",
    "bannedWord2" : "replacedWord2",
    "bannedWord3" : "replacedWord3",
]


Second, you need a pattern string compiled into a regex object.

var bannedWords = banningDictionary.keys
var bannedPattern = bannedWords.joinWithSeparator("|")
var pattern = "\\b(?:\(bannedPattern))\\b"

(The same assumpltion as above.)


Third, this is the hardest part, you need to subclass NSRegularExpression to customize replacing result.

class MyRegularExpression: NSRegularExpression {
    override func replacementStringForResult(result: NSTextCheckingResult, inString string: String, offset: Int, template templ: String) -> String {
        let matchingRange = result.range
        let matchingWord = (string as NSString).substringWithRange(matchingRange)
        if let replacedWord = banningDictionary[matchingWord] {
            return replacedWord
        } else {
            //This may never be executed.
            return "*"
        }
    }
}


Ready, let's do it.

var badSentence = "bannedWord1 bannedWord2 bannedWord3 bannedWord10"
let regex = try! MyRegularExpression(pattern: pattern, options: [])
print(regex.stringByReplacingMatchesInString(badSentence, options: [], range: NSRange(0..<badSentence.utf16.count), withTemplate: ""))
//->replacedWord1 replacedWord2 replacedWord3 bannedWord10

Thank you for a well detailed answer. I guess I will be working with this snippet of code for a long while, trying to set it all up properly - as I am fairly new to Swift. However, if there was an easier way of doing this, please let me know!


Again, many thanks.

I am not sure that you are still on the forum. But this looks like what I need to run the RegEx I am looking for. Do you ahve an example that shows where these code snip's are placed? looking for a working example.