I made a log analyzer which does currently use the SHELL which works fine. Now I wanted to replace it with native Swift code but the result is really slow.
Here is an example with about 100 MB real log data. It needs just a second via SHELL but more than 80 seconds with Swift code.
Is there a way to improve the Swift code?
In SHELL I use
% time (cd /regTest/logs;cat access.log access.log.1 | grep '26\/Apr' | egrep -v '(www.apple.com/go/applebot | www.bing.com/bingbot.htm | www.googlebot.com/bot.html | Xing Bot)' | awk '/GET/ {print $1}' | sort -n | uniq 1>/dev/null)
1,09s user 0,05s system 105% cpu 1,081 total
Here is the result of my Swift test:
% time ./regTest
Found 1813 lines.
82,54s user 0,26s system 99% cpu 1:22,83 total
My Swift sample code
import Foundation
import RegexBuilder
guard let fullText = try? String(contentsOf: URL(filePath: "/regTest/logs/access.log")) + String(contentsOf: URL(filePath: "/regTest/logs/access.log.1")) else {
print("Cannot read files!")
exit(1)
}
let yesterdayRegEx = Regex {
Capture {
"26/Apr"
}
}
let botListRegEx = Regex {
Capture {
ChoiceOf {
"www.apple.com/go/applebot"
"www.bing.com/bingbot.htm"
"www.googlebot.com/bot.html"
"Xing Bot"
}
}
}
let dateMatch = fullText.split(separator: "\n")
.filter{ $0.firstMatch(of: yesterdayRegEx) != nil }
.filter{ $0.firstMatch(of: botListRegEx) == nil }
print("Found \(dateMatch.count) lines.")
There was another discussion on https://forums.swift.org/t/use-regexbuilder-als-grep-replacement/65782 where I was able to get much faster code. This code does now run in 3 seconds.
let paths = ["regTest/logs/access.log", "/regTest/logs/access.log.1"]
var startDate = Date()
let matchedLines = try await withThrowingTaskGroup(of: [String].self, body: { group in
for aPath in paths {
group.addTask {
let lines = FileHandle(forReadingAtPath: aPath)!.bytes.lines
.filter { $0.contains("26/Apr") }
.filter { $0.contains("\"GET ") }
.filter{ !$0.contains(botListRegEx) }
var matched : [String] = []
for try await line in lines {
if let m = line.firstMatch(of: ipAtStartRegEx) {
let (s, _) = m.output
matched.append(String(s))
}
}
return matched
}
}
var matchedLines : [String] = []
for try await partialMatchedLines in group {
matchedLines.append(contentsOf: partialMatchedLines)
}
return matchedLines
})
let uniqArray = Set(matchedLines)
print("Match time: \(abs(startDate.timeIntervalSinceNow)) s")
print("Found \(matchedLines.count) lines.")
print("Found \(uniqArray.count) IP adresses.")