How do I extract HTML table data from a string with regex builder

I am attempting to learn regex builder. I scape data from a web site, pull out just the table rows <tr> and table data <td> and place them into a string. I have attempted to extract table data with regex builder with no success. For testing I placed 3 scrapped table rows into a multi line string and apply my regex pattern in a for loop. It does not appear to find any matches. I am clearly overlooking something. Below is my code:

func GetHTMLTableData() {
    
    let stringData = """
<tr class=BdT Bdc($seperatorColor) Ta(end) Fz(s) Whs(nw)><td class=Py(10px) Ta(start) Pend(10px)><span>Jun 30, 2023</span></td><td class=Py(10px) Pstart(10px)><span>405.40</span></td><td class=Py(10px) Pstart(10px)><span>408.22</span></td><td class=Py(10px) Pstart(10px)><span>405.29</span></td><td class=Py(10px) Pstart(10px)><span>407.28</span></td><td class=Py(10px) Pstart(10px)><span>407.28</span></td><td class=Py(10px) Pstart(10px)><span>5,160,100</span></td></tr>
<tr class=BdT Bdc($seperatorColor) Ta(end) Fz(s) Whs(nw)><td class=Py(10px) Ta(start) Pend(10px)><span>Jun 29, 2023</span></td><td class=Py(10px) Pstart(10px)><span>400.60</span></td><td class=Py(10px) Pstart(10px)><span>402.67</span></td><td class=Py(10px) Pstart(10px)><span>400.19</span></td><td class=Py(10px) Pstart(10px)><span>402.51</span></td><td class=Py(10px) Pstart(10px)><span>402.51</span></td><td class=Py(10px) Pstart(10px)><span>3,914,800</span></td></tr>
<tr class=BdT Bdc($seperatorColor) Ta(end) Fz(s) Whs(nw)><td class=Py(10px) Ta(start) Pend(10px)><span>Jun 28, 2023</span></td><td class=Py(10px) Pstart(10px)><span>401.35</span></td><td class=Py(10px) Pstart(10px)><span>403.49</span></td><td class=Py(10px) Pstart(10px)><span>400.71</span></td><td class=Py(10px) Pstart(10px)><span>402.55</span></td><td class=Py(10px) Pstart(10px)><span>400.97</span></td><td class=Py(10px) Pstart(10px)><span>4,320,700</span></td></tr>
"""

    let tradingDayPattern = Regex {
        "<tr class=BdT Bdc($seperatorColor) Ta(end) Fz(s) Whs(nw)>"
        "<td class=Py(10px) Ta(start) Pend(10px)><span>"
        Capture(.date(format: "\(month: .abbreviated) \(day: .twoDigits), \(year: .extended(minimumLength: 4))", locale: Locale(identifier: "en_US_POSIX") , timeZone: .gmt))
        "</span></td><td class=Py(10px) Pstart(10px)><span>"
        TryCapture {
            OneOrMore(.digit)
            "."
            Repeat(.digit, count: 2)
          } transform: {
            Double($0)
          }
        "</span></td><td class=Py(10px) Pstart(10px)><span>"
        TryCapture {
            OneOrMore(.digit)
            "."
            Repeat(.digit, count: 2)
          } transform: {
            Double($0)
          }
        "</span></td><td class=Py(10px) Pstart(10px)><span>"
        TryCapture {
            OneOrMore(.digit)
            "."
            Repeat(.digit, count: 2)
          } transform: {
            Double($0)
          }
        "</span></td><td class=Py(10px) Pstart(10px)><span>"
        TryCapture {
            OneOrMore(.digit)
            "."
            Repeat(.digit, count: 2)
          } transform: {
            Double($0)
          }
        "</span></td><td class=Py(10px) Pstart(10px)><span>"
        TryCapture {
            OneOrMore(.digit)
            "."
            Repeat(.digit, count: 2)
          } transform: {
            Double($0)
          }
        "</span></td><td class=Py(10px) Pstart(10px)><span>"
        TryCapture {
            OneOrMore(.digit)
            ","
            Repeat(.digit, count: 3)
            ","
            Repeat(.digit, count: 3)
        } transform: {
            Int($0)
        }
        "</span></td></tr>"
    }
    for match in stringData.matches(of: tradingDayPattern) {
        let (line, date, open, high, low, close, adjClose, volume ) = match.output
        print("\(date) - \(close)")
    }
}
Answered by ChrisMH in 758012022

After some additional research and testing I discovered that the error in my RegEx Builder was related to attempting to capture the last number in the pattern (x,***,***). So I broke it out into 3 captures and then perform some math in the for loop. There is probably a better way but I have not yet figured it out. In addition, thanks to esilks input, I removed the line feeds from the multi line variable and converted it to a regular string. Clearly a better way. Below is the updated function.

func GetHTMLTableData3()  {
    
    let stringData = "<tr class=BdT Bdc($seperatorColor) Ta(end) Fz(s) Whs(nw)><td class=Py(10px) Ta(start) Pend(10px)><span>Jun 30, 2023</span></td><td class=Py(10px) Pstart(10px)><span>405.40</span></td><td class=Py(10px) Pstart(10px)><span>408.22</span></td><td class=Py(10px) Pstart(10px)><span>405.29</span></td><td class=Py(10px) Pstart(10px)><span>407.28</span></td><td class=Py(10px) Pstart(10px)><span>407.28</span></td><td class=Py(10px) Pstart(10px)><span>5,160,100</span></td></tr><tr class=BdT Bdc($seperatorColor) Ta(end) Fz(s) Whs(nw)><td class=Py(10px) Ta(start) Pend(10px)><span>Jun 29, 2023</span></td><td class=Py(10px) Pstart(10px)><span>400.60</span></td><td class=Py(10px) Pstart(10px)><span>402.67</span></td><td class=Py(10px) Pstart(10px)><span>400.19</span></td><td class=Py(10px) Pstart(10px)><span>402.51</span></td><td class=Py(10px) Pstart(10px)><span>402.51</span></td><td class=Py(10px) Pstart(10px)><span>3,914,800</span></td></tr><tr class=BdT Bdc($seperatorColor) Ta(end) Fz(s) Whs(nw)><td class=Py(10px) Ta(start) Pend(10px)><span>Jun 28, 2023</span></td><td class=Py(10px) Pstart(10px)><span>401.35</span></td><td class=Py(10px) Pstart(10px)><span>403.49</span></td><td class=Py(10px) Pstart(10px)><span>400.71</span></td><td class=Py(10px) Pstart(10px)><span>402.55</span></td><td class=Py(10px) Pstart(10px)><span>400.97</span></td><td class=Py(10px) Pstart(10px)><span>4,320,700</span></td></tr>"
    
    let tradingDayPattern = Regex {
        
        "<td class=Py(10px) Ta(start) Pend(10px)><span>"
        Capture(.date(format: "\(month: .abbreviated) \(day: .twoDigits), \(year: .extended(minimumLength: 4))", locale: Locale(identifier: "en_US_POSIX") , timeZone: .gmt))
        "</span></td><td class=Py(10px) Pstart(10px)><span>"
        
        TryCapture {
            OneOrMore(.digit)
            "."
            Repeat(.digit, count: 2)
        } transform: {
            Double($0)
        }
        "</span></td><td class=Py(10px) Pstart(10px)><span>"
        TryCapture {
            OneOrMore(.digit)
            "."
            Repeat(.digit, count: 2)
        } transform: {
            Double($0)
        }
        "</span></td><td class=Py(10px) Pstart(10px)><span>"
        TryCapture {
            OneOrMore(.digit)
            "."
            Repeat(.digit, count: 2)
        } transform: {
            Double($0)
        }
        "</span></td><td class=Py(10px) Pstart(10px)><span>"
        TryCapture {
            OneOrMore(.digit)
            "."
            Repeat(.digit, count: 2)
        } transform: {
            Double($0)
        }
        "</span></td><td class=Py(10px) Pstart(10px)><span>"
        TryCapture {
            OneOrMore(.digit)
            "."
            Repeat(.digit, count: 2)
        } transform: {
            Double($0)
        }
        "</span></td><td class=Py(10px) Pstart(10px)><span>"
        Capture {
            Repeat(.digit, count: 1)
        }transform: {
            Int($0)
        }
        ","
        Capture {
            Repeat(.digit, count: 3)
        }transform: {
            Int($0)
        }
        ","
        Capture {
            Repeat(.digit, count: 3)
        } transform: {
            Int($0)
        }
        "</span>"
    }
    
    for match in stringData.matches(of: tradingDayPattern) {
        let (line, date, open, high, low, close, adjClose, v1, v2, v3) = match.output
        
        let vol1 = v1! * 1000000
        let vol2 = v2! * 1000
        let volume = vol1 + vol2 + v3!
        
        print("\(date) - \(close) - \(volume)")
    }
}

?????? ??? ? wh ???

i dont know

lmao

import Foundation

func getHTMLTableData() {
    
    let stringData = """
<tr class=BdT Bdc($seperatorColor) Ta(end) Fz(s) Whs(nw)><td class=Py(10px) Ta(start) Pend(10px)><span>Jun 30, 2023</span></td><td class=Py(10px) Pstart(10px)><span>405.40</span></td><td class=Py(10px) Pstart(10px)><span>408.22</span></td><td class=Py(10px) Pstart(10px)><span>405.29</span></td><td class=Py(10px) Pstart(10px)><span>407.28</span></td><td class=Py(10px) Pstart(10px)><span>407.28</span></td><td class=Py(10px) Pstart(10px)><span>5,160,100</span></td></tr>
<tr class=BdT Bdc($seperatorColor) Ta(end) Fz(s) Whs(nw)><td class=Py(10px) Ta(start) Pend(10px)><span>Jun 29, 2023</span></td><td class=Py(10px) Pstart(10px)><span>400.60</span></td><td class=Py(10px) Pstart(10px)><span>402.67</span></td><td class=Py(10px) Pstart(10px)><span>400.19</span></td><td class=Py(10px) Pstart(10px)><span>402.51</span></td><td class=Py(10px) Pstart(10px)><span>402.51</span></td><td class=Py(10px) Pstart(10px)><span>3,914,800</span></td></tr>
<tr class=BdT Bdc($seperatorColor) Ta(end) Fz(s) Whs(nw)><td class=Py(10px) Ta(start) Pend(10px)><span>Jun 28, 2023</span></td><td class=Py(10px) Pstart(10px)><span>401.35</span></td><td class=Py(10px) Pstart(10px)><span>403.49</span></td><td class=Py(10px) Pstart(10px)><span>400.71</span></td><td class=Py(10px) Pstart(10px)><span>402.55</span></td><td class=Py(10px) Pstart(10px)><span>400.97</span></td><td class=Py(10px) Pstart(10px)><span>4,320,700</span></td></tr>
"""

    let pattern = "<td class=Py\\(10px\\) Ta\\(start\\) Pend\\(10px\\)><span>(.+?)<\\/span><\\/td><td class=Py\\(10px\\) Pstart\\(10px\\)><span>(.+?)<\\/span><\\/td><td class=Py\\(10px\\) Pstart\\(10px\\)><span>(.+?)<\\/span><\\/td><td class=Py\\(10px\\) Pstart\\(10px\\)><span>(.+?)<\\/span><\\/td><td class=Py\\(10px\\) Pstart\\(10px\\)><span>(.+?)<\\/span><\\/td><td class=Py\\(10px\\) Pstart\\(10px\\)><span>(.+?)<\\/span><\\/td><td class=Py\\(10px\\) Pstart\\(10px\\)><span>(.+?)<\\/span><\\/td>"

    let regex = try! NSRegularExpression(pattern: pattern, options: [])

    let matches = regex.matches(in: stringData, options: [], range: NSRange(location: 0, length: stringData.utf16.count))

    for match in matches {
        let date = (stringData as NSString).substring(with: match.range(at: 1))
        let close = (stringData as NSString).substring(with: match.range(at: 5))
        print("\(date) - \(close)")
    }
}

getHTMLTableData()
Accepted Answer

After some additional research and testing I discovered that the error in my RegEx Builder was related to attempting to capture the last number in the pattern (x,***,***). So I broke it out into 3 captures and then perform some math in the for loop. There is probably a better way but I have not yet figured it out. In addition, thanks to esilks input, I removed the line feeds from the multi line variable and converted it to a regular string. Clearly a better way. Below is the updated function.

func GetHTMLTableData3()  {
    
    let stringData = "<tr class=BdT Bdc($seperatorColor) Ta(end) Fz(s) Whs(nw)><td class=Py(10px) Ta(start) Pend(10px)><span>Jun 30, 2023</span></td><td class=Py(10px) Pstart(10px)><span>405.40</span></td><td class=Py(10px) Pstart(10px)><span>408.22</span></td><td class=Py(10px) Pstart(10px)><span>405.29</span></td><td class=Py(10px) Pstart(10px)><span>407.28</span></td><td class=Py(10px) Pstart(10px)><span>407.28</span></td><td class=Py(10px) Pstart(10px)><span>5,160,100</span></td></tr><tr class=BdT Bdc($seperatorColor) Ta(end) Fz(s) Whs(nw)><td class=Py(10px) Ta(start) Pend(10px)><span>Jun 29, 2023</span></td><td class=Py(10px) Pstart(10px)><span>400.60</span></td><td class=Py(10px) Pstart(10px)><span>402.67</span></td><td class=Py(10px) Pstart(10px)><span>400.19</span></td><td class=Py(10px) Pstart(10px)><span>402.51</span></td><td class=Py(10px) Pstart(10px)><span>402.51</span></td><td class=Py(10px) Pstart(10px)><span>3,914,800</span></td></tr><tr class=BdT Bdc($seperatorColor) Ta(end) Fz(s) Whs(nw)><td class=Py(10px) Ta(start) Pend(10px)><span>Jun 28, 2023</span></td><td class=Py(10px) Pstart(10px)><span>401.35</span></td><td class=Py(10px) Pstart(10px)><span>403.49</span></td><td class=Py(10px) Pstart(10px)><span>400.71</span></td><td class=Py(10px) Pstart(10px)><span>402.55</span></td><td class=Py(10px) Pstart(10px)><span>400.97</span></td><td class=Py(10px) Pstart(10px)><span>4,320,700</span></td></tr>"
    
    let tradingDayPattern = Regex {
        
        "<td class=Py(10px) Ta(start) Pend(10px)><span>"
        Capture(.date(format: "\(month: .abbreviated) \(day: .twoDigits), \(year: .extended(minimumLength: 4))", locale: Locale(identifier: "en_US_POSIX") , timeZone: .gmt))
        "</span></td><td class=Py(10px) Pstart(10px)><span>"
        
        TryCapture {
            OneOrMore(.digit)
            "."
            Repeat(.digit, count: 2)
        } transform: {
            Double($0)
        }
        "</span></td><td class=Py(10px) Pstart(10px)><span>"
        TryCapture {
            OneOrMore(.digit)
            "."
            Repeat(.digit, count: 2)
        } transform: {
            Double($0)
        }
        "</span></td><td class=Py(10px) Pstart(10px)><span>"
        TryCapture {
            OneOrMore(.digit)
            "."
            Repeat(.digit, count: 2)
        } transform: {
            Double($0)
        }
        "</span></td><td class=Py(10px) Pstart(10px)><span>"
        TryCapture {
            OneOrMore(.digit)
            "."
            Repeat(.digit, count: 2)
        } transform: {
            Double($0)
        }
        "</span></td><td class=Py(10px) Pstart(10px)><span>"
        TryCapture {
            OneOrMore(.digit)
            "."
            Repeat(.digit, count: 2)
        } transform: {
            Double($0)
        }
        "</span></td><td class=Py(10px) Pstart(10px)><span>"
        Capture {
            Repeat(.digit, count: 1)
        }transform: {
            Int($0)
        }
        ","
        Capture {
            Repeat(.digit, count: 3)
        }transform: {
            Int($0)
        }
        ","
        Capture {
            Repeat(.digit, count: 3)
        } transform: {
            Int($0)
        }
        "</span>"
    }
    
    for match in stringData.matches(of: tradingDayPattern) {
        let (line, date, open, high, low, close, adjClose, v1, v2, v3) = match.output
        
        let vol1 = v1! * 1000000
        let vol2 = v2! * 1000
        let volume = vol1 + vol2 + v3!
        
        print("\(date) - \(close) - \(volume)")
    }
}
How do I extract HTML table data from a string with regex builder
 
 
Q