Hello All,
I'm developing a machine learning model for image classification, which requires managing an exceptionally large dataset comprising over 18,000 classes. I've encountered several hurdles while using Create ML, and I would appreciate any insights or advice from those who have faced similar challenges.
Current Issues:
Create ML Failures with Large Datasets:
When using Create ML, the process often fails with errors such as "Failed to create CVPixelBufferPool." This issue appears when handling particularly large volumes of data.
Custom Implementation Struggles:
To bypass some of the limitations of Create ML, I've developed a custom solution leveraging the MLImageClassifier within the CreateML framework in my own SwiftUI MacOS app.
Initially I had similar errors as I did in Create ML, but I discovered I could move beyond the "extracting features" stage without crashing by employing a workaround: using a timer to cancel and restart the job every 30 seconds. This method is the only way I've been able to finish the extraction phase, even with large datasets, but it causes many errors in the console if I allow it to run too long.
Lack of Progress Reporting:
Using MLJob<MLImageClassifier>, I've noticed that progress reporting stalls after the feature extraction phase. Although system resources indicate activity, there is no programmatic feedback on what is occurring.
Things I've Tried:
Data Validation: Ensured that all images in the dataset are valid and non-corrupted, which helps prevent unnecessary issues from faulty data.
Custom Implementation with CreateML Framework: Developed a custom solution using the MLImageClassifier within the CreateML framework to gain more control over the training process.
Timer-Based Workaround: Employed a workaround using a timer to cancel and restart the job every 30 seconds to move past the "extracting features" phase, allowing progress even with larger datasets.
Monitoring System Resources: Observed ongoing system resource usage when process feedback stalled, confirming background processing activity despite the lack of progress reporting.
Subset Testing: Successfully created and tested a model on a subset of the data, which validated the approach worked for smaller datasets and could produce a functioning model.
Router Model Concept: Considered training multiple models for different subsets of data and implementing a "router" model to decide which specialized model to utilize based on input characteristics.
What I Need Help With:
Handling Large Datasets:
I'm seeking strategies or best practices for effectively utilizing Create ML with large datasets.
Any guidance on memory management or alternative methodologies would be immensely helpful.
Improving Progress Reporting:
I'm looking for ways to obtain more consistent and programmatic progress updates during the training and testing phases.
I'm working on a Mac M1 Pro w/ 32GB RAM, with Apple Silicon and am fully integrated within the Apple ecosystem. I am very grateful for any advice or experiences you could share to help overcome these challenges.
Thank you!
I've pasted the relevant code below:
func go() {
if self.trainingSession == nil {
self.trainingSession = createTrainingSession()
}
if self.startTime == nil {
self.startTime = Date()
}
job = try! MLImageClassifier.resume(self.trainingSession)
job.phase
.receive(on: RunLoop.main)
.sink { phase in
self.phase = phase
}
.store(in: &cancellables)
job.checkpoints
.receive(on: RunLoop.main)
.sink { checkpoint in
self.state = "\(checkpoint)\n\(self.job.progress)"
self.progress = self.job.progress.fractionCompleted + 0.2
self.updateTimeEstimates()
}
.store(in: &cancellables)
job.result
.receive(on: DispatchQueue.main)
.sink(receiveCompletion: { completion in
switch completion {
case .failure(let error):
print("Training Failed: \(error.localizedDescription)")
case .finished:
print("๐๐๐๐ TRAINING SESSION FINISHED!!!!")
self.trainingFinished = true
}
}, receiveValue: { classifier in
Task {
await self.saveModel(classifier)
}
})
.store(in: &cancellables)
}
private func createTrainingSession() -> MLTrainingSession<MLImageClassifier> {
do {
print("Initializing training Data...")
let trainingData: MLImageClassifier.DataSource = .labeledDirectories(at: trainingDataURL)
let modelParameters = MLImageClassifier.ModelParameters(
validation: .split(strategy: .automatic),
augmentation: self.augmentations,
algorithm: .transferLearning(
featureExtractor: .scenePrint(revision: 2),
classifier: .logisticRegressor
)
)
let sessionParameters = MLTrainingSessionParameters(
sessionDirectory: self.sessionDirectoryURL,
reportInterval: 1,
checkpointInterval: 100,
iterations: self.numberOfIterations
)
print("Initializing training session...")
let trainingSession: MLTrainingSession<MLImageClassifier>
if FileManager.default.fileExists(atPath: self.sessionDirectoryURL.path) && isSessionCreated(atPath: self.sessionDirectoryURL.path()) {
do {
trainingSession = try MLImageClassifier.restoreTrainingSession(sessionParameters: sessionParameters)
}
catch {
print("error resuming, exiting.... \(error.localizedDescription)")
fatalError()
}
}
else {
trainingSession = try MLImageClassifier.makeTrainingSession(
trainingData: trainingData,
parameters: modelParameters,
sessionParameters: sessionParameters
)
}
return trainingSession
} catch {
print("Failed to initialize training session: \(error.localizedDescription)")
fatalError()
}
}
Post
Replies
Boosts
Views
Activity
Since iOS 13, the colors getting assigned to UIControls don't seem to be the same colors that ultimately end up getting displayed.Here is an example that can be copy & pasted into an xcode playground:import UIKit
import PlaygroundSupport
// https://www.hackingwithswift.com/example-code/uicolor/how-to-convert-a-hex-color-to-a-uicolor
extension UIColor {
public convenience init?(hex hexUser: String) {
let r, g, b, a: CGFloat
let hex = "\(hexUser)FF" // append alpha compnent for ease of example
if hex.hasPrefix("#") {
let start = hex.index(hex.startIndex, offsetBy: 1)
let hexColor = String(hex[start...])
if hexColor.count == 8 {
let scanner = Scanner(string: hexColor)
var hexNumber: UInt64 = 0
if scanner.scanHexInt64(&hexNumber) {
r = CGFloat((hexNumber & 0xff000000) >> 24) / 255
g = CGFloat((hexNumber & 0x00ff0000) >> 16) / 255
b = CGFloat((hexNumber & 0x0000ff00) >> 8) / 255
a = CGFloat(hexNumber & 0x000000ff) / 255
self.init(red: r, green: g, blue: b, alpha: a)
return
}
}
}
return nil
}
}
let backgroundColorHex = "#303b46" // shows up as #38424c in iOS >= 13
let tintColorHex = "#ebdb34" // shows up as #eadb33 in iOS >= 13
let backgroundColor = UIColor(hex: backgroundColorHex)!
let tintColor = UIColor(hex: tintColorHex)!
class MyViewController: UIViewController {
override func loadView() {
let view = UIView()
view.backgroundColor = .white
let segmentedControl = UISegmentedControl(items: ["One", "Two"])
segmentedControl.selectedSegmentTintColor = tintColor
segmentedControl.backgroundColor = backgroundColor
segmentedControl.setTitleTextAttributes([NSAttributedString.Key.foregroundColor: UIColor.white], for: .normal)
segmentedControl.selectedSegmentIndex = 0
view.addSubview(segmentedControl)
segmentedControl.translatesAutoresizingMaskIntoConstraints = false
segmentedControl.centerYAnchor.constraint(equalTo: view.centerYAnchor).isActive = true
segmentedControl.centerXAnchor.constraint(equalTo: view.centerXAnchor).isActive = true
segmentedControl.widthAnchor.constraint(equalToConstant: 200).isActive = true
self.view = view
}
}
PlaygroundPage.current.liveView = MyViewController()For this UISegmentedControl I am setting a background color of "#303b46", but upon inspection I see iOS renders "#38424c". For the selectedSegmentTintColor I am setting "#ebdb34" but iOS renders "#eadb33".SGkx3.pngpzNl8.pngDoes anyone know what mechanisms are in play here for my assigned color to not actually get displayed as assigned? Along with the best way to deal with these inconsistencies when working with design? Thank you.