Detecting malware through Machine Learning

Hello, I wanted to hear some opinions on this problem I want to tackle. Currently at my job we have an Endpoint Security sysext app (swift) deployed on 10k+ macs and we are using a custom rule engine we developed to run some rules on the events received by the app. These rules are downloaded by the app.

This works great but we wanted to dive into the world of ML and try to use it to detect more complex malware that may be more difficult to detect using rules.

We thought of two options to approach this:

  1. Periodically collect events from all macs and send them to an api to be stored somewhere and perform the training in the cloud.
  2. Somehow, maybe using the ML frameworks provided in Swift, train the model IN the device rather than in the cloud.

I know this is a very broad question but I just wanted to hear some suggestions.

Thanks in advance.

That supposes you can instruct the training that "this is a malware" to associate to the symptoms. Are you able to do this ?

This in general is a hard problem, but I recommend partnering with a University that has a strong info security and computer science program. Crowdstrike has a very good system for detecting these "abnormal behavior" events at scale.

You will need to ingest a large amount of data, to a central log server that collects all of the MacOS logs.

With a machine learning approach, you will need a large list of anomalies

You can also try things like an "artificial ignorance" approach, where you alert on the first time a user runs a new program. But it will be hard to filter out "bad software install" compared with regular installs.

Detecting malware through Machine Learning
 
 
Q