WHY THIS MATTERS IN BRIEF
By figuring out what data sets cybersecurity are training their AI’s on hackers have found a way to create malware that can’t be detected.
Walking around the exhibition floor at this week’s massive Black Hat cybersecurity conference in Las Vegas many people were struck by the number of companies boasting about how they are using Artificial Intelligence (AI) and machine learning to help make the world a safer place and counter today’s, and tomorrow’s, cyber security threats.
But while everyone and their dog looks like they’re embracing AI there are an increasing number of experts who are worried that cybersecurity vendors aren’t paying enough attention to the risks associated with relying heavily on these technologies.
“What’s happening is a little concerning, and in some cases even dangerous,” warns Raffael Marty of security firm Forcepoint.
The security industry’s hunger for algorithms is understandable. It’s facing a tsunami of cyberattacks just as the number of devices being hooked up to the internet is exploding. At the same time, there’s a massive shortage of skilled cyber workers. Using machine learning and AI to help automate threat detection and response can ease the burden on employees, and potentially help identify threats more efficiently than other software-driven approaches. But Marty and some others speaking at Black Hat say plenty of firms are now rolling out AI based products because they feel they have to in order to get an audience with customers who have bought into the AI hype cycle. And there’s a danger that they will overlook ways in which the machine learning algorithms could lull customers into a false sense of security.
Many products being rolled out involve something called “Supervised learning,” which requires firms to choose and label data sets that algorithms are trained on, for instance, by tagging code that’s malware and code that is clean.
Marty says that one risk is that in rushing to get their products to market, companies use training information that hasn’t been thoroughly scrubbed of anomalous data points, and that could lead to the algorithm missing some attacks. Another is that hackers who get access to a security firm’s systems could corrupt data by switching labels so that some malware examples are tagged as clean code. And the bad guys don’t even need to tamper with the data, instead, they could work out the features of code that a model is using to flag malware and then remove these from their own malicious code so the algorithm doesn’t catch it.
In a session at the conference, Holly Stewart and Jugal Parikh of Microsoft also flagged the risk of over reliance on a single, master algorithm to drive a security system. The danger is that if that algorithm is compromised, there’s no other signal that would flag a problem with it.
To help guard against this, Microsoft’s Windows Defender threat protection service uses a diverse set of algorithms with different training data sets and features. So if one algorithm is hacked, the results from the others, assuming their integrity hasn’t been compromised too, will highlight the anomaly in the first model.
Beyond these issues Marty notes that with some very complex algorithms it can be really difficult to work out why they actually spit out certain answers. This “explainability” issue as he, and many others, call it, can make it hard to assess what’s driving any anomalies that crop up, although there are a number of different companies working on trying to “read the mind” of these black box AI’s with some notable successes from the likes of DeepMind, MIT, Nvidia, and others.
Irrespective of this though everyone agrees that none of this means that AI and machine learning shouldn’t have an important role in a defensive arsenal though. The message from Marty and others is that it’s really important for security companies and their customers to monitor and minimise the risks associated with algorithmic models, and that’s no small challenge given that people with the ideal combination of deep expertise in cybersecurity and in data science are still as rare as a cool day in a Las Vegas summer.