WHY THIS MATTERS IN BRIEF
The digital world is reliant on software that contains bugs and security vulnerabilities that all need identifying and patching, this new AI automates the task.
I’ve written extensively before about a new breed of autonomous Artificial Intelligence (AI) powered Robo-Hacker that can scan and discover bugs and vulnerabilities in millions of lines of software within minutes, not the years it takes human experts, and then exploit them or patch them – the same systems that are now being used by the US Pentagon to defend their mission critical systems from attack, and help Blackberry and IBM find bugs in self-driving car software.
Trying to identify bugs in software is not only a tiring task for companies but it’s also an increasingly lucrative business as more and more companies pay people big sums of money to discover bugs in their software.
Now though that might all start to change after Microsoft claims its developed a system that correctly distinguishes between security and non-security software bugs 99 percent of the time, and that accurately identifies critical, high-priority security bugs on average 97 percent of the time. The company also announced that in the coming months it plans to open-source the methodology on GitHub along with example models and other resources.
The work suggests that the system, which was trained on a data set of 13 million work items and bugs from 47,000 developers at Microsoft stored across AzureDevOps and GitHub repositories, could be used to support human experts. Coralogix estimates that developers create 70 bugs per 1,000 lines of code and that fixing a bug takes 30 times longer than writing a line of code, w3hich means that as a result in the US alone over $113 billion is spent annually on identifying and fixing product defects.
In the course of architecting the model, Microsoft says that security experts approved the training data and that statistical sampling was used to provide those experts a manageable amount of data to review. The data was then encoded into representations called feature vectors, and Microsoft researchers set about designing the system using a two-step process.
First, the model learned to classify security and non-security bugs, and then it learned to apply severity labels including critical, important, or low-impact, to the security bugs.
Microsoft’s model leverages two techniques to make its bug predictions. The first is a term called Frequency-Inverse Document Frequency Algorithm (TF-IDF), an information retrieval approach that assigns importance to a word based on the number of times it appears in a document and checks how relevant the word is throughout a collection of titles. The second technique, which is a logistic regression model, uses a logistic function to model the probability of a certain class or event existing.
Microsoft says that the model has now been deployed into production internally, and that it’s continually retrained with data approved by security experts who monitor the number of bugs generated in software development.
“Every day, software developers stare down a long list of features and bugs that need to be addressed. Security professionals try to help by using automated tools to prioritize security bugs, but too often, engineers waste time on false positives or miss a critical security vulnerability that has been misclassified,” wrote Microsoft senior security program manager Scott Christiansen and Microsoft data and applied scientist Mayana Pereira in a blog post. “We discovered that by pairing machine learning models with security experts, we can significantly improve the identification and classification of security bugs.”
Microsoft isn’t the only tech giant using AI to weed out software bugs either though. Amazon’s CodeGuru service, which was partly trained on code reviews and apps developed internally at Amazon, spots issues including resource leaks and wasted CPU cycles. As for Facebook, it developed a tool called SapFix that generates fixes for bugs before sending them to human engineers for approval, and another tool called Zoncolan that maps the behaviour and functions of codebases and looks for potential problems in individual branches as well as in the interactions of various paths through the program.