"Researchers Find Bugs Using Single-Codebase Inconsistencies"
A research team at Northeastern University finds code defects and some vulnerabilities by detecting inconsistent programming in which programmers use different code snippets to implement the same functions. The researchers used Machine Learning (ML) to find bugs by identifying code snippets that implement the same functionality and then comparing the code to find inconsistencies. The project, titled "Functionally-similar yet Inconsistent Code Snippets" (FICS), discovered 22 new and unique bugs by analyzing QEMU, OpenSSL, and three other open-source projects. This research aims not to replace other forms of static analysis but to provide another way for developers to analyze their code and find potential bugs. Other approaches to static analysis are required to have already encountered an issue or be given a rule in order to recognize a pattern. The ML techniques used in this research find functionally similar code that is implemented differently or inconsistently rather than matches to know vulnerability patterns. The team used two types of unsupervised clustering, which refers to the organization of data with similar features into groupings by the ML system. They transformed code into functional constructs to cluster parts of a program's code based on their functionality. Then they compared code in the same clusters and applied ML to group them based on implementation. If a code snippet makes up most implementations in a particular functional cluster, then it is considered the correct coding method. This article continues to discuss the aim, techniques, and capabilities of the FICS system, as well as the problem of false positives faced by this system.
Dark Reading reports "Researchers Find Bugs Using Single-Codebase Inconsistencies"