Intel engineers, and academics from MIT and Georgia Tech, have built a neural network that predicts whether two snippets of code intend to achieve the same aim even if they’re written differently.
Thus, you show it two routines and it should be able to figure out whether or not they were designed to do the same thing regardless of their implementation. This AI system is a stepping stone to developing a recommendation engine that we imagine might work like this: the network is trained on a vast library of tried-and-tested, and optimized, algorithms. If it spots in your application source code a function doing similar work to one of the routines it was trained to identify, it will point out that your function could be replaced by, or adapted from, the optimized routine in the library.
This could, in theory, reduce the number of bugs and improve performance as blocks of code are replaced with known-good tested ones. The AI could also be used for “automated construction of software tests and defect mitigation,” according to its makers.
Programmers! Close the StackOverflow tabs. This AI robot will write your source code for you
For now, the system, dubbed Machine Inferred Code Similarity, or MISIM for short, is more of a research project rather than a usable tool. It was taught to recognize code similar to algorithms in a dataset of tens of thousands of C and C++ programs. These programs were written by students tackling 104 coding problems. Pairs of programs were labelled as similar in the dataset if they both solved the same problem.
After training, the neural network’s makers tested their system’s ability to figure out whether two functions were similar, in terms of outcome. It’s not a simple case of just matching up the code line by line; each routine can be written differently but still carry out the same task. MISIM, essentially, spits out a percentage to describe if two blocks of code are alike, operations-wise.
“MISIM provides a neural-based code similarity scoring algorithm, which can be implemented with various neural-based network architectures with learned parameters,” Justin Gottschlich, principal scientist and director of Machine Programming Research at Intel, told The Register.
“For MISIM’s scoring algorithm, researchers investigated three neural network approaches: a graph neural network (GNN), a recurrent neural network (RNN), and a bag of manual features (BoF) neural network. MISM performed best overall using the GNN.”
The GNN was able to accurately identify how similar two programs were more than 75 per cent of the time, according to a paper written by the team and shared via arXiv. Now, Intel is working to sculpt the MISIM system into a code recommendation engine.
This type of system would be able to recognize the intent behind a simple algorithm input by a developer
Chipzilla said in a statement: “This type of system would be able to recognize the intent behind a simple algorithm input by a developer and offer candidate codes that are semantically similar but with improved performance.” That is pretty much Intel through and through: most of its engineers are software developers working on tools to speed up code on its silicon.
“In order to develop future automated coding tools, such as a code recommendation engine, the system needs to understand which pieces of code are similar to another,” Gottschlich told us. “Without this foundation, it cannot make accurate recommendations. MISM serves as the first step to building these bigger, more complex systems.”
He hopes such an engine will eventually help developers by, for instance, automatically detecting common programming mistakes and design mistakes in their code. “I imagine most developers would happily let the machine find and fix bugs for them, if it could – I know I would,” he added. ®