Machine learning: intel and mit search for similarities in code

Machine learning: intel and MIT search for similarities in code

Intel and the massachusetts institute of technology (mit), together with the georgia institute of technology (georgia tech), have unveiled machine inferred code similarity (misim) – the engine automatically captures the structure of code by examining passages of code presented to it for syntactic similarities and differences from other code with similar behavior.

Detecting intentions of algorithms by comparing structures

The tool is part of a machine programming (mp) system under development by intel. Apparently, mp can use it to see what purpose an analyzed piece of code serves and what an algorithm’s intent is. The goal, according to intel, is that the tool will be able to provide automated code suggestions to developers in complex environments, especially for cross-system and cross-architecture software development and for troubleshooting existing code.

Machine programming is a term coined by intel and mit. Code similarity, the finding of similarities in code, is apparently a key technology on the way to the automation of software development processes. To build accurate code similarity systems, meanwhile, is still considered a hurdle. Under the hood, systems such as misim try to test the similarity of two code snippets at a time and see if they have similar characteristics to achieve similar goals. According to the vendor, misim is able to determine whether two passages of code each trigger a similar computational process, even with different algorithms and data structures.

Under the hood: context-aware semantic structure (cass)

Misim is neither the first nor the only code-similarity program. One difference to existing systems is apparently its context-aware semantic structure (cass), which users can configure for specific contexts. This allows the system to obtain meta-level information that provides more specific insight into the nature of the code being analyzed. The code similarity program does not need to convert human-readable code into computer-executable code, so no compiler is required. Apparently, the system can also already run incomplete code snippets and suggest additions, which can be useful when fixing problems.

After implementation in cass, neural network systems rate the code passages for similarity, based on the performance of their assigned tasks. Thus, two code snippets can look quite different, but receive a similar rating if they serve the same purpose. Misim is said to detect similarities in code about 40 times faster than other models.

Test run for intel’s own architectures

Misim now leaves the research level and enters the preliminary phase for a demo version. Long-term goal, as announced in company blog, is for misim to become a recommendation engine that helps software developers program across architectures. Development is initially concentrated on the architectures offered by intel itself.

More information can be found in the announcement on the intel blog. A research paper on misim is also available from intel labs, mit and georgia tech, which interested parties can read for free.

Leave a Reply

Your email address will not be published.