The present application claims the benefit under 35 U.S.C. § 119 of European Patent Application No. EP 23 20 0608.0 filed on Sep. 28, 2023, which is expressly incorporated herein by reference in its entirety.
Methods for automatically translating program code from a source language to a target language exist.
Translation is carried out manually, for instance, i.e., by a programmer. This procedure is not efficient.
There are also dedicated code translators. However, these are only available for a few source-target language pairs. One problem is that the translations do not guarantee that the code is functional in the target language. Unreadable code may moreover be generated, that has no maintainability, for example, because the translation is mechanistic. Lastly, there is a risk that the dedicated code translator may possibly adopt the idioms of the source language and not adapt them to the target language.
Recently, attempts have been made to use large language models (LLM) or code assistants for code translation. Problems arise, however, because LLMs and code assistants do not provide guarantees for the correctness of the translated code.
A first general aspect of the present invention relates to a method for automatically translating program code from a source language to a target language.
According to an example embodiment of the present invention, the method includes:
A second general aspect of the present invention relates to a method for training a language model configured to automatically translate program code from a source program code in a source language to a target program code in a target language.
According to an example embodiment of the present invention, the method includes:
A third general aspect of the present invention relates to a computer system configured to carry out the method according to the first and/or the second general aspect (or an embodiment thereof).
A fourth general aspect of the present invention relates to a computer program configured to carry out the method according to the first general aspect (or an embodiment thereof).
A fifth general aspect of the present invention relates to a computer-readable medium or signal, which stores and/or contains the computer program according to the fourth general aspect (or an embodiment thereof).
The techniques of the first, second, third, fourth and fifth general aspects of the present invention may have one or more of the following advantages in some situations.
A symbiosis of a language model for generating code (using the creativity of the language model) and verification and validation (V&V) techniques to ensure the results of the language model is proposed. This strong or complete automation increases efficiency. The V&V techniques are used to ensure that the result is (probably) correct, to provide the user with feedback about the correctness, confidence and/or quality of the translation, and to make the code translation more reliable.
The present disclosure enables the automatically translated code to be guaranteed. The use of language models such as LLMs to translate the code makes the translated code understandable and utilizes the characteristics of the target language. The advantage is that the resulting code fulfills properties such as functional equivalence and other properties, in particular security requirements. The quality of the resulting, i.e. translated, code is measured and evaluated as well.
The present invention enables code translation that fulfills security requirements. The generated code can therefore run on security-critical systems, such as a control unit, in particular in the automotive sector.
Some terms are used in the present disclosure in the following manner:
A “language model” can in particular be a large language model (LLM), a neural network, a recurrent neural network (RNN), a transformer model, or a code model as a language model specialized for code or also a very general language model that also includes code. Also included are computer languages, codes such as program codes of a computing device such as a computer. The language of the model can include not only natural languages, but also artificial languages such as programming languages.
“Testing,” “checking” or “comparing the source program code and the target program code” can include: a formal check of the same behavior of the source program code and the target program code, for example by means of bounded model checking, tests in the source language, tests for contracts in the source language and/or syntactic and stylistic tests, fuzzing, mutation of the inputs of the test harness, derivation from contracts of the source language and/or the target language and/or derivation from a language model.
A “test harness” or test framework includes a collection of software and test data that is used to systematically and automatically test a program under different ambient conditions. A test harness typically includes a test execution engine that is responsible for executing the test logic, and a test data repository or database that contains the test scripts, test programs and other test resources. The test harness in this case is generated automatically, for example by adding differentiating tests to the database. The test can be started with specific or ready-made tests from the test database. The system can also generate tests automatically.
Data here can be software code including test cases and harnesses plus additional (natural language) descriptions of the functionality or areas of validity. As an example, C is described here as the source language and Rust as the target language; other combinations are possible. The translation of C to Rust is interesting, because Rust provides features in the area of security-critical systems, whereas other languages, especially C, include a lot of legacy code.
“Contracts” are a component of contract-based programming or design by contract. This is a concept of software development that has the objective of optimizing the interaction of individual program modules by defining formal contracts for the use of interfaces that go beyond their static definition.
“Fuzzing” or “fuzz testing” is the automated process of sending randomly generated inputs from a fuzzer to a target or target program and observing the response of the target.
A fuzzer or fuzzing engine is a program that automatically generates inputs. They are therefore not necessarily connected to the software being tested and no instrumentation is carried out. They do, however, have the ability to instrument code, generate test cases and execute programs being tested. Well-known examples are afl and libfuzzer.
A “test case” is a specific input and a specific test run from a test harness or fuzz test. To ensure reproducibility, interesting runs (finding new code paths or crashes) are stored.
In a first step, a source program code in a source language is translated 11 into a target program code in a target language by means of a language model. The language model is a large language model (LLM), for example, into which data such as, in this case, a program code is entered via input (prompt) together with a question, such as, in this case. a translation request.
The translation is carried out repeatedly 12 with changed conditions, such as changing one or more hyperparameters such as a temperature parameter of the language model, transformations in the source program code, and/or changes in the input to the language model such as changes in the tasks or prompts. Variables in the code can be renamed as well.
These measures create a variance. This variance makes it possible to verify and improve the quality evaluation of the generated translations and also train the language model with feedback. Part of the improved quality evaluation is being able to determine which of the generated translations are more or less suitable.
The source program code and the target program code or codes is then compared 13 with a test harness, in which case the test harness is generated automatically. Alternatively, it is also possible to use an already existing test harness. Tests of the test harness or the test harness itself can be generated automatically, for example by fuzzing, mutation of the inputs of the test harness, derivation from contracts of the source language and/or the target language and/or from a language model.
Comparing the target program code to the test harness can begin with one or more specified or ready-made tests, for example. These initial tests can belong to the codes or be provided with the codes. The tests can be stored in a test database. Differentiating tests can then be added to the test database.
An automatic configuration of a test harness that compares a source program code and a target program code can include the following steps or specifications:
Static checks can optionally be carried out in parallel to the tests with the test harness.
The target program code is evaluated 14 based on code quality metrics such as the length of the source code, the number of loops and/or the branch depth, test quality metrics such as branch coverage and/or number of available or carried out tests.
According to an embodiment, the method further includes carrying out steps 11 to 14 several times so that several evaluations of target program codes are generated and the several evaluations are each provided with a quality value.
According to one embodiment, the method also includes generating the quality values based on code and test metrics, the number of available tests, the number of additionally carried out tests and/or the type and/or number of formal checks of contracts of the target language.
The evaluations or the quality values enable a comparability and prioritization of the generated target program codes. The method returns only correct solutions, i.e. code translations, and also provides information relating to quality and confidence in the correctness.
A source program code 21 in a source language such as C is provided to a language model 22 such as a large language model (LLM) for translation into a target language such as Rust. The language model 22 generates a target program code 23 as a translation of the source program code 21. This region of the computer system 20 can be referred to as the generation region.
Tests and optionally a test harness in the source language are another input 24 to the system 20. Alternatively or optionally, the tests and/or the test harness can be in the target language. These inputs are fed to a test harness 25. The test harness 25 records functions or tests in the target language.
Contracts 26 can optionally be fed to a contracts unit 27 that operates in the target language such as Rust. The contracts are managed there for later checks of the target program code 23.
Inputs of a first checking unit 28 are connected to the language model 22 for inputting the target program code 23 and with the test harness 25 for inputting test routines. In the first checking unit 28, the target program code 23 is compared to the source program code 21 by means of the test harness 25.
Inputs of a second checking unit 29 are connected to the language model 22 for inputting the target program code 23 and to the contracts unit 27 for inputting contracts. In the second checking unit 29, the source program code 21 is checked using the contracts 26.
If the checks in the first checking unit 28 and in the second checking unit 29 are completed successfully, a status message 30 indicating that the target program code 23 is okay is output. This region of the computer system 20 can be referred to as the checking region.
The target program code 31 is evaluated in terms of its quality using metrics 32. The metrics 32 can include code quality metrics, test quality metrics, and/or the number of tests. If the evaluation is successful, the target program code and its quality are output as the output 33. This region of the computer system 20 can be referred to as the quality evaluation region.
A quality can be calculated on the basis of the evaluation. If several target program codes have been generated, the solutions can be provided to the user in order of quality.
From the error module 35, the best target program code to date with the errors that still exist is fed back to the language model 22 as information 36 to be used to generate a better, ideally error-free target program code. This reduces the reliability and is taken into account in the quality determination.
This can optionally refer not only to errors in the checking region, but analogously also to errors in the quality evaluation region.
In a first step, the source program code is input 41 into a language model and a predictive target program code is generated. The language model can already be pretrained and possibly also already fine-tuned. Alternatively, it is also possible to start with a new, untrained language model. The training here is based on reinforcement learning. Training takes place in a training environment, for example with PPO (proximal policy optimization).
The predictive target program code is furthermore checked automatically 42 based on a formal check of the same behavior of the source program code and the target program code, tests in the source language, tests for contracts in the source language and/or syntactic and stylistic tests.
This is followed by the generation 43 of a reward for the language model, wherein a negative reward is generated if a test fails. The value of the negative reward becomes more negative the more incorrect the check or the result thereof is. A positive reward is generated if all of the tests and the amount of a value of the reward is based on the number of tests in the source language and the contracts in the source language.
According to one embodiment, the method further includes that the value of the reward is offset against a code quality metric in the case of a positive reward.
Lastly, the weights of the language model are updated 44 with the value of the reward. The result of the method is a language model that is better trained on new unlabeled data (here, for example C code from an engine control system), i.e. provides more reliable translations.
According to one example embodiment of the present invention, the method also includes approximating the reward by carrying out only one test of the automatic checking tests. This makes it possible to accelerate the training.
| Number | Date | Country | Kind |
|---|---|---|---|
| 23 20 0608.0 | Sep 2023 | EP | regional |