How To Compile Typescript to LLVM Intermediate Representation (IR) – Part 1By: Lung (Backend Software Engineer)25 January 2022
In the growing interest of writing smart contracts, you are concerned about how a smart contract would end up in the blockchain network after being committed. To tolerate no programming mistakes occurring during a transaction on the smart contract, you may wonder if the smart contract could be verified at compile-time, and these mistakes are found before it is even sent to the blockchain network so that risk of financial loss or otherwise can be mitigated accordingly. In this article, we will explore one of the possible approaches that you could take to compile Typescript to an intermediate language which in turn is analyzed by your choice of smart contract verifiers.
When it comes to writing a compiler, it would be daunting to think about as a software engineer. Most of the time, you would write applications in a programming language instead of creating a new language. Fortunately, Typescript Compiler API can be a great entry, which saves us a few steps on code parsing and syntax analysis, to establish Abstract Syntax Tree (AST) from which we can extract statement and expression information and generate an intermediate representation with the LLVM IR builder. At the time of writing this article, only limited resources are available to build such a compiler frontend for compiling Typescript to LLVM IR and for later optimization and analysis. The diagram shown below is our idea of generating the IR. You can see that both the Compiler API and LLVM bindings are required to achieve our goal of converting Typescript constructs. Whenever the compiler frontend (which is not the UI frontend as we normally know) visits an AST node of our interest, a corresponding IR will be built according to the extracted information of that node.
Let us look at the following example to understand how our approach actually works.
The first thing you may have noticed is that variables are declared and assigned on the first few lines inside the function. To represent the variable declaration and assignment in LLVM IR, a stack space is allocated for every variable, such as %1 and %a. Every time the value of a variable is needed, the load instruction, e.g. %2, will load the value from a space previously allocated on the stack for later use. We can also store a new value to an allocated space with the store instruction shown in the IR output.
With the basic knowledge of variable initialization above, you are about to learn how we visit if-statement and build up its corresponding IR.
In the if-statement above, the first condition a >= b results in branching into so-called basic blocks, labeled if.then and if.elseif in the IR output below. Based on the comparison result %5, the program execution will flow through into either one of the blocks. Now you may also have realized in the if.elseif basic block, another branching appears to decide the flow of the program execution based on the second condition b < a represented by %14. However, whenever we see a return statement in an if-statement block, the basic block, e.g., if.else, does not branch off, and instead it returns a value from there. If a return statement does not exist in a block, the program execution continues until the end of the function. In our example, only if.then1 eventually goes to the final basic block if.end.
Converting multiple else-if statements does seem complicated at a first glance. So how should we accomplish it in our implementation? Here we follow a visitor pattern in which we implement a visitor function for each expression or statement of our interest. In our if-statement example, visitIfStatement is used to construct its IR output, where we have a while loop to continuously check whether nextStatement is the next else-if statement to visit, or otherwise the program execution goes directly to the final basic block.
In summary, we have described the purpose of why we are exploring a way to translate Typescript to LLVM IR. By an example source code and its IR output, we have also explained, though not thoroughly, how it has been converted as well as a possible implementation for visiting if-statements to achieve our goal.