I built a TypeScript to C transpiler in 9 days

Aug 13 2024

#programming #languages #transpilers #compilers #c #typescript

TLDR: I built a transpiler that converts TypeScript code into C. I made some interesting design choices. The project doesn’t implement JavaScript’s standard library or object-oriented programming and instead focuses on writing C-like code using TypeScript syntax. In this blog post I explain the implementation process, challenges faced, and lessons learned during the making of this project.

The code is available on GitHub.

Over the past few days I’ve been working on a little program which transpiles JavaScript (well, TypeScript) code into C. This is no ordinary transpiler, as the user has to adapt their codebase to use C practices, and this is by design. I have made a lot of important design choices in this project which I am proud of, because they shape the project to be very unique. Some of the important design choices I made are:

Not using JavaScript’s standard library — so not implementing the global console object, the forEach method on collections, etc.
Not implementing OOP or objects.

These design choices made the project’s goal clearer: to be able to write C with the syntax of TypeScript. Obviously, there’s no particular reason that anyone would want to do this, but it’s a fun experiment. I will talk about the specifics of how I implemented the transpiler later on. For now, I should just explain how to write C-compatible TypeScript code.

I mentioned the fact that I don’t expose a console object, so how does one print to the screen if we can’t use console.log? the answer is printf!

import { printf } from "stdio.h";

printf("Hello, C!\n");

But that’s not completely right. A C program doesn’t just guess where to start running code — it runs the main function.

import { printf } from "stdio.h";

export function main(argc: number, argv: string[]): number {
  printf("Hello, C!\n");
  return 0;
}

This obviously compiles to this C program.

#include <stdio.h>

int main(int argc, char** argv) {
  printf("Hello, C!\n");
  return 0;
}

You may notice the import — it imports the printf function from the stdio.h module. When this is transpiled, the items which are imported aren’t really taken into consideration. I still, however, highly suggest that you specify which items you’re using from each “module” (in reality header file) and I think this feature should exist in C as well. The same applies with booleans:

import { true, false } from "stdbool.h";

You may also notice that we only really have one number type: number, which is just converted to int in C. That is because JS/TS uses a single type for all kinds of numbers (it’s a f64), and that’s one downside of transpiling between a high level language and a low level language.

If you’re wondering why I don’t just convert number to a double (which make more sense), that’s because we have to return an int from the main function.

One thing that was fun to implement was pointers. TypeScript obviously won’t allow us to use the & and * characters like we do in C, so I opted for a solution that is more native to TypeScript.

let foo: number = 8;
let bar: Pointer<number> = ptr(foo);

This transpiles to:

int foo = 8;
int *bar = &foo;

ptr is currently the only builtin function. Obviously, it has no TypeScript definition. You might think that its type signature would be something like this:

declare function ptr<T>(value: T): Pointer<T>;

This makes sense in terms of usage, but it’s not very logical since JavaScript passes such simple arguments by value and not by reference, so if we passed a variable through this function multiple times, it would return different a pointer each time.

Since pointers work fine, I also implemented arrays. Well, sort of. I didn’t really know how to implement fixed-size arrays like this:

int arr[4];

But I did get this syntax working:

let primes: number[] = [2, 3, 5, 7, 11, 13];

which transpiles into:

int primes[] = {2, 3, 5, 7, 11, 13};

You can even use sizeof like so:

import { printf } from "stdio.h";

export function main(): number {
  let primes: number[] = [2, 3, 5, 7, 11, 13];
  let primes_len: number = sizeof(primes) / sizeof(primes[0]);

  printf("primes_len = %d\n", primes_len);

  return 0;
}

In C I’d usually opt in for dividing by sizeof(int) instead of sizeof(primes[0]), but it doesn’t really make a lot of sense here since we don’t really have the type int in TypeScript. It does work however:

sizeof(int) // => 4

It would make more sense for this to work:

sizeof(number)

But it doesn’t. I really don’t know how it works in C, but on my transpiler, types are not values, so they can’t be passed as arguments into functions.

Well, that’s partially false. The above code does work, but the transpiler interprets it as using sizeof on a variable called number instead of the type number — so it doesn’t convert the type into int.

Other than that, I’ve implemented the rest of the basic language features: while loops, for loops (only the c-like ones though — I couldn’t figure out how to do for..of and for..in)

let i: number = 0;

while(i < 99) {
  printf("i = %d\n", i);
  i++;
}

for(let i: number = 0; i < 99; i++) {
  printf("i = %d\n", i);
}

is transpiled into:

int i = 0;
while (i < 99) {
  printf("i = %d\n", i);
  i = i + 1;
}
for (int i = 0; i < 99; i = i + 1) {
  printf("i = %d\n", i);
}

I did have some trouble implementing the for loops, primarily because of the semicolons. I’ll elaborate on that.

Here is my for loop struct:

pub struct ForStatement {
    pub init: Box<Statement>,
    pub test: Expression,
    pub update: Box<Statement>,

    pub body: Box<Statement>,
}

You can see it primarily contains three fields (we won’t really talk about the body, that’s not very relevant to my issue.)

Just to make sure we’re on the same page:

init is the first part of the for loop where you can initialize or declare a variable. It’s a statement.
test is the second part, after the first semicolon, which tests whether we should continue with the loop. It’s an expression.
update is the last part after the second semicolon which updates the counter, usually the same variable declared by init. It’s a statement.

In the following case:

for(int i = 0; i < 5; i++) {
    ...
}

int i = 0 is init
i < 5 is test
i++ is update

In my transpiler, I’d always write a semicolon after statements, because before I implemented for loops, they only appeared in places where you’d need a semicolon — so, in their own lines. When implementing for loops I realised that statements don’t only appear in their own lines.

I think it would be perfectly logical to terminate all statements with a semicolon. This looks normal to me:

for(int i = 0; i < 99; i++;) {
    ...
}

Yet C doesn’t allow it.

Since I believe this is an inconsistency in C’s design, I opted in for an inconsistency on my transpiler:

statement.update.trim_end_matches(";");

(I really couldn’t figure out a better way.)

Before I dive into the implementation details, here’s bubble sort:

export function bubbleSort(arr: number[], n: number): void {
  let i: number;
  let j: number;
  let temp: number;

  for (i = 0; i < n - 1; i++) {
    for (j = 0; j < n - i - 1; j++) {
      if (arr[j] > arr[j + 1]) {
        temp = arr[j];
        arr[j] = arr[j + 1];
        arr[j + 1] = temp;
      }
    }
  }
}

transpiles into:

void bubbleSort(int arr[], int n) {
  int i;
  int j;
  int temp;
  for (i = 0; i < n - 1; i = i + 1) {
    for (j = 0; j < n - i - 1; j = j + 1) {
      if (arr[j] > arr[j + 1]) {
        temp = arr[j];
        arr[j] = arr[j + 1];
        arr[j + 1] = temp;
      }
    }
  }
}

Implementation

My first implementation was in TypeScript itself, since I thought I’d have better support. I used swc’s parser’s bindings for TypeScript but I realized I just didn’t like the workflow, so I opted in for the language that I’ve been using the most recently, which is also the language that has the best support for SWC’s APIs, which is Rust.

I should first mention that I created my own IR, which is basically a dumbed down AST which is derived from SWC’s AST. It would probably be faster if I only had one step (converting SWC’s AST into C) instead of two steps (1) converting SWC’s AST into my IR, 2) converting IR to C) but it’s still too fast to complain.

This is what my IR looks like:

pub struct Program {
    pub imports: Vec<Import>,
    pub methods: Vec<Method>,
}

pub struct Method {
    pub name: String,
    pub return_type: Type,
    pub parameters: Vec<MethodParameter>,
    pub body: Vec<Statement>,
}

pub enum Statement {
    VariableDeclaration(VariableDeclaration),
    If(IfStatement),
    While(WhileStatement),
    For(ForStatement),
    Return(ReturnStatement),
    Expression(ExpressionStatement),
    Block(BlockStatement),
}

Maybe not the cleanest, but anyway.

I used SWC’s Visit API. I don’t fully understand how it works, or how the Visitor Pattern works, but I got around to using it.

I implemented a Visitor struct implementing its Visit trait. The struct contains some state:

pub struct Visitor {
    pub program: Program,

    pub current_function_name: Option<String>,
    pub current_function_params: Option<Vec<MethodParameter>>,
    pub current_function_return_type: Option<Type>,
    pub current_function_body: Option<Vec<Statement>>,
}

It implements these functions:

fn visit_import_decl(&mut self, node: &swc_ecma_ast::ImportDecl) {
    ...
}

fn visit_fn_decl(&mut self, node: &swc_ecma_ast::FnDecl) {
    ...
}

fn visit_function(&mut self, node: &swc_ecma_ast::Function) {
    ...
}

fn visit_param(&mut self, node: &swc_ecma_ast::Param) {
    ...
}

fn visit_stmt(&mut self, node: &swc_ecma_ast::Stmt) {
    ...
}

I won’t go into detail as to what they do specifically, but basically, they recursively visit the AST nodes, and convert them into my IR. I call this step “parsing”. Not sure if it’s the right word but that’s what I came up with.

I came up with a trait called ToIR

pub trait ToIR<T> {
    fn to_ir(&self) -> Result<T>;
}

Obviously I could’ve just used From but I think this organizes the code better.

I went through a few phases of code quality, but the one I ended up with is this structure:

├── expr
│   ├── array.rs
│   ├── call.rs
│   ├── mod.rs
│   └── ...
├── statement
│   ├── return_s.rs
│   ├── while_s.rs
│   ├── mod.rs
│   └── ...
├── types
│   ├── array.rs
│   ├── mod.rs
│   ├── primitive.rs
│   └── ...
└── ...

A “parser” (an implementation of ToIR) looks like this:

def_parser!(CallExpr, Expression, |expr| {
    let callee = *expr.callee.as_expr().unwrap().clone();

    let name = match callee {
        Expr::Ident(ident) => ident.sym.to_string(),

        other => bail!("non-supported callee kind: {:?}", other),
    };

    let arguments: Vec<Expression> = expr
        .args
        .iter()
        .map(|expr| expr.expr.to_ir())
        .map(Result::unwrap)
        .collect();

    Ok(Expression::MethodCall(MethodCall {
        name, arguments
    }))
});

Yes, I know, the error handling is nasty. I am not proud of the code quality. I will slowly start to refactor it now that the project is for the most part functional.

The important part is that I implemented a macro for implementing parsers, and it greatly helped remove boilerplate.

I did the same with codegens! I have almost the same structure for code generation: a ToC struct and a def_codegen! macro.

def_codegen!(Import, |import| {
    let mut buffer = CodeBuffer::default();

    buffer.write("#include <");
    buffer.write(import.module.as_str());
    buffer.write(">");

    Ok(buffer)
});

You will notice the CodeBuffer struct. This is my poor attempt at making some sort of StringBuilder equivalent for Rust. It’s not fully needed but I think it’s a pretty clean way to write codegens. This is roughly what it looks like:

pub struct CodeBuffer {
    lines: Vec<String>,
}

impl CodeBuffer {
    pub fn write<S: Into<String>>(&mut self, content: S) {
        let content = content.into();

        if let Some(last) = self.lines.last_mut() {
            last.push_str(&content);
        } else {
            self.write_line(content);
        }
    }

    pub fn write_line<S: Into<String>>(&mut self, content: S) {
        let content = content.into();

        self.lines.push(content);
    }

    pub fn concat(&mut self, other: &CodeBuffer) {
        self.lines.extend(other.lines.clone());
    }
}

I found the concat method useful for combining CodeBuffers. I look at them as some sort of combinators, because I implement ToC for tiny units of code, and then concatenate them like this:

def_codegen!(Expression, |expr| {
    let mut buffer = CodeBuffer::default();

    match &expr {
        Expression::Literal(literal) => return literal.to_c(),
        Expression::Variable(variable) => buffer.write(variable),
        Expression::Paren(expr) => {
            buffer.write("(");
            buffer.write(expr.0.to_c()?);
            buffer.write(")");
        }
        Self::MemberAccess(access) => {
            buffer.write(access.object.to_c()?);
            buffer.write("[");
            buffer.write(access.index.to_c()?);
            buffer.write("]");
        }
        ...
    }

    Ok(buffer);
});

Recap

To recap, the code goes through this process:

parsed by SWC, converting it to an AST
AST is walked, creating IR AST
IR AST is walked, converting it into C

PS: The smallest part of the codebase is the codegen, so It’d be super easy to change the language it’s transpiled to.

Benchmarks

I really didn’t focus on performance while making this, but here’s the benchmarks of me running it on the following program:

import { printf } from "stdio.h";

export function bubbleSort(arr: number[], n: number): void {
  let i: number;
  let j: number;
  let temp: number;

  for (i = 0; i < n - 1; i++) {
    for (j = 0; j < n - i - 1; j++) {
      if (arr[j] > arr[j + 1]) {
        temp = arr[j];
        arr[j] = arr[j + 1];
        arr[j + 1] = temp;
      }
    }
  }
}

export function printArray(arr: number[], n: number): void {
  let i: number;
  for (i = 0; i < n; i++) {
    printf("%d ", arr[i]);
  }
  printf("\n");
}

export function main(): number {
  const arr: number[] = [64, 34, 25, 12, 22, 11, 90];
  const n: number = 7;

  printf("Original array: ");
  printArray(arr, n);

  bubbleSort(arr, n);

  printf("Sorted array: ");
  printArray(arr, n);

  return 0;
}

(release build, running on M1 Mac)

> hyperfine -N 'target/release/type-c-rs examples/bubble_sort.ts' --warmup 1000
Benchmark 1: target/release/type-c-rs examples/bubble_sort.ts
  Time (mean ± σ):       1.3 ms ±   0.1 ms    [User: 0.5 ms, System: 0.4 ms]
  Range (min … max):     1.2 ms …   2.1 ms    2258 runs