2020-05-18

"Code Generation" in C (or: The Preprocessor is Actually Pretty Great)

A few days ago, I wanted two different versions a resizing array in C. One for int, one for char. Of course, you can't do this is C. C doesn't have templates, doesn't have generics, doesn't have anything like that. The only thing C does have is macros. So, I set to work.

Something "Close" to Generics in C

First, let's write a simple version of a resizing array for a single type, T (I haven't bothered making this implementation super efficient, since this is just for illustrative purposes). The following goes in vect.c:


#define VECT_INIT_CAP 8
#define T int


typedef struct Vect {
        size_t count, cap;
        T *items;
} Vect;


Vect v_init(void) {
        return {
                0, VECT_INIT_CAP, malloc(VECT_INIT_CAP * sizeof(T))
        };
};


void v_cleanup(Vect *v) {
        free(v->items);
}


void v_push(Vect *v, T t) {
        if (v->count == v->cap) {
                v->cap *= 2;
                v->items = realloc(v->items, v->cap * sizeof(T));
        }
        v->items[v->count++] = t;
}


T v_pop(Vect *v) {
        return v->items[--v->count];
}


T v_get(Vect *v, size_t idx) {
        return v->items[idx];
}

Looks good so far. This will work for a single type T (in this case a typedef of int).

There's a few problems we need to solve in order to get this working for whatever types we need. First, we need a unique name for function, since C doesn't allow overloading. (So, we can't have both v_push(vect(char), char); and v_push(vect(int), int);, since both have the same name (v_push), and C only allows one function with each name.

So, the first step to making this properly generic is to generate a unique name for the struct and functions, based on T.

We can do this with a little bit of preprocessor magic:


#define Vect(T) v_##T
#define CAT(A,B) A##B
#define CAT_EXPAND(A, B) CAT(A,B)
#define VECT_FUN_NAME(T, F) CAT_EXPAND(Vect(T), F)
#define v_init(T) VECT_FUN_NAME(T, _init)
// ... and so on for the rest of the functions.

We also replace Vect and the names of the functions with Vect(T) and the appropriate macros for each function.

So the signature for our push function is now void v_push(T)(Vect(T) *, T);, for example.

Let's break down what's happening with v_push(T), since it involves a lot of indirection with macros.

The preprocessor will do the following replacements (supposing we substitute have T defined as int, as above):

v_push(T) -> VECT_FUN_NAME(int, _push)
VECT_FUN_NAME(int, _push) -> CAT_EXPAND(Vect(int), _push)
CAT_EXPAND(Vect(int), _push) -> CAT(v_int, _push)
CAT(v_int, _push) -> v_int_push

The reason we need so many levels of indirection here is because of how the preprocessor expands things out. ## will paste the macro arguments literally, so without the extra indirection, CAT would try to paste Vect(int) and _push literally, without expanding them, which is an error.

Long story short, we've not got unique names for our vectors and related functions, based on T. (There are some obvious limitations: T has to be a single identifier for this to work. We can get around this by typedefing more complex types, if we need T to be anything more complicated.)

The last step to this is generating a unique version of our vector for each type we want.

Fortunately, this is pretty easy. We wrap our implementation in #define VECT_INSTANTIATE(T), rather than having T explicitly defined. Then, in order to get vectors of type int, we can just add VECT_INSTANTIATE(int) to the bottom of vect.c.

The last piece is the header. We add a similar macro to VECT_INSTANTIATE (which I'll call VECT_DECL), which just declares the methods we want.

A full implementation can be found here, on my GitHub.

How Efficient is This?

Since this directly generates a specific version of our vector for each type we need, there's 0 runtime overhead to doing this. There's also no code-bloat, since we get exactly one definition of each version of the vector for each type we need. (This is possibly better than C++, which could instantiate a template multiple times for the same type, but in practice I haven't found this to be a problem with C++.)

In fact, the only "overhead" we have compared with C++ is having to write VECT_DECL(T) and VECT_INSTANTIATE(T) for each type we need, and having to write the type of the functions along with each invocation. (This second one could be avoided if you want to add a few extra macros, but then it becomes a bit more of a pain.)

Would You Actually Use This?

In its current form, probably not. It's mostly a POC to show what kind of stuff is possible in C using the preprocessor.

"But I Can Do This in Language X using Code-Generator Y"

Yup, I'm sure you can. And now you've got an extra dependency for your project that you have to deal with, while I've used the built-in code-generator that every C implementation comes with. I'm happy for you though.