2020-02-18

Data-structures That Are Actually Useful

Not exhaustive, but should cover all the important ones. I'm gonna start simple. I'll use C/C++ examples mostly, but you should find the majority of these in most languages. Of course, in scripting languages sometimes they're you're missing some or whatever, but if you cared about performance you wouldn't be using a scripting language so it balances out in the end I guess.

I honestly cannot remember the last time I actually needed anything that wasn't either one of the below, or a thin layer on top of one of them...

Unless you're actually benchmarking your code, please don't use anything more complicated than these and tell me about "asymptotic run-time".

Static Array

By "static" I mean "the number of elements is known statically (at compile time that is)" and by "array" I mean "the elements are contiguous in memory". This gives it great cache locality, something that it shares with its dynamic cousin below.

The fact that its size is known statically is a bit of a restriction, but lets the compiler safely put it on the stack without risking a stack-overflow. This is its one and only advantage over a dynamic array, so if you're using a language where stack v. heap doesn't really matter, don't even bother with this one.

Operations on static arrays are pretty great. Indexing (both set/get) are both constant time.

In C, this looks like: int arr[10];. In C++, you can also do: std::array<int, 10>arr;.

Dynamic Array

The hip, cool version of the static array. Its size isn't known at compile time, but its elements are still stored contiguously in memory. The elements are on the heap, in order to allow it to grow or shrink at runtime. (There's a reason VLAs are considered a dangerous feature by C programmers; and if C programmers say something is dangerous you know you should probably listen.)

This should probably be your go-to for storing elements of the same type in some order.

Operations are the same as for its stack-allocated cousin the static array. Also should support appending in amortized constant time. Note that this means dynamic arrays can double as a stack (the push/pop kind) quite easily. Inserting and deleting are both linear, but quite fast cause of the cache locality.

In C++: std::vector<int> v;.

Hashtable

Sometimes, you want to be able to index in constant time, but your keys just aren't integers. In those cases, let me introduce you to the hashtable.

You can index based on keys of arbitrary types, not just integers. Lookups, inserts, and deletes are amortized constant time.

I hear Google has a nice version of a hashtable in C++...

Hashset

Just like a hashtable, but with no value for each key (hence taking half the space, but at the cost of not being able to do more than check for containment, rather than lookup based on a key. Runtime complexity should be the same as for a hashtable.

Tuple

Basically, an anonymous C struct (in a statically typed language). In a dynamically typed language this is just the same as a static array. Fixed number of elements, decided at compile time. Each can have a different type. Elements should be stored contiguously (or as close as possible when considering alignment) in memory.

Getting and setting are both constant time of course. There's no concept of deleting or inserting extra elements.

C++ gives you one: std::tuple<int, double, std::string>t(1, 3.14, "hello world");. This one contains an int, a double, and a C++ style string.

What About Other Data-structures?

Yes, they obviously have their uses otherwise they wouldn't exist, but they're just not useful enough to (a) deserve special syntax or (b) get used all that often. Obviously in certain special cases you just must have some specialized data-structure, so you should be aware that other data-structures exist, but that's about it...

P.S. If you're reading this and thinking of hiring me, don't be a dick and take this as a challenge to quiz me on the most obscure data-structure you can find.