Systems Programming & Memory Safety

Resist the phobia of low level programming

Sep 01, 2022

In my previous article, I talked about how I like Odin, which is a systems programming language that ships with no garbage collection nor a borrow checker. Like C, it allows you great control over memory, and it does not prevent you from dealing with memory however you want. 1

This, for me, is a good thing. I want to get into systems programming because there are some things that cannot be done well except in a systems language.

However, the most prevalent opinion on the internet seems to be something along the following lines: “This is dangerous! Handling memory is fraught with security problems! Even the most experienced programmers leave security holes behind! Stay away from languages that are not memory-safe!”.

I fundamentally disagree with this stance.

Although it’s true: a mistake in the low level can bring entire systems down. But this is not a reason to shy away from the endavour.

For example, if building a data persistence and retrieval system (aka a database) with a bad design or a naive architecture, then an error in writing a few bytes to a file could corrupt the entire database.

Note that such errors have nothing to do with managing memory. It can happen in a garbage collected language, such as Go, yet it would be severe none the less.

So the risk is high: if you don’t know what you are doing, costly mistakes are easy to make.

But the reward is also high: if you know what you are doing, and you do it right, you can create something orders of magnitude better than everything else that existed before.

Of course, you can’t get there using the same techniques and mentality you use when you program a website with React and Typescript. That’s part of the challenge, or the thrill: to learn new ways of programming that work for low level systems.

I can’t claim that I know what I’m doing: the whole point is I want to learn, and the only way to learn properly is to learn by doing.

Safety in “memory safe” languages.

There is a huge class of security problems that have nothing to do with memory safety, so they can and do happen often in all the “memory safe” languages.

A web application server, if not programmed properly, can have bugs where guests gain admin-level access.

Programs like “VS Code” and “npm” have had security problems related to automatically executing shell commands from external repositories without the knowledge of the user. I believe git has had similar issues in the past. Again, none of this is related to memory safety.

Many websites (specially those implemented in PHP) used to suffer from SQL injection attacks (some still do).

For some reason, the people who insist that it’s unethical to write production code in C, continue to write web applications in interpreted languages, and continue to use SQL database engines for data persistence and retrieval.2

Recently it was revealed that a very popular logging library for Java (log4j) has had a vulnerability that allows arbitrary code execution, and it went largely unnoticed for years.

Out of all things, a logging library is the last place one would expect to find a security vulnerability; and yet there it was! In a popular open source library, written in a memory safe language.

Pause for a second to notice how bad this is. If you had made the decision to use Java because it was “memory safe”, and made the decision to use log4j because it’s “popular” and “open source”, and you gleefully believed that “with enough eyeballs, all bugs are shallow”, you would have been in for a huge surprise around the end of 2021 when the vulnerability was announced.

The most popular “programming” paradigm today, in 2022, is to depend on hundreds (if not thousands) of packages written by other people. Packages that you have not looked much into, nor vetted for security concerns! All without having the least bit of worry about what security problems are lying in there waiting to be exploited.3 The vast majority of these libraries are licensed in such a way that their authors have zero responsibility for any problem their code might cause, whether it’s to you or to the users of your product.

So when people bring up “memory safety” issues in the context of systems programming, I don’t believe it. I think they are scared of memory. They have never learned how to do it, and/or were taught to be scared of it, either by their college professors or by other people on the internet.

Now, I myself haven’t learned how to do it properly, but that’s exactly why I want to get into systems programming: to learn how to do it.

C’s biggest mistake

More than 10 years ago, the creator of the D programming language, Walter Bright, published an article titled “C's Biggest Mistake”. In it, he argued that the source of the vast majority of security issues in C stem from one fundamental mistake: the conflation of arrays and pointers. In other words, the lack of a builtin type that stores both the address and the length together.

Buffer overrun

A “buffer overrun” happens when a program writes data to a buffer that it doesn’t know the size of, and can’t properly check that it’s writing the data within the space allowed for the buffer.

One of the “standard” functions in C is strcpy. This function copies from one string to another by taking only the pointers to the beginning of the input and output strings. Obviously, it has no way to tell where either of them actually end, so it keeps copying from source to target until it hits a zero byte in the source buffer. It’s the callers responsibility to ensure that enough size is allocated for the target buffer.

Later versions of the standard added a safer alternative: strncpy. This one takes an additional argument to specify how many bytes to copy. You are supposed to pass the size of the target buffer. This function will not “overrun” the target buffer, but it can leave it in a state where it is not properly null terminated. If you then try to pass it around to other functions as if it was a null terminated string, problems will happen.

The problem is not limited to string function. The “standard” memcpy suffers a similar problem. It takes two pointers and one size. You need to manually specify the size such that:

It is not larger than the target buffer
It is not larger than the source buffer
(optional) Fits some condition in your program

It is fairly easy to make a mistake here if you are not careful.

Slices solve buffer overrun

When you have slices, you almost never need to copy memory using raw pointers and memcpy.4

Odin has a builtin slice type and a builtin copy function that takes two slices as arguments: target and source. Since the lengths of both buffers are known, there’s no risk of buffer overrun.

    buffer: [1024]u8    
    copy(buffer[:], source)

Arrays and dynamic arrays can be converted to a slice with the slicing [:] syntax.

The slicing syntax can also be used to selectively copy only a portion of the source buffer (or to a portion of the dest buffer).

    copy(buffer[start:], source[:limit])

If you’re worried about confusing argument order, you can specify parameter names:

    copy(dst=buffer, src=source)

Use After Free

This one is difficult to solve at the language level without a huge mental overhead to the programmer (aka “ownership semantics”), which I’m not a fan of, because it effectively makes the language not a systems language.5

However, this is where the techniques and mindset come into play.

If your model of programming revolves around creating lots of tiny objects that reference each other but have potentially unrelated lifetimes, then you could run into this problem and it maybe be very difficult to solve.

The first angle of attack is to think of objects in terms of groups that have the same lifetime. Objects within this group can safely point to each other. As long as no external object points to any of them, then there’s nothing to worry about.

For example, if you are programming a web server, then each request will need to allocate memory while it’s being processed, but once the response is sent, all the memory can go away at once, and nothing outside the request would have any pointer to any of this memory. Arena allocators are perfect here.

If you do have objects that reference each other but have unrelated lifetimes, the consensus among experienced programmers seems to be that you would use opaque handles instead of pointers, where the handle is an index into an array, with a generation number embedded into it. When you need to dereference the handle, you call a special function to convert it to a pointer (that you would not keep around). If it returns nil, then you know the object is no longer valid. If you forget to check for nil, you crash the program instead of exposing a vulnerability through use-after-free.

Here is an article6 on the subject7 written by someone with some experience:

Handles are the better pointers

Architecture, not carefulness

The prevailing internet stance on manual memory management is that to do it properly you have to be very careful about matching every “malloc” with a “free”. (With the underlying assumption being that you would have a lot of them).

This is the wrong approach. You want instead to architect things so the problem mostly takes care of itself. Just like we saw with arenas and handles.

This is not only for memory management; it applies to everything.

How do you build a database system where you don’t lose everything due to a few erroneous bytes? Architecture.

This is not to say that you don’t have to be careful; you still have to be. But no amount of carefulness will be enough to avoid problems caused by bad architecture.

Endavouring into systems programming for me is about architecture. How to architect a web server to handle as many concurrent requests as possible with few resources? How to architect a databse for high reliability, high throughput and low latency? How to architect a UI system with a high frame rate, no jank, and low power consumption? These are all interesting problems in the systems space that I believe are still open for innovative solutions, and I want in on some of the fun.

Addendum (March 2023)

One point I didn’t directly address in this article was summed up by this comment someone had left on the orange website:

[You are] ignoring the nice security equation,
    Σ exploits = Σ memory_corruption + Σ logic_errors
Having Σ memory_corruption ==> 0 is of course much welcomed outcome, even if Σ logic_errors > 0.

This is meant as a rebuttal to my point above about (1) Security errors sitll occuring in memory safe languages, and (2) Logic errors mattering a great deal in systems programming.

Now, the problem with the reasoning (and sentiment) behind this comment is that assumes that memory errors and logic errors are orthogonal.

My assertion is that they are not: if your code is likely to container logic errors, it’s also likely to contain memory errors, because both classes of error have the same root: bad architecture, improper technique, and lack of a compehensive testing framework.

It does require you to be explicit about your intentions. For example, it will not implicitly demote an array to a raw pointer, but if you want to get the raw pointer to an array, it will not stop from doing so.

If the data persistance and retrieval system had a regular API, then SQL would not be needed. You would just write regular code to persist and query data. In terms of security, having SQL is really bad.

To be completely honest, I’m also guilty of this to some extent, although I do try to minimize - as much as possible - the number (and size) of dependencies.

Slices also eliminate the need for pointer arithmetics.

If the programmer can’t deal with memory as he wants, it’s not a systems language. Rust has an ‘unsafe’ mode that lets you deal with memory without restrictions.

It seems to be a well known technique going back as far as the early 2000’s or even the 1990’s, but I didn’t know where to look for resources about it, so that article is the only thing I have at the moment.

It seems like this technique is gaining some popularity among Rust programmers! If you have an hour or so to spare, this can be fun to listen to: Rant on entity systems and the borrow checker.

Hasen Judi

Discussion about this post