HCR EP 003: Data persistence API

2024.12.09

Dec 10, 2024

This is Episode 3 of the HandCraftedForum series, where I document the creation of a forum system using a mini web framework.

In the previous episode I walked you through setting up an empty project and creating a simple RPC on the server the the client can call, as well as basic UI interaction and state management.

In this episode I will show you how we do data persistence by continuing the basic auth implementation.

The purpose of this exercise is to demonstrate the basic structure of applications written in this framework. Once we are done this demonstration, we will begin building out the project at a faster pace, and I will not necessarily be showing all the steps in their tiny details, as writing up about everything is going to take considerable effort and slow down the project work itself.

So in these first few episodes, I want to give you the tools to understand the basic way we do things, so that you can follow along as we proceed further.

Native data persistence, indexing, and querying

Unlike most web frameworks, we are not going to be using an external database server that we connect to, instead, our database engine is a library that is embedded into the program itself. Sort of like SQLite, but not.

Also unlike most web frameworks, our data model is not relational, and our querying model is not SQL. Instead, we just call regular functions to persist data, persist indexing information, and load/query it.

In the SQL model, you send commands in a textual query language, which you must construct first, and send to the storage engine. Even in SQLite, that's what you do. The SQL engine then parses the query you sent and turns it into an execution plan, and executes it, usually in some kind of byte code virtual machine; at least, that seems to be what SQLite does.

Our data persistence models is completely different. We use a B-Tree based storage engine (BoltDB) to persist data in buckets, indexes, and collections.

You can think of a bucket as a persisted map: you put in the object Id, you get back a copy of the object.

An index is like a persisted bidirectional multi-map, but with a sorting key, and the ability to set all the search terms (keys) for a particular target.

A collection is a way to group a list of keys under a parent key, with an ordering key.

We have regular function to interact with these storage blocks:

Buckets: Store/Load/Delete items by id
Collection: Add/remove (key, order) by id
Index: Set target terms, and iterate matching targets for a search term.

A few other utility functions exist, but the above constitutes the core of our persistence API, and it suffices for almost everything you want to do in a web application.

To allow data to be persisted, it needs to be serialized. We define a serialization function using VPack.

This is what a typical serialization function would look like:

type User struct {
    Id       int
    Username string
    Email    string
    IsAdmin  bool
}

func PackUser(self *User, buf *vpack.Buffer) {
    vpack.Version(1, buf)
    vpack.Int(&self.Id, buf)
    vpack.String(&self.Username, buf)
    vpack.String(&self.Email, buf)
    vpack.Bool(&self.IsAdmin, buf)
}

The PackUser function is used for both Serialization and Deserialization. You do not need to define these two separately.

Now that we have a type and a serialization function, we can define a bucket:

var dbInfo vbolt.Info

var UsersBkt = vbolt.Bucket(&dbInfo, "users", vpack.FInt, PackUser)

The first argument is an object that collects information about the database file: the list of buckets, indexes, and collections. This information will be used to initialize the database. We will see how in a bit.

The second parameter is a name for the bucket. You will almost never use this directly, but we still choose to give a short but meaningful name. The only actual requirement is for it to be unique.

The third and fourth parameters define the types of the "key" and "value" objects, and their serialization functions. vpack.FInt is the serialization function for integers that uses fixed width (8 bytes), as opposed to the regular vpack.Int which uses a special scheme that allows using the minimum number of bytes to store small number values.

The Bucket function takes packing functions, not types, as its parameters. The types for Keys and Values are derived from the packing functions.

Now, let's say we want to define an RPC to create a user with a username, email, and password.

We need a few more things in addition to the above:

We need a bucket store the password hash. Note: we do not put the hash in the User struct; we don't want that data to be sent to the browser when we read stuff from the UsersBkt.

var PasswdBkt = vbolt.Bucket(&dbInfo, "passwd", vpack.FInt, vpack.ByteSlice)

We also need to keep track of usernames that are taken so we can prevent duplicate usernames, and we also want to be able to retrieve the user id for a given user name.

var UsernameBkt = vbolt.Bucket(&dbInfo, "username", vpack.StringZ, vpack.Int)

Now there's an important step when you define a set of buckets: we must create them if they don't exist on program startup.

Without this bit of initialization code, any attempt to read or write to buckets would panic at runtime.

Note: we don't return errors for reads or writes to the database; we only panic. It's the programming equivalent of reading/writing to a null or invalid pointer, or to outside of array bounds.

When they panic, the response handler for the RPC returns a special "Server Error" response.

Now we can implement AddUser and GetUsers in terms of the buckets.

Here's a utility function to fetch all the users from the UsersBkt (without pagination)

func fetchUsers(tx *vbolt.Tx) (users []User) {
    vbolt.IterateAll(tx, UsersBkt, func(key int, value User) bool {
        generic.Append(&users, value)
        return true
    })
    return
}

We'll use this at the end of both the AddUser and ListUsers.

Now, let's take a look at how I implement adding a user. It's a bit difficult to explain everything in a blog format, so I took a screenshot of the code and annotated it.

We can make a few changes to the frontend code, they should be easy to guess, and we have an interface to add users:

I hope this serves as a good introduction to my style of programming with database: it's just a set of building blocks and functions you call to manipulate data on buckets, indexes, and collections. We haven't seen how to use indexes yet, but when the time comes and you see it, hopefully you will not be surprised.

There's a lot to talk about in terms of what the storage layer can do and how to use it. I'll introduce more aspects of it gradually as we move along.

Download the code: EP003.zip

View the code online: HandCraftedForum/tree/EP003

Hasen Judi

Discussion about this post

Ready for more?