HCF EP 007: Prototyping with imported data

2025.01.16

Jan 17, 2025

Happy new year everyone. I hope you had a nice holiday and may this year of 2025 bring you great achievements in your life.

This is Episode 7 of HandCraftedForum.

In the past few episodes we talked about the project's basic structured and discussed some technical points.

In this episode I want to start prototyping the conversation view.

I would like to try out some ideas that allows us to have threaded conversations but still have a linear view of all the replies to a post.

Something like Twitter/X, where when you open a post, you see its parents, and its descendants (replies).

X is not very good at showing you all the replies, specially when they are threaded. You have to keep drilling. Instead I think we could show all the replies to a post, flattened.

I don't like threaded views line Reddit and Hacker News, because replies to top comments get shown before replies that would otherwise rank higher; whether you are sorting by date or by some kind of score.

However, I do like having comments that are replies to other comments, show a link to the comment they are replying to. I think Discourse does that. I think it's a good idea and we will steal it.

Data Model

To keep track of parent-child relationships between posts and their replies, we'll add a ParentPostId to the Post struct, and we'll use an index to keep track of replies.

Since we have no real data yet, we don't need to migrate anything; we just add a migration process to reset the posts bucket, and now we can change the packing function for Posts without incrementing the version.

To keep track of parent child relationships, we declare an index from int to int.

This index can be used to get both all the ancestors of a post and all of its descendants, iteratively instead of recursively.

Remember that the index is a two way multi-map between terms and targets: If the term points to the direct parent id, then the target will point to the direct children ids. Since the term points to all its ancestors, then the target will point to all of its descendants.

To explain a bit further (feel free to skip this paragraph if you already understand):
- The parent id is always on the post
- When we first post a comment A without a parent, there's nothing for it in the index.
- When we then post a reply B to A, we store [A] as the term for B.
- When we post C as a reply to B, we get the list of ancestors of B, which is [A], and add [B] to it, and store it as the list of parents for C.
- When we post D as a reply to C, we get the list of ancestors of C, which is [A, B], and we add C to it and store it as the list of ancestors for D. Storing a list of terms for a target basically adds entries of the form (term, priority, target) for each of the terms. If we ignore the priority and the ordering, you can think of the index this way:

Think of it as a table that we iterate and filter. It's clear that given key 'B' we can find all the ancestors and descendants by just iterating and filtering.

So to implement writing to this index, we look at the parent id, and if it's set, we retrieve the list of the parents parents from the index, add the direct parent id to the list, then write that as the set of terms for the post id.

populating the post replies index

We will see below how to query this index for the ancestors and replies.

Importing Data

Now, in order to start prototyping the UI, we need to have some conversation data. I don't want to create the conversations manually, nor do I want to generate them.

I did a bit of research to see if we can find a dump of some conversation data from HN or some mailing list. Turns out, HN has an API to get conversation data:

https://github.com/HackerNews/API

I wrote some code to "import" a thread recursively by the id of the root post. The code does the import in two stages: first stage downloads all posts from the API, second stage populates our DB with all the downloaded posts.

We keep the post ids from HN. Since this is temporary data, we don't care about maintaining an auto increment id for the posts. This also allows us to import multiple times; for example, if we fix our import code, or our index updating code, we don't have to do anything to the posts themselves.

The only other note worthy thing item here is that I also download posts locally so that we don't touch the API more than needed. If we already downloaded a post, we don't have to download it again.

The importer is an executable package forum/hn_importer. The command takes one argument: the post to download.

go run forum/hn_importer 24649786

API / UI to view a single post

Having imported some posts, let's verify it by implementing a simple view page.

On the server side, we'll define a function that retrieves a single post by id


type PostQuery struct {
    PostId int
}

type PostResponse struct {
    Post Post
}

var PostNotFound = errors.New("PostNotFound")

func GetPost(ctx *vbeam.Context, req PostQuery) (resp PostResponse, err error) {
    if !vbolt.Read(ctx.Tx, PostsBkt, req.PostId, &resp.Post) {
        err = PostNotFound
    }
    return
}

On the client side we add a router entry so that navigating to “/item/:id” shows a page with the post's content.

We'll define two functions: one to fetch the post, one to view it.

export const itemHandler = {
    fetch: fetchPostItem,
    view: viewPostItem
}

async function fetchPostItem(route: string, prefix: string) {
    const postId = vlens.intUrlArg(route, prefix)
    return server.GetPost({PostId: postId})
}

const clsPostPlain = vlens.cssClass("post-plain", {
    // .. snipped
})

function viewPostItem(route: string, prefix: string, data: server.PostResponse) {
    return <div class={clsPostPlain}>
        <p>
            {data.Post.Content}
        </p>
        <a class="permalink" href={"/item/" + data.Post.Id}>Permalink: {data.Post.Id}</a>
    </div>
}

Now we can navigate to “/item/<id>” for some id to check how the content looks like.

Unfortunately I did not create a screenshot of what that would look like, so I can't really show it. I could show you the API response but you can already imagine what it would look like.

Originally I had intended to stream myself programming this. The idea was that I'd later cut things from the stream and share them here. However, the stream was not setup properly: half the screen was out of view, and it got cut abruptly half way through.

At any rate, the above is just the basic skeleton. We want to expand it so that the page shows not only the post itself, but all of its ancestors and all of its descendants.

API/UI to view all parents and replies

One of the major points I want to impart on you in this article is how we design the API response.

Take a look at the very first sketch in this article and consider what kind of data we need to implement it.

What we have is a list of posts. Depending on what we decide to add to the UI, we'll need some additional metadata about the post. For now, I want to display the username and the number of replies so the user can decide whether they want to "zoom in" on the specific comment or not.

Here's my API design:

Notice the response is just a container for basic types we already have.

This is in contrast to what most people would do, which is design a post object modeled around the UI.

They will think: the UI to view the post needs to know the username and the number of replies so let's create this response model this way:

Notice the comment in scary uppercase telling you that this is a bad idea.

Why is it a bad idea? Because it's creating additional work for yourself, with no return on investment. You have to write code on the server side to transform a regular Post object into a UIPostModel. This is not a useful data transformation. The data is already available in the Post object. Creating this parallel object that is very similar but also subtly different serves no purpose. The data provided is the same; it's just arranged in a different way.

What's the purpose of this re-arrangement? Nothing.

So many people hold in their minds a bunch of bogus notions about the problem domain and how your object model should reflect the business rules. Just pure non-sense.

The UI needs to display a list of posts in a specific order, so just provide the order in a list, and then provide the id -> post mapping.

The post has a user id and you need to get the username? Just slap another map. You need the number of replies to each post? Just slap another map.

Now the implementation is pretty straight forward. The list of posts is as follows:

Parent posts
Self (the requested post id)
Replies

So we start by reading these post ids off the database in that order (the post replies index)

Now resp.PostIds is the list of post ids we need to load. So we just load them! We already saw in a previous episode that we have a helper function that does just that.

Now we use the counting feature from the index to get the number of replies.

Now we load the users. This is the "trickiest" part of the code because it tries to avoid loading a user that's already been loaded.

This gives us all the data we need for the UI as we are prototyping.

Here's what the prototype UI looks like

Conversation UI prototype (data imported from HN)

The UI code is pretty straight forward and I don't think there's anything about it that is new or remarkable to make it worthy of showing here.

Refer to the code attached to this episode.

Here's a demo of importing posts from HN and viewing it in the UI

The perils of Clean Code adjacent paradigms

What do you notice about our implementation?

The same "Post" type is used for the storage layer, the application code, the API, the UI.

This should not come off as remarkable, it's just straight forward common sense.

However, a lot of "programming education" material out there teaches people a very different thing.

They teach people to completely isolate the different components of the system, and to separate the data types across the different components, so each component has its own representation of the data. They also teach you to use a "Data Transfer Object" to move data between layers.

So you would have a "Repository" layer that has its own representation of "Post" that it stores and loads from the database, and a "PostModel" object that it exposes to the outside world.

You would also have a "Domain" layer with its own representation of "Post". It would use the "repository.PostModel" to communicate with the Repository layer, and use another "api.PostModel" to communicate with the outside world through the API.

The UI will also have its own representation of the post, perhaps even several representations depending on context: PostViewModel, PostEditForm. It will have a convertor for each such type to and from the api PostModel.

If you follow these ideas, you will have to write tons of code. All of it noisy boilerplate that happens to be a great minefield for bugs to hide.

We're having none of it. This alone means we can be several times more productive than someone who programs using such paradigms. At least 3x more productive, as a conservative estimate.

Download the code: EP007.zip

View the code online: HandCraftedForum/tree/EP007

Hasen Judi

Discussion about this post