I'm trying a new series where I take some hypothetical feature and explain what it takes to implement it. (In the context of web programming).
There will not be much in the way of actual code, but there should be enough details for most programmers to be able to implement the feature on their own as an exercise, even if they are not experienced enough to come up with the design on their own.
If you are experienced, everything here probably seems trivial. You might even wonder what's the point.
There's a phenomenon where experienced people don't realize the knowledge and experience they take for granted is actually not available to most people. I've noticed it when working with other people: things that seem obvious to me are puzzling to others. Likewise, some things are puzzling to me, but appear obvious to those proficient at them.
Authentication is not a particularly difficult problem. If anything, it's among the easiest problems you'll ever encounter. But it feels like a good starting point for the series, because almost any problem you want you solve will require you to have users and accounts first.
Background:
What does it mean for a user to register and login? It means they reserve a handle (e.g. a username, email address) to identify themselves, and they can use it to interact with the service as themselves.
This document will be limited to handling the following operations:
Registration
Sessions
Login
Logout
Email Verification
Password Reset
Almost all operations require the use of "tokens".
Outline of Operations:
Registration
The user reserves an identifier and specifies a password. Potentially providing other information, such as an email address and a phone number.
Email Verification
The user tells us back a secret token or short code that we sent to his email (or phone number).
Login
The user provides his username and password to prove that he is the one sitting on the machine, and as a result, a session is created.
Session
The website remembers that the user has logged in, without requiring him to re-enter his password all the time to verify himself.
The server issues a secret token to associate with the user id, the client keeps this token and sends it along with every request.
Password reset
If the user forgets his password, a fallback mechanism is provided via email.
Usually done by producing a password-reset token and sending an email with a special URL that, when clicked, opens a page that allows resetting the form.
Logout
The client forgets the session token, and optionally requests the server to expire the token
Important Considerations
The Session
The session token counts as a form of authentication, but it's weaker than the password. It's possible that some time after the user has logged in, someone else is now sitting at the machine.
Some operations should require the user to re-enter his password.
Viewing sensitive information
Attempting to change the password
Making a payment, specially if a large sum
Tokens need to expire after a while to minimize risks.
Building Blocks
Passwords
Passwords are to never be stored in plain text as-is. They must be encrypted using a "one way" encryption scheme, such that:
It's not practically possible to deduce the password given the encrypted form
It's very easy to validate a specific password against a specific hash
The consensus seems to be on the "bcrypt" algorithm.
Here's the Go implementation: https://pkg.go.dev/golang.org/x/crypto/bcrypt
Username and User Id:
The system should be designed to allow the user to change their username without breaking anything else in the system.
Now, whether you want to allow the user to change his identifier is a separate question.
The implication is that the actual user identification used throughout the system is a numerical id, and the user-facing handler is merely a way to get the actual user id.
Generating Tokens
Tokens are easy to generate: fill a 16 byte buffer with random data from the OS provided high entropy random data source (/dev/random on linux).
Then encode the buffer to base64 (or hex, or any other text encoding)
Verifying and using tokens
Some operations require sending the user a URL with a token.
When the page is first loaded, the server verifies the token: that it exists, has not expired, and is useable for the given purpose (password reset, etc).
When the form on the page is submitted, the token is submitted with it as well, and the server again verifies the token.
If the token is one-time use, the server expires the token after performing the requested operation.
Token expiration and deletion
Tokens are only useful for specific operations. There's no point in keeping old tokens around forever. Once a token is expired, it can either be deleted immediately, or left alone for some other process to delete old tokens.
For example, you can run a daily process to delete any token that has expired more than 10 days ago.
Sessions
A session is merely a token that identifies a user.
The client and the server agree on a special identifier name to use in request headers (or cookies) to identify the session token.
Data Design
To support all the process needed for authentication, we will define types, buckets, and indices.
Refer to the Data Design Spec for a brief explanation of the notation used.
type Account struct {
Id int // auto generated
Username string
Email string
EmailVerified bool
PasswordHash string
Created time.Time
}
type AccountToken struct {
Token string // unique (id)
Type string // session, email_verification, etc
AccountId int
Created timestamp
Expires timestamp // or can be defined as a duration
}
bucket Accounts(Id int, Account)
bucket Usernames(Username string, AccountId int)
bucket Emails(Email string, AccountId int)
bucket Tokens(Token string, AccountToken) // string to object
index AccountTokens(AccountId int, Token string) // iterates tokens associated with account
// Valid session token types
const TokenTypeSession = "session"
const TokenTypePasswordReset = "password_reset"
const TokenTypeEmailVerification = "email_verification"
Operations
Her's a basic description of operations in plain language.
New Account Registration
User provides desired username, email, and password
Verify that no existing account uses the username
By checking the Usernames index does not have an entry for the given username (and the same for the email, if you want)
Hash the password using bcrypt
Generate a new Id on the AccountsBucket
Each bucket has an auto incrementing sequence
Generate a verification token for the provided email, and send an email (as a background task) containing a verification link
Store the Account in the Accounts bucket.
Create a session token and send it to the client.
Verify Email:
A verification URL with a token is already sent to the email address
When the user opens the URL, the token will be sent to the server
The server looks up the token, makes sure it has not expired, and that it is
in fact an email verification token
The email associated with the token is now considered verified. Update the
EmailVerified
flag on the account.This token is one time use, so expire it and schedule it for deletion
Change Email:
When the user changes their email address, the following steps must be taken:
Set the
EmailVerified
flag on the account tofalse
Use the
AccountTokens
index to find any "pending" email verification token associated with the account and expire it (or delete it). This is crucial because the account is only associated with the account id rather than the email.Create a new verification token and send it (as a URL) to the provided email address. (same as what we did during registration)
Login:
User enters his username and password
The password is sent to the server in plain text
Use the
Usernames
bucket to find the account idLoad the account and compare the password against the
PasswordHash
on the account, using the bcrypt algorithm.Note: an empty password hash does not match any password!
If account exists and password matches, create a session token and return it to the client along with basic user information
If the client is SPA based, store the session token in localStorage and add it to the headers in every request using a special header name that both the client and the server agree on.
Password Reset:
If the user has forgotten their password, they are allowed to request a password reset.
The user enters their username or their email, which ever they happen to remember
The server finds the account id associated with the account, creates a password reset token, and sends it to the email associated with the account.
When the user clicks the URL, he is taken to a form for entering a new password
The new password and the token are sent to the server
The server verifies the token: that it has not expired, and that it is of the appropriate type
The password is hashed, and a session token is generated and returned to the client.
Password Change
If the user wants to change his password while logged in, we require him to enter his original password, because as mentioned above, the session is a weaker form of authentication than the password.
User enters existing password and new password
Existing password is validated against hashed password stored on the account
New password is hashed, and its hash replaces the old password hash on the account
Session Invalidation
This is an admin function: if there's suspicion that a user account has been hacked, it should be possible for an admin to invalidate all current sessions associated with the user account.
Use the
AccountTokens
index to find all tokens associated with the accountIf the token is not expired, expire it immediately
No need to check for token type. We actually should expire all tokens of all types, because each one can be used to gain illegitimate access to the account.
Session Recognition
You want a certain class of request handlers on the server side to require a valid session before proceeding with handling the request.
The exact requirements vary from one handler to another, but at a minimum, we want to return a "401 unauthorized" if the request does not have a session.
Grab the session token from the specified location: either a special http header, a special token name, or both
Find the token object, make sure its type is "session", and that is is not expired.
Find the account associated with it and make sure it exists.
If any of the above is not fulfilled, return a 401 response and stop end the handler
If all conditions are satisfied, keep around the session and account information so the rest of the request handler has access to them.
Conclusion
I have mixed feelings about writing this document. On one hand, I think this kind of data modelling and informal explanation of basic operations is useful in general. On the other hand, I'm not sure of:
How useful is it for this particular "feature":
To me it seems very basic. A few months ago I would have assumed that every programmer just has enough skills to come up with this on their own.
How detailed should I go:
I tried to keep it high level and abstract. I made almost no assumptions about programming language, framework, or database.
I kept it focused on the server side data model
How actionable this information is:
Would a team of beginners, having no prior experience implementing this feature, be able to use this document to actually create a fully working implementation?
Noise to signal ratio:
How much of the document is actually useful? How much of it is just blabber that makes it difficult to access the useful information hidden in it? (assuming there's any at all)