Carving a little island on the internet - Part 2
This is the second part of the little selfhosting adventure I embarked on when creating this blog.
We’ve already covered how to create a website with Hugo, setup an instance at a cloud provider then spin up nginx to serve that website. It’s now time to bring it to The WebTM.
Oh and we’ll also make it a breeze to push content out.
Open the (HTTP) floodgates
The network configuration of our instance is spread over multiple components. One of them is the Virtual Cloud Network (or VCN). The subnet we’ve configured and the route table that is setup by default will remain as is, but we’ll have to work on the ingress rules. These define what incoming traffic is authorized. Out of the box, the main thing to note is that it’s allowing SSH traffic on port 22.
We will start by allowing HTTP and HTTPS traffic. For that we simply add two
rules that allow TCP traffic from IP 0.0.0.0/0
, which is the CIDR
notation for “any IP”, on the standard HTTP and HTTPS ports, respectively 80
and 443.
Ideally we would only accept HTTPS traffic, but allowing HTTP will let us test our setup for now, until we grab a certificate.
The VCN is not the only place we need to configure, there’s also the instance’s
own firewall. We’ll have to use iptables
in order to add rules allowing
traffic on certain ports1. As a disclaimer, I’m in no way an expert
in this area, but what we’ll do is relatively basic.
We can run sudo iptables -L
first to check which rules are in place. On our
instance, some rules have already been configured, among which a couple
noteworthy ones:
Chain INPUT (policy ACCEPT)
target prot opt source destination
ACCEPT all -- anywhere anywhere state RELATED,ESTABLISHED
...
ACCEPT tcp -- anywhere anywhere state NEW tcp dpt:ssh
REJECT all -- anywhere anywhere reject-with icmp-host-prohibited
First, we have to note that these rules are grouped by chains, and applied in
the order they’re listed. In this case, we can see that the first rule allows
all traffic under the condition that a connection has already been established,
or that it’s a new connection associated with an existing connection (as per
the RELATED
state). Extra rules might follow, but we’ll stop on the next one
I’ve listed. This allow TCP traffic, for all sources and destinations,
permitting new connections on the destination port associated with ssh2.
Then we have a catch-all REJECT
rule which rejects everything. More
specifically, and that’s where the concept of rule chains kicks in, “everything
which hasn’t been explicitly ACCEPT
ed until now”.
Let’s add a rule to allow HTTP traffic coming in. We have to take one thing in
consideration, following the chain behaviour I’ve described: that rule must
come in before the catch-all REJECT
rule. Been there, done that, wondered
why there was no traffic coming in despite adding the rule. This is a potential
reason.
Here we go:
sudo iptables -I INPUT 3 -m state --state NEW -p tcp --dport 80 -j ACCEPT
There’s a bit to unpack here. First we specify -I INPUT
to insert, in the
INPUT chain. We then specify a number (here 3) which is the index at which we
want to insert the rule. Remember this has to be before the rejection rule3.
Then we specify a parameter that the request has to match with -m
. In this
case we’ll match on the connection state, in this case NEW
connections.
Next are the protocol we want to allow with -p tcp
and the destination port
with --dport 80
.
Finally we specify where to “jump” with the -j
option. We have the option of
jumping to another chain, or to specific built-in targets which decide
immediately what to do with the packet, such as the ACCEPT
target which
simply allows it.
We can do the same with port 443 to allow HTTPS traffic, and that should be
sufficient for the firewall. In the case where the default OUTPUT
policy is
not set to ACCEPT
or if there is a rule akin to the catch-all REJECT
rule
in our INPUT
policy, we also need to allow the traffic for established
connection to go out. We can do so with the following command:
sudo iptables -A OUTPUT -p tcp --sport 80 -m state --state ESTABLISHED -j ACCEPT
Here we specify that we want to add the rule to the OUTPUT
table, and the
other notable change is that we are now configuring it for the source port
with --sport 80
.
Once the firewall is setup, and the ingress rules have been configured on the VCN, nginx should be able to receive incoming HTTP request, and serve the responses. Once again, we can test that from another computer with
curl -H 'Host: example.com' <instance-ip>:80
and we should get the HTML response in return4.
Cool but I don’t want that lousy HTTP, I want s.e.c.u.r.i.t.y.
Setting up HTTP is pretty straightforward, but HTTPS is the standard today, to ensure we are actually communicating with the website that we’re accessing and to encrypt the traffic during the exchanges. For a personal blog this might be a bit overkill, but it’s still better to serve over HTTPS.
This is a pretty complicated topic, but we’ll go through the basics of setting it up for this project.
In order to enable HTTPS, we need a handful of things: a domain name, and a certificate from a CA (certificate authority). The domain name provides us with basically an alias for our public IP, and the certificate is a cryptographic document that (very roughly) lets clients validate that the server they are communicating with is actually the one they want to access, and establish an encrypted connection with the server.
I went with a simple domain name over at OVH, and then relied on Let’s Encrypt for the certificate. They’re a non-profit that provide Domain Validation certificate for free, with a very easy process. They use the Electronic Frontier Foundation Certbot tool to automate the certificate issuance and server configuration.
On Ubuntu, we simply need to install Certbot through snapd
5 with the
--classic
flag, then symlink it to a location where it can be picked up via
the PATH
variable, e.g. /usr/bin/certbot
. Finally, we get a certificate and
install it with the following command:
sudo certbot --nginx
and test that automatic renewal6 will work with
sudo certbot renew --dry-run
If all went well, certbot will have modified our nginx configuration, and we can now access the website through the HTTPS protocol.
Removing one manual step of the chain
We’ve got pretty much everything we need, and we could do with what we have now. However we’re developers, by nature we are lazy to some degree. Lazy not as in “I don’t want to do this manually everytime” but rather “I don’t want to take the risk to screw up each time I do that same manual task”.
Let’s add some automation. I’m using Gitlab, so we’ll set up a Gitlab CI/CD pipeline in order to deploy our website anytime we push a commit onto the main branch of our repository. The concept are fairly simple, and should be very easy to implement with CircleCI or Github Actions, whatever suits your environment.
Before we write our gitlab-ci.yml
, we need to configure a few variables in
our project’s settings. This is important to avoid committing any sensitive
parameters such as secrets, user login, etc. directly in the repository, which
seriously undermines the security of our project. To do so, we go to the
project, then Settings > CI/CD, and expand the Variables* section. Here we’ll
create three variables:
DEPLOYMENT_TARGET_ADDRESS
: the IP of our machine, since it might change in the future it’s handy to only have to change that in the settings and avoid a commitDEPLOYMENT_TARGET_USER_LOGIN
: with which user we’ll connect to our remote hostSSH_PRIVATE_KEY
: the private key we’ll use tossh
into the host, which should have its corresponding public key installed on the serverSSH_KNOWN_HOSTS
: the known host information that we will store to ensure that we are connecting to the right host
The first two are setup as standard variables, meaning they will be replaced by the string they contain. For the latter two, we can use the special “File” type variable, which means that it will be replaced by a path to a temporary file that contains the content of the variable.
Once we’ve setup these variables, we can write our configuration file:
image: golang:latest
variables:
DEPLOYMENT_TARGET_ADDRESS:
description: "The remote target address to deploy to"
DEPLOYMENT_TARGET_USER_LOGIN:
description: "The username with which to log into the deployment target"
SSH_PRIVATE_KEY:
description: "The private key to access the remote"
SSH_KNOWN_HOSTS:
description: "The known host entry for the remote"
before_script:
- 'which ssh-agent || ( apt-get update -y && apt-get install openssh-client git -y )'
- 'which rsync || (apt-get update -y && apt-get install rsync -y )'
- eval $(ssh-agent -s)
- chmod 400 "$SSH_PRIVATE_KEY"
- ssh-add "$SSH_PRIVATE_KEY"
- mkdir -p ~/.ssh
- chmod 700 ~/.ssh
- cp "$SSH_KNOWN_HOSTS" ~/.ssh/known_hosts
- chmod 644 ~/.ssh/known_hosts
- wget https://github.com/gohugoio/hugo/releases/download/v0.124.1/hugo_0.124.1_Linux-64bit.tar.gz
- tar -xzf hugo_0.124.1_Linux-64bit.tar.gz
- cp hugo $GOPATH/bin/hugo
- hugo version
workflow:
rules:
- if: $CI_COMMIT_TAG
when: never
- if: $CI_COMMIT_BRANCH == 'main'
build_and_deploy:
script:
- hugo
- rsync -avz --delete public/ "$DEPLOYMENT_TARGET_USER_LOGIN@$DEPLOYMENT_TARGET_ADDRESS:~/www"
Let’s break it down.
After listing our variables, we’ll setup the environment in a before_script
to be able to run hugo
and to access our remote host. To do so we ensure that
the ssh agent is installed, as well as rsync
.
before_script:
- 'which ssh-agent || ( apt-get update -y && apt-get install openssh-client git -y )'
- 'which rsync || (apt-get update -y && apt-get install rsync -y )'
We then setup our private key to be able to access our remote, setup the
known_hosts
file with the information of our remote, to make sure we can
connect to the host.
- eval $(ssh-agent -s)
- chmod 400 "$SSH_PRIVATE_KEY"
- ssh-add "$SSH_PRIVATE_KEY"
- mkdir -p ~/.ssh
- chmod 700 ~/.ssh
- cp "$SSH_KNOWN_HOSTS" ~/.ssh/known_hosts
- chmod 644 ~/.ssh/known_hosts
Finally we install Hugo
from the Github public releases of the project.
- wget https://github.com/gohugoio/hugo/releases/download/v0.124.1/hugo_0.124.1_Linux-64bit.tar.gz
- tar -xzf hugo_0.124.1_Linux-64bit.tar.gz
- cp hugo $GOPATH/bin/hugo
- hugo version
We then configure the workflow to never run by default, and only trigger when
there is a commit on the main
branch.
workflow:
rules:
- if: $CI_COMMIT_TAG
when: never
- if: $CI_COMMIT_BRANCH == 'main'
All that remains is to define a a simple job, build_and_deploy
, which will
render our website with hugo
, and use rsync
to copy the website content over SSH
to our remote host.
build_and_deploy:
script:
- hugo
- rsync -avz --delete public/ "$DEPLOYMENT_TARGET_USER_LOGIN@$DEPLOYMENT_TARGET_ADDRESS:~/www"
We notably use the --delete
option of rsync to remove files that are present
on the server but not in the public
folder created by Hugo, to keep things clean.
And voila! Now everytime we’ll push to the main
branch, the pipeline will run
and update the website’s content on the remote, which can then be served
immediately.
Wrapping it up
There’s a lot that I only skimmed, or haven’t even started to explain for all the steps we went through, but this gives a good picture of the basics needed to create a simple self-hosted setup.
I also haven’t mentioned some extra steps I took, notably related to security. For example, I’ve changed the port I use to ssh to a custom one, in order to avoid 98% of the low-effort malicious traffic that’s out there. As is customary I suppose when attempting that for the first time, and despite trying to make sure everything was correctly configured, I locked myself out of my instance of course. I had misconfigured the firewall and added the rule accepting traffic on my custom port after the catch-all REJECT rule, and didn’t first make sure I could connect with the new port before changing the sshd config. I also didn’t realize at first that you could specify multiple port to listen on in the sshd configuration.
There’s also a lot of extra steps I could take to harden security and reduce the fragility of the setup, such as:
- having a fixed public IP on the Oracle Cloud side which the VCN then routes to the instance, as currently I use the instance’s public IP, which is subject to change if I need to kill the instance, or if Oracle reclaim it and forces me to get a new one
- bundle all the instance setup steps into a nice script, which would greatly simplify setting up a new instance should the current one get bricked (which already happened once when I changed the ssh port and missed a step)
Finally, I have a couple things in mind that I would like to add to this setup, such as analytics. I’ve toyed with on-premise Plausible and umami so far, but haven’t reached anything conclusive for now. The easy setups for Plausible and Umami rely on docker, and the runtime performance I’ve noticed seem too high for the small instance I have for now. I’ll probably need to look into a lighter setup if I want to use these.
It still has many flaws, but for now, I have my little internet island, from where I can easily toss little messages in a bottle into the sea.
-
I know there’s friendlier ways of configuring a firewall on Ubuntu, but this was an opportunity to get to know
iptables
a bit, and on top of that, some of these such asufw
are known to be somewhat tricky to use with Oracle Cloud instances. ↩︎ -
We can always specify raw port numbers and iptables will use an alias if it’s a standard port number, i.e. here for port 22 it will list it as “the ssh port”. ↩︎
-
To keep it simple, it can just be the index of the rejection rule, pushing it down and inserting the new rule right before. One can check rule indices with
sudo iptables -L --line-numbers
to make sure. ↩︎ -
I insist.
iptables
is an extremely powerful and complex tool. While this made my setup work, and I believe it’s rather sane, I’m in no way an expert and there’s a entire rabbit hole of complexity to dive into so please do not underestimate that. ↩︎ -
On the topic of Ubuntu not being great in low resource environment, the fact that it comes with snap which is sometimes the only package manager that has certain packages is frustrating.
snapd
sometimes goes a bit wild in terms of resource consumption, which is far from ideal on our modest instance. ↩︎ -
The certbot package comes in with a cron job or systemd timer which will ensure that the certificates we were issued will be renewed before they expire. This will spare us of ending up with a website that won’t be accessible via HTTPS all of a sudden. ↩︎