From cf7850ecde0a8963a58ff58763a116adf8f4109c Mon Sep 17 00:00:00 2001 From: missytake Date: Wed, 4 Dec 2024 14:41:40 +0100 Subject: [PATCH 1/3] doc: added social practices & common tools --- README.md | 207 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 207 insertions(+) diff --git a/README.md b/README.md index 43e7669..ec17d9f 100644 --- a/README.md +++ b/README.md @@ -14,6 +14,213 @@ or - run `pyinfra --dry inventory.py deploy.py` and check that you are on the same state that is already deployed +# social practices + +maintainers: people who know (next to) everything and would be able to learn the rest +adepts: people who are still learning about the infrastructure, but don't need to keep everything in mind +associates: others, who just need to maintain a certain service + +Discussions can happen: +- in presence (gathering), should happen at least every 3-4 months, to discuss the big picture +- in presence (coworking), while working on new services +- in issues and PRs for concrete proposals +- in online calls to fix emergencies +- in chat groups for exploring ideas and everything else + + +## structure of this repository + +this repository documents the current state +of the infrastructure. + +For each server/VM, +it contains a directory with + +- a README.md file which gives an overview on the server +- a pyinfra inventory.py file +- a pyinfra deploy.py file which documents what's installed +- the configuration files pyinfra deploys +- optional: a deploy-restore.py file which can restore data from backup +- optional: other pyinfra deploy files which only manage certain services or tasks, like upgrades + +The repository also contains a lib/ directory +with pyinfra packages we reuse accross servers. + +With pull requests we can propose changes +to the current infrastructure. +PRs need to be approved by at least one maintainer. +The pyinfra code in PRs can already be deployed, +if it is not destructive - decide responsibly. + + +## create a VM + +To add a new VM for a service you want to manage, + +0. Checkout a new branch with `git checkout -b your-server-name` +1. Add your VM to inventory.py +2. Create a directory for the VM +3. Add your VM to ararat/deploy.py +4. Ask the core team to run `pyinfra ararat.0x90.space ararat/deploy.py` + to create your VM +5. Write your pyinfra deployment script in your-server-name/deploy.py +6. Deploy it, if it doesn't work change it, repeat until the service works +7. Copy TEMPLATE.md to your-server-name/README.md and fill it out. + You can leave out parts which are obvious from your deploy.py file. +8. Commit your changes, push them to your branch, + open a pull request from your branch to the development branch, + and ask a maintainer to review and merge it + + +## tools we use + +The hope is that you don't need to know all of these tools +to already do useful things, +but can systematically dive deeper into the infrastructure. + +### pass + +password manager to store passphrases and secrets, +the repository with our secrets +is at for now. + +### ssh + +to connect to servers and VMs with root@, +no sudo, +root should have set a password, +but via SSH, password access should be forbidden. + +There should be no shared SSH keys, +one SSH key per person. +SSH private keys should be password-protected +and only stored on laptops +with hard disk encryption. + +### systemctl & journalctl + +to look at status and log output of services. +systemd is a good way of keeping services running, +at least on Linux machines. +On openBSD we will use /etc/rc.d/ scripts. + +### git + +for updating the documentation, +pushing and pulling secrets, +and opening PRs to doku/pyinfra repos. + +to be discussed: +- Keep in mind that PRs can and will be deployed to servers. OR +- The main branch should always reflect the state of the machine. + +### markdown + sembr + +for documenting the infrastructure. +[Semantic line breaks](https://sembr.org/) are great +for formatting text files +which are managed in git. + +### kvm + virsh + +as a hypervisor +which we can use to create VMs +for specific services. + +The hypervisor is a minimal alpine linux, +with "boot to RAM", +the data-partition for the VM images is encrypted. + +### pyinfra + +as a nice declarative config tool for deployment. +we can also maintain some of the things we need +in extra python modules. + +pyinfra vs. ansible? ~> need to investigate. currently ansible setup on golem, pyinfra used in deltachat and 1 ezra service. + +### podman + +to isolate services in root-less containers. +a podman container should run in a systemd process. +it takes some practice to understand +how to run commands inside a container +or where the files are mounted. +But it goes well with pyinfra +if it's managed in systemd. + +### nftables + +as a declarative firewall +which can be managed in pyinfra. + +### nginx + +as an HTTPS reverse proxy, +passing traffic on to the podman containers. + +### acmetool + +as a tool to manage Let's Encrypt certificates, +which goes well with pyinfra +because of it's declarative nature. + +It also ships acmetool-redirector +which redirects HTTP traffic on port 80 +to nginx on port 443. + +There is a pyinfra package for it at +https://github.com/deltachat/pyinfra-acmetool/ + +https://man.openbsd.org/acme-client + https://man.openbsd.org/relayd on OpenBSD + +### cron + +to schedule recurring tasks, +like acmetool's certificate renewals +or the nightly borgbackup runs. + +on OpenBSD already daily cronjob that executes /etc/daily.local + +### borgbackup + +can be used to back up application data +in a nightly cron job. + +Backups need to be stored at an extra backup server. + +There is a pyinfra package for it at +https://github.com/deltachat/pyinfra-borgbackup/ + +might also look at restic ~> append-only backup better restricted + +### wireguard + +as a VPN to connect the backup server, +which can be at some private house, +with the production servers. + +### prometheus + +as a tool to measure service uptime +and measure typical errors +from journalctl output. +It can expose metrics via HTTPS +behind basic auth. + +### grafana + +as a visual dashboard to show service uptime +and whether services throw errors. +It can also send out email alerts. + +### team-bot + +a deltachat bot to receive support requests +and email alerts from grafana. + + + # Set up alpine on hetzner This was only tested with a cloud VPS so far. -- 2.43.5 From 6f8c765f007aabcd61227e955dc6c398d9aebd5e Mon Sep 17 00:00:00 2001 From: missytake Date: Wed, 4 Dec 2024 14:44:20 +0100 Subject: [PATCH 2/3] doc: added server documentation template --- TEMPLATE.md | 104 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 104 insertions(+) create mode 100644 TEMPLATE.md diff --git a/TEMPLATE.md b/TEMPLATE.md new file mode 100644 index 0000000..4753ca9 --- /dev/null +++ b/TEMPLATE.md @@ -0,0 +1,104 @@ +# Server: Server name + +## Usage + +Who is using this server? +Who needs the server and will be affected if the server is not working? + +## Maintainers + +Who to ask about this server? + +## Domain Settings + +Where are the DNS settings? E.g. with Hetzner or in a DNS zone file. +How to change DNS settings? +Which domains and subdomains exist? + +## Hosting + +Where is the server hosted? +Add a link to the hosting admin interface, e.g. . + +## Services + +Which services are running there? +E.g. there are a `www.example.org` and `ci.example.org` services. + +### Service: ci.example.org + +Each service has a greppable heading starting with `### Service: `. + +Which software the service is running? E.g. nginx. +How was it deployed? E.g. manually or with pyinfra. +How can the software be managed, +Where the admin credentials are stored if you need to fix something (e.g. for mailcow)? +Is there an admin chatgroup (e.g. for mailadm) and how to join it? + +#### Monitoring + +How to read the logs of the service? +How admins are notified when the service is down? + +#### Deployment + +How the service was deployed? +How to reinstall it? + +#### Upgrade Strategy + +How the service is upgraded? +Which commands to run to upgrade it, e.g. where the upgrade script is located and how to run it? +If there is an official documentation, put a link to it in this section. + +#### Maintainers + +Who to ask about the service? + +#### Integration + +How the service is related to other services running on this or other servers? +E.g. service `ci.example.org` uses the secret storage `secrets.example.net` and runner `runner.example.com` hosted elsewhere. + +### Service: www.example.org + +Description similar to the other service. + +## Users + +Who has access to this server? + +Which admin accounts are there? +Which service accounts are there? +Which user accounts are there? + +## Monitoring + +How do we notice if something fails? + +Where do the errors show up? +Where the logs for the services are located, e.g. Postfix logs go to `/var/log/mail.log`. + +## Upgrade Strategy + +How do we keep the services up to date? + +## Backup and Restore + +How the server is backed up and how to restore the backup? + +## Deployment + +How to reinstall the server? +Which settings were selected to create the server? E.g. the operating system image. +Are there deployment scripts, and if any, where they are located and how to run them? + +# Changelog + +## 2023-05-30 - Created the server + +Document the steps taken here. + +## 2023-06-10 - Installed nginx + +... -- 2.43.5 From cb6676ab655ce73a92c485ebbb3bce6ba1b36dcb Mon Sep 17 00:00:00 2001 From: missytake Date: Wed, 4 Dec 2024 15:40:32 +0100 Subject: [PATCH 3/3] doc: documented ararat test VPS with template --- README.md | 84 ------------------------- ararat/README.md | 158 +++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 158 insertions(+), 84 deletions(-) create mode 100644 ararat/README.md diff --git a/README.md b/README.md index ec17d9f..6df4471 100644 --- a/README.md +++ b/README.md @@ -220,87 +220,3 @@ a deltachat bot to receive support requests and email alerts from grafana. - -# Set up alpine on hetzner - -This was only tested with a cloud VPS so far. -Source: -(but it's less of a hassle than described there) - -To create an alpine server on hetzner, -you need to first create a Debian VPS or something similar. - -Then you boot into the rescue system. - -Get the download link of the latest VIRTUAL x86_64 alpine iso -from . - -Login to the rescue system via console or SSH, -and write the ISO to the disk: - -``` -ssh root@xxxx:xxxx:xxxx:xxxx::1 -wipefs -a /dev/sda -wget https://dl-cdn.alpinelinux.org/alpine/v3.20/releases/x86_64/alpine-virt-3.20.3-x86_64.iso # or whatever link you got from alpine -dd if=alpine-virt-3.20.3-x86_64.iso of=/dev/sda -reboot -``` - -Then open the server console (SSH doesn't work), -login to root (no password required), -and proceed with: - -``` -cp -r /.modloop /root -cp -r /media/sda /root -umount /.modloop /media/sda -rm /lib/modules -mv /root/.modloop/modules /lib -mv /root/sda /media -setup-alpine -``` - -Then select what you wish, -contrary to the guide above, -DHCP is actually fine. -The drive should be sda, -the installation type can be sys -(why go through the hassle). - -VoilĂ ! reboot and login. -Probably the first SSH login will be via root password, -as copy-pasting your public SSH key into the console doesn't work really. -Make sure the SSH config allows this -(and turn passwort root access off afterwards). - - -## Encrypting /var/lib/libvirt partition - -**Status: tested with Hetzner VPS, not deployed in production yet** - -Messing with file systems and partitions -should not be done by automation scripts, -so I created the LUKS-encrypted /dev/sdb partition manually. - -(So far, /dev/sdb was added via a Hetzner volume, -but it can be any partition actually) - -To create a partition in the VPS volume -(which was formatted to ext4 originally), -- I ran `fdisk /dev/sdb`, -- entered `o` to create a DOS partition table, -- added `n` to add a new primary partition, using all available space, -- and `w` to save to disk and exit. - -Then I ran `cryptsetup luksFormat /dev/sdb1` -and entered the passphrase from `pass 0x90/ararat/sdb-crypt` -to create a LUKS volume. - -Now I could decrypt the new volume with -`cryptsetup luksOpen /dev/sdb1 sdb_crypt` -and entering the passphrase from `pass 0x90/ararat/sdb-crypt`. - -Finally, I ran `mkfs.ext4` -to create an ext4 file system -in the encrypted partition. - diff --git a/ararat/README.md b/ararat/README.md new file mode 100644 index 0000000..9a262c1 --- /dev/null +++ b/ararat/README.md @@ -0,0 +1,158 @@ +# Server: ararat test VPS + +## Usage + +For now this server doesn't host any production services. + +## Maintainers + +- missytake@systemli.org + +## Domain Settings + +It doesn't have a domain pointing to it yet. + +## Hosting + +For now, the VPS is hosted in missytake's personal hetzner account. +Ask them if you need something. + +## Deployment + +To deploy the server, run + +``` +pyinfra --yes inventory.py ararat/deploy.py --limit 95.217.163.200 +``` + +You also need to run this after every reboot, +to decrypt the encrypted volume +and start the libvirt VMs. + +## Services + +### Service: kvm / libvirt + +This is a KVM hypervisor, +which allows managing VMs with libvirt. + +You can use libvirt through the `virsh` command line tool. +e.g. you can login via SSH as root +and run `virsh list` to see running VMs. + +#### Monitoring + +It doesn't really need monitoring for now. + +#### Deployment + +The service is part of the pyinfra deploy.py file; +you can deploy it with +`pyinfra --yes inventory.py ararat/deploy.py --limit 95.217.163.200`. + +#### Upgrade Strategy + +As long as it is a test deployment, +we don't need to upgrade it regularly. + +## Users + +There is only the root user, +the SSH keys of missytake, hagi, and vmann are deployed via pyinfra. + +## Upgrade Strategy + +To upgrade the packages, +you need to login via SSH and run `apk update && apk upgrade`. + +## Backup and Restore + +As long as it is a test deployment, +we don't need backups. + + +# Changelog + +## 2024-12-02 Set up alpine VPS on hetzner + +This was only tested with a cloud VPS so far. +Source: +(but it's less of a hassle than described there) + +To create an alpine server on hetzner, +you need to first create a Debian VPS or something similar. + +Then you boot into the rescue system. + +Get the download link of the latest VIRTUAL x86_64 alpine iso +from . + +Login to the rescue system via console or SSH, +and write the ISO to the disk: + +``` +ssh root@xxxx:xxxx:xxxx:xxxx::1 +wipefs -a /dev/sda +wget https://dl-cdn.alpinelinux.org/alpine/v3.20/releases/x86_64/alpine-virt-3.20.3-x86_64.iso # or whatever link you got from alpine +dd if=alpine-virt-3.20.3-x86_64.iso of=/dev/sda +reboot +``` + +Then open the server console (SSH doesn't work), +login to root (no password required), +and proceed with: + +``` +cp -r /.modloop /root +cp -r /media/sda /root +umount /.modloop /media/sda +rm /lib/modules +mv /root/.modloop/modules /lib +mv /root/sda /media +setup-alpine +``` + +Then select what you wish, +contrary to the guide above, +DHCP is actually fine. +The drive should be sda, +the installation type can be sys +(why go through the hassle). + +VoilĂ ! reboot and login. +Probably the first SSH login will be via root password, +as copy-pasting your public SSH key into the console doesn't work really. +Make sure the SSH config allows this +(and turn passwort root access off afterwards). + + +## 2024-12-02 Encrypting /var/lib/libvirt partition + +**Status: tested with Hetzner VPS, not deployed in production yet** + +Messing with file systems and partitions +should not be done by automation scripts, +so I created the LUKS-encrypted /dev/sdb partition manually. + +(So far, /dev/sdb was added via a Hetzner volume, +but it can be any partition actually) + +To create a partition in the VPS volume +(which was formatted to ext4 originally), +- I ran `fdisk /dev/sdb`, +- entered `o` to create a DOS partition table, +- added `n` to add a new primary partition, using all available space, +- and `w` to save to disk and exit. + +Then I ran `cryptsetup luksFormat /dev/sdb1` +and entered the passphrase from `pass 0x90/ararat/sdb-crypt` +to create a LUKS volume. + +Now I could decrypt the new volume with +`cryptsetup luksOpen /dev/sdb1 sdb_crypt` +and entering the passphrase from `pass 0x90/ararat/sdb-crypt`. + +Finally, I ran `mkfs.ext4` +to create an ext4 file system +in the encrypted partition. + -- 2.43.5