From cf7850ecde0a8963a58ff58763a116adf8f4109c Mon Sep 17 00:00:00 2001 From: missytake Date: Wed, 4 Dec 2024 14:41:40 +0100 Subject: [PATCH] doc: added social practices & common tools --- README.md | 207 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 207 insertions(+) diff --git a/README.md b/README.md index 43e7669..ec17d9f 100644 --- a/README.md +++ b/README.md @@ -14,6 +14,213 @@ or - run `pyinfra --dry inventory.py deploy.py` and check that you are on the same state that is already deployed +# social practices + +maintainers: people who know (next to) everything and would be able to learn the rest +adepts: people who are still learning about the infrastructure, but don't need to keep everything in mind +associates: others, who just need to maintain a certain service + +Discussions can happen: +- in presence (gathering), should happen at least every 3-4 months, to discuss the big picture +- in presence (coworking), while working on new services +- in issues and PRs for concrete proposals +- in online calls to fix emergencies +- in chat groups for exploring ideas and everything else + + +## structure of this repository + +this repository documents the current state +of the infrastructure. + +For each server/VM, +it contains a directory with + +- a README.md file which gives an overview on the server +- a pyinfra inventory.py file +- a pyinfra deploy.py file which documents what's installed +- the configuration files pyinfra deploys +- optional: a deploy-restore.py file which can restore data from backup +- optional: other pyinfra deploy files which only manage certain services or tasks, like upgrades + +The repository also contains a lib/ directory +with pyinfra packages we reuse accross servers. + +With pull requests we can propose changes +to the current infrastructure. +PRs need to be approved by at least one maintainer. +The pyinfra code in PRs can already be deployed, +if it is not destructive - decide responsibly. + + +## create a VM + +To add a new VM for a service you want to manage, + +0. Checkout a new branch with `git checkout -b your-server-name` +1. Add your VM to inventory.py +2. Create a directory for the VM +3. Add your VM to ararat/deploy.py +4. Ask the core team to run `pyinfra ararat.0x90.space ararat/deploy.py` + to create your VM +5. Write your pyinfra deployment script in your-server-name/deploy.py +6. Deploy it, if it doesn't work change it, repeat until the service works +7. Copy TEMPLATE.md to your-server-name/README.md and fill it out. + You can leave out parts which are obvious from your deploy.py file. +8. Commit your changes, push them to your branch, + open a pull request from your branch to the development branch, + and ask a maintainer to review and merge it + + +## tools we use + +The hope is that you don't need to know all of these tools +to already do useful things, +but can systematically dive deeper into the infrastructure. + +### pass + +password manager to store passphrases and secrets, +the repository with our secrets +is at for now. + +### ssh + +to connect to servers and VMs with root@, +no sudo, +root should have set a password, +but via SSH, password access should be forbidden. + +There should be no shared SSH keys, +one SSH key per person. +SSH private keys should be password-protected +and only stored on laptops +with hard disk encryption. + +### systemctl & journalctl + +to look at status and log output of services. +systemd is a good way of keeping services running, +at least on Linux machines. +On openBSD we will use /etc/rc.d/ scripts. + +### git + +for updating the documentation, +pushing and pulling secrets, +and opening PRs to doku/pyinfra repos. + +to be discussed: +- Keep in mind that PRs can and will be deployed to servers. OR +- The main branch should always reflect the state of the machine. + +### markdown + sembr + +for documenting the infrastructure. +[Semantic line breaks](https://sembr.org/) are great +for formatting text files +which are managed in git. + +### kvm + virsh + +as a hypervisor +which we can use to create VMs +for specific services. + +The hypervisor is a minimal alpine linux, +with "boot to RAM", +the data-partition for the VM images is encrypted. + +### pyinfra + +as a nice declarative config tool for deployment. +we can also maintain some of the things we need +in extra python modules. + +pyinfra vs. ansible? ~> need to investigate. currently ansible setup on golem, pyinfra used in deltachat and 1 ezra service. + +### podman + +to isolate services in root-less containers. +a podman container should run in a systemd process. +it takes some practice to understand +how to run commands inside a container +or where the files are mounted. +But it goes well with pyinfra +if it's managed in systemd. + +### nftables + +as a declarative firewall +which can be managed in pyinfra. + +### nginx + +as an HTTPS reverse proxy, +passing traffic on to the podman containers. + +### acmetool + +as a tool to manage Let's Encrypt certificates, +which goes well with pyinfra +because of it's declarative nature. + +It also ships acmetool-redirector +which redirects HTTP traffic on port 80 +to nginx on port 443. + +There is a pyinfra package for it at +https://github.com/deltachat/pyinfra-acmetool/ + +https://man.openbsd.org/acme-client + https://man.openbsd.org/relayd on OpenBSD + +### cron + +to schedule recurring tasks, +like acmetool's certificate renewals +or the nightly borgbackup runs. + +on OpenBSD already daily cronjob that executes /etc/daily.local + +### borgbackup + +can be used to back up application data +in a nightly cron job. + +Backups need to be stored at an extra backup server. + +There is a pyinfra package for it at +https://github.com/deltachat/pyinfra-borgbackup/ + +might also look at restic ~> append-only backup better restricted + +### wireguard + +as a VPN to connect the backup server, +which can be at some private house, +with the production servers. + +### prometheus + +as a tool to measure service uptime +and measure typical errors +from journalctl output. +It can expose metrics via HTTPS +behind basic auth. + +### grafana + +as a visual dashboard to show service uptime +and whether services throw errors. +It can also send out email alerts. + +### team-bot + +a deltachat bot to receive support requests +and email alerts from grafana. + + + # Set up alpine on hetzner This was only tested with a cloud VPS so far.