doc: documented ararat test VPS with template

doc: added server documentation template
doc: added social practices & common tools
2024-12-04 15:40:32 +01:00 · 2024-12-04 15:04:31 +01:00 · 2024-12-04 15:04:28 +01:00 · 2024-12-04 14:54:29 +01:00 · 2024-12-04 14:53:32 +01:00 · 2024-12-04 14:53:28 +01:00
16 changed files with 685 additions and 4 deletions
--- a/.gitignore
+++ b/.gitignore
@ -0,0 +1,2 @@
+pyinfra-debug.log
+.venv/
--- a/README.md
+++ b/README.md
@ -12,3 +12,211 @@ or
 - run `git pull` to fetch the newest version
 - run `pyinfra @local deploy.py` to install/update `0x90.ssh_config` trustmebro
 - run `pyinfra --dry inventory.py deploy.py` and check that you are on the same state that is already deployed
+
+
+# social practices
+
+maintainers: people who know (next to) everything and would be able to learn the rest
+adepts: people who are still learning about the infrastructure, but don't need to keep everything in mind
+associates: others, who just need to maintain a certain service
+
+Discussions can happen:
+- in presence (gathering), should happen at least every 3-4 months, to discuss the big picture
+- in presence (coworking), while working on new services
+- in issues and PRs for concrete proposals
+- in online calls to fix emergencies
+- in chat groups for exploring ideas and everything else
+
+
+## structure of this repository
+
+this repository documents the current state
+of the infrastructure.
+
+For each server/VM,
+it contains a directory with
+
+- a README.md file which gives an overview on the server
+- a pyinfra inventory.py file
+- a pyinfra deploy.py file which documents what's installed
+- the configuration files pyinfra deploys
+- optional: a deploy-restore.py file which can restore data from backup
+- optional: other pyinfra deploy files which only manage certain services or tasks, like upgrades
+
+The repository also contains a lib/ directory
+with pyinfra packages we reuse accross servers.
+
+With pull requests we can propose changes
+to the current infrastructure.
+PRs need to be approved by at least one maintainer.
+The pyinfra code in PRs can already be deployed,
+if it is not destructive - decide responsibly.
+
+
+## create a VM
+
+To add a new VM for a service you want to manage,
+
+0. Checkout a new branch with `git checkout -b your-server-name`
+1. Add your VM to inventory.py
+2. Create a directory for the VM
+3. Add your VM to ararat/deploy.py
+4. Ask the core team to run `pyinfra ararat.0x90.space ararat/deploy.py`
+   to create your VM
+5. Write your pyinfra deployment script in your-server-name/deploy.py
+6. Deploy it, if it doesn't work change it, repeat until the service works
+7. Copy TEMPLATE.md to your-server-name/README.md and fill it out.
+   You can leave out parts which are obvious from your deploy.py file.
+8. Commit your changes, push them to your branch,
+   open a pull request from your branch to the development branch,
+   and ask a maintainer to review and merge it
+
+
+## tools we use
+
+The hope is that you don't need to know all of these tools
+to already do useful things,
+but can systematically dive deeper into the infrastructure.
+
+### pass
+
+password manager to store passphrases and secrets,
+the repository with our secrets
+is at <https://git.0x90.space/links-tech/pass> for now.
+
+### ssh
+
+to connect to servers and VMs with root@,
+no sudo,
+root should have set a password,
+but via SSH, password access should be forbidden.
+
+There should be no shared SSH keys,
+one SSH key per person.
+SSH private keys should be password-protected
+and only stored on laptops
+with hard disk encryption.
+
+### systemctl & journalctl
+
+to look at status and log output of services.
+systemd is a good way of keeping services running,
+at least on Linux machines.
+On openBSD we will use /etc/rc.d/ scripts.
+
+### git
+
+for updating the documentation,
+pushing and pulling secrets,
+and opening PRs to doku/pyinfra repos.
+
+to be discussed:
+- Keep in mind that PRs can and will be deployed to servers. OR
+- The main branch should always reflect the state of the machine.
+
+### markdown + sembr
+
+for documenting the infrastructure.
+[Semantic line breaks](https://sembr.org/) are great
+for formatting text files
+which are managed in git.
+
+### kvm + virsh
+
+as a hypervisor
+which we can use to create VMs
+for specific services.
+
+The hypervisor is a minimal alpine linux,
+with "boot to RAM",
+the data-partition for the VM images is encrypted.
+
+### pyinfra
+
+as a nice declarative config tool for deployment.
+we can also maintain some of the things we need
+in extra python modules.
+
+pyinfra vs. ansible? ~> need to investigate. currently ansible setup on golem, pyinfra used in deltachat and 1 ezra service.
+
+### podman
+
+to isolate services in root-less containers.
+a podman container should run in a systemd process.
+it takes some practice to understand
+how to run commands inside a container
+or where the files are mounted.
+But it goes well with pyinfra
+if it's managed in systemd.
+
+### nftables
+
+as a declarative firewall
+which can be managed in pyinfra.
+
+### nginx
+
+as an HTTPS reverse proxy,
+passing traffic on to the podman containers.
+
+### acmetool
+
+as a tool to manage Let's Encrypt certificates,
+which goes well with pyinfra
+because of it's declarative nature.
+
+It also ships acmetool-redirector
+which redirects HTTP traffic on port 80
+to nginx on port 443.
+
+There is a pyinfra package for it at
+https://github.com/deltachat/pyinfra-acmetool/
+
+https://man.openbsd.org/acme-client + https://man.openbsd.org/relayd on OpenBSD
+
+### cron
+
+to schedule recurring tasks,
+like acmetool's certificate renewals
+or the nightly borgbackup runs.
+
+on OpenBSD already daily cronjob that executes /etc/daily.local
+
+### borgbackup
+
+can be used to back up application data
+in a nightly cron job.
+
+Backups need to be stored at an extra backup server.
+
+There is a pyinfra package for it at
+https://github.com/deltachat/pyinfra-borgbackup/
+
+might also look at restic ~> append-only backup better restricted
+
+### wireguard
+
+as a VPN to connect the backup server,
+which can be at some private house,
+with the production servers.
+
+### prometheus
+
+as a tool to measure service uptime
+and measure typical errors
+from journalctl output.
+It can expose metrics via HTTPS
+behind basic auth.
+
+### grafana
+
+as a visual dashboard to show service uptime
+and whether services throw errors.
+It can also send out email alerts.
+
+### team-bot
+
+a deltachat bot to receive support requests
+and email alerts from grafana.
+
+
--- a/TEMPLATE.md
+++ b/TEMPLATE.md
@ -0,0 +1,104 @@
+# Server: Server name
+
+## Usage
+
+Who is using this server?
+Who needs the server and will be affected if the server is not working?
+
+## Maintainers
+
+Who to ask about this server?
+
+## Domain Settings
+
+Where are the DNS settings? E.g. with Hetzner or in a DNS zone file.
+How to change DNS settings?
+Which domains and subdomains exist?
+
+## Hosting
+
+Where is the server hosted?
+Add a link to the hosting admin interface, e.g. <https://console.hetzner.cloud/>.
+
+## Services
+
+Which services are running there?
+E.g. there are a `www.example.org` and `ci.example.org` services.
+
+### Service: ci.example.org
+
+Each service has a greppable heading starting with `### Service: `.
+
+Which software the service is running? E.g. nginx.
+How was it deployed? E.g. manually or with pyinfra.
+How can the software be managed,
+Where the admin credentials are stored if you need to fix something (e.g. for mailcow)?
+Is there an admin chatgroup (e.g. for mailadm) and how to join it?
+
+#### Monitoring
+
+How to read the logs of the service?
+How admins are notified when the service is down?
+
+#### Deployment
+
+How the service was deployed?
+How to reinstall it?
+
+#### Upgrade Strategy
+
+How the service is upgraded?
+Which commands to run to upgrade it, e.g. where the upgrade script is located and how to run it?
+If there is an official documentation, put a link to it in this section.
+
+#### Maintainers
+
+Who to ask about the service?
+
+#### Integration
+
+How the service is related to other services running on this or other servers?
+E.g. service `ci.example.org` uses the secret storage `secrets.example.net` and runner `runner.example.com` hosted elsewhere.
+
+### Service: www.example.org
+
+Description similar to the other service.
+
+## Users
+
+Who has access to this server?
+
+Which admin accounts are there?
+Which service accounts are there?
+Which user accounts are there?
+
+## Monitoring
+
+How do we notice if something fails?
+
+Where do the errors show up?
+Where the logs for the services are located, e.g. Postfix logs go to `/var/log/mail.log`.
+
+## Upgrade Strategy
+
+How do we keep the services up to date?
+
+## Backup and Restore
+
+How the server is backed up and how to restore the backup?
+
+## Deployment
+
+How to reinstall the server?
+Which settings were selected to create the server? E.g. the operating system image.
+Are there deployment scripts, and if any, where they are located and how to run them?
+
+# Changelog
+
+## 2023-05-30 - Created the server
+
+Document the steps taken here.
+
+## 2023-06-10 - Installed nginx
+
+...
--- a/ararat/README.md
+++ b/ararat/README.md
@ -0,0 +1,158 @@
+# Server: ararat test VPS
+
+## Usage
+
+For now this server doesn't host any production services.
+
+## Maintainers
+
+- missytake@systemli.org
+
+## Domain Settings
+
+It doesn't have a domain pointing to it yet.
+
+## Hosting
+
+For now, the VPS is hosted in missytake's personal hetzner account.
+Ask them if you need something.
+
+## Deployment
+
+To deploy the server, run
+
+```
+pyinfra --yes inventory.py ararat/deploy.py --limit 95.217.163.200
+```
+
+You also need to run this after every reboot,
+to decrypt the encrypted volume
+and start the libvirt VMs.
+
+## Services
+
+### Service: kvm / libvirt
+
+This is a KVM hypervisor,
+which allows managing VMs with libvirt.
+
+You can use libvirt through the `virsh` command line tool.
+e.g. you can login via SSH as root
+and run `virsh list` to see running VMs.
+
+#### Monitoring
+
+It doesn't really need monitoring for now.
+
+#### Deployment
+
+The service is part of the pyinfra deploy.py file;
+you can deploy it with
+`pyinfra --yes inventory.py ararat/deploy.py --limit 95.217.163.200`.
+
+#### Upgrade Strategy
+
+As long as it is a test deployment,
+we don't need to upgrade it regularly.
+
+## Users
+
+There is only the root user,
+the SSH keys of missytake, hagi, and vmann are deployed via pyinfra.
+
+## Upgrade Strategy
+
+To upgrade the packages,
+you need to login via SSH and run `apk update && apk upgrade`.
+
+## Backup and Restore
+
+As long as it is a test deployment,
+we don't need backups.
+
+
+# Changelog
+
+## 2024-12-02 Set up alpine VPS on hetzner
+
+This was only tested with a cloud VPS so far.
+Source: <https://gist.github.com/c0m4r/e38d41d0e31f6adda4b4c5a88ba0a453>
+(but it's less of a hassle than described there)
+
+To create an alpine server on hetzner,
+you need to first create a Debian VPS or something similar.
+
+Then you boot into the rescue system.
+
+Get the download link of the latest VIRTUAL x86_64 alpine iso
+from <https://alpinelinux.org/downloads/>.
+
+Login to the rescue system via console or SSH,
+and write the ISO to the disk:
+
+```
+ssh root@xxxx:xxxx:xxxx:xxxx::1
+wipefs -a /dev/sda
+wget https://dl-cdn.alpinelinux.org/alpine/v3.20/releases/x86_64/alpine-virt-3.20.3-x86_64.iso  # or whatever link you got from alpine
+dd if=alpine-virt-3.20.3-x86_64.iso of=/dev/sda
+reboot
+```
+
+Then open the server console (SSH doesn't work),
+login to root (no password required),
+and proceed with:
+
+```
+cp -r /.modloop /root
+cp -r /media/sda /root
+umount /.modloop /media/sda
+rm /lib/modules
+mv /root/.modloop/modules /lib
+mv /root/sda /media
+setup-alpine
+```
+
+Then select what you wish,
+contrary to the guide above,
+DHCP is actually fine.
+The drive should be sda,
+the installation type can be sys
+(why go through the hassle).
+
+Voilà! reboot and login.
+Probably the first SSH login will be via root password,
+as copy-pasting your public SSH key into the console doesn't work really.
+Make sure the SSH config allows this
+(and turn passwort root access off afterwards).
+
+
+## 2024-12-02 Encrypting /var/lib/libvirt partition
+
+**Status: tested with Hetzner VPS, not deployed in production yet**
+
+Messing with file systems and partitions
+should not be done by automation scripts,
+so I created the LUKS-encrypted /dev/sdb partition manually.
+
+(So far, /dev/sdb was added via a Hetzner volume,
+but it can be any partition actually)
+
+To create a partition in the VPS volume
+(which was formatted to ext4 originally),
+- I ran `fdisk /dev/sdb`,
+- entered `o` to create a DOS partition table,
+- added `n` to add a new primary partition, using all available space,
+- and `w` to save to disk and exit.
+
+Then I ran `cryptsetup luksFormat /dev/sdb1`
+and entered the passphrase from `pass 0x90/ararat/sdb-crypt`
+to create a LUKS volume.
+
+Now I could decrypt the new volume with
+`cryptsetup luksOpen /dev/sdb1 sdb_crypt`
+and entering the passphrase from `pass 0x90/ararat/sdb-crypt`.
+
+Finally, I ran `mkfs.ext4`
+to create an ext4 file system
+in the encrypted partition.
+
--- a/ararat/deploy.py
+++ b/ararat/deploy.py
@ -0,0 +1,100 @@
+import os
+
+from pyinfra import host, inventory
+from pyinfra.operations import server, apk, files, openrc
+from pyinfra.facts.server import Mounts
+
+from pyinfra_util import get_pass
+
+
+files.replace(
+    name="Enable TCP forwarding via SSH server",
+    path="/etc/ssh/sshd_config",
+    text="AllowTcpForwarding no",
+    replace="AllowTcpForwarding yes",
+)
+openrc.service(
+    name="Restart sshd",
+    service="sshd",
+    restarted=True,
+)
+
+files.replace(
+    name="Enable community repository",
+    path="/etc/apk/repositories",
+    text="#http://dl-cdn.alpinelinux.org/alpine/v3.20/community",
+    replace="http://dl-cdn.alpinelinux.org/alpine/v3.20/community",
+)
+apk.update()
+apk.packages(
+    packages=["cryptsetup", "vim"]
+)
+
+mounts = host.get_fact(Mounts)
+if "/var/lib/libvirt" not in mounts:
+    decryption_password = get_pass('0x90/ararat/sdb-crypt').strip()
+    if decryption_password:
+        server.shell(
+            name="Decrypt and mount /data",
+            commands=[
+                f" echo -n '{decryption_password}' | cryptsetup luksOpen --key-file - /dev/sdb1 sdb_crypt || true",
+                "mount /dev/mapper/sdb_crypt /var/lib/libvirt",
+            ]
+        )
+
+apk.packages(
+    packages=["libvirt-daemon", "qemu-img", "qemu-system-x86_64", "virt-install"]
+)
+openrc.service(
+    name="Start libvirtd",
+    service="libvirtd",
+    running=True,
+    enabled=False,
+)
+
+# add networking: https://wiki.alpinelinux.org/wiki/KVM#Networking
+# modprobe tun
+# echo "tun" >> /etc/modules-load.d/tun.conf
+# cat /etc/modules | grep tun || echo tun >> /etc/modules
+
+# if it doesn't exist, create debian base image (later: and other base images): https://mop.koeln/blog/creating-a-local-debian-vm-using-cloud-init-and-libvirt/#download-the-image
+# for every active VM, if no image exists, run virt-install with the chosen base image and their cloud-init.yml file: https://mop.koeln/blog/creating-a-local-debian-vm-using-cloud-init-and-libvirt/#preparing-a-cloud-init-file
+debian_image_path = "/var/lib/libvirt/images/debian-12-generic-amd64.qcow2"
+files.download(
+    name="Download Debian 12 base image",
+    src="https://cloud.debian.org/images/cloud/bookworm/latest/debian-12-generic-amd64.qcow2",
+    dest=debian_image_path,
+)
+for vm in inventory.groups.get("debian_vms"):
+    if os.path.isfile(f"{vm}/files/cloud-init.yml"):
+        files.put(
+            name=f"Upload {vm}-cloud-init.yml",
+            src=f"{vm}/files/cloud-init.yml",
+            dest=f"/root/{vm}-cloud-init.yml",
+        )
+        #virt-install
+    else:
+        if vm.data.get("authorized_keys"):
+            authorized_keys = "ssh_authorized_keys:\n    - " + "    - ".join(
+                [get_pass(f"0x90/ssh_keys/{admin}.pub") for admin in vm.data.get("authorized_keys")]
+            )
+        else:
+            authorized_keys = ""
+        files.template(
+            name=f"Upload {vm}-cloud-init.yml",
+            src="ararat/files/cloud-init.yml.j2",
+            dest=f"/root/{vm}-cloud-init.yml",
+            ssh_authorized_keys=authorized_keys,
+        )
+        memory = 1024
+        vcpus = 1
+        disk_size = 4
+        server.shell(
+            name=f"virt-install {vm}",
+            commands=[
+                f"virt-install --name {vm} --disk=size={disk_size},backing_store={debian_image_path} "
+                f"--memory {memory} --vcpus {vcpus} --cloud-init user-data=/root/{vm}-cloud-init.yml,disable=on "
+                "--network bridge=virbr0 --osinfo=debian12 || true",
+            ]
+        )
+    # for every active VM, make sure an IP is assigned and traffic is passed to it
--- a/ararat/files/cloud-init.yml.j2
+++ b/ararat/files/cloud-init.yml.j2
@ -0,0 +1,25 @@
+#cloud-config
+
+keyboard:
+  layout: de
+  variant: nodeadkeys
+
+locale: en_US
+
+timezone: UTC
+
+disable_root: false
+
+users:
+  - name: root
+    shell: /bin/bash
+    {{ ssh_authorized_keys }}
+  - name: mop
+    # so our user can just sudo without any password
+    sudo: ALL=(ALL) NOPASSWD:ALL
+    shell: /bin/bash
+    # content from $HOME/.ssh/id_rsa.pub on your host system
+    ssh_authorized_keys:
+    - ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIKZYJ91RLXRCQ4ZmdW6ucIltzukQ/k+lDOqlRIYwxNRv missytake@systemli.org
+
+# Examples: https://cloudinit.readthedocs.io/en/latest/reference/examples_library.html#examples-library
--- a/inventory.py
+++ b/inventory.py
@ -1,5 +1,13 @@
-targets = [
-    "@local",
-    ("ararat.0x90.space", dict(ssh_port=42022)),
-    ("baixun.0x90.space", dict(ssh_port=42023)),
+localhost = "@local"
+
+hypervisor = [("95.217.163.200", dict(ssh_user="root"))]
+
+debian_vms = [
+#    "cloud",
+    (
+        "playground",
+        {
+            "authorized_keys": ["missytake", "hagi", "vmann"],
+        }
+     ),
 ]
--- a/lib/pyinfra-util/pyinfra_util.egg-info/PKG-INFO
+++ b/lib/pyinfra-util/pyinfra_util.egg-info/PKG-INFO
@ -0,0 +1,3 @@
+Metadata-Version: 2.1
+Name: pyinfra-util
+Version: 0.1
--- a/lib/pyinfra-util/pyinfra_util.egg-info/SOURCES.txt
+++ b/lib/pyinfra-util/pyinfra_util.egg-info/SOURCES.txt
@ -0,0 +1,7 @@
+pyproject.toml
+pyinfra_util/__init__.py
+pyinfra_util/util.py
+pyinfra_util.egg-info/PKG-INFO
+pyinfra_util.egg-info/SOURCES.txt
+pyinfra_util.egg-info/dependency_links.txt
+pyinfra_util.egg-info/top_level.txt
--- a/lib/pyinfra-util/pyinfra_util.egg-info/dependency_links.txt
+++ b/lib/pyinfra-util/pyinfra_util.egg-info/dependency_links.txt
@ -0,0 +1 @@
+
--- a/lib/pyinfra-util/pyinfra_util.egg-info/top_level.txt
+++ b/lib/pyinfra-util/pyinfra_util.egg-info/top_level.txt
@ -0,0 +1 @@
+pyinfra_util
--- a/lib/pyinfra-util/pyinfra_util/init.py
+++ b/lib/pyinfra-util/pyinfra_util/init.py
@ -0,0 +1 @@
+from .util import get_pass, deploy_tmux
--- a/lib/pyinfra-util/pyinfra_util/pycache/init.cpython-310.pyc
+++ b/lib/pyinfra-util/pyinfra_util/pycache/init.cpython-310.pyc
--- a/lib/pyinfra-util/pyinfra_util/pycache/util.cpython-310.pyc
+++ b/lib/pyinfra-util/pyinfra_util/pycache/util.cpython-310.pyc
--- a/lib/pyinfra-util/pyinfra_util/util.py
+++ b/lib/pyinfra-util/pyinfra_util/util.py
@ -0,0 +1,56 @@
+"""
+nginx deploy
+"""
+import subprocess
+from pyinfra.operations import files, apt
+
+
+def get_pass(filename: str) -> str:
+    """Get the data from the password manager."""
+    try:
+        r = subprocess.run(["pass", "show", filename], capture_output=True)
+    except FileNotFoundError:
+        readme_url = "https://git.0x90.space/deltachat/secrets"
+        print(f"Please install pass and pull the latest version of our pass secrets from {readme_url}")
+        exit()
+    return r.stdout.decode('utf-8')
+
+
+def deploy_tmux(home_dir="/root", escape_key="C-b", additional_config=[]):
+    apt.packages(
+        name="apt install tmux",
+        packages=["tmux"],
+    )
+
+    config = [
+        f"set-option -g prefix {escape_key}",
+        "set-option -g aggressive-resize on",
+        "set-option -g mouse on",
+        "set-option -g set-titles on",
+        "set-option -g set-titles-string '#I:#W - \"#H\"'",
+        "unbind-key C-b",
+        "bind-key ` send-prefix",
+        "bind-key a last-window",
+        "bind-key k kill-session",
+    ]
+    for item in additional_config:
+        config.append(item)
+    for line in config:
+        files.line(
+            path=f"{home_dir}/.tmux.conf",
+            line=line,
+        )
+
+    dot_profile_add = """
+# autostart tmux
+if [ -t 0 -a -z "$TMUX" ]
+then
+  test -z "$(tmux list-sessions)" && exec tmux new -s "$USER" || exec tmux new -A -s $(tty | tail -c +6) -t "$USER"
+fi
+"""
+    files.block(
+        name="connect to tmux session on login",
+        path=f"{home_dir}/.profile",
+        content=dot_profile_add,
+        try_prevent_shell_expansion=True,
+    )
--- a/lib/pyinfra-util/pyproject.toml
+++ b/lib/pyinfra-util/pyproject.toml
@ -0,0 +1,7 @@
+[build-system]
+requires = ["setuptools>=45"]
+build-backend = "setuptools.build_meta"
+
+[project]
+name = "pyinfra-util"
+version = "0.1"
Author	SHA1	Message	Date
missytake	cb6676ab65	doc: documented ararat test VPS with template	2024-12-04 15:40:32 +01:00
missytake	6f8c765f00	doc: added server documentation template	2024-12-04 15:04:31 +01:00
missytake	cf7850ecde	doc: added social practices & common tools	2024-12-04 15:04:28 +01:00
missytake	cc666a832b	added gitignore	2024-12-04 14:54:29 +01:00
missytake	67f06ce24d	ararat: Set up kvm, initiate debian VM	2024-12-04 14:53:32 +01:00
missytake	50e0547e23	util: added get_pass function	2024-12-04 14:53:28 +01:00
missytake	22870b7fd2	Created an alpine hypervisor VPS and playground VM	2024-12-04 14:51:50 +01:00
missytake	c6dc45b724	ararat: rough structure of deploy.py	2024-12-04 14:51:18 +01:00
missytake	aebff70524	doc: documented encrypting /var/lib/libvirt on a VPS	2024-12-04 14:50:49 +01:00
missytake	43ac0f3ac2	doc: documented alpine installation	2024-12-04 14:50:27 +01:00