A proposal for social practices, preferred tools, and documentation #2

Open
missytake wants to merge 3 commits from social-practices into kvm-base
3 changed files with 456 additions and 71 deletions

265
README.md
View file

@ -14,86 +14,209 @@ or
- run `pyinfra --dry inventory.py deploy.py` and check that you are on the same state that is already deployed
# Set up alpine on hetzner
# social practices
This was only tested with a cloud VPS so far.
Source: <https://gist.github.com/c0m4r/e38d41d0e31f6adda4b4c5a88ba0a453>
(but it's less of a hassle than described there)
maintainers: people who know (next to) everything and would be able to learn the rest
adepts: people who are still learning about the infrastructure, but don't need to keep everything in mind
associates: others, who just need to maintain a certain service
To create an alpine server on hetzner,
you need to first create a Debian VPS or something similar.
Then you boot into the rescue system.
Get the download link of the latest VIRTUAL x86_64 alpine iso
from <https://alpinelinux.org/downloads/>.
Login to the rescue system via console or SSH,
and write the ISO to the disk:
```
ssh root@xxxx:xxxx:xxxx:xxxx::1
wipefs -a /dev/sda
wget https://dl-cdn.alpinelinux.org/alpine/v3.20/releases/x86_64/alpine-virt-3.20.3-x86_64.iso # or whatever link you got from alpine
dd if=alpine-virt-3.20.3-x86_64.iso of=/dev/sda
reboot
```
Then open the server console (SSH doesn't work),
login to root (no password required),
and proceed with:
```
cp -r /.modloop /root
cp -r /media/sda /root
umount /.modloop /media/sda
rm /lib/modules
mv /root/.modloop/modules /lib
mv /root/sda /media
setup-alpine
```
Then select what you wish,
contrary to the guide above,
DHCP is actually fine.
The drive should be sda,
the installation type can be sys
(why go through the hassle).
Voilà! reboot and login.
Probably the first SSH login will be via root password,
as copy-pasting your public SSH key into the console doesn't work really.
Make sure the SSH config allows this
(and turn passwort root access off afterwards).
Discussions can happen:
- in presence (gathering), should happen at least every 3-4 months, to discuss the big picture
- in presence (coworking), while working on new services
- in issues and PRs for concrete proposals
- in online calls to fix emergencies
- in chat groups for exploring ideas and everything else
## Encrypting /var/lib/libvirt partition
## structure of this repository
**Status: tested with Hetzner VPS, not deployed in production yet**
this repository documents the current state
of the infrastructure.
Messing with file systems and partitions
should not be done by automation scripts,
so I created the LUKS-encrypted /dev/sdb partition manually.
For each server/VM,
it contains a directory with
(So far, /dev/sdb was added via a Hetzner volume,
but it can be any partition actually)
- a README.md file which gives an overview on the server
- a pyinfra inventory.py file
- a pyinfra deploy.py file which documents what's installed
- the configuration files pyinfra deploys
- optional: a deploy-restore.py file which can restore data from backup
- optional: other pyinfra deploy files which only manage certain services or tasks, like upgrades
To create a partition in the VPS volume
(which was formatted to ext4 originally),
- I ran `fdisk /dev/sdb`,
- entered `o` to create a DOS partition table,
- added `n` to add a new primary partition, using all available space,
- and `w` to save to disk and exit.
The repository also contains a lib/ directory
with pyinfra packages we reuse accross servers.
Then I ran `cryptsetup luksFormat /dev/sdb1`
and entered the passphrase from `pass 0x90/ararat/sdb-crypt`
to create a LUKS volume.
With pull requests we can propose changes
to the current infrastructure.
PRs need to be approved by at least one maintainer.
The pyinfra code in PRs can already be deployed,
if it is not destructive - decide responsibly.
Now I could decrypt the new volume with
`cryptsetup luksOpen /dev/sdb1 sdb_crypt`
and entering the passphrase from `pass 0x90/ararat/sdb-crypt`.
Finally, I ran `mkfs.ext4`
to create an ext4 file system
in the encrypted partition.
## create a VM
To add a new VM for a service you want to manage,
0. Checkout a new branch with `git checkout -b your-server-name`
1. Add your VM to inventory.py
2. Create a directory for the VM
3. Add your VM to ararat/deploy.py
4. Ask the core team to run `pyinfra ararat.0x90.space ararat/deploy.py`
to create your VM
5. Write your pyinfra deployment script in your-server-name/deploy.py
6. Deploy it, if it doesn't work change it, repeat until the service works
7. Copy TEMPLATE.md to your-server-name/README.md and fill it out.
You can leave out parts which are obvious from your deploy.py file.
8. Commit your changes, push them to your branch,
open a pull request from your branch to the development branch,
and ask a maintainer to review and merge it
## tools we use
The hope is that you don't need to know all of these tools
to already do useful things,
but can systematically dive deeper into the infrastructure.
### pass
password manager to store passphrases and secrets,
the repository with our secrets
is at <https://git.0x90.space/links-tech/pass> for now.
### ssh
to connect to servers and VMs with root@,
no sudo,
root should have set a password,
but via SSH, password access should be forbidden.
There should be no shared SSH keys,
one SSH key per person.
SSH private keys should be password-protected
and only stored on laptops
with hard disk encryption.
### systemctl & journalctl
to look at status and log output of services.
systemd is a good way of keeping services running,
at least on Linux machines.
On openBSD we will use /etc/rc.d/ scripts.
### git
for updating the documentation,
pushing and pulling secrets,
and opening PRs to doku/pyinfra repos.
to be discussed:
- Keep in mind that PRs can and will be deployed to servers. OR
- The main branch should always reflect the state of the machine.
### markdown + sembr
for documenting the infrastructure.
[Semantic line breaks](https://sembr.org/) are great
for formatting text files
which are managed in git.
### kvm + virsh
as a hypervisor
which we can use to create VMs
for specific services.
The hypervisor is a minimal alpine linux,
with "boot to RAM",
the data-partition for the VM images is encrypted.
### pyinfra
as a nice declarative config tool for deployment.
we can also maintain some of the things we need
in extra python modules.
pyinfra vs. ansible? ~> need to investigate. currently ansible setup on golem, pyinfra used in deltachat and 1 ezra service.
### podman
to isolate services in root-less containers.
a podman container should run in a systemd process.
it takes some practice to understand
how to run commands inside a container
or where the files are mounted.
But it goes well with pyinfra
if it's managed in systemd.
### nftables
as a declarative firewall
which can be managed in pyinfra.
### nginx
as an HTTPS reverse proxy,
passing traffic on to the podman containers.
### acmetool
as a tool to manage Let's Encrypt certificates,
which goes well with pyinfra
because of it's declarative nature.
It also ships acmetool-redirector
which redirects HTTP traffic on port 80
to nginx on port 443.
There is a pyinfra package for it at
https://github.com/deltachat/pyinfra-acmetool/
https://man.openbsd.org/acme-client + https://man.openbsd.org/relayd on OpenBSD
### cron
to schedule recurring tasks,
like acmetool's certificate renewals
or the nightly borgbackup runs.
on OpenBSD already daily cronjob that executes /etc/daily.local
### borgbackup
can be used to back up application data
in a nightly cron job.
Backups need to be stored at an extra backup server.
There is a pyinfra package for it at
https://github.com/deltachat/pyinfra-borgbackup/
might also look at restic ~> append-only backup better restricted
### wireguard
as a VPN to connect the backup server,
which can be at some private house,
with the production servers.
### prometheus
as a tool to measure service uptime
and measure typical errors
from journalctl output.
It can expose metrics via HTTPS
behind basic auth.
### grafana
as a visual dashboard to show service uptime
and whether services throw errors.
It can also send out email alerts.
### team-bot
a deltachat bot to receive support requests
and email alerts from grafana.
Review

Ich würde noch nix erwähnen, einfach weil einige Leute im Space das lieber als Pyinfra benutzen. Aber das kann dann auch einer der NixOS-Menschen schreiben :)

Ich würde noch nix erwähnen, einfach weil einige Leute im Space das lieber als Pyinfra benutzen. Aber das kann dann auch einer der NixOS-Menschen schreiben :)

104
TEMPLATE.md Normal file
View file

@ -0,0 +1,104 @@
# Server: Server name
## Usage
Who is using this server?
Who needs the server and will be affected if the server is not working?
## Maintainers
Who to ask about this server?
## Domain Settings
Where are the DNS settings? E.g. with Hetzner or in a DNS zone file.
How to change DNS settings?
Which domains and subdomains exist?
## Hosting
Where is the server hosted?
Add a link to the hosting admin interface, e.g. <https://console.hetzner.cloud/>.
## Services
Which services are running there?
E.g. there are a `www.example.org` and `ci.example.org` services.
### Service: ci.example.org
Each service has a greppable heading starting with `### Service: `.
Which software the service is running? E.g. nginx.
How was it deployed? E.g. manually or with pyinfra.
How can the software be managed,
Where the admin credentials are stored if you need to fix something (e.g. for mailcow)?
Is there an admin chatgroup (e.g. for mailadm) and how to join it?
#### Monitoring
How to read the logs of the service?
How admins are notified when the service is down?
#### Deployment
How the service was deployed?
How to reinstall it?
#### Upgrade Strategy
How the service is upgraded?
Which commands to run to upgrade it, e.g. where the upgrade script is located and how to run it?
If there is an official documentation, put a link to it in this section.
#### Maintainers
Who to ask about the service?
#### Integration
How the service is related to other services running on this or other servers?
E.g. service `ci.example.org` uses the secret storage `secrets.example.net` and runner `runner.example.com` hosted elsewhere.
### Service: www.example.org
Description similar to the other service.
## Users
Who has access to this server?
Which admin accounts are there?
Which service accounts are there?
Which user accounts are there?
## Monitoring
How do we notice if something fails?
Where do the errors show up?
Where the logs for the services are located, e.g. Postfix logs go to `/var/log/mail.log`.
## Upgrade Strategy
How do we keep the services up to date?
## Backup and Restore
How the server is backed up and how to restore the backup?
## Deployment
How to reinstall the server?
Which settings were selected to create the server? E.g. the operating system image.
Are there deployment scripts, and if any, where they are located and how to run them?
# Changelog
## 2023-05-30 - Created the server
Document the steps taken here.
## 2023-06-10 - Installed nginx
...

158
ararat/README.md Normal file
View file

@ -0,0 +1,158 @@
# Server: ararat test VPS
## Usage
For now this server doesn't host any production services.
## Maintainers
- missytake@systemli.org
## Domain Settings
It doesn't have a domain pointing to it yet.
## Hosting
For now, the VPS is hosted in missytake's personal hetzner account.
Ask them if you need something.
## Deployment
To deploy the server, run
```
pyinfra --yes inventory.py ararat/deploy.py --limit 95.217.163.200
```
You also need to run this after every reboot,
to decrypt the encrypted volume
and start the libvirt VMs.
## Services
### Service: kvm / libvirt
This is a KVM hypervisor,
which allows managing VMs with libvirt.
You can use libvirt through the `virsh` command line tool.
e.g. you can login via SSH as root
and run `virsh list` to see running VMs.
#### Monitoring
It doesn't really need monitoring for now.
#### Deployment
The service is part of the pyinfra deploy.py file;
you can deploy it with
`pyinfra --yes inventory.py ararat/deploy.py --limit 95.217.163.200`.
#### Upgrade Strategy
As long as it is a test deployment,
we don't need to upgrade it regularly.
## Users
There is only the root user,
the SSH keys of missytake, hagi, and vmann are deployed via pyinfra.
## Upgrade Strategy
To upgrade the packages,
you need to login via SSH and run `apk update && apk upgrade`.
## Backup and Restore
As long as it is a test deployment,
we don't need backups.
# Changelog
## 2024-12-02 Set up alpine VPS on hetzner
This was only tested with a cloud VPS so far.
Source: <https://gist.github.com/c0m4r/e38d41d0e31f6adda4b4c5a88ba0a453>
(but it's less of a hassle than described there)
To create an alpine server on hetzner,
you need to first create a Debian VPS or something similar.
Then you boot into the rescue system.
Get the download link of the latest VIRTUAL x86_64 alpine iso
from <https://alpinelinux.org/downloads/>.
Login to the rescue system via console or SSH,
and write the ISO to the disk:
```
ssh root@xxxx:xxxx:xxxx:xxxx::1
wipefs -a /dev/sda
wget https://dl-cdn.alpinelinux.org/alpine/v3.20/releases/x86_64/alpine-virt-3.20.3-x86_64.iso # or whatever link you got from alpine
dd if=alpine-virt-3.20.3-x86_64.iso of=/dev/sda
reboot
```
Then open the server console (SSH doesn't work),
login to root (no password required),
and proceed with:
```
cp -r /.modloop /root
cp -r /media/sda /root
umount /.modloop /media/sda
rm /lib/modules
mv /root/.modloop/modules /lib
mv /root/sda /media
setup-alpine
```
Then select what you wish,
contrary to the guide above,
DHCP is actually fine.
The drive should be sda,
the installation type can be sys
(why go through the hassle).
Voilà! reboot and login.
Probably the first SSH login will be via root password,
as copy-pasting your public SSH key into the console doesn't work really.
Make sure the SSH config allows this
(and turn passwort root access off afterwards).
## 2024-12-02 Encrypting /var/lib/libvirt partition
**Status: tested with Hetzner VPS, not deployed in production yet**
Messing with file systems and partitions
should not be done by automation scripts,
so I created the LUKS-encrypted /dev/sdb partition manually.
(So far, /dev/sdb was added via a Hetzner volume,
but it can be any partition actually)
To create a partition in the VPS volume
(which was formatted to ext4 originally),
- I ran `fdisk /dev/sdb`,
- entered `o` to create a DOS partition table,
- added `n` to add a new primary partition, using all available space,
- and `w` to save to disk and exit.
Then I ran `cryptsetup luksFormat /dev/sdb1`
and entered the passphrase from `pass 0x90/ararat/sdb-crypt`
to create a LUKS volume.
Now I could decrypt the new volume with
`cryptsetup luksOpen /dev/sdb1 sdb_crypt`
and entering the passphrase from `pass 0x90/ararat/sdb-crypt`.
Finally, I ran `mkfs.ext4`
to create an ext4 file system
in the encrypted partition.