A proposal for social practices, preferred tools, and documentation #2
265
README.md
265
README.md
|
@ -14,86 +14,209 @@ or
|
||||||
- run `pyinfra --dry inventory.py deploy.py` and check that you are on the same state that is already deployed
|
- run `pyinfra --dry inventory.py deploy.py` and check that you are on the same state that is already deployed
|
||||||
|
|
||||||
|
|
||||||
# Set up alpine on hetzner
|
# social practices
|
||||||
|
|
||||||
This was only tested with a cloud VPS so far.
|
maintainers: people who know (next to) everything and would be able to learn the rest
|
||||||
Source: <https://gist.github.com/c0m4r/e38d41d0e31f6adda4b4c5a88ba0a453>
|
adepts: people who are still learning about the infrastructure, but don't need to keep everything in mind
|
||||||
(but it's less of a hassle than described there)
|
associates: others, who just need to maintain a certain service
|
||||||
|
|
||||||
To create an alpine server on hetzner,
|
Discussions can happen:
|
||||||
you need to first create a Debian VPS or something similar.
|
- in presence (gathering), should happen at least every 3-4 months, to discuss the big picture
|
||||||
|
- in presence (coworking), while working on new services
|
||||||
Then you boot into the rescue system.
|
- in issues and PRs for concrete proposals
|
||||||
|
- in online calls to fix emergencies
|
||||||
Get the download link of the latest VIRTUAL x86_64 alpine iso
|
- in chat groups for exploring ideas and everything else
|
||||||
from <https://alpinelinux.org/downloads/>.
|
|
||||||
|
|
||||||
Login to the rescue system via console or SSH,
|
|
||||||
and write the ISO to the disk:
|
|
||||||
|
|
||||||
```
|
|
||||||
ssh root@xxxx:xxxx:xxxx:xxxx::1
|
|
||||||
wipefs -a /dev/sda
|
|
||||||
wget https://dl-cdn.alpinelinux.org/alpine/v3.20/releases/x86_64/alpine-virt-3.20.3-x86_64.iso # or whatever link you got from alpine
|
|
||||||
dd if=alpine-virt-3.20.3-x86_64.iso of=/dev/sda
|
|
||||||
reboot
|
|
||||||
```
|
|
||||||
|
|
||||||
Then open the server console (SSH doesn't work),
|
|
||||||
login to root (no password required),
|
|
||||||
and proceed with:
|
|
||||||
|
|
||||||
```
|
|
||||||
cp -r /.modloop /root
|
|
||||||
cp -r /media/sda /root
|
|
||||||
umount /.modloop /media/sda
|
|
||||||
rm /lib/modules
|
|
||||||
mv /root/.modloop/modules /lib
|
|
||||||
mv /root/sda /media
|
|
||||||
setup-alpine
|
|
||||||
```
|
|
||||||
|
|
||||||
Then select what you wish,
|
|
||||||
contrary to the guide above,
|
|
||||||
DHCP is actually fine.
|
|
||||||
The drive should be sda,
|
|
||||||
the installation type can be sys
|
|
||||||
(why go through the hassle).
|
|
||||||
|
|
||||||
Voilà! reboot and login.
|
|
||||||
Probably the first SSH login will be via root password,
|
|
||||||
as copy-pasting your public SSH key into the console doesn't work really.
|
|
||||||
Make sure the SSH config allows this
|
|
||||||
(and turn passwort root access off afterwards).
|
|
||||||
|
|
||||||
|
|
||||||
## Encrypting /var/lib/libvirt partition
|
## structure of this repository
|
||||||
|
|
||||||
**Status: tested with Hetzner VPS, not deployed in production yet**
|
this repository documents the current state
|
||||||
|
of the infrastructure.
|
||||||
|
|
||||||
Messing with file systems and partitions
|
For each server/VM,
|
||||||
should not be done by automation scripts,
|
it contains a directory with
|
||||||
so I created the LUKS-encrypted /dev/sdb partition manually.
|
|
||||||
|
|
||||||
(So far, /dev/sdb was added via a Hetzner volume,
|
- a README.md file which gives an overview on the server
|
||||||
but it can be any partition actually)
|
- a pyinfra inventory.py file
|
||||||
|
- a pyinfra deploy.py file which documents what's installed
|
||||||
|
- the configuration files pyinfra deploys
|
||||||
|
- optional: a deploy-restore.py file which can restore data from backup
|
||||||
|
- optional: other pyinfra deploy files which only manage certain services or tasks, like upgrades
|
||||||
|
|
||||||
To create a partition in the VPS volume
|
The repository also contains a lib/ directory
|
||||||
(which was formatted to ext4 originally),
|
with pyinfra packages we reuse accross servers.
|
||||||
- I ran `fdisk /dev/sdb`,
|
|
||||||
- entered `o` to create a DOS partition table,
|
|
||||||
- added `n` to add a new primary partition, using all available space,
|
|
||||||
- and `w` to save to disk and exit.
|
|
||||||
|
|
||||||
Then I ran `cryptsetup luksFormat /dev/sdb1`
|
With pull requests we can propose changes
|
||||||
and entered the passphrase from `pass 0x90/ararat/sdb-crypt`
|
to the current infrastructure.
|
||||||
to create a LUKS volume.
|
PRs need to be approved by at least one maintainer.
|
||||||
|
The pyinfra code in PRs can already be deployed,
|
||||||
|
if it is not destructive - decide responsibly.
|
||||||
|
|
||||||
Now I could decrypt the new volume with
|
|
||||||
`cryptsetup luksOpen /dev/sdb1 sdb_crypt`
|
|
||||||
and entering the passphrase from `pass 0x90/ararat/sdb-crypt`.
|
|
||||||
|
|
||||||
Finally, I ran `mkfs.ext4`
|
## create a VM
|
||||||
to create an ext4 file system
|
|
||||||
in the encrypted partition.
|
To add a new VM for a service you want to manage,
|
||||||
|
|
||||||
|
0. Checkout a new branch with `git checkout -b your-server-name`
|
||||||
|
1. Add your VM to inventory.py
|
||||||
|
2. Create a directory for the VM
|
||||||
|
3. Add your VM to ararat/deploy.py
|
||||||
|
4. Ask the core team to run `pyinfra ararat.0x90.space ararat/deploy.py`
|
||||||
|
to create your VM
|
||||||
|
5. Write your pyinfra deployment script in your-server-name/deploy.py
|
||||||
|
6. Deploy it, if it doesn't work change it, repeat until the service works
|
||||||
|
7. Copy TEMPLATE.md to your-server-name/README.md and fill it out.
|
||||||
|
You can leave out parts which are obvious from your deploy.py file.
|
||||||
|
8. Commit your changes, push them to your branch,
|
||||||
|
open a pull request from your branch to the development branch,
|
||||||
|
and ask a maintainer to review and merge it
|
||||||
|
|
||||||
|
|
||||||
|
## tools we use
|
||||||
|
|
||||||
|
The hope is that you don't need to know all of these tools
|
||||||
|
to already do useful things,
|
||||||
|
but can systematically dive deeper into the infrastructure.
|
||||||
|
|
||||||
|
### pass
|
||||||
|
|
||||||
|
password manager to store passphrases and secrets,
|
||||||
|
the repository with our secrets
|
||||||
|
is at <https://git.0x90.space/links-tech/pass> for now.
|
||||||
|
|
||||||
|
### ssh
|
||||||
|
|
||||||
|
to connect to servers and VMs with root@,
|
||||||
|
no sudo,
|
||||||
|
root should have set a password,
|
||||||
|
but via SSH, password access should be forbidden.
|
||||||
|
|
||||||
|
There should be no shared SSH keys,
|
||||||
|
one SSH key per person.
|
||||||
|
SSH private keys should be password-protected
|
||||||
|
and only stored on laptops
|
||||||
|
with hard disk encryption.
|
||||||
|
|
||||||
|
### systemctl & journalctl
|
||||||
|
|
||||||
|
to look at status and log output of services.
|
||||||
|
systemd is a good way of keeping services running,
|
||||||
|
at least on Linux machines.
|
||||||
|
On openBSD we will use /etc/rc.d/ scripts.
|
||||||
|
|
||||||
|
### git
|
||||||
|
|
||||||
|
for updating the documentation,
|
||||||
|
pushing and pulling secrets,
|
||||||
|
and opening PRs to doku/pyinfra repos.
|
||||||
|
|
||||||
|
to be discussed:
|
||||||
|
- Keep in mind that PRs can and will be deployed to servers. OR
|
||||||
|
- The main branch should always reflect the state of the machine.
|
||||||
|
|
||||||
|
### markdown + sembr
|
||||||
|
|
||||||
|
for documenting the infrastructure.
|
||||||
|
[Semantic line breaks](https://sembr.org/) are great
|
||||||
|
for formatting text files
|
||||||
|
which are managed in git.
|
||||||
|
|
||||||
|
### kvm + virsh
|
||||||
|
|
||||||
|
as a hypervisor
|
||||||
|
which we can use to create VMs
|
||||||
|
for specific services.
|
||||||
|
|
||||||
|
The hypervisor is a minimal alpine linux,
|
||||||
|
with "boot to RAM",
|
||||||
|
the data-partition for the VM images is encrypted.
|
||||||
|
|
||||||
|
### pyinfra
|
||||||
|
|
||||||
|
as a nice declarative config tool for deployment.
|
||||||
|
we can also maintain some of the things we need
|
||||||
|
in extra python modules.
|
||||||
|
|
||||||
|
pyinfra vs. ansible? ~> need to investigate. currently ansible setup on golem, pyinfra used in deltachat and 1 ezra service.
|
||||||
|
|
||||||
|
### podman
|
||||||
|
|
||||||
|
to isolate services in root-less containers.
|
||||||
|
a podman container should run in a systemd process.
|
||||||
|
it takes some practice to understand
|
||||||
|
how to run commands inside a container
|
||||||
|
or where the files are mounted.
|
||||||
|
But it goes well with pyinfra
|
||||||
|
if it's managed in systemd.
|
||||||
|
|
||||||
|
### nftables
|
||||||
|
|
||||||
|
as a declarative firewall
|
||||||
|
which can be managed in pyinfra.
|
||||||
|
|
||||||
|
### nginx
|
||||||
|
|
||||||
|
as an HTTPS reverse proxy,
|
||||||
|
passing traffic on to the podman containers.
|
||||||
|
|
||||||
|
### acmetool
|
||||||
|
|
||||||
|
as a tool to manage Let's Encrypt certificates,
|
||||||
|
which goes well with pyinfra
|
||||||
|
because of it's declarative nature.
|
||||||
|
|
||||||
|
It also ships acmetool-redirector
|
||||||
|
which redirects HTTP traffic on port 80
|
||||||
|
to nginx on port 443.
|
||||||
|
|
||||||
|
There is a pyinfra package for it at
|
||||||
|
https://github.com/deltachat/pyinfra-acmetool/
|
||||||
|
|
||||||
|
https://man.openbsd.org/acme-client + https://man.openbsd.org/relayd on OpenBSD
|
||||||
|
|
||||||
|
### cron
|
||||||
|
|
||||||
|
to schedule recurring tasks,
|
||||||
|
like acmetool's certificate renewals
|
||||||
|
or the nightly borgbackup runs.
|
||||||
|
|
||||||
|
on OpenBSD already daily cronjob that executes /etc/daily.local
|
||||||
|
|
||||||
|
### borgbackup
|
||||||
|
|
||||||
|
can be used to back up application data
|
||||||
|
in a nightly cron job.
|
||||||
|
|
||||||
|
Backups need to be stored at an extra backup server.
|
||||||
|
|
||||||
|
There is a pyinfra package for it at
|
||||||
|
https://github.com/deltachat/pyinfra-borgbackup/
|
||||||
|
|
||||||
|
might also look at restic ~> append-only backup better restricted
|
||||||
|
|
||||||
|
### wireguard
|
||||||
|
|
||||||
|
as a VPN to connect the backup server,
|
||||||
|
which can be at some private house,
|
||||||
|
with the production servers.
|
||||||
|
|
||||||
|
### prometheus
|
||||||
|
|
||||||
|
as a tool to measure service uptime
|
||||||
|
and measure typical errors
|
||||||
|
from journalctl output.
|
||||||
|
It can expose metrics via HTTPS
|
||||||
|
behind basic auth.
|
||||||
|
|
||||||
|
### grafana
|
||||||
|
|
||||||
|
as a visual dashboard to show service uptime
|
||||||
|
and whether services throw errors.
|
||||||
|
It can also send out email alerts.
|
||||||
|
|
||||||
|
### team-bot
|
||||||
|
|
||||||
|
a deltachat bot to receive support requests
|
||||||
|
and email alerts from grafana.
|
||||||
|
|
||||||
|
|||||||
|
|
||||||
|
|
104
TEMPLATE.md
Normal file
104
TEMPLATE.md
Normal file
|
@ -0,0 +1,104 @@
|
||||||
|
# Server: Server name
|
||||||
|
|
||||||
|
## Usage
|
||||||
|
|
||||||
|
Who is using this server?
|
||||||
|
Who needs the server and will be affected if the server is not working?
|
||||||
|
|
||||||
|
## Maintainers
|
||||||
|
|
||||||
|
Who to ask about this server?
|
||||||
|
|
||||||
|
## Domain Settings
|
||||||
|
|
||||||
|
Where are the DNS settings? E.g. with Hetzner or in a DNS zone file.
|
||||||
|
How to change DNS settings?
|
||||||
|
Which domains and subdomains exist?
|
||||||
|
|
||||||
|
## Hosting
|
||||||
|
|
||||||
|
Where is the server hosted?
|
||||||
|
Add a link to the hosting admin interface, e.g. <https://console.hetzner.cloud/>.
|
||||||
|
|
||||||
|
## Services
|
||||||
|
|
||||||
|
Which services are running there?
|
||||||
|
E.g. there are a `www.example.org` and `ci.example.org` services.
|
||||||
|
|
||||||
|
### Service: ci.example.org
|
||||||
|
|
||||||
|
Each service has a greppable heading starting with `### Service: `.
|
||||||
|
|
||||||
|
Which software the service is running? E.g. nginx.
|
||||||
|
How was it deployed? E.g. manually or with pyinfra.
|
||||||
|
How can the software be managed,
|
||||||
|
Where the admin credentials are stored if you need to fix something (e.g. for mailcow)?
|
||||||
|
Is there an admin chatgroup (e.g. for mailadm) and how to join it?
|
||||||
|
|
||||||
|
#### Monitoring
|
||||||
|
|
||||||
|
How to read the logs of the service?
|
||||||
|
How admins are notified when the service is down?
|
||||||
|
|
||||||
|
#### Deployment
|
||||||
|
|
||||||
|
How the service was deployed?
|
||||||
|
How to reinstall it?
|
||||||
|
|
||||||
|
#### Upgrade Strategy
|
||||||
|
|
||||||
|
How the service is upgraded?
|
||||||
|
Which commands to run to upgrade it, e.g. where the upgrade script is located and how to run it?
|
||||||
|
If there is an official documentation, put a link to it in this section.
|
||||||
|
|
||||||
|
#### Maintainers
|
||||||
|
|
||||||
|
Who to ask about the service?
|
||||||
|
|
||||||
|
#### Integration
|
||||||
|
|
||||||
|
How the service is related to other services running on this or other servers?
|
||||||
|
E.g. service `ci.example.org` uses the secret storage `secrets.example.net` and runner `runner.example.com` hosted elsewhere.
|
||||||
|
|
||||||
|
### Service: www.example.org
|
||||||
|
|
||||||
|
Description similar to the other service.
|
||||||
|
|
||||||
|
## Users
|
||||||
|
|
||||||
|
Who has access to this server?
|
||||||
|
|
||||||
|
Which admin accounts are there?
|
||||||
|
Which service accounts are there?
|
||||||
|
Which user accounts are there?
|
||||||
|
|
||||||
|
## Monitoring
|
||||||
|
|
||||||
|
How do we notice if something fails?
|
||||||
|
|
||||||
|
Where do the errors show up?
|
||||||
|
Where the logs for the services are located, e.g. Postfix logs go to `/var/log/mail.log`.
|
||||||
|
|
||||||
|
## Upgrade Strategy
|
||||||
|
|
||||||
|
How do we keep the services up to date?
|
||||||
|
|
||||||
|
## Backup and Restore
|
||||||
|
|
||||||
|
How the server is backed up and how to restore the backup?
|
||||||
|
|
||||||
|
## Deployment
|
||||||
|
|
||||||
|
How to reinstall the server?
|
||||||
|
Which settings were selected to create the server? E.g. the operating system image.
|
||||||
|
Are there deployment scripts, and if any, where they are located and how to run them?
|
||||||
|
|
||||||
|
# Changelog
|
||||||
|
|
||||||
|
## 2023-05-30 - Created the server
|
||||||
|
|
||||||
|
Document the steps taken here.
|
||||||
|
|
||||||
|
## 2023-06-10 - Installed nginx
|
||||||
|
|
||||||
|
...
|
158
ararat/README.md
Normal file
158
ararat/README.md
Normal file
|
@ -0,0 +1,158 @@
|
||||||
|
# Server: ararat test VPS
|
||||||
|
|
||||||
|
## Usage
|
||||||
|
|
||||||
|
For now this server doesn't host any production services.
|
||||||
|
|
||||||
|
## Maintainers
|
||||||
|
|
||||||
|
- missytake@systemli.org
|
||||||
|
|
||||||
|
## Domain Settings
|
||||||
|
|
||||||
|
It doesn't have a domain pointing to it yet.
|
||||||
|
|
||||||
|
## Hosting
|
||||||
|
|
||||||
|
For now, the VPS is hosted in missytake's personal hetzner account.
|
||||||
|
Ask them if you need something.
|
||||||
|
|
||||||
|
## Deployment
|
||||||
|
|
||||||
|
To deploy the server, run
|
||||||
|
|
||||||
|
```
|
||||||
|
pyinfra --yes inventory.py ararat/deploy.py --limit 95.217.163.200
|
||||||
|
```
|
||||||
|
|
||||||
|
You also need to run this after every reboot,
|
||||||
|
to decrypt the encrypted volume
|
||||||
|
and start the libvirt VMs.
|
||||||
|
|
||||||
|
## Services
|
||||||
|
|
||||||
|
### Service: kvm / libvirt
|
||||||
|
|
||||||
|
This is a KVM hypervisor,
|
||||||
|
which allows managing VMs with libvirt.
|
||||||
|
|
||||||
|
You can use libvirt through the `virsh` command line tool.
|
||||||
|
e.g. you can login via SSH as root
|
||||||
|
and run `virsh list` to see running VMs.
|
||||||
|
|
||||||
|
#### Monitoring
|
||||||
|
|
||||||
|
It doesn't really need monitoring for now.
|
||||||
|
|
||||||
|
#### Deployment
|
||||||
|
|
||||||
|
The service is part of the pyinfra deploy.py file;
|
||||||
|
you can deploy it with
|
||||||
|
`pyinfra --yes inventory.py ararat/deploy.py --limit 95.217.163.200`.
|
||||||
|
|
||||||
|
#### Upgrade Strategy
|
||||||
|
|
||||||
|
As long as it is a test deployment,
|
||||||
|
we don't need to upgrade it regularly.
|
||||||
|
|
||||||
|
## Users
|
||||||
|
|
||||||
|
There is only the root user,
|
||||||
|
the SSH keys of missytake, hagi, and vmann are deployed via pyinfra.
|
||||||
|
|
||||||
|
## Upgrade Strategy
|
||||||
|
|
||||||
|
To upgrade the packages,
|
||||||
|
you need to login via SSH and run `apk update && apk upgrade`.
|
||||||
|
|
||||||
|
## Backup and Restore
|
||||||
|
|
||||||
|
As long as it is a test deployment,
|
||||||
|
we don't need backups.
|
||||||
|
|
||||||
|
|
||||||
|
# Changelog
|
||||||
|
|
||||||
|
## 2024-12-02 Set up alpine VPS on hetzner
|
||||||
|
|
||||||
|
This was only tested with a cloud VPS so far.
|
||||||
|
Source: <https://gist.github.com/c0m4r/e38d41d0e31f6adda4b4c5a88ba0a453>
|
||||||
|
(but it's less of a hassle than described there)
|
||||||
|
|
||||||
|
To create an alpine server on hetzner,
|
||||||
|
you need to first create a Debian VPS or something similar.
|
||||||
|
|
||||||
|
Then you boot into the rescue system.
|
||||||
|
|
||||||
|
Get the download link of the latest VIRTUAL x86_64 alpine iso
|
||||||
|
from <https://alpinelinux.org/downloads/>.
|
||||||
|
|
||||||
|
Login to the rescue system via console or SSH,
|
||||||
|
and write the ISO to the disk:
|
||||||
|
|
||||||
|
```
|
||||||
|
ssh root@xxxx:xxxx:xxxx:xxxx::1
|
||||||
|
wipefs -a /dev/sda
|
||||||
|
wget https://dl-cdn.alpinelinux.org/alpine/v3.20/releases/x86_64/alpine-virt-3.20.3-x86_64.iso # or whatever link you got from alpine
|
||||||
|
dd if=alpine-virt-3.20.3-x86_64.iso of=/dev/sda
|
||||||
|
reboot
|
||||||
|
```
|
||||||
|
|
||||||
|
Then open the server console (SSH doesn't work),
|
||||||
|
login to root (no password required),
|
||||||
|
and proceed with:
|
||||||
|
|
||||||
|
```
|
||||||
|
cp -r /.modloop /root
|
||||||
|
cp -r /media/sda /root
|
||||||
|
umount /.modloop /media/sda
|
||||||
|
rm /lib/modules
|
||||||
|
mv /root/.modloop/modules /lib
|
||||||
|
mv /root/sda /media
|
||||||
|
setup-alpine
|
||||||
|
```
|
||||||
|
|
||||||
|
Then select what you wish,
|
||||||
|
contrary to the guide above,
|
||||||
|
DHCP is actually fine.
|
||||||
|
The drive should be sda,
|
||||||
|
the installation type can be sys
|
||||||
|
(why go through the hassle).
|
||||||
|
|
||||||
|
Voilà! reboot and login.
|
||||||
|
Probably the first SSH login will be via root password,
|
||||||
|
as copy-pasting your public SSH key into the console doesn't work really.
|
||||||
|
Make sure the SSH config allows this
|
||||||
|
(and turn passwort root access off afterwards).
|
||||||
|
|
||||||
|
|
||||||
|
## 2024-12-02 Encrypting /var/lib/libvirt partition
|
||||||
|
|
||||||
|
**Status: tested with Hetzner VPS, not deployed in production yet**
|
||||||
|
|
||||||
|
Messing with file systems and partitions
|
||||||
|
should not be done by automation scripts,
|
||||||
|
so I created the LUKS-encrypted /dev/sdb partition manually.
|
||||||
|
|
||||||
|
(So far, /dev/sdb was added via a Hetzner volume,
|
||||||
|
but it can be any partition actually)
|
||||||
|
|
||||||
|
To create a partition in the VPS volume
|
||||||
|
(which was formatted to ext4 originally),
|
||||||
|
- I ran `fdisk /dev/sdb`,
|
||||||
|
- entered `o` to create a DOS partition table,
|
||||||
|
- added `n` to add a new primary partition, using all available space,
|
||||||
|
- and `w` to save to disk and exit.
|
||||||
|
|
||||||
|
Then I ran `cryptsetup luksFormat /dev/sdb1`
|
||||||
|
and entered the passphrase from `pass 0x90/ararat/sdb-crypt`
|
||||||
|
to create a LUKS volume.
|
||||||
|
|
||||||
|
Now I could decrypt the new volume with
|
||||||
|
`cryptsetup luksOpen /dev/sdb1 sdb_crypt`
|
||||||
|
and entering the passphrase from `pass 0x90/ararat/sdb-crypt`.
|
||||||
|
|
||||||
|
Finally, I ran `mkfs.ext4`
|
||||||
|
to create an ext4 file system
|
||||||
|
in the encrypted partition.
|
||||||
|
|
Loading…
Reference in a new issue
Ich würde noch nix erwähnen, einfach weil einige Leute im Space das lieber als Pyinfra benutzen. Aber das kann dann auch einer der NixOS-Menschen schreiben :)