A proposal for social practices, preferred tools, and documentation #2
265
README.md
265
README.md
|
@ -14,86 +14,209 @@ or
|
|||
- run `pyinfra --dry inventory.py deploy.py` and check that you are on the same state that is already deployed
|
||||
|
||||
|
||||
# Set up alpine on hetzner
|
||||
# social practices
|
||||
|
||||
This was only tested with a cloud VPS so far.
|
||||
Source: <https://gist.github.com/c0m4r/e38d41d0e31f6adda4b4c5a88ba0a453>
|
||||
(but it's less of a hassle than described there)
|
||||
maintainers: people who know (next to) everything and would be able to learn the rest
|
||||
adepts: people who are still learning about the infrastructure, but don't need to keep everything in mind
|
||||
associates: others, who just need to maintain a certain service
|
||||
|
||||
To create an alpine server on hetzner,
|
||||
you need to first create a Debian VPS or something similar.
|
||||
|
||||
Then you boot into the rescue system.
|
||||
|
||||
Get the download link of the latest VIRTUAL x86_64 alpine iso
|
||||
from <https://alpinelinux.org/downloads/>.
|
||||
|
||||
Login to the rescue system via console or SSH,
|
||||
and write the ISO to the disk:
|
||||
|
||||
```
|
||||
ssh root@xxxx:xxxx:xxxx:xxxx::1
|
||||
wipefs -a /dev/sda
|
||||
wget https://dl-cdn.alpinelinux.org/alpine/v3.20/releases/x86_64/alpine-virt-3.20.3-x86_64.iso # or whatever link you got from alpine
|
||||
dd if=alpine-virt-3.20.3-x86_64.iso of=/dev/sda
|
||||
reboot
|
||||
```
|
||||
|
||||
Then open the server console (SSH doesn't work),
|
||||
login to root (no password required),
|
||||
and proceed with:
|
||||
|
||||
```
|
||||
cp -r /.modloop /root
|
||||
cp -r /media/sda /root
|
||||
umount /.modloop /media/sda
|
||||
rm /lib/modules
|
||||
mv /root/.modloop/modules /lib
|
||||
mv /root/sda /media
|
||||
setup-alpine
|
||||
```
|
||||
|
||||
Then select what you wish,
|
||||
contrary to the guide above,
|
||||
DHCP is actually fine.
|
||||
The drive should be sda,
|
||||
the installation type can be sys
|
||||
(why go through the hassle).
|
||||
|
||||
Voilà! reboot and login.
|
||||
Probably the first SSH login will be via root password,
|
||||
as copy-pasting your public SSH key into the console doesn't work really.
|
||||
Make sure the SSH config allows this
|
||||
(and turn passwort root access off afterwards).
|
||||
Discussions can happen:
|
||||
- in presence (gathering), should happen at least every 3-4 months, to discuss the big picture
|
||||
- in presence (coworking), while working on new services
|
||||
- in issues and PRs for concrete proposals
|
||||
- in online calls to fix emergencies
|
||||
- in chat groups for exploring ideas and everything else
|
||||
|
||||
|
||||
## Encrypting /var/lib/libvirt partition
|
||||
## structure of this repository
|
||||
|
||||
**Status: tested with Hetzner VPS, not deployed in production yet**
|
||||
this repository documents the current state
|
||||
of the infrastructure.
|
||||
|
||||
Messing with file systems and partitions
|
||||
should not be done by automation scripts,
|
||||
so I created the LUKS-encrypted /dev/sdb partition manually.
|
||||
For each server/VM,
|
||||
it contains a directory with
|
||||
|
||||
(So far, /dev/sdb was added via a Hetzner volume,
|
||||
but it can be any partition actually)
|
||||
- a README.md file which gives an overview on the server
|
||||
- a pyinfra inventory.py file
|
||||
- a pyinfra deploy.py file which documents what's installed
|
||||
- the configuration files pyinfra deploys
|
||||
- optional: a deploy-restore.py file which can restore data from backup
|
||||
- optional: other pyinfra deploy files which only manage certain services or tasks, like upgrades
|
||||
|
||||
To create a partition in the VPS volume
|
||||
(which was formatted to ext4 originally),
|
||||
- I ran `fdisk /dev/sdb`,
|
||||
- entered `o` to create a DOS partition table,
|
||||
- added `n` to add a new primary partition, using all available space,
|
||||
- and `w` to save to disk and exit.
|
||||
The repository also contains a lib/ directory
|
||||
with pyinfra packages we reuse accross servers.
|
||||
|
||||
Then I ran `cryptsetup luksFormat /dev/sdb1`
|
||||
and entered the passphrase from `pass 0x90/ararat/sdb-crypt`
|
||||
to create a LUKS volume.
|
||||
With pull requests we can propose changes
|
||||
to the current infrastructure.
|
||||
PRs need to be approved by at least one maintainer.
|
||||
The pyinfra code in PRs can already be deployed,
|
||||
if it is not destructive - decide responsibly.
|
||||
|
||||
Now I could decrypt the new volume with
|
||||
`cryptsetup luksOpen /dev/sdb1 sdb_crypt`
|
||||
and entering the passphrase from `pass 0x90/ararat/sdb-crypt`.
|
||||
|
||||
Finally, I ran `mkfs.ext4`
|
||||
to create an ext4 file system
|
||||
in the encrypted partition.
|
||||
## create a VM
|
||||
|
||||
To add a new VM for a service you want to manage,
|
||||
|
||||
0. Checkout a new branch with `git checkout -b your-server-name`
|
||||
1. Add your VM to inventory.py
|
||||
2. Create a directory for the VM
|
||||
3. Add your VM to ararat/deploy.py
|
||||
4. Ask the core team to run `pyinfra ararat.0x90.space ararat/deploy.py`
|
||||
to create your VM
|
||||
5. Write your pyinfra deployment script in your-server-name/deploy.py
|
||||
6. Deploy it, if it doesn't work change it, repeat until the service works
|
||||
7. Copy TEMPLATE.md to your-server-name/README.md and fill it out.
|
||||
You can leave out parts which are obvious from your deploy.py file.
|
||||
8. Commit your changes, push them to your branch,
|
||||
open a pull request from your branch to the development branch,
|
||||
and ask a maintainer to review and merge it
|
||||
|
||||
|
||||
## tools we use
|
||||
|
||||
The hope is that you don't need to know all of these tools
|
||||
to already do useful things,
|
||||
but can systematically dive deeper into the infrastructure.
|
||||
|
||||
### pass
|
||||
|
||||
password manager to store passphrases and secrets,
|
||||
the repository with our secrets
|
||||
is at <https://git.0x90.space/links-tech/pass> for now.
|
||||
|
||||
### ssh
|
||||
|
||||
to connect to servers and VMs with root@,
|
||||
no sudo,
|
||||
root should have set a password,
|
||||
but via SSH, password access should be forbidden.
|
||||
|
||||
There should be no shared SSH keys,
|
||||
one SSH key per person.
|
||||
SSH private keys should be password-protected
|
||||
and only stored on laptops
|
||||
with hard disk encryption.
|
||||
|
||||
### systemctl & journalctl
|
||||
|
||||
to look at status and log output of services.
|
||||
systemd is a good way of keeping services running,
|
||||
at least on Linux machines.
|
||||
On openBSD we will use /etc/rc.d/ scripts.
|
||||
|
||||
### git
|
||||
|
||||
for updating the documentation,
|
||||
pushing and pulling secrets,
|
||||
and opening PRs to doku/pyinfra repos.
|
||||
|
||||
to be discussed:
|
||||
- Keep in mind that PRs can and will be deployed to servers. OR
|
||||
- The main branch should always reflect the state of the machine.
|
||||
|
||||
### markdown + sembr
|
||||
|
||||
for documenting the infrastructure.
|
||||
[Semantic line breaks](https://sembr.org/) are great
|
||||
for formatting text files
|
||||
which are managed in git.
|
||||
|
||||
### kvm + virsh
|
||||
|
||||
as a hypervisor
|
||||
which we can use to create VMs
|
||||
for specific services.
|
||||
|
||||
The hypervisor is a minimal alpine linux,
|
||||
with "boot to RAM",
|
||||
the data-partition for the VM images is encrypted.
|
||||
|
||||
### pyinfra
|
||||
|
||||
as a nice declarative config tool for deployment.
|
||||
we can also maintain some of the things we need
|
||||
in extra python modules.
|
||||
|
||||
pyinfra vs. ansible? ~> need to investigate. currently ansible setup on golem, pyinfra used in deltachat and 1 ezra service.
|
||||
|
||||
### podman
|
||||
|
||||
to isolate services in root-less containers.
|
||||
a podman container should run in a systemd process.
|
||||
it takes some practice to understand
|
||||
how to run commands inside a container
|
||||
or where the files are mounted.
|
||||
But it goes well with pyinfra
|
||||
if it's managed in systemd.
|
||||
|
||||
### nftables
|
||||
|
||||
as a declarative firewall
|
||||
which can be managed in pyinfra.
|
||||
|
||||
### nginx
|
||||
|
||||
as an HTTPS reverse proxy,
|
||||
passing traffic on to the podman containers.
|
||||
|
||||
### acmetool
|
||||
|
||||
as a tool to manage Let's Encrypt certificates,
|
||||
which goes well with pyinfra
|
||||
because of it's declarative nature.
|
||||
|
||||
It also ships acmetool-redirector
|
||||
which redirects HTTP traffic on port 80
|
||||
to nginx on port 443.
|
||||
|
||||
There is a pyinfra package for it at
|
||||
https://github.com/deltachat/pyinfra-acmetool/
|
||||
|
||||
https://man.openbsd.org/acme-client + https://man.openbsd.org/relayd on OpenBSD
|
||||
|
||||
### cron
|
||||
|
||||
to schedule recurring tasks,
|
||||
like acmetool's certificate renewals
|
||||
or the nightly borgbackup runs.
|
||||
|
||||
on OpenBSD already daily cronjob that executes /etc/daily.local
|
||||
|
||||
### borgbackup
|
||||
|
||||
can be used to back up application data
|
||||
in a nightly cron job.
|
||||
|
||||
Backups need to be stored at an extra backup server.
|
||||
|
||||
There is a pyinfra package for it at
|
||||
https://github.com/deltachat/pyinfra-borgbackup/
|
||||
|
||||
might also look at restic ~> append-only backup better restricted
|
||||
|
||||
### wireguard
|
||||
|
||||
as a VPN to connect the backup server,
|
||||
which can be at some private house,
|
||||
with the production servers.
|
||||
|
||||
### prometheus
|
||||
|
||||
as a tool to measure service uptime
|
||||
and measure typical errors
|
||||
from journalctl output.
|
||||
It can expose metrics via HTTPS
|
||||
behind basic auth.
|
||||
|
||||
### grafana
|
||||
|
||||
as a visual dashboard to show service uptime
|
||||
and whether services throw errors.
|
||||
It can also send out email alerts.
|
||||
|
||||
### team-bot
|
||||
|
||||
a deltachat bot to receive support requests
|
||||
and email alerts from grafana.
|
||||
|
||||
|
||||
|
||||
|
|
104
TEMPLATE.md
Normal file
104
TEMPLATE.md
Normal file
|
@ -0,0 +1,104 @@
|
|||
# Server: Server name
|
||||
|
||||
## Usage
|
||||
|
||||
Who is using this server?
|
||||
Who needs the server and will be affected if the server is not working?
|
||||
|
||||
## Maintainers
|
||||
|
||||
Who to ask about this server?
|
||||
|
||||
## Domain Settings
|
||||
|
||||
Where are the DNS settings? E.g. with Hetzner or in a DNS zone file.
|
||||
How to change DNS settings?
|
||||
Which domains and subdomains exist?
|
||||
|
||||
## Hosting
|
||||
|
||||
Where is the server hosted?
|
||||
Add a link to the hosting admin interface, e.g. <https://console.hetzner.cloud/>.
|
||||
|
||||
## Services
|
||||
|
||||
Which services are running there?
|
||||
E.g. there are a `www.example.org` and `ci.example.org` services.
|
||||
|
||||
### Service: ci.example.org
|
||||
|
||||
Each service has a greppable heading starting with `### Service: `.
|
||||
|
||||
Which software the service is running? E.g. nginx.
|
||||
How was it deployed? E.g. manually or with pyinfra.
|
||||
How can the software be managed,
|
||||
Where the admin credentials are stored if you need to fix something (e.g. for mailcow)?
|
||||
Is there an admin chatgroup (e.g. for mailadm) and how to join it?
|
||||
|
||||
#### Monitoring
|
||||
|
||||
How to read the logs of the service?
|
||||
How admins are notified when the service is down?
|
||||
|
||||
#### Deployment
|
||||
|
||||
How the service was deployed?
|
||||
How to reinstall it?
|
||||
|
||||
#### Upgrade Strategy
|
||||
|
||||
How the service is upgraded?
|
||||
Which commands to run to upgrade it, e.g. where the upgrade script is located and how to run it?
|
||||
If there is an official documentation, put a link to it in this section.
|
||||
|
||||
#### Maintainers
|
||||
|
||||
Who to ask about the service?
|
||||
|
||||
#### Integration
|
||||
|
||||
How the service is related to other services running on this or other servers?
|
||||
E.g. service `ci.example.org` uses the secret storage `secrets.example.net` and runner `runner.example.com` hosted elsewhere.
|
||||
|
||||
### Service: www.example.org
|
||||
|
||||
Description similar to the other service.
|
||||
|
||||
## Users
|
||||
|
||||
Who has access to this server?
|
||||
|
||||
Which admin accounts are there?
|
||||
Which service accounts are there?
|
||||
Which user accounts are there?
|
||||
|
||||
## Monitoring
|
||||
|
||||
How do we notice if something fails?
|
||||
|
||||
Where do the errors show up?
|
||||
Where the logs for the services are located, e.g. Postfix logs go to `/var/log/mail.log`.
|
||||
|
||||
## Upgrade Strategy
|
||||
|
||||
How do we keep the services up to date?
|
||||
|
||||
## Backup and Restore
|
||||
|
||||
How the server is backed up and how to restore the backup?
|
||||
|
||||
## Deployment
|
||||
|
||||
How to reinstall the server?
|
||||
Which settings were selected to create the server? E.g. the operating system image.
|
||||
Are there deployment scripts, and if any, where they are located and how to run them?
|
||||
|
||||
# Changelog
|
||||
|
||||
## 2023-05-30 - Created the server
|
||||
|
||||
Document the steps taken here.
|
||||
|
||||
## 2023-06-10 - Installed nginx
|
||||
|
||||
...
|
158
ararat/README.md
Normal file
158
ararat/README.md
Normal file
|
@ -0,0 +1,158 @@
|
|||
# Server: ararat test VPS
|
||||
|
||||
## Usage
|
||||
|
||||
For now this server doesn't host any production services.
|
||||
|
||||
## Maintainers
|
||||
|
||||
- missytake@systemli.org
|
||||
|
||||
## Domain Settings
|
||||
|
||||
It doesn't have a domain pointing to it yet.
|
||||
|
||||
## Hosting
|
||||
|
||||
For now, the VPS is hosted in missytake's personal hetzner account.
|
||||
Ask them if you need something.
|
||||
|
||||
## Deployment
|
||||
|
||||
To deploy the server, run
|
||||
|
||||
```
|
||||
pyinfra --yes inventory.py ararat/deploy.py --limit 95.217.163.200
|
||||
```
|
||||
|
||||
You also need to run this after every reboot,
|
||||
to decrypt the encrypted volume
|
||||
and start the libvirt VMs.
|
||||
|
||||
## Services
|
||||
|
||||
### Service: kvm / libvirt
|
||||
|
||||
This is a KVM hypervisor,
|
||||
which allows managing VMs with libvirt.
|
||||
|
||||
You can use libvirt through the `virsh` command line tool.
|
||||
e.g. you can login via SSH as root
|
||||
and run `virsh list` to see running VMs.
|
||||
|
||||
#### Monitoring
|
||||
|
||||
It doesn't really need monitoring for now.
|
||||
|
||||
#### Deployment
|
||||
|
||||
The service is part of the pyinfra deploy.py file;
|
||||
you can deploy it with
|
||||
`pyinfra --yes inventory.py ararat/deploy.py --limit 95.217.163.200`.
|
||||
|
||||
#### Upgrade Strategy
|
||||
|
||||
As long as it is a test deployment,
|
||||
we don't need to upgrade it regularly.
|
||||
|
||||
## Users
|
||||
|
||||
There is only the root user,
|
||||
the SSH keys of missytake, hagi, and vmann are deployed via pyinfra.
|
||||
|
||||
## Upgrade Strategy
|
||||
|
||||
To upgrade the packages,
|
||||
you need to login via SSH and run `apk update && apk upgrade`.
|
||||
|
||||
## Backup and Restore
|
||||
|
||||
As long as it is a test deployment,
|
||||
we don't need backups.
|
||||
|
||||
|
||||
# Changelog
|
||||
|
||||
## 2024-12-02 Set up alpine VPS on hetzner
|
||||
|
||||
This was only tested with a cloud VPS so far.
|
||||
Source: <https://gist.github.com/c0m4r/e38d41d0e31f6adda4b4c5a88ba0a453>
|
||||
(but it's less of a hassle than described there)
|
||||
|
||||
To create an alpine server on hetzner,
|
||||
you need to first create a Debian VPS or something similar.
|
||||
|
||||
Then you boot into the rescue system.
|
||||
|
||||
Get the download link of the latest VIRTUAL x86_64 alpine iso
|
||||
from <https://alpinelinux.org/downloads/>.
|
||||
|
||||
Login to the rescue system via console or SSH,
|
||||
and write the ISO to the disk:
|
||||
|
||||
```
|
||||
ssh root@xxxx:xxxx:xxxx:xxxx::1
|
||||
wipefs -a /dev/sda
|
||||
wget https://dl-cdn.alpinelinux.org/alpine/v3.20/releases/x86_64/alpine-virt-3.20.3-x86_64.iso # or whatever link you got from alpine
|
||||
dd if=alpine-virt-3.20.3-x86_64.iso of=/dev/sda
|
||||
reboot
|
||||
```
|
||||
|
||||
Then open the server console (SSH doesn't work),
|
||||
login to root (no password required),
|
||||
and proceed with:
|
||||
|
||||
```
|
||||
cp -r /.modloop /root
|
||||
cp -r /media/sda /root
|
||||
umount /.modloop /media/sda
|
||||
rm /lib/modules
|
||||
mv /root/.modloop/modules /lib
|
||||
mv /root/sda /media
|
||||
setup-alpine
|
||||
```
|
||||
|
||||
Then select what you wish,
|
||||
contrary to the guide above,
|
||||
DHCP is actually fine.
|
||||
The drive should be sda,
|
||||
the installation type can be sys
|
||||
(why go through the hassle).
|
||||
|
||||
Voilà! reboot and login.
|
||||
Probably the first SSH login will be via root password,
|
||||
as copy-pasting your public SSH key into the console doesn't work really.
|
||||
Make sure the SSH config allows this
|
||||
(and turn passwort root access off afterwards).
|
||||
|
||||
|
||||
## 2024-12-02 Encrypting /var/lib/libvirt partition
|
||||
|
||||
**Status: tested with Hetzner VPS, not deployed in production yet**
|
||||
|
||||
Messing with file systems and partitions
|
||||
should not be done by automation scripts,
|
||||
so I created the LUKS-encrypted /dev/sdb partition manually.
|
||||
|
||||
(So far, /dev/sdb was added via a Hetzner volume,
|
||||
but it can be any partition actually)
|
||||
|
||||
To create a partition in the VPS volume
|
||||
(which was formatted to ext4 originally),
|
||||
- I ran `fdisk /dev/sdb`,
|
||||
- entered `o` to create a DOS partition table,
|
||||
- added `n` to add a new primary partition, using all available space,
|
||||
- and `w` to save to disk and exit.
|
||||
|
||||
Then I ran `cryptsetup luksFormat /dev/sdb1`
|
||||
and entered the passphrase from `pass 0x90/ararat/sdb-crypt`
|
||||
to create a LUKS volume.
|
||||
|
||||
Now I could decrypt the new volume with
|
||||
`cryptsetup luksOpen /dev/sdb1 sdb_crypt`
|
||||
and entering the passphrase from `pass 0x90/ararat/sdb-crypt`.
|
||||
|
||||
Finally, I ran `mkfs.ext4`
|
||||
to create an ext4 file system
|
||||
in the encrypted partition.
|
||||
|
Loading…
Reference in a new issue
Ich würde noch nix erwähnen, einfach weil einige Leute im Space das lieber als Pyinfra benutzen. Aber das kann dann auch einer der NixOS-Menschen schreiben :)