Architecture Rework #79

New issue

Open

opened 2019-06-10 09:49:40 +00:00 by b3yond · 1 comment

b3yond commented

2019-06-10 09:49:40 +00:00

Owner

Author: @b3yond Posted at: 12.01.2019 10:11

Yesterday we discussed how the next big architecture rework could look like. I'll try to summarize the points (how I understand them), so we can start working on it.

The working branch for now is stable3; feature branches should be merged into it. As soon as Ticketfrei 3.0.0 is release-ready, we can merge it into master. Then master will become our development branch, and stable3 the release branch.

Pain points of the current architecture

The current architecture is very monolithic, all database requests are in one file. Except that there are many exceptions to this rule, but impossible to understand why it is different sometimes.

If you want to write a new plugin, i.e. integrate a new bot, you have to make changes in 3 or 4 different places in the code. But it is nowhere documented where, and in general not very transparent. We want to lower the treshold to contribute: in the future, everything you need to write to integrate a new bot is supposed to fit in one subdirectory.

Currently we have two applications: the frontend, a single Bottle application, is mostly fine; but the backend is a loop over all cities and bots. When one of those bots has a hickup, all the other bots have to wait. When there are too many cities, a report can be delayed quite long, because the bot is iterating over all the cities after one another.

Both processes run in different docker containers currently, this can also be unified when there is a main process which starts the bots and the frontend as forks.

What do we want it to look like?

Each bot gets its own process

There will be one class, main.py (or so), to watch over the processes and manage the general control flow. It starts all the bot processes for each city.

All bots which belong to 1 city communicate with each other through a message queue. They have three main methods. run(), which loops the other too; recv_messages():[report], which crawls new messages from a social network and puts them into the message queue, and send_messages([report]), which receives messages from the message queue and posts them to the social networks. The latter 2 methods are comparable to the old crawl():[report] and post([report]).

When a report reaches the message queue, it gets handled by the censor.py process. It looks whether it contains at least one trigger word, and none of the blocklist words. Then the censor gives the report back to the send_messages functions of the bots.

Each bot gets its own directory

A bot plugin is supposed to have 4 files, which are under /bots/$bot (e.g. /bots/twitter):

bot.py contains the methods run, recv_messages, send_messages, and additional functions, e.g. to handle the connection to the social network
db.py contains all db calls the bot needs
webui.py contains a Bottle application to handle the specific calls to register the bot at the network
settings.tpl contains the Bottle template to configure the bot, and is imported by the settings page in the frontend

The frontend

The frontend becomes a subprocess of main.py.

Instead of being a single Bottle application like now, frontend.py mounts the webui.py Bottle applications from all of the bots as child applications. The run() function of frontend.py either opens a webserver that is directly used by a docker setup, or is callable as a WSGI application.

The Templates would still be similar, except that /templates/settings.tpl would have to import the /bot/$bot/settings.tpl. As they are in directorys not accessible by the web server, they need to be imported at bot startup or so.

The database

The frontend keeps user.py as a a means to handle user management and city creation, and db.py to handle the database calls and define the database scheme.

The database scheme is not supposed to change, to save the migration effort from Ticketfrei 2 to Ticketfrei 3.

All future bots should use $bot_ (e.g. "twitter_") as a table prefix for their tables.

We will take a look if SQLalchemy is a good database handler for Ticketfrei.

E-Mail setup

We have not yet decided for an approach to handle an E-Mail bot; none of the approaches perfectly fits our requirements. The options are:

The status quo: an exim4 mail server writes incoming mails to an mbox file. ad-hoc E-Mail adress generation at Ticketfrei account creation happens through an ugly hack, by appending a line to /etc/aliases. Does not work with docker yet, as we didn't find a proper exim4 container; but it would be possible by running exim4 on the docker host and mounting the mbox file into the backend and /etc/aliases into the frontend container.
We could use an MDA to receive the mails and put them into the message queue. It would make ad-hoc E-Mail adress generation at Ticketfrei account creation possible. But it requires a proper SMTP server and is most likely not compatible with a docker setup.
We could connect to established mail adresses, like with all the other networks. This would be possible with e.g. IMAP+SMTPlib. But the user would have to register the mail account and configure it in Ticketfrei. ad-hoc E-Mail adress generation at Ticketfrei account creation would be impossible.
We run a python-smtpd server as a mailbot. It is a bit nasty because they are most likely not designed for running in production, but it would make ad-hoc E-Mail adress generation at Ticketfrei account creation easy and would easily work with docker.

Whoever has first implemented their favorite approach has won. Whoever comes later can still convince all the other people that their approach is better, of course.

Что делать?

To implement this, we want to take the following major steps:

Move the frontend functions into bot directories
Restructure backend.py to run the bots as forks
Create the message queue to enable communication between bots
move censor.py functions out of user.py to own class
Implement SQLalchemy for db calls
Split up templates

Feedback welcome! Did I get something wrong?

Author: @b3yond Posted at: 12.01.2019 10:11 Yesterday we discussed how the next big architecture rework could look like. I'll try to summarize the points (how I understand them), so we can start working on it. The working branch for now is stable3; feature branches should be merged into it. As soon as Ticketfrei 3.0.0 is release-ready, we can merge it into master. Then master will become our development branch, and stable3 the release branch. ## Pain points of the current architecture The current architecture is very monolithic, all database requests are in one file. Except that there are many exceptions to this rule, but impossible to understand why it is different sometimes. If you want to write a new plugin, i.e. integrate a new bot, you have to make changes in 3 or 4 different places in the code. But it is nowhere documented where, and in general not very transparent. We want to lower the treshold to contribute: in the future, everything you need to write to integrate a new bot is supposed to fit in one subdirectory. Currently we have two applications: the frontend, a single Bottle application, is mostly fine; but the backend is a loop over all cities and bots. When one of those bots has a hickup, all the other bots have to wait. When there are too many cities, a report can be delayed quite long, because the bot is iterating over all the cities after one another. Both processes run in different docker containers currently, this can also be unified when there is a main process which starts the bots and the frontend as forks. ## What do we want it to look like? ### Each bot gets its own process There will be one class, main.py (or so), to watch over the processes and manage the general control flow. It starts all the bot processes for each city. All bots which belong to 1 city communicate with each other through a message queue. They have three main methods. run(), which loops the other too; recv_messages():[report], which crawls new messages from a social network and puts them into the message queue, and send_messages([report]), which receives messages from the message queue and posts them to the social networks. The latter 2 methods are comparable to the old crawl():[report] and post([report]). When a report reaches the message queue, it gets handled by the censor.py process. It looks whether it contains at least one trigger word, and none of the blocklist words. Then the censor gives the report back to the send_messages functions of the bots. ### Each bot gets its own directory A bot plugin is supposed to have 4 files, which are under /bots/$bot (e.g. /bots/twitter): * bot.py contains the methods run, recv_messages, send_messages, and additional functions, e.g. to handle the connection to the social network * db.py contains all db calls the bot needs * webui.py contains a Bottle application to handle the specific calls to register the bot at the network * settings.tpl contains the Bottle template to configure the bot, and is imported by the settings page in the frontend ### The frontend The frontend becomes a subprocess of main.py. Instead of being a single Bottle application like now, frontend.py mounts the webui.py Bottle applications from all of the bots as child applications. The run() function of frontend.py either opens a webserver that is directly used by a docker setup, or is callable as a WSGI application. The Templates would still be similar, except that /templates/settings.tpl would have to import the /bot/$bot/settings.tpl. As they are in directorys not accessible by the web server, they need to be imported at bot startup or so. ### The database The frontend keeps user.py as a a means to handle user management and city creation, and db.py to handle the database calls and define the database scheme. The database scheme is not supposed to change, to save the migration effort from Ticketfrei 2 to Ticketfrei 3. All future bots should use $bot_ (e.g. "twitter_") as a table prefix for their tables. We will take a look if SQLalchemy is a good database handler for Ticketfrei. ### E-Mail setup We have not yet decided for an approach to handle an E-Mail bot; none of the approaches perfectly fits our requirements. The options are: 1. The status quo: an exim4 mail server writes incoming mails to an mbox file. ad-hoc E-Mail adress generation at Ticketfrei account creation happens through an ugly hack, by appending a line to /etc/aliases. Does not work with docker yet, as we didn't find a proper exim4 container; but it would be possible by running exim4 on the docker host and mounting the mbox file into the backend and /etc/aliases into the frontend container. 2. We could use an MDA to receive the mails and put them into the message queue. It would make ad-hoc E-Mail adress generation at Ticketfrei account creation possible. But it requires a proper SMTP server and is most likely not compatible with a docker setup. 3. We could connect to established mail adresses, like with all the other networks. This would be possible with e.g. IMAP+SMTPlib. But the user would have to register the mail account and configure it in Ticketfrei. ad-hoc E-Mail adress generation at Ticketfrei account creation would be impossible. 4. We run a python-smtpd server as a mailbot. It is a bit nasty because they are most likely not designed for running in production, but it would make ad-hoc E-Mail adress generation at Ticketfrei account creation easy and would easily work with docker. Whoever has first implemented their favorite approach has won. Whoever comes later can still convince all the other people that their approach is better, of course. ## Что делать? To implement this, we want to take the following major steps: * [ ] Move the frontend functions into bot directories * [ ] Restructure backend.py to run the bots as forks * [ ] Create the message queue to enable communication between bots * [ ] move censor.py functions out of user.py to own class * [ ] Implement SQLalchemy for db calls * [x] Split up templates Feedback welcome! Did I get something wrong?