OpenBSD version: 7.1
Arch:            Any
NSFP:            Ups...

A bit of new features

So, i decided to drop postfix for (most) of my boxes and just go with OpenSMTPd. This, in general, was a fun experience, especially after i figured out to also restrict senders appropriately. Following my rather stupid and convoluted set of mysql queries, this was a simple as (watch the added senders <sndrs> before filter):

table sndrs mysql:/etc/mail/mysql-sndr.conf
...
listen on vio0 port 465 pki mail.aperture-labs.org smtps auth <credentials> senders <sndrs> filter "rspamd-sign" tag "DKIM"
listen on vio0 port 587 pki mail.aperture-labs.org tls-require auth <credentials> senders <sndrs> filter "rspamd-sign" tag "DKIM"

To support this, /etc/mail/mysql-sndr.conf contains a slightly adjusted query from the one for users:

host 127.0.0.1
username opensmtpd
password ZGlkIHlvdSByZWFseSB0aGluayBpIHdvdWxkLi4uPw==
database smtpd

query_mailaddrmap with t as (SELECT REGEXP_REPLACE( ? , '[+]([^@])+\@', '\@' )  as addr) select valias.addr as mail from valias join t on REGEXP_REPLACE(valias.alias, '[+]([^@])+\@', '\@' ) = t.addr JOIN vdomains AS vd ON valias.addr LIKE CONCAT('%',vd.domain,'%') UNION select mail from vusers join t on vusers.mail = t.addr;

The only requirement of this is that in the valias table everything on the destination site is an actual user that can be found in vusers. Which, well… for the size of my setup does not reaaaaally need technical enforcement, especially as the worst thing happening is that a user cannot send from an alias.

A wild problem occurs

The problem, with my fun use of mysql, is that for some reason, i ocassionally ended up with my main mailserver not being reachable. Like, kind’a dead: Nagios message saying that a connection to port tcp/25 on mail.aperture-labs.org has been refused.

This is, of course, rather not so fun. Debugging this, i saw that this down-ness goes along with the smtpd’s user table lookup process maxing out a CPU, while the mysqld is also preeetty busy:

Graph of CPU load on mail.aperture-labs.org flatlining on ~100% on system and additional 30-40 in user. Graph of mysql requests on mail.aperture-labs.org hovering around 360 requests per second for hours.

Again; Not funny. Luckily the secondary MX happily accepted mail in the meantime. Still, this is annoying and looks buggy.

kdump my heart er… process

So, of we go taking a look; First step is getting a trace of the wild process; Of course, we could also have started smtpd with debug logging… buuuut you usually don’t do that for prod processes, and this issue only happened sporadically:

ktrace -p `ps aux|grep _smtpd | grep mysql-user.conf | awk '{print $2}'`

This gives us a ktrace.out file in our cwd, which happily keeps growing while the process runs. We can then inspect that file using kdump (mind you; ktrace.out is the default file name, so you could skip the -f part):

kdump -f ./ktrace.out | cat -n | less

So, what do we see? Well, we see the table-mysql process making happy little connections to the mysqld:

    88   54126 table-mysql GIO   fd 3 read 92 bytes
    89         "X\0\0\0
    90          5.5.5-10.6.7-MariaDB\0%\M- [...] mysql_native_password\0"
    91   54126 table-mysql RET   recvfrom 92/0x5c
    92   54126 table-mysql CALL  sendto(3,0xf2204a82000,0xcc,0x480<MSG_DONTWAIT|MSG_NOSIGNAL>,0,0)
    93   54126 table-mysql GIO   fd 3 wrote 204 bytes
    94         "\M-H\0\0\^A\M^L\M-"\M->\0\0\0\0@-\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\^]\0\0\0opensmtpd\0 [...] smtpd\0mysql_native_password\0l\^C_os\aOpenBSD\f_client_name
    95          libmariadb\^D_pid\^E54126\^O_client_version\^E3.3.2     _platform\^Eamd64\f_server_host 127.0.0.1"
    96   54126 table-mysql RET   sendto 204/0xcc

However, these connections are rather short lived, because mysqld quickly terminates them:

   147         "t\0\0\^A\M^?\M-s\^D#HY000Illegal mix of collations (latin1_swedish_ci,IMPLICIT) and (utf8mb4_general_ci,COERCIBLE) for operation '='"
...
   153         "credentials[54126]: warn: trying to reconnect after error: Illegal mix of collations (latin1_swedish_ci,IMPLICIT) and (utf8mb4_general_ci,COERCIBLE) for operation '='

After which OpenSMTPd thinks to itself: Oh, wow, that didn’t work, why don’t i try again? This then gives us a really pretty endless look of mysql being hit with requests as fast as the CPU can.

Solving things

So, what went wrong here? When i setup the whole thing, i left mysql to its own defaulty devices. This means that–following the table creation instructions in mysql-table–i ended up with a lot of latin1 tables for my user data. My best guest is that this usually works rather fine, because hey; mail’s just ASCII anyway. No auto-mapping issues. However, during some fun bruteforce attempts, occasionally a character snuck into a username that was not that easily castable, making mysql somewhat sad (and terminating the connection).

So, what to do? Well, we do what every not-so-smart person like me does, and turn to the Internet (which is surprisingly helpful on this). So, drawing from some earlier work, we identify the tables that are causing the issue:

SELECT table_schema, table_name, column_name, character_set_name, collation_name 
FROM information_schema.columns 
WHERE collation_name = 'latin1_swedish_ci' 
ORDER BY table_schema, table_name,ordinal_position 
LIMIT 10;

Which gives us (note, that the Swedish things is somewhat still tied to mysql’s origins in… well… Sweden.):

+--------------+------------+-------------+--------------------+--------------------+
| table_schema | table_name | column_name | character_set_name | collation_name     |
+--------------+------------+-------------+--------------------+--------------------+
| smtpd        | valias     | addr        | latin1             | latin1_swedish_ci  |
| smtpd        | valias     | alias       | latin1             | latin1_swedish_ci  |
| smtpd        | vdomains   | domain      | latin1             | latin1_swedish_ci  |
| smtpd        | vusers     | c_uid       | latin1             | latin1_swedish_ci  |
| smtpd        | vusers     | c_name      | latin1             | latin1_swedish_ci  |
| smtpd        | vusers     | c_password  | latin1             | latin1_swedish_ci  |
| smtpd        | vusers     | c_cn        | latin1             | latin1_swedish_ci  |
+--------------+------------+-------------+--------------------+--------------------+

To now get rid of this issue (hopefully… let’s see if it happens again after i applied these fixes), we now first transform these tables to UTF8:

USE smtpd;
ALTER TABLE valias CONVERT TO CHARACTER SET utf8mb4 COLLATE 'utf8mb4_general_ci';
ALTER TABLE vdomains CONVERT TO CHARACTER SET utf8mb4 COLLATE 'utf8mb4_general_ci';
ALTER TABLE vusers CONVERT TO CHARACTER SET utf8mb4 COLLATE 'utf8mb4_general_ci';

In addition, we also tell mysqld to be more UTF-8-y by adding the following to /etc/my.cnf:

[mysqld]
...
character_set_server = utf8mb4
collation_server = utf8mb4_general_ci
...

And of course, we follow this up with a rcctl restart mysqld. Thereafter–as i said–things should keep working. At least i hope so. If it doesn’t, it will most likely be kdump time again. -.-‘

Something for OpenSMTPd

While, in general, this is kind-of my issue in setting up tables as latin1 (or, rather, me blindly following defaults), … I am also a bit curious on whether this may not be partially an issue of mysql-table. Instead of getting the message that a query did not work, mysql-table retried the query ad infinum. So, as a patch for mysql-table, it might make sense to return a temp-error/error if a lookup does not succeed after N (configurable with sane default?) retries.

I guess I will head to the mailinglist and see if what people think.