<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="4.2.2">Jekyll</generator><link href="https://doing-stupid-things.as59645.net/feed.xml" rel="self" type="application/atom+xml" /><link href="https://doing-stupid-things.as59645.net/" rel="alternate" type="text/html" /><updated>2024-07-22T18:14:05+02:00</updated><id>https://doing-stupid-things.as59645.net/feed.xml</id><title type="html">Doing stupid things (with packets and OpenBSD)</title><subtitle>Sometimes, it is fun to do stupid things. This blog documents things done on AS59645 to run a (mostly) OpenBSD only, self-hosted AS &quot;the old way&quot;. Most  certainly NSFP (Not Safe for Production) and never reasonable.</subtitle><entry><title type="html">You route me round-round like a packet…: Why routes should not loop</title><link href="https://doing-stupid-things.as59645.net/ipv6/routing/loops/abuse/2024/07/20/you-route-me-round-round.html" rel="alternate" type="text/html" title="You route me round-round like a packet…: Why routes should not loop" /><published>2024-07-20T15:14:21+02:00</published><updated>2024-07-20T15:14:21+02:00</updated><id>https://doing-stupid-things.as59645.net/ipv6/routing/loops/abuse/2024/07/20/you-route-me-round-round</id><content type="html" xml:base="https://doing-stupid-things.as59645.net/ipv6/routing/loops/abuse/2024/07/20/you-route-me-round-round.html"><![CDATA[<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>OpenBSD version: might have helped... or not.
Arch:            Any
NSFP:            ffs... -.-'
</code></pre></div></div>

<p>I just had a rather not so fun encounter with the joys of routing loops.
Yesterday, already, I saw a bit too many packets running circles in <a href="https://measurement.network/services/v4less-as/">V4LESS-AS</a>:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>...
22:38:37.659748 IP6 240e:c2:1800:84:0:1:1:2 &gt; 2a06:d1c3:8ed9:c1fc:448a:114:7fb7:4a79: ICMP6, echo request, id 50599, seq 21853, length 8
22:38:37.659748 IP6 240e:c2:1800:84:0:1:1:2 &gt; 2a06:d1c3:cd62:c1fc:3fe1:a9b:2ef0:f541: ICMP6, echo request, id 27682, seq 53451, length 8
22:38:37.659748 IP6 240e:c2:1800:84:0:1:1:2 &gt; 2a06:d1c3:ab46:c1fc:27d7:6863:2686:949: ICMP6, echo request, id 37253, seq 46574, length 8
22:38:37.659786 IP6 240e:c2:1800:84:0:1:1:2 &gt; 2a06:d1c3:41dd:c4fc:57b5:6e40:3f7:50e8: ICMP6, echo request, id 35591, seq 37275, length 8
22:38:37.659800 IP6 240e:c2:1800:84:0:1:1:2 &gt; 2a06:d1c3:612:c4fc:5019:c1f0:10e4:cf2c: ICMP6, echo request, id 48464, seq 3565, length 8
22:38:37.659810 IP6 240e:c2:1800:84:0:1:1:2 &gt; 2a06:d1c3:8ed9:c1fc:448a:114:7fb7:4a79: ICMP6, echo request, id 50599, seq 21853, length 8
22:38:37.659814 IP6 240e:c2:1800:84:0:1:1:2 &gt; 2a06:d1c3:cd62:c1fc:3fe1:a9b:2ef0:f541: ICMP6, echo request, id 27682, seq 53451, length 8
...
</code></pre></div></div>

<p>The rather obvious root-cause here is a combination of ‘high TTL’ (255) and
‘somebody’ (read: me) doing stupid things. In this case: having a routing loop.</p>

<h1 id="making-your-own-routing-loop">Making your own routing loop</h1>

<p>Routing loops are <a href="https://www.sciencedirect.com/science/article/pii/S1389128622005345">surprisingly common</a>, and even easier to make yourself.
All you need is two routers that think the corresponding other one is the next
hop for a packet they want to forward. A common way to make that happen would
be, for example, to install the covering prefixes you announce without a
<code class="language-plaintext highlighter-rouge">discard</code> or <code class="language-plaintext highlighter-rouge">blackhole</code>-like statement.</p>

<p>Drawing from your IGP to find prefixes to route to, your border routers will
happily take any packet destined for prefixes <em>not</em> in the IGP on a scenic
tour through your AS.</p>

<h1 id="why-this-is-annoying">Why this is annoying</h1>

<p>Now, as you see up there, there seems to be an address sending <em>a lot</em> of v6
echo requests; My assumption is that this is some form of research work
testing a new target generation algorithm for v6.</p>

<p>Well, whatever it is, even with just 8b sized ICMPv6 payloads, this can quickly go to 20mbit on an
interface. If your routers route a bit more, it can be a bit more… happily stacking
up over time.</p>

<p>After having experienced a bit of looping yesterday (and getting rid of it by
making loops go away), I was playing around with what this can do.</p>

<p>Turns out, a single linux box behind a standard end-user access connection can
easily ship 5gbit+ on a looping link:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>for i in {1..1024}; do sudo ping6 -f -t 255 -s 1452  2001:db8:: &amp; done;
</code></pre></div></div>

<p>Even a single <code class="language-plaintext highlighter-rouge">ping6 -f -t 255 -s 1452  2001:db8::</code> put ~100mbit on a
single link.</p>

<p>This, of course, is long since known, and has the simple solution of ‘do not
have loops in your network’.</p>

<p>While I was playing around, I got a visit from 240e:c2:1800:84:0:1:1:2 again;
However, this time in another network, with a bit more ‘loop-capability’:</p>

<p><img src="/static/img/2024-07-20-loop.jpeg" alt="Screenshot of a traffic graph quickly growing from nearly no traffic to ~250mbit peak." /></p>

<p>This was, at least, ‘mildly annoying’. Luckily, our friends at <code class="language-plaintext highlighter-rouge">240e:c2:1800:84:0:1:1:2</code> are just using an 8b payload; Otherwise, I would have had ‘a few’ gbit more on that link.</p>

<h1 id="what-to-do">What to do</h1>

<p>Well, obviously, the solution is ‘do not have loops in your network’; In
practice, though, this is often easier said than done (entropy, bodies,
basement… you know).  What I ultimately ended up doing for AS59645 was
installing a central iBGP peer running bird that just injects blackhole routes
for each ‘announced prefix -1 bit’ I have (assuming those do not collide with
any other prefix). That way, traffic always finds a way… if only into
<code class="language-plaintext highlighter-rouge">/dev/null</code>.</p>

<p>Still, annoying, and probably something to watch out for. -.-‘</p>]]></content><author><name></name></author><category term="ipv6" /><category term="routing" /><category term="loops" /><category term="abuse" /><summary type="html"><![CDATA[OpenBSD version: might have helped... or not. Arch: Any NSFP: ffs... -.-']]></summary></entry><entry><title type="html">Putting the MAU into meowmeow: On personal ASNs</title><link href="https://doing-stupid-things.as59645.net/ripe/policy/personal/asn/2024/07/19/putting-the-mau-into-meowmeow.html" rel="alternate" type="text/html" title="Putting the MAU into meowmeow: On personal ASNs" /><published>2024-07-19T17:44:21+02:00</published><updated>2024-07-19T17:44:21+02:00</updated><id>https://doing-stupid-things.as59645.net/ripe/policy/personal/asn/2024/07/19/putting-the-mau-into-meowmeow</id><content type="html" xml:base="https://doing-stupid-things.as59645.net/ripe/policy/personal/asn/2024/07/19/putting-the-mau-into-meowmeow.html"><![CDATA[<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>OpenBSD version: might have helped
Arch:            Any
NSFP:            That seems to be debatable...
</code></pre></div></div>

<p><strong>Disclaimer:</strong> I am writing this article as an individual member of the RIPE
Community, and do not represent my employer. Furthermore, I am a person prone
to ‘doing stupid things’, and a repeated offender when it comes to personal
ASes: I have three. Furthermore, I am on friendly terms with several LIRs mentioned
in the original article.</p>

<p>Today, an article about personal ASNs was <a href="https://labs.ripe.net/author/eu/driving-the-asn-truck-without-a-licence/">published on RIPE Labs</a> that
has been ‘critically acclaimed’ in the community. It follows <a href="https://ripe88.ripe.net/archives/video/1296">two</a>
<a href="https://ripe88.ripe.net/archives/video/1329">talks</a> at RIPE88, during which there was already an ‘engaged’ discussion
around the propositions in the corresponding talks.</p>

<p>As I had already been rather vocal during the talks at RIPE88, it is not overly
surprising that I have ‘some thoughts’ regarding the article. Well, keeping
thoughts to yourself may make you less non-friends, but it is also less fun,
so here we go.</p>

<h1 id="the-article-itself">The article itself</h1>

<p>The article titled <a href="https://labs.ripe.net/author/eu/driving-the-asn-truck-without-a-licence/">“Driving the ASN Truck Without a Licence”</a>
essentially revisits the arguments of the presentations held at the last
meeting given the <a href="https://labs.ripe.net/author/james-kennedy/reflections-on-six-months-as-chief-registry-officer/">increasing number of personal ASNs</a>.  To summarize
them:</p>

<ul>
  <li>Personal ASes are not run ‘better’ than ‘traditional’ networks, but likely worse, causing issues; If they seem to be run better, it is because it is easier to do so at smaller scale, and also, a tunnel-based ASN is simply not relevant, even if it does RPKI correctly.</li>
  <li>Hobby networks do not help with IPv6 deployment, but instead hinder it, as individuals no longer need to pressure their ISP into rolling out v6.</li>
  <li>Virtual Internet Exchanges are useless and–along with all the tunnel ASes–degrade the MTU below 1500, <a href="https://doing-stupid-things.as59645.net/networking/debugging/mtu/2022/11/14/time-to-be-real.html">leading to further issues</a>.</li>
  <li>You do not need a ‘real’ personal ASN to get experience; Read a book or join <a href="https://dn42.net">dn42</a>.</li>
  <li>LIRs misrepresent the policy to implement ‘shady business practices’ to push end-users to request resources</li>
  <li>IPv6 PA is being abused as a PI-ish resource, especially by those ‘shady business practice’ LIRs</li>
  <li>Personal ASNs are regularly leveraged for policy abuse, and to pollute public databases and protocols</li>
</ul>

<p>In conclusion, the article calls for stronger restrictions on handing out personal ASNs, curbing “the business practices of some LIRs” promoting “irresponsible behaviour”, and making resources less accessible, ideally by making them more expensive.</p>

<h1 id="the-tone-of-the-article">The tone of the article</h1>

<p>The article, in several points, tends to use an authoritative language. Looking
at the comments that already came in also suggest that the article creates an
impression of trying to gate-keep the Internet from those, mostly defined by
their nature as natural persons wanting to hold resources, possibly unworthy
but certainly unqualified for actually doing so.</p>

<p>I personally know the author, and <em>know</em> that there are some good intention
(and partially even very sensible points) behind the article. Hence, while I
personally agree with the statements on the phrasing, I will focus on engaging
the underlying arguments in this post.</p>

<h1 id="but-first-a-message-from">But first, a message from…</h1>

<p>… the reason we are talking about this. Kind of: Me. I personally hold
<a href="https://bgp.tools/as/59645">three</a> <a href="https://bgp.tools/as/211286">different</a> <a href="https://bgp.tools/as/215250">ASes</a>, all of them ‘personal
ASes’. I <em>do</em> acknowledge that my little personal AS might be ‘a bit bigger’
than the usual personal AS, including a few more gbit of 9000MTU L2
connectivity between PoPs, and a couple more big-name-brand routers than
usually seen in personal ASes. Also, following the article’s definition, these
are <em>not really</em> personal ASes, as I am also an <a href="https://www.ripe.net/membership/member-support/list-of-members/de/wybt/">LIR</a>. However, I <em>also</em> 
sponsor some end-users’ PI/ASN resources… so, I think I can confidently say
that I am ‘part of the problem’.</p>

<p>Still… the question remains… why would I need three ASes?</p>

<p><strong>Note:</strong> AS211286 and AS215250 are funded via the RIPE Community fund and
supported by contributions of various other RIPE Community members, including
LWLcom, OpenFactory, DE-CIX, VirtuaCloud, and WIIT AG, see
<a href="https://measurement.network/">measurement.network</a>.</p>

<h1 id="as59645-where-it-all-begins">AS59645: Where it all begins</h1>

<p>The ‘root cause’ of me being an LIR is a combination of ‘mild frustration’ with
<a href="https://ripe85.ripe.net/archives/video/877">the state of the Internet</a>, <a href="https://ripe88.ripe.net/archives/video/1294">academia</a>, and the general realization
that I kind of picked a <a href="https://www.mpi-inf.mpg.de/departments/inet/people/tobias-fiebig">day job</a> that is more on the management and less on
the ‘doing things with my hands’ side than is generally good for my well-being.</p>

<p>So, AS59645 is there to make it ping; Learn and automate things, and get the
practice necessary to, say, work on <a href="https://datatracker.ietf.org/person/tobias@fiebig.nl">some documents</a> about the Internet.</p>

<h1 id="as211286-crisis-ethics-reliability--a-measurementnetwork">AS211286: Crisis, Ethics, Reliability &amp; a measurement.network</h1>

<p>The next issue I stumbled upon was the aforementioned <a href="https://ripe88.ripe.net/archives/video/1294">academia</a> thing;
Researchers doing researcher things (speaking as a researcher myself…) can kind of be… difficult… So, I set out and started building
<a href="https://measurement.network/">something</a>, in an attempt to get ahead of the bike-shed; And
well… for <a href="https://pure.mpg.de/pubman/faces/ViewItemOverviewPage.jsp?itemId=item_3517635">reasons</a>.</p>

<h1 id="as215250-v4less-as">AS215250: V4LESS-AS</h1>

<p>As we already went about me having a thing for doing <em>stupid</em> things… it should
be no surprise that I can have <em>very</em> weird ideas. Like… why not build an AS
that <a href="https://ripe88.ripe.net/archives/video/1358">does not have IPv4 on any of its routers</a>.</p>

<p>Obviously, this is a rather great idea, and <a href="https://datatracker.ietf.org/doc/rfc8950/">RFC8950</a> simply the
future.  Surprisingly, even some of the serious not-so-personal-ASes seem to
<a href="https://github.com/euro-ix/rfc8950-ixp">think so</a>.</p>

<p>Naturally, the only thing to do then is making something that allows people to
try out RFC8950; Without actually having to break production infrastructure.
Well, this is what <a href="https://measurement.network/services/v4less-as/">V4LESS-AS does</a>.</p>

<h1 id="on-the-arguments-for-an-asn-driving-license">On the arguments for an ASN-Driving-License</h1>

<p>So, with the reasons for which <em>I</em> have some ASes out of the way, let’s delve
into the arguments around personal ASes made in the <a href="https://labs.ripe.net/author/eu/driving-the-asn-truck-without-a-licence/">original blog
article</a>.</p>

<h1 id="make-the-internet-worse-by-being-badly-run">Make the Internet worse by being badly run</h1>

<p>This argument centers around an (implicit) idea that ‘companies know better’.
They will, more likely, run a better service, and employees do have to answer
to management, and the company to customers if something goes wrong.</p>

<p>For personal ASes, though, this is not the case. Instead, they can do <em>whatever</em>,
and if the Internet breaks… well. They do not need to care.</p>

<p>Or, to put it into a <a href="https://youtu.be/NMdDolXCP6s?t=596">nice quote from a related talk</a>:</p>

<p>“OSPF is amazing, because it allows you to break your own network in ways you do not understand.
BGP on the other hand allows you to break <strong>everyone’s</strong> network in ways <strong>nobody</strong> understands.”</p>

<p>Good thing that Facebook never head to deal with a global BGP outage due to a misconfiguration,
and there is not a single actual company that does anything remotely shady with their AS.</p>

<p>I am skipping on examples for the latter to protect the guilty; But you know who you are.</p>

<p>Also, running your ISP from a couple of re-flashed 100G switches tends to be a thing as well…
And the number of ISPs without <em>any</em> form of filtering I have seen…</p>

<p>Well, I guess the point is clear. And I did not even get into the point that
all this tunnel stuff… Well, MPLS is not exactly not tunneled, is it? (Keeps
starring in BGP-free core…)</p>

<h1 id="does-not-help-but-hinder-ipv6-deployment">Does not help but hinder IPv6 Deployment</h1>

<p>The argument around IPv6 adoption is a bit… weird. While, yes, you <a href="https://youtu.be/NMdDolXCP6s?t=596">should
not just get an AS because you need v6</a>, you could use the same argument
around HE’s <a href="https://tunnelbroker.net/">tunnel broker service</a>.</p>

<p>In turn, you could <em>also</em> argue that HE’s <a href="https://tunnelbroker.net/">tunnel broker service</a> is a better
option anyway.</p>

<p>Ultimately, this creates an impression of a bit of straw burning…</p>

<h1 id="virtual-internet-exchanges-are-useless">Virtual Internet Exchanges are useless</h1>

<p>Well, besides the whole ‘tunnels are bad’ issue–given that basically
everything these days goes through a lasagna of tunnels–there is the point of
these things being practically pointless. This is something I very much agree
with (even though I <em>do</em> connect to some of these Toaster-IXes); Still, there
is a lot of self-inflicted foot-shooting going on on these… and I mean…
better shoot yourself in the foot with a toaster than with a truck. Or something
like that.</p>

<p>Besides that, there is of course the MTU argument; However… to be frank…
it is not like actual big players aren’t <a href="https://doing-stupid-things.as59645.net/networking/debugging/mtu/2022/11/14/time-to-be-real.html">doing things that may er… restrict
the MTU to say… 1492b</a>, no?</p>

<h1 id="just-go-for-dn42">Just go for DN42</h1>

<p>Again, the recommendation to, instead, go learning in a very much enclosed
environment pretty much holds true, and is one I <a href="https://youtu.be/NMdDolXCP6s?t=596">gave out myself</a> (and
will happily give out again). However, here, there is not much of an argument
left beyond you also <em>could</em> do similar stuff elsewhere. And honestly, part of
the appeal of the personal AS is that you <em>can</em> make yourself eat your own
dogfood; Which usually works wonders on service quality and learning outcomes.</p>

<h1 id="there-are-lirs-with-shady-business-practices-pushing-personal-ases">There are LIRs with ‘shady business practices’ pushing Personal ASes</h1>

<p>This point is, in my opinion, <em>‘somewhat difficult and maybe not necessarily
ideally phrased’</em>, up to a point of <em>‘maybe attributing a bit too much malice to
third parties’</em>. Note, though, I am on rather friendly terms with mentioned
entities, and they also support <a href="https://measurement.network/">measurement.network</a>.</p>

<p>The issue here is that a lot of things are going into the same bucket. To
grab the reference to <a href="https://web.archive.org/web/20240626063422/https:/freetransit.ch/">freetransit.ch</a> as an example,
the article suggests that freetransit.ch is <em>“not respecting and enforcing current
policies”</em> of the RIPE region. Instead they execute <em>“shady practices”</em> like
<em>“offering ASNs and IPs to children (“</em><a href="https://web.archive.org/web/20240626063422/https:/freetransit.ch/"><em>Minors can still request resources!</em></a><em>”)”</em>,
all in a bit to <em>“help themselves”</em>.</p>

<p>I would argue that the argumentation taken here is, at least, severely worrisome.
First, the insinuation that ASNs and IPs are marketed to <strong>children</strong> is an
arguable stretch for a checkbox in the contact form inquiring whether the sender
is able to legally sign legal documents, i.e., not a <strong>minor</strong>:</p>

<p><img src="/static/img/2024-07-19-ftm.png" alt="Screenshot of the freetransit.ch request form." /></p>

<p>The statement thereunder is also not something <em>“not respecting and enforcing current policies”</em>.
In fact, it is pretty much exactly what <a href="https://www.ripe.net/publications/docs/ripe-637/">RIPE-637</a> is saying about contractual relationships:
<em>There has to be one.</em></p>

<p><a href="https://www.ripe.net/publications/docs/ripe-812/">RIPE-812</a> also clarifies that a member can be <em>“A natural person or
a legal entity that has entered into the RIPE NCC Standard Service Agreement
with the RIPE NCC.”</em> There is no need for a member to be (at least depending on
jurisdiction, but generally for the EU, IANAL) 18. A 17 year old can very much
become a member, if they enter into a legal relationship with the RIPE NCC.
Usually, this will require consent and signature from their legal guardian. At
least that is my rather naive non-lawyer reading of the absence of any age
restrictions in <a href="https://www.ripe.net/publications/docs/ripe-812/">RIPE-812</a>. (Also: How else would minors otherwise
join the local football club of a small town in northern Friesland; It, after
all, has the same legal structure as RIPE…) Hence, I see no reason why this
could not be the case for an end user’s relationship to a sponsoring LIR.</p>

<p>This leaves the whole issue of framing a statement around <em>minors</em> to <em>children</em>
out of the picture; I am not making a value statement regarding the contents,
but do note that I find <em>using</em> the framing to be a discussion style I personally
do not necessarily associate with content-directed argumentation.</p>

<p>Finally, there is the argument that this is being done for ‘own gains’ by these
LIRs; (<strong>Note:</strong> I am not contesting that there <em>also</em> are some actors trying to
leverage end-users to create a (quick) profit; In fact, I am pretty sure there
are <em>some</em>; Just not necessarily the specific examples selected here.)</p>

<p>However, again going back to the explicitly chosen example, using an LIR
invoicing customers a yearly re-occuring cost very close to the actual cost
billed by the NCC (EUR145 YRC for ASN+PI, i.e., a ‘profit’ of EUR20… which still
includes VAT and covering costs) is ‘somewhat a stretch’. Considering how much
time usually goes into a registration process (KYC, contract, interaction with
the NCC etc.), I would argue that this is, best case, covering the actual
costs, even when considering the higher fee billed during the first year.</p>

<p>Hence, overall… especially this argument feels… difficult.</p>

<h1 id="ipv6-pa-is-being-abused-for-pi">IPv6 PA is being abused for PI</h1>

<p>Subsequently, the argument is made that, indeed, some LIRs are also offering
end-users to use PA instead of applying for PA. In general, the argument
vaguely alludes to this being <a href="https://labs.ripe.net/author/marco_schmidt/ipv6-stockpiling-a-trojan-horse-in-our-midst/">missuse of resources</a>.</p>

<p>My point here, first and foremost, would be that there are indeed <em>some</em> issues
with the current IPv6 assignment policy that lead to not ideal effects.</p>

<p>However, I would also argue that it might make sense to, instead, <a href="https://www.ripe.net/publications/docs/ripe-781/">participate in the
policy-development-process</a> in order to <a href="https://ripe88.ripe.net/archives/video/1324">actually do something</a>
about the underlying operational issues.</p>

<h1 id="personal-asns-are-used-for-policy-abuse-and-to-pollute-databasesthe-grt">Personal ASNs are used for Policy Abuse and to pollute databases/the GRT</h1>

<p>The final argument revolves around an argument of PI/personal ASN resources being used
for abusing the policy; It leads with the argument of a person requesting multiple
ASNs for multiple projects, noting that it is rather unlikely that the person
<em>actually</em> needs that many ASNs; I feel somewhat called-out. <strong>;-)</strong></p>

<p>However, argumentatively, this section feels a bit like a lot of straw burning
again.  After all, for any personal AS used for nefarious stuff, I can likely
name an AS registered to a ‘real’ company or even LIR that is no less engaged
in annoying or directly malivious activity.</p>

<p>Similarly, I think that the author of the initial article is <em>insanely</em> lucky
that no <em>Secret-WG</em> does not even not not exist, that may or may not even have
the audacity to <em>pollute</em> the database with <a href="https://ftp.ripe.net/ripe/dbase/split/ripe.db.poem.gz">limericks</a>. Otherwise, rather
black helicopters would already be circling overhead, given this straight out
attack on such a non-existing organization.</p>

<p>And knowing the (PI and personal ASN holding) person who put in the XSS
referenced (and was the one thereby finding an issue with the web edits tooling
in the process, reporting it, and getting it fixed)…  I am not sure whether
it makes it more of an argument <em>for</em> personal ASNs.</p>

<h1 id="conclusion">Conclusion</h1>

<p>In conclusion, the arguments presented in <a href="https://labs.ripe.net/author/eu/driving-the-asn-truck-without-a-licence/">the RIPE Labs post</a> are not
able to convince this revie<code class="language-plaintext highlighter-rouge">^H^H^H^H^H</code>member. Despite acknowledging that there
<em>are</em> serious issues around some of these developments, and that we certainly <em>do</em>
need more <a href="https://ripe85.ripe.net/archives/video/877"><em>care</em></a> in the operation of the Internet, we <em>also</em> need a wide
availability of operational experience currently often far too lacking <em>even
for professional companies</em>. Picking on some of the LIRs actually rather
engaged in poking the end users they sponsor to do (and learn to do) the right
thing may also be… <em>‘not necessarily the ideal approach’</em> toward improving the
situation. And–based on my engagement with the arguments–I, indeed, find myself seeing
how commenters reached the conclusion that the article creates an imperession
of <em>‘gate keeping’</em>.</p>

<p>So, as a herder of cats, reading the conclusion, I kind of do not feel too bad
about being the one to put the ‘Mau’ into (<em>“silly smöl meow meow”</em><code class="language-plaintext highlighter-rouge">[sic]</code>) networks. And
I think I will continue doing that; Responsibly, for the good of the
<strong>I</strong>nternet. For it to become again the open distributed end-to-end
infrastructure ultimately owned by noone, enabling equitable participation,
expression and learning which it was once envisioned to be; Making sure that those I
take responsibility for do not break the Internet. Well, at least I will keep
trying that (as well as not breaking it myself).</p>

<p>(<strong>Final Note:</strong> Yes, I know, AS59645 <em>also</em> sometimes does stupid things, like
leaking routes because of an algorithmic error in community handling for
exported prefixes leading to an overloaded router, which collided with FRR’s
non-atomic config application; But hey, at least that motivated me ultimately
to take a shot at updating <a href="https://datatracker.ietf.org/doc/draft-ietf-grow-bgpopsecupd/">BCP194</a>. Thee who is without any weird
configuration body in their basement, throweth the first depeering; Or something
like that.)</p>]]></content><author><name></name></author><category term="ripe" /><category term="policy" /><category term="personal" /><category term="asn" /><summary type="html"><![CDATA[OpenBSD version: might have helped Arch: Any NSFP: That seems to be debatable...]]></summary></entry><entry><title type="html">SERVFAIL me one more time: Reflections on TU Delft’s Downtime</title><link href="https://doing-stupid-things.as59645.net/dns/dos/downtime/operations/2024/06/01/servfail-me-one-more-time.html" rel="alternate" type="text/html" title="SERVFAIL me one more time: Reflections on TU Delft’s Downtime" /><published>2024-06-01T22:40:21+02:00</published><updated>2024-06-01T22:40:21+02:00</updated><id>https://doing-stupid-things.as59645.net/dns/dos/downtime/operations/2024/06/01/servfail-me-one-more-time</id><content type="html" xml:base="https://doing-stupid-things.as59645.net/dns/dos/downtime/operations/2024/06/01/servfail-me-one-more-time.html"><![CDATA[<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>OpenBSD version: might have helped
Arch:            Any
NSFP:            Well... obviously not.
</code></pre></div></div>

<p>“It has been 0 days since it was DNS.”</p>

<p>Last Monday I noticed some of my nagios checks going critical. Specifically, it
was the checks for the anycasted DNS recursors testing <a href="https://datatracker.ietf.org/doc/draft-momoka-v6ops-ipv6-only-resolver/02/">Momoka’s draft on v4
resolution for v6 resolvers</a> that would try to resolve <code class="language-plaintext highlighter-rouge">tudelft.nl</code>.
Why <code class="language-plaintext highlighter-rouge">tudelft.nl</code>? Well, this zone is reliably free of any IPv6 support.
At the same time, various projects I operate were unable to deliver mail to
users <code class="language-plaintext highlighter-rouge">@tudelft.nl</code>. The reason here being that the domain was not resolvable.</p>

<p>I initially did not really want to write about this incident, but given that
the number of people who asked me about this, given my background in a)
operations, and b) research about “exactly this” has slowly reached “too many”,
I figured this might be easier.</p>

<h1 id="disclaimer">Disclaimer</h1>

<p>I am writing this text as a system and network engineer and scientist working
on the subject of digital infrastructure operations. Statements, suggestions,
and conclusions presented in this document are based on my <a href="https://pure.mpg.de/rest/items/item_3532055_4/component/file_3532150/content">practical</a>
experience and <a href="https://pure.mpg.de/rest/items/item_3480830_6/component/file_3561680/content">scientific</a> findings around running digital
infrastructure in a higher-education/research network context. My statements do
not necessarily represent the official position of my employer or any other
affiliation I hold, and are, as such, my own.</p>

<p>I note that I <em>was</em> employed by TU Delft until 2022-03-31 and held a
hospitation agreement with TU Delft up until 2024-03-31. However, I was never
part of the operational staff involved with the digital infrastructure of TU
Delft.  At this point in time, I am in no way affiliated with TU Delft.  As
such, I do not have any privileged insight or information on the operational
efforts in general or the incident at hand specifically.</p>

<p>All information provided and statements made in this document are either the
result of analysis informed by public information, or conjecture based on
common operational practices. This document does not contain confidential or
proprietary information of TU Delft.</p>

<p>The purpose of this document is the open discussion of challenges in the
operation of digital infrastructure for higher education institutions. Observed
deviations from operational best practices are only stated when independently
verifiable. Conjecture is marked as such.</p>

<h1 id="timeline-of-events">Timeline of Events</h1>

<p>At the moment, I see the following timeline of events in relation to the events
at hand, all times UTC:</p>

<ul>
  <li><em>2024-05-28T02:19:24:</em> monitoring for <code class="language-plaintext highlighter-rouge">n64v6res01.ber01.as59645.net</code> notified
that <code class="language-plaintext highlighter-rouge">?IN A tudelft.nl</code> cannot be resolved. <code class="language-plaintext highlighter-rouge">n64v6res01.ber01.as59645.net</code>
first notified at 2024-05-28T02:28:32, and finally
<code class="language-plaintext highlighter-rouge">n64v6res01.ams01.as59645.net</code> notified at 2024-05-28T03:18:50.</li>
  <li><em>2024-05-28T11:41:10:</em> Monitoring notes that <code class="language-plaintext highlighter-rouge">tudelft.nl</code> is resolvable again
on <code class="language-plaintext highlighter-rouge">n64v6res01.dus01.as59645.net</code>, <code class="language-plaintext highlighter-rouge">n64v6res01.ber01.as59645.net</code> follows at
2024-05-28T11:44:06. <code class="language-plaintext highlighter-rouge">n64v6res01.ams01.as59645.net</code> was already able to
resolve <code class="language-plaintext highlighter-rouge">tudelft.nl</code> at 2024-05-28T06:12:59.</li>
  <li><em>2024-05-28:</em> TU Delft ICT published a notification on
https://meldingen-ict.tudelft.nl/en/, noting ongoing issues with DNS which
are being investigated. This message has since been removed.</li>
  <li><em>2024-05-30:</em> Issues with resolving <code class="language-plaintext highlighter-rouge">tudelft.nl</code> persist intermittently. It
appears  that the DNSSEC configuration of <code class="language-plaintext highlighter-rouge">tudelft.nl</code> <a href="https://dnsviz.net/d/tudelft.nl/Zlg8AA/dnssec/">has been
misconfigured</a>, i.e., new keys were added on the
authoritative servers, but the <code class="language-plaintext highlighter-rouge">DS</code> records in <code class="language-plaintext highlighter-rouge">.nl</code> were not updated.</li>
  <li><em>2024-05-31:</em> While intermittent reachability issues continue. Meanwhile, TU
Delta reported that the attack from 2024-05-28 <a href="https://delta.tudelft.nl/en/article/heavy-ddos-attack-on-tu-delft-had-minor-consequences">reached 2.8 trillion requests
per half hour</a>. A former colleague also informally noted that,
apparently, during the attack external connectivity was briefly interrupted
by ICT to be able to work on further mitigating the attack.</li>
  <li><em>2024-06-01:</em> <code class="language-plaintext highlighter-rouge">tudelft.nl</code> seems to have <a href="https://dnsviz.net/d/tudelft.nl/Zlrjng/dnssec/">partially
recovered</a>.</li>
  <li><em>2024-06-01:</em> Around 13:00, DNS resolution seems to return to normal for most
clients. However, DNSViz still <a href="https://dnsviz.net/d/tudelft.nl/ZlsxCQ/dnssec/">reports the zone to be
bogus</a>.  Based on that data, it appears that
<code class="language-plaintext highlighter-rouge">tudelft.nl</code> currently responds with <code class="language-plaintext highlighter-rouge">RRSIG</code>s signed with keyid 135, while
<code class="language-plaintext highlighter-rouge">.nl</code> only holds <code class="language-plaintext highlighter-rouge">DS</code> records for keyid 47965 and keyid 28945, with neither
signing 135. Furthermore, responses from the <code class="language-plaintext highlighter-rouge">tudelft.nl</code> namesevers still
have a high RTT, sometimes timing out.</li>
</ul>

<h1 id="what-might-be-going-on">What might be going on…</h1>

<p>The question now is: What really happened, and why is it not yet fixed?
Let’s delve into that, shall we?</p>

<h2 id="the-nature-of-the-dos">The nature of the DoS</h2>

<p>The root cause of these events seems to be the DoS attack from Monday. This has
been claimed to have caused 2.8 trillion requests per 30 minutes. Now, this
may mean 1,000,000,000,000 (<code class="language-plaintext highlighter-rouge">10^12</code>) or 1,000,000,000,000,000,000 (<code class="language-plaintext highlighter-rouge">10^18</code>)
requests depending on <a href="https://en.wikipedia.org/wiki/Trillion">the definition of trillion</a>.</p>

<p>This allows us to gauge the bandwidth that must have come in per second, on average.
A request for <code class="language-plaintext highlighter-rouge">?IN A tudelft.nl</code> has, using <code class="language-plaintext highlighter-rouge">dig A tudelft.nl @192.0.2.1</code>, 79 bytes.
For <code class="language-plaintext highlighter-rouge">10^12</code>, this makes ca. 351.11 Gbit/s on average, for <code class="language-plaintext highlighter-rouge">10^18</code> it makes ca.
351.11 Pbit/s; Mildly unlikely.</p>

<p>Even the avg. 351.11 Gbit/s are somewhat unlikely to have reached the servers,
given that it would require a campus uplink of at least 800 Gbit/s (considering
that such small packets <a href="https://old.fmad.io/blog-what-is-10g-line-rate.html">come with a lot of overhead</a>) if the statement
that the servers had to process this amount of requests also should hold. Then
again, these stats <em>might</em> also have been collected upstream, i.e., in the SURF
backbone.</p>

<p>In any case, this is obviously a lot of data, and nothing overly easy to handle.</p>

<h2 id="the-authoritative-dns-servers-of-tudelftnl">The Authoritative DNS Servers of <code class="language-plaintext highlighter-rouge">tudelft.nl</code></h2>

<p><a href="https://www.rfc-editor.org/rfc/rfc2182">RFC2182</a> has some opinions on running authoritative nameservers; or rather,
the <em>second</em> one for a zone, and says in 3.1:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>   When selecting secondary servers, attention should be given to the
   various likely failure modes.  Servers should be placed so that it is
   likely that at least one server will be available to all significant
   parts of the Internet, for any likely failure.
</code></pre></div></div>

<p>Now, how does this look for <code class="language-plaintext highlighter-rouge">tudelft.nl</code>:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>% dig NS tudelft.nl @ns1.dns.nl

; &lt;&lt;&gt;&gt; DiG 9.16.48 &lt;&lt;&gt;&gt; NS tudelft.nl @ns1.dns.nl
;; global options: +cmd
;; Got answer:
;; -&gt;&gt;HEADER&lt;&lt;- opcode: QUERY, status: NOERROR, id: 17759
;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 2, ADDITIONAL: 3
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;tudelft.nl.			IN	NS

;; AUTHORITY SECTION:
tudelft.nl.		3600	IN	NS	ns1.tudelft.nl.
tudelft.nl.		3600	IN	NS	ns2.tudelft.nl.

;; ADDITIONAL SECTION:
ns1.tudelft.nl.		3600	IN	A	130.161.180.1
ns2.tudelft.nl.		3600	IN	A	130.161.180.65

;; Query time: 30 msec
;; SERVER: 2001:678:2c:0:194:0:28:53#53(2001:678:2c:0:194:0:28:53)
;; WHEN: Sat Jun 01 17:31:06 CEST 2024
;; MSG SIZE  rcvd: 107
</code></pre></div></div>

<p>Well. Those two are <em>certainly</em> in the same /24. No matter how geographically
distributed the setup is, this does <em>not</em> fulfill the requirements of 
<a href="https://www.rfc-editor.org/rfc/rfc2182">RFC2182</a> or <a href="https://www.rfc-editor.org/rfc/rfc3258">RFC3258</a>. And this has been like that for… 
some… time. And, incidentally, still <em>is</em> like that.</p>

<h2 id="how-it-all-comes-together">How it all Comes Together</h2>

<p>Based on the observations above, I make the following conjectures concerning
the events that transpired in response to the DoS attack.</p>

<ul>
  <li>I assume that the authoritative DNS servers of <code class="language-plaintext highlighter-rouge">tudelft.nl</code> were the primary
target of the attack, given that TU Delft ICT first reported to be investigating
an issue related to their DNS servers.</li>
  <li>As the nameservers of <code class="language-plaintext highlighter-rouge">tudelft.nl</code> are single-homed behind its campus network,
disconecting or otherwise severing, i.e., via the DoS, the campus network from
the rest of the Internet makes <code class="language-plaintext highlighter-rouge">tudelft.nl</code> unreachable, also making, e.g.,
emails undeliverable.</li>
  <li>In response to the observed attack, TU Delft ICT hopefully asked SURF to ACL,
e.g., via Flowspec, certain routes into their network.</li>
  <li>As there was still notable inbound requests hitting the authoritative NS, ICT
decided to either replace or provision an additional authoritative DNS worker.
Alternatively, possibly a middle box or DNS filtering solution was brought in.</li>
</ul>

<p>Together, the above steps restored operations, especially after the DoS attack
subsided a bit. Issues then resurfaced after roughly 48 hours when the TTL
of 172800 seconds for the DNSKEY records of <code class="language-plaintext highlighter-rouge">tudelft.nl</code> expired on more and
more recursive resolvers.</p>

<p>Possibly, during the change (adding or upgrading a worker/traffic filter) a new
ZSK DNSSEC key (keyid=135) was introduced. Based on the DNSViz resolutions, it
seems like this DNSKEY is not always delivered with an RRSIG signed by one of
the keys for which a DS record is installed in <code class="language-plaintext highlighter-rouge">.nl</code>, even though it seems like
<a href="https://dnsviz.net/d/tudelft.nl/Zlrjng/dnssec/">it <em>should</em> be signed by keyid=47965</a>. Nevertheless,
it <a href="https://dnsviz.net/d/tudelft.nl/ZlsxCQ/dnssec/">not always is</a>, and the keyid <a href="https://dnsviz.net/d/tudelft.nl/Zlg8AA/dnssec/">was not present
earlier this year</a>.  which might have benign reasons, e.g.,
a ZSK rollover.</p>

<p>Similarly, it might be that those queries with those specific rrsigs/ DNSKEY
records simply get eaten somewhere on-path or enjoy just being dropped.  This
may be the result of a misconfiguration, the result of ongoing DoS, or caused
by the introduction of specific defense mechanisms, e.g., heavy per-client rate
limits.</p>

<p>In any case, something is still broken, and this heavily impacts reachability
for <code class="language-plaintext highlighter-rouge">tudelft.nl</code> in the DNS.</p>

<p><em>Update 2024-06-02:</em> The more I think about this, the more this feels like the
result of ratelimiting; Which will likely also impact large resolvers (q1/q8/q9)
less, because they will re-resolve from different IPs. Also makes sense given
135 is a zsk, and the other two are ksk.</p>

<h2 id="absence-of-monitoring">Absence of Monitoring</h2>

<p>Given the prolongued nature of the DNSSEC issues, I would argue that TU Delft
ICT might not be fully aware of the issue, may not yet have identified the root
cause, or may face challenges in fixing the specific DNS server implementation
they are running to present properly signed zones.</p>

<h1 id="what-could-have-been-done">What could have been done</h1>

<p>Now, it is always easy (and mildly fun) to shout at the TV when a football game
is going on. So, being a person that likes easy things, I will go for that.
What would I have done seeing a ton of packets coming to my authoritative NS
all of a sudden?</p>

<h2 id="do-not-run-ns-in-a-single-25">Do not run NS in a single /25</h2>

<p>The obvious first step is not having all NS in a single /25 (yes,
‘twenty-five’), and most certainly not all of them on campus. Ideally, also
more than just two.  Instead, I would have tried to find multiple secondaries,
likely just ingesting AXFR, across different networks. Ideally also one
anycasted.</p>

<p>This would already have sufficed to ensure that email delivery is not impacted,
and all cloud hosted services remain accessible.</p>

<h2 id="dos-defense">DoS Defense</h2>

<p>Assuming that NS are also off-campus in <em>other</em> netblocks with <em>unique routing
policies</em> would have made it a lot easier to very blanket-ly block, e.g., inbound
packets with dport udp/53 and tcp/53; Flowspec comes in handy there, and SURF would
likely have been happy to assist there.</p>

<p>There are also some options with the current setup of ‘one’ DNS server network.
Depending on what else lives in those /25 (ideally not much but well… ), one
could have drawn an AXFR from the authoritative and started serving the zone from
the /24 now being anycasted.</p>

<p>That <em>would</em> have required surf to actually set a ROUTE object, and ideally
configure a ROA in RPKI; Furthermore, one would have had to convince SURF to
borrow one of their (spare) ASNs. Hence, in general, this is more of a fun thing
to do with a bit more time than ‘everything burns’.</p>

<h2 id="monitor-stuff">Monitor Stuff</h2>

<p>I noted above that there is some monitoring likely… missing, given that the
issue still persists. Which is ‘not really ideal’. So getting that in place would
also be a top priority.</p>

<p>Now, something one could rather easily do would be setting up some RIPE Atlas
measurements; Like these ones for <code class="language-plaintext highlighter-rouge">mpg.de</code>:</p>

<ul>
  <li>IPv4 UDP: https://atlas.ripe.net/measurementdetail/72293568/</li>
  <li>IPv4 TCP: https://atlas.ripe.net/measurementdetail/72293620/</li>
  <li>IPv6 UDP: https://atlas.ripe.net/measurementdetail/72293583/</li>
  <li>IPv6 TCP: https://atlas.ripe.net/measurementdetail/72293641/</li>
</ul>

<p>That can then be easily polled by some off-site monitoring setup. Such a
monitor can then also check a lot of other services, especially end-to-end
things like “Can a user send an email, does it arrive somewhere else, and can
its DKIM signature be verified then?”</p>

<h1 id="conclusion">Conclusion</h1>

<p>There can be broad speculation as to why this is now happening to TU Delft, and
why DNS(SEC) does not really want to play nice at the moment. In my <a href="https://pure.mpg.de/rest/items/item_3480830_6/component/file_3561680/content">scientific
work</a>, there are some conjectures that an increasing introduction
of a cloud first strategy can lead to a capability errosion, especially for
basic services, due to an organization realizing cost savings by not
maintaining or actively reducing in-house capacity for, especially, basic
services. Still, even though TU Delft <em>is</em> following a very cloud focused
approach, finding a conclusive answer there would require perspectives from
inside the organization, which I do not have.</p>

<p>Hence, what remains is me being hopefull that ICT is aware of the underlying
issue, and am convinced that the team members do their best to resolve the
incident.</p>

<p>From a technical perspective, the issues should get <em>a lot</em> better by adding
new authoritative servers, i.e., simple AXFR’ing secondaries. I am not sure why
this is not being done (not only now, but in the years before). It might be,
for example, the complexity of the current setup, or, simply that
organizational knowledge on the setup was lost over time;</p>

<p>However, as a person who had to emergency migrate a production setup to a
PowerDNS authoritative in a few hours… I am pretty sure that it is not too
difficult to:</p>

<ul>
  <li>Get a few additional servers in different providers</li>
  <li>Make sure that there exists <em>one</em> complete copy of the zone, i.e., an AXFR
that is properly signed and contains all necesarry DNS key records</li>
  <li>Optional: Add a new DS to tudelft.nl via the registrar</li>
  <li>Throw NSD, Knot, Bind9, or PowerDNS on the servers with a configuration that
can <a href="https://medium.com/nlnetlabs/tuning-nsd-for-even-better-performance-a43fbbe61b5c">handle ~7k qps</a></li>
  <li>Make the registrar add those new NS (+glue) to .nl</li>
  <li>Bonus: Get additional NS under different TLDs for extra redundancy</li>
</ul>

<p>But I guess things are just looking a bit too simple from the seat of a
professional commentator: With the remote to the right, and a bowl of chips to
the left, giving good advice from afar. And with that, I grab another beverage,
and murmor at the TV: “Well, maybe you should have not tried to drive a golf car
through a race track!”; Not realizing that I might be watching Golf instead.</p>]]></content><author><name></name></author><category term="dns" /><category term="dos" /><category term="downtime" /><category term="operations" /><summary type="html"><![CDATA[OpenBSD version: might have helped Arch: Any NSFP: Well... obviously not.]]></summary></entry><entry><title type="html">(UPDATE) Howto: Send an email to an (OpenBSD) mailinglist…</title><link href="https://doing-stupid-things.as59645.net/email/mailinglists/dmarc/2024/03/14/sending-an-email-that-arrives.html" rel="alternate" type="text/html" title="(UPDATE) Howto: Send an email to an (OpenBSD) mailinglist…" /><published>2024-03-14T09:21:21+01:00</published><updated>2024-03-14T09:21:21+01:00</updated><id>https://doing-stupid-things.as59645.net/email/mailinglists/dmarc/2024/03/14/sending-an-email-that-arrives</id><content type="html" xml:base="https://doing-stupid-things.as59645.net/email/mailinglists/dmarc/2024/03/14/sending-an-email-that-arrives.html"><![CDATA[<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>OpenBSD version: doesn't really matter
Arch:            Any
NSFP:            More of it, please
</code></pre></div></div>

<p>Somewhen in 2022, is <a href="https://doing-stupid-things.as59645.net/email/mailinglists/dmarc/2022/05/19/sending-an-email.html">complained a bit</a> about the impact DMARC can
have on mailinglists. Specifically, I was struggling a bit with OpenBSD’s
mailinglists stripping <a href="https://datatracker.ietf.org/doc/html/rfc6376">DKIM</a> while–obviously–breaking
<a href="https://datatracker.ietf.org/doc/html/rfc7208">SPF</a>. With my <code class="language-plaintext highlighter-rouge">p=reject</code> <a href="https://datatracker.ietf.org/doc/html/rfc7489">DMARC</a> policy, this meant that
sending to OpenBSD mailinglists would ‘reduce’ the number of people who
would actually get my mails.</p>

<p>Back then, i solved that by registering <a href="https://reads-this-mailinglist.com/">reads-this-mailinglist.com</a>,
which let me have a domain with a significantly relaxed policy (and the
OpenBSD list hosts included in SPF), to make my mails deliver again when
traversing through the OpenBSD mailinglists.</p>

<p>However, over the past two years, the number of mails <em>i</em> (or rather <em>my
mailserver</em>) kept rejecting from OpenBSD kept constantly growing, also
because of–in most cases–DMARC.</p>

<h1 id="the-list-changing">The list changing</h1>

<p>I hence decided to <a href="https://marc.info/?l=openbsd-misc&amp;m=171015367409290&amp;w=2">complain again (well, you gotta do what you’re good at,
no?) on misc@</a>. However, complaining actually had an effect this time
around, as Todd C. Miller was <a href="https://marc.info/?l=openbsd-misc&amp;m=171035246721970&amp;w=2">very quick to take action</a> this time around,
and implemented from-rewriting for addresses that have a DMARC policies for the
OpenBSD majordomo.</p>

<p>And, what shall i say, <a href="https://marc.info/?l=openbsd-misc&amp;m=171036098332158&amp;w=2">it works!</a> I of course (accidentally) tested that
right away, because i slipped and sent my reply from the wrong (strict
<code class="language-plaintext highlighter-rouge">p=reject</code>) address; However, with the change implemented, this now worked
flawlessly! :-)</p>

<p>So, thanks Todd!</p>]]></content><author><name></name></author><category term="email" /><category term="mailinglists" /><category term="dmarc" /><summary type="html"><![CDATA[OpenBSD version: doesn't really matter Arch: Any NSFP: More of it, please]]></summary></entry><entry><title type="html">Configuring Postfix as a proxy in front of Exim MTAs</title><link href="https://doing-stupid-things.as59645.net/mail/2023/09/30/postfix-proxy-setup.html" rel="alternate" type="text/html" title="Configuring Postfix as a proxy in front of Exim MTAs" /><published>2023-09-30T14:40:12+02:00</published><updated>2023-09-30T14:40:12+02:00</updated><id>https://doing-stupid-things.as59645.net/mail/2023/09/30/postfix-proxy-setup</id><content type="html" xml:base="https://doing-stupid-things.as59645.net/mail/2023/09/30/postfix-proxy-setup.html"><![CDATA[<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>OpenBSD Version: -
Arch:            All
NSFP:            Uff...
</code></pre></div></div>

<p>With the recent announcement of <a href="https://seclists.org/oss-sec/2023/q3/254">RCE in Exim</a>, and patches still being pending, it might be opportune to move existing Exim setups behind another frontend.
Postfix, for example.</p>

<p>In this article, I will describe methods to a) add an <strong>additional inbound MX using postfix</strong>, and b) enabling <strong>authenticated email sending with postfix</strong> for existing Exim environments.</p>

<h1 id="transparent-additional-mx--inbound-email">Transparent additional MX / Inbound Email</h1>

<h2 id="requirements">Requirements</h2>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>- Clean VM with Debian 12
- IPv4/IPv6 Address for that box with inbound port tcp/25
- Existing Exim MXes
</code></pre></div></div>

<h2 id="initial-situation">Initial Situation</h2>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>DNS:
example.com. IN MX 50 mx01.example.com.
example.com. IN MX 75 mx02.example.com.
mx01.example.com. IN A 203.0.113.5
mx02.example.com. IN A 192.0.2.23
</code></pre></div></div>
<p>Furthermore, both mx01.example.com and mx02.example.com run a vulnerable Exim.</p>

<h2 id="configuring-mx03examplecom">Configuring mx03.example.com</h2>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>DNS:
example.com. IN MX 50 mx01.example.com.
example.com. IN MX 75 mx02.example.com.
example.com. IN MX 99 mx03.example.com. ; Add this entry
mx01.example.com. IN A 203.0.113.5
mx02.example.com. IN A 192.0.2.23
mx03.example.com. IN A 198.51.100.12 ; Add this entry
</code></pre></div></div>

<p>The system should be called <code class="language-plaintext highlighter-rouge">mx03.example.com</code>!</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code># hostname -f
mx03.example.com
</code></pre></div></div>

<h2 id="install-required-software">Install required software</h2>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code># apt-get install python3-certbot certbot postfix
</code></pre></div></div>

<h2 id="obtain-tls-certificates">Obtain TLS certificates</h2>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code># certbot certonly --standalone --preferred-challenges http --agree-tos --email contact@example.com -d `hostname -f`
</code></pre></div></div>

<h2 id="configure-postfix-edit-etcpostfixmaincf">Configure postfix (edit /etc/postfix/main.cf)</h2>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code># See /usr/share/postfix/main.cf.dist for a commented, more complete version


# Debian specific:  Specifying a file name will cause the first
# line of that file to be used as the name.  The Debian default
# is /etc/mailname.
#myorigin = /etc/mailname

smtpd_banner = $myhostname ESMTP $mail_name (Debian/GNU)
biff = no

# appending .domain is the MUA's job.
append_dot_mydomain = no

# Uncomment the next line to generate "delayed mail" warnings
#delay_warning_time = 4h

readme_directory = no

# See http://www.postfix.org/COMPATIBILITY_README.html -- default to 3.6 on
# fresh installs.
compatibility_level = 3.6



## TLS parameters
# IMPORTANT: Update the path to fit your environment/names!
smtpd_tls_cert_file=/etc/letsencrypt/live/mx03.example.com/fullchain.pem
smtpd_tls_key_file=/etc/letsencrypt/live/mx03.example.com/privkey.pem
smtpd_tls_security_level=may
# This could be better; But well.
smtpd_tls_protocols = !SSLv2, !SSLv3, !TLSv1
smtpd_tls_loglevel = 1

## Enforce TLS to primary; We should have TLS setup, no? If not, set:
# smtp_tls_security_level=may
smtp_tls_CApath=/etc/ssl/certs
smtp_tls_security_level=verify
smtp_tls_verify_cert_match = hostname, nexthop, dot-nexthop
smtp_tls_session_cache_database = btree:${data_directory}/smtp_scache
smtp_tls_loglevel = 1

## Use reject_unverified_recipient to enforce validation of deliverability to prevent backscatter
# messages are denied with 450, i.e., non perm-error
smtpd_relay_restrictions = permit_mynetworks reject_unverified_recipient defer_unauth_destination
myhostname = mx03.example.com
alias_maps = hash:/etc/aliases
alias_database = hash:/etc/aliases
myorigin = /etc/mailname
mydestination = $myhostname, mx03.example.com, localhost.example.com, localhost

## Relay configuration; Add your domains under relay domains
relay_domains = example.com, example.net
# You can optionally hard-code the relay host you want to use; Usually this should
# not be necessary, though.
# relayhost = [mx01.example.com]:25
# Enable soft bouncing, i.e., do not permanently reject mail to ensure no messages
# are lost.
soft_bounce = yes

mynetworks = 127.0.0.0/8 [::ffff:127.0.0.0]/104 [::1]/128
mailbox_size_limit = 0
recipient_delimiter = +
inet_interfaces = all
inet_protocols = all
</code></pre></div></div>

<h2 id="restart-postfix">Restart postfix</h2>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code># service postfix restart
</code></pre></div></div>

<h2 id="test-mail-sending">Test mail sending</h2>
<p>In one terminal, run:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code># journalctl -f|grep postfix
</code></pre></div></div>

<p>Then, from your host, use the following script; Please update sender/recipient addresses as appropriate:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>#!/usr/bin/env python3
import sys
import time
import datetime
import hashlib
import smtplib
from email.mime.multipart import MIMEMultipart
from email.mime.text import MIMEText

def send_mail(msg):
        srvr = 'mx03.example.com'
        port = 25
        sndr = 'My Sender &lt;test@example.com&gt;'

        s = smtplib.SMTP(host=srvr, port=port)
        # Uncomment this if you did not configure TLS
        s.starttls()
        s.send_message(msg)

def create_message(dst='dst@example.com'):
        msg = MIMEMultipart()
        msg['To']='&lt;'+dst+'&gt;'
        msg['From']='My Sender &lt;test@example.com&gt;'
        msg['Subject']="Test mail"
        msg_text = "Test mail"
        msg.attach(MIMEText(msg_text, 'plain'))

        return msg


msg = create_message()
send_mail(msg)
</code></pre></div></div>

<p>If mail sending works, you should see the mail accepted and relayed in the log, and ultimately find it in the right inbox wherever these are stored.</p>

<h2 id="firewalling-and-filtering">Firewalling and Filtering</h2>

<p>Next, you must limit access to port tcp/25 on mx01.example.com and mx02.example.com, while still allowing access from mx03.example.com.
How you can do this depends on your setup.
The easiest way will likely be using iptables, or rules in your local firewall, see the picture below.</p>

<p><img src="/static/img/2023-09-30-postfix-mx.png" alt="Overview of the setup for adding a postfix MX to an existing Exim setup." /></p>

<p>Furthermore, you should make sure that mx01.example.com and mx02.example.com accept emails forwarded by mx03.example.com regardless of their SPF status/spam filtering etc.;
This is, of course, not ideal, but the purpose of this activity is mainly to survive until exim patches are out.
“Readers are advised to conduct personal risk assessment at their own discretion.”</p>

<h2 id="caveats">Caveats</h2>
<p>There are some things to keep in mind:</p>
<ul>
  <li>If you use DANE/TLSA, you will have to add a record for mx03.example.com</li>
  <li>If you use MTA-STS, you need to update your MTA-STS policy</li>
</ul>

<h1 id="transparent-authenticated-mail-relaying">Transparent Authenticated Mail Relaying</h1>

<p>To get authenticated mail relaying on a temporary postfix proxy, we will use the IMAP plugin for saslauthd to seamlessly integrate authentication for our proxy.</p>

<p><img src="/static/img/2023-09-30-postfix-relay.png" alt="Overview of the setup for adding a postfix authenticated relay to an existing Exim setup." /></p>

<h2 id="requirements-1">Requirements</h2>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>- Clean VM with Debian 12
- IPv4/IPv6 Address for that box with inbound port tcp/465 and tcp/587 in the network segment in which the authenticated relay resides!
- Existing Exim relays
- An IMAP server with the same(!) user/pass syntax as your mail servers!
</code></pre></div></div>

<h2 id="initial-situation-1">Initial Situation</h2>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>DNS:
smtp-auth01.example.com. IN A 198.51.100.10
imap.example.com. IN A 192.0.2.23
</code></pre></div></div>
<p>Here, smtp-auth01.example.com runs a vulnerable Exim.</p>

<h2 id="configuring-smtp-auth02examplecom">Configuring smtp-auth02.example.com</h2>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>DNS:
smtp-auth01.example.com. IN A 198.51.100.10
smtp-auth02.example.com. IN A 198.51.100.12
imap.example.com. IN A 192.0.2.23
</code></pre></div></div>

<p>The new system should be called <code class="language-plaintext highlighter-rouge">smtp-auth02.example.com.</code></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code># hostname -f
smtp-auth02.example.com
</code></pre></div></div>

<h2 id="install-required-software-1">Install required software</h2>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code># apt-get install postfix libsasl2-modules sasl2-bin stunnel net-tools
</code></pre></div></div>

<h2 id="obtain-tls-certificates-1">Obtain TLS certificates</h2>
<p>Copy the TLS certificates of smtp-auth01.example.com over to smtp-auth02.example.com.</p>

<h2 id="setup-stunnel-for-imap">Setup stunnel for IMAP</h2>
<p>Sadly, the <code class="language-plaintext highlighter-rouge">rimap</code> feature of saslauthd does not support TLS.
Hence, to use it, we must use stunnel to connect to the remote IMAP server and bind a non-TLS port to localhost.</p>

<p>Add the following to <code class="language-plaintext highlighter-rouge">/etc/stunnel/imap_example_com.conf</code>:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>setuid = stunnel4
setgid = stunnel4

#pid = /var/run/stunnel.pid

[exmpl-imap]
client = yes
accept = 127.0.0.1:143
connect = imap.example.com:993
verifyChain = yes
CApath = /etc/ssl/certs
checkHost = imap.example.com
OCSPaia = yes
</code></pre></div></div>

<p>After adding the config file, restart stunnel:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>service stunnel4 restart
</code></pre></div></div>

<p>Thereafter, stunnel should be listening on <code class="language-plaintext highlighter-rouge">lo</code>:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code># netstat -an | grep 143
tcp        0      0 127.0.0.1:143           0.0.0.0:*               LISTEN
</code></pre></div></div>

<h2 id="setup-sasl-authentication">Setup SASL authentication</h2>
<p>Next, we have to configure <code class="language-plaintext highlighter-rouge">saslauthd</code>.
There is a good summary on the tool in the <a href="https://wiki.debian.org/PostfixAndSASL">debian wiki</a>;</p>

<p>To make it quick:</p>

<p>Create <code class="language-plaintext highlighter-rouge">/etc/postfix/sasl/smtpd.conf</code> (we are assuming you only support login and plain):</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>pwcheck_method: saslauthd
mech_list: LOGIN PLAIN
</code></pre></div></div>

<p>Create <code class="language-plaintext highlighter-rouge">/etc/default/saslauthd-postfix</code>:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>DESC="Postfix reverse IMAP SASL Authentication Daemon"
NAME="saslauthd-postfix"
START=yes
MECHANISMS="rimap"
MECH_OPTIONS="127.0.0.1"
THREADS=5
OPTIONS="-c -m /var/spool/postfix/var/run/saslauthd"
</code></pre></div></div>

<h2 id="setup-sasl-requirements">Setup SASL requirements</h2>
<p>Create a missing directory and add postfix to the sasl group:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code># dpkg-statoverride --add root sasl 710 /var/spool/postfix/var/run/saslauthd
# adduser postfix sasl
</code></pre></div></div>

<h2 id="restart-and-test-sasl-authd">Restart and test SASL authd</h2>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code># service saslauthd restart
</code></pre></div></div>

<p>After starting saslauthd, you should not only see it running in <code class="language-plaintext highlighter-rouge">ps aux</code>, but should also be able to authenticate.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code># testsaslauthd -u validmailuser -p validpassword -f /var/spool/postfix/var/run/saslauthd/mux
0: OK "Success."
</code></pre></div></div>

<p>While an invalid password should of course not work:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code># testsaslauthd -u validmailuser -p invalidpassword -f /var/spool/postfix/var/run/saslauthd/mux
0: NO "authentication failed"
</code></pre></div></div>

<h2 id="configure-postfix-edit-etcpostfixmaincf-1">Configure postfix (edit /etc/postfix/main.cf)</h2>
<p>Be aware, we are making this look like it is <code class="language-plaintext highlighter-rouge">smtp-auth01</code>!</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code># See /usr/share/postfix/main.cf.dist for a commented, more complete version


# Debian specific:  Specifying a file name will cause the first
# line of that file to be used as the name.  The Debian default
# is /etc/mailname.
#myorigin = /etc/mailname

smtpd_banner = $myhostname ESMTP $mail_name (Debian/GNU)
biff = no

# appending .domain is the MUA's job.
append_dot_mydomain = no

# Uncomment the next line to generate "delayed mail" warnings
#delay_warning_time = 4h

readme_directory = no

# See http://www.postfix.org/COMPATIBILITY_README.html -- default to 3.6 on
# fresh installs.
compatibility_level = 3.6



## TLS parameters
smtpd_tls_cert_file=/etc/letsencrypt/live/smtp-auth01.example.com/fullchain.pem
smtpd_tls_key_file=/etc/letsencrypt/live/smtp-auth01.example.com/privkey.pem
smtpd_tls_security_level=may
smtpd_tls_protocols = !SSLv2, !SSLv3, !TLSv1
smtpd_tls_loglevel = 1

## Enforce TLS to primary
smtp_tls_CApath=/etc/ssl/certs
smtp_tls_security_level=verify
smtp_tls_verify_cert_match = hostname, nexthop, dot-nexthop
smtp_tls_session_cache_database = btree:${data_directory}/smtp_scache
smtp_tls_loglevel = 1

## Use reject_unverified_recipient to enforce validation of deliverability to prevent backscatter
# messages are denied with 450, i.e., non perm-error
smtpd_relay_restrictions = permit_mynetworks reject_unverified_recipient defer_unauth_destination
myhostname = smtp-auth01.example.com
alias_maps = hash:/etc/aliases
alias_database = hash:/etc/aliases
myorigin = /etc/mailname
mydestination = $myhostname, smtp-auth01.example.com, localhost.example.com, localhost

## SASL configuration
cyrus_sasl_config_path = /etc/postfix/sasl
smtpd_sasl_local_domain = $myhostname
smtpd_sasl_auth_enable = no
broken_sasl_auth_clients = yes
smtpd_sasl_security_options = noanonymous

## Relay configuration
relayhost = smtp-auth01.example.com
# Enable soft bouncing, i.e., do not permanently reject mail to ensure no messages
# are lost.
soft_bounce = yes

## Restriction classes for submission/smtps
smtpd_restriction_classes = mua_sender_restrictions, mua_client_restrictions, mua_helo_restrictions
mua_client_restrictions = permit_sasl_authenticated, reject
mua_sender_restrictions = permit_sasl_authenticated, reject
mua_helo_restrictions = permit

mynetworks = 127.0.0.0/8 [::ffff:127.0.0.0]/104 [::1]/128
mailbox_size_limit = 0
recipient_delimiter = +
inet_interfaces = all
inet_protocols = all
</code></pre></div></div>

<p>We have multiple options for handling re-injecting authenticated emails.
What we are using here is relaying to smtp-auth01.example.com on port tcp/25,
assuming that this port is available from <code class="language-plaintext highlighter-rouge">smtp-auth02.example.com</code>. However,
we could also use a dedicated system account for that.</p>

<p>For configuring postfix with authenticated relaying, please see <a href="https://www.howtoforge.com/tutorial/configure-postfix-to-use-gmail-as-a-mail-relay/">other resources</a>.</p>

<h2 id="configure-postfix-edit-etcpostfixmastercf">Configure postfix (edit /etc/postfix/master.cf)</h2>
<p>In addition to configuring the basic postfix features outlined above, we also
need to enable tcp/465 (smtps) and tcp/587 (submission). We can do that in
<code class="language-plaintext highlighter-rouge">/etc/postfix/master.cf</code>, by adding the following lines:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>submission     inet  n       -       y       -       -       smtpd
  -o syslog_name=postfix/submission
  -o smtpd_tls_security_level=encrypt
  -o smtpd_sasl_auth_enable=yes
  -o smtpd_tls_auth_only=yes
  -o smtpd_reject_unlisted_recipient=no
  -o smtpd_client_restrictions=$mua_client_restrictions
  -o smtpd_helo_restrictions=$mua_helo_restrictions
  -o smtpd_sender_restrictions=$mua_sender_restrictions
  -o smtpd_recipient_restrictions=
  -o smtpd_relay_restrictions=permit_sasl_authenticated,reject
smtps     inet  n       -       y       -       -       smtpd
  -o syslog_name=postfix/smtps
  -o smtpd_tls_wrappermode=yes
  -o smtpd_sasl_auth_enable=yes
  -o smtpd_reject_unlisted_recipient=no
  -o smtpd_client_restrictions=$mua_client_restrictions
  -o smtpd_helo_restrictions=$mua_helo_restrictions
  -o smtpd_sender_restrictions=$mua_sender_restrictions
  -o smtpd_recipient_restrictions=
  -o smtpd_relay_restrictions=permit_sasl_authenticated,reject
</code></pre></div></div>

<h2 id="restart-and-test-postfix">Restart and Test postfix</h2>

<p>With everything in place, we can restart postfix:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code># service postfix restart
</code></pre></div></div>

<p>If everything comes up clean, we can redirect the mail submission ports to this host (see the figure in the beginning).
Technically, you might want to test whether auth+relay actually works before doing so (with the script below);
But we are just very trusty people around here, so we guess it works.
Also, testing it now would be difficult, as the TLS certificate currently does not match (remember: we took the one from smtp-auth01.example.com).
Of course, you should also make sure that there are no other open exim ports.</p>

<p>When the redirect has been put in place, and we are sure mails are successfully pushed to the ‘original’ mail relay (so our hopefully existing toolchain involving dkim etc. does not get bypassed), we can test it with the following script (again, please double check the mail addresses/credentials to make them fit your infra):</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>#!/usr/bin/env python3
import sys
import time
import datetime
import hashlib
import smtplib
from email.mime.multipart import MIMEMultipart
from email.mime.text import MIMEText

def send_mail(msg):
        srvr = 'smtp-auth01.example.com'
        port = 465
        user = 'mailuser'
        pswd = 'securepassword'
        #pswd = 'awd'
        sndr = 'Mail Sender &lt;external-recipient@example.net&gt;'

        if port == 465:
                s = smtplib.SMTP_SSL(host=srvr, port=port)
                s.login(user, pswd)
        else:
                s = smtplib.SMTP(host=srvr, port=port)
                s.starttls()
                s.login(user, pswd)
        s.send_message(msg)

def create_message(dst='dst@example.com'):

        msg = MIMEMultipart()
        msg['To']='&lt;'+dst+'&gt;'
        msg['From']='Mail Sender &lt;external-recipient@example.net&gt;'
        msg['Subject']="Test mail"
        msg_text = "Test mail"
        msg.attach(MIMEText(msg_text, 'plain'))

        return msg


msg = create_message()
send_mail(msg)
</code></pre></div></div>

<p>If everything was setup correctly, you should see your mail arriving at the correct destination, and find it having traversed <code class="language-plaintext highlighter-rouge">smtp-auth02.example.com</code> and then <code class="language-plaintext highlighter-rouge">smtp-auth01.example.com</code>.</p>

<p>It of course always makes sense to test whether mail sending works with <a href="https://email-security-scans.org/">https://email-security-scans.org/</a>.</p>

<h1 id="summary">Summary</h1>
<p>So, that’s it; Two ways of hiding exim behind postfix.</p>]]></content><author><name></name></author><category term="mail" /><summary type="html"><![CDATA[OpenBSD Version: - Arch: All NSFP: Uff...]]></summary></entry><entry><title type="html">[DE] Zeit fuer Real(tek|talk): Wenn die PMTUD nicht reicht</title><link href="https://doing-stupid-things.as59645.net/networking/debugging/mtu/2022/11/14/time-to-be-real-de.html" rel="alternate" type="text/html" title="[DE] Zeit fuer Real(tek|talk): Wenn die PMTUD nicht reicht" /><published>2022-11-14T02:16:12+01:00</published><updated>2022-11-14T02:16:12+01:00</updated><id>https://doing-stupid-things.as59645.net/networking/debugging/mtu/2022/11/14/time-to-be-real-de</id><content type="html" xml:base="https://doing-stupid-things.as59645.net/networking/debugging/mtu/2022/11/14/time-to-be-real-de.html"><![CDATA[<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>OpenBSD Version: Erm... keine. o.O
Arch:            Alle
NSFP:            Ja, ne.
</code></pre></div></div>

<p><strong>Disclaimer:</strong> Der Text ist mehr oder minder auto-uebersetzt, mit einigen manuellen aenderungen damit es nicht gaaaaanz so schlimm ist. Ich empfehle den <a href="https://doing-stupid-things.as59645.net/networking/debugging/mtu/2022/11/14/time-to-be-real.html">englischsprachigen Artikel</a>, falls du auch Englisch sprichst (oder zumindest liest).</p>

<p>Die Mastodon-Instanz von <a href="https://digitalcourage.de/">Digitalcourage e.V.</a> hatte in den letzten paar Monaten ein paar <a href="https://digitalcourage.social/@freiheit/109240695486058178">Netzwerkproblemen</a>.
User konnten Dateien nur mit ca. 100KB/s von der Instanz downloaden, und alles war irgendwie langsam.
Das war natürlich eher ‘nicht so gut’.</p>

<p>Ueber die Zeit zeichnete sich in den Rueckmeldungen von Usern ein klares Muster ab.
Konkret schienen hauptsächlich User der Deutschen Telekom betroffen zu sein, während andere in der Lage waren, Dateien mit der erwarteten Geschwindigkeit abzurufen.
Die Admins von Digital Courage hatten sogar versucht, die IP-Adresse des Systems zu ändern: Vergeblich.
Interessanterweise schnitt ein anderes System direkt neben der Mastodon-Instanz bei digitalcourage.social ziemlich gut ab, selbst für ansonsten betroffene Deutsche Telekom-User.
Letztendlich führte dies zu einer ziemlich ausführlichen Diskussion darüber, ob die Deutsche Telekom den Datenverkehr zu den Servern von Digitalcourage möglicherweise etwas ungleicher behandelt, als sie sollte.</p>

<p>Ich bin letzten Mittwoch (9. November 2022) auf die Diskussion aufmerksam geworden, und das Probleme hat es mir irgendwie angetan.
Ueber einen Bekannten bei der Deutschen Telekom, der bereits an der Fehlersuche beteiligt war, kam ich mit den Digitalcourage-Leuten in Kontakt, und schlug vor zu schauen, ob ich eventuell helfen kann.
Diese fanden die Idee nicht schlecht, und somit konnte der Debug-Spasz beginnen.</p>

<p>Da es sicher einige interessiert was nun <em>wirklich</em> passiert ist (und das <em>wirklich</em> habe ich auch erst heute nacht herausgefunden), dachte ich, ich schreib mal einen Blogartikel.</p>

<h1 id="debug-start">DEBUG: Start</h1>
<p>Der erste Schritt beim Debuggen eines solchen Problems, ist natürlich das Problem zu reproduzieren.
Für den vorliegenden Fall war dies überraschend einfach.
Als ich versuchte, eine Testdatei an meinem Rechner herunterzuladen, sah das nicht so gut aus:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>% wget -O /dev/null https://digitalcourage.social/1GB.bin
--2022-11-09 18:53:17--  https://digitalcourage.social/1GB.bin
Resolving digitalcourage.social (digitalcourage.social)... 217.197.90.87
Connecting to digitalcourage.social (digitalcourage.social)|217.197.90.87|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1048576000 (1000M) [application/octet-stream]
Saving to: ‘/dev/null’

/dev/null             0%[                    ] 615.70K  98.8KB/s    eta 2h 53m 
</code></pre></div></div>
<p>Hier ist das ausnahmsweise gut.</p>
<ul>
  <li>a) Es gibt mir ein System, auf dem ich Debuggen kann</li>
  <li>b) Mein Netz geht durch einen Tunnel, der letztendlich durch <a href="https://doing-stupid-things.as59645.net/colocation/networking/openbsd/infrastructure/2022/07/18/go-big-or-go-rack.html">AS59645</a> zum Internet kommt; Und <a href="https://in-berlin.de/">IN-BERLIN</a> (der Hoster von Digitalcourage) ist einer meiner Upstreams.</li>
  <li>c) Es bedeutet, dass die Deutsche Telekom hier unschuldig ist. Daher sollte sich–was auch immer kaputt ist–auf den Systemen von Digitalcourage fixen lassen.</li>
</ul>

<p>Also, die Telekom ist raus, ich kann auf den kompletten Pfad bis zur Uebergabe an IN-BERLIN schauen… los gehts.</p>

<p>Der nächste Schritt bestand also darin, zu sehen, ‘wo’ Dinge kaputt gehen.</p>
<ul>
  <li>Das Abrufen der Datei von meiner Workstation war langsam.</li>
  <li>Das Abrufen der Datei vom ersten Router war langsam.</li>
  <li>Die Datei von meinem Router drüben bei IN-BERLIN abzurufen war schnell.
Da die Verbindung zwischen meinem Router bei IN-BERLIN und dem vor meiner Workstation über eine weniger-als-1500b-MTU-Verbindung verläuft, beschlich mich doch recht schnell der Verdacht, dass wir hier ein MTU Problem haben.</li>
</ul>

<h1 id="mtu">MTU?</h1>
<p>Die MTU oder Maximum Transmission Unit ist ein Wert in Bytes, der Rechnern im Internet mitteilt, wie groß Netzwerkpakete sein dürfen.
Die MTU haengt mit der MSS (Maximum Segment Size) für TCP-Verbindungen zusammen.
Üblicherweise ist die MSS ‘MTU - 20b’ (für IPv4) und ‘MTU - 40b’ (für IPv6), d.h. wenn die MTU 1500 ist (wie es für die meisten Ethernet-Verbindungen und im Wesentlichen die meisten Verbindungen im Internet ist), sollte die MSS 1480 für IPv4 und 1460 für IPv6 sein.</p>

<p>Leider haben nicht alle Links im Internet eine MTU von 1500.
Meine (wireguard-basierte) Tunnelverbindung hat beispielsweise eine MTU von 1420.
Daher ist die MSS für IPv4 1400 hier.
Ebenso haben DSL-User der Deutschen Telekom eine MTU von 1492, da für PPPoE 8b benötigt werden.
Kabelnetzbetreiber verwenden normalerweise kein PPPoE, und daher haben ihre User normalerweise eine MTU von 1500.</p>

<p>Da Links unter 1500b üblich sind, gibt es natürlich Mechanismen für Hosts, um die maximale MTU auf einem Pfad zu einem Server/Client zu bestimmen, damit sie sicherstellen können, dass keine größeren Pakete gesendet werden.
Dies wird allgemein als Path MTU Discovery (PMTUD) bezeichnet, und ich hatte bereits meinen <a href="https://doing-stupid-things.as59645.net/networking/bgp/nsfp/2022/08/07/making-it-ping-part-6.html">fairen Anteil an Kopfschmerzen</a> damit.
Die Idee hinter PMTUD ist im Wesentlichen, dass jeder Host auf dem Pfad, sobald er ein Paket nicht weiterleiten kann weil es zu groß ist, eine ICMP-Fehlermeldung vom Typ 3 Code 4 (Fragmentation Needed and Don’t Fragment was set) sendet.
Wenn der sendende Host dieses Paket erhält, weiß er, dass er kleinere Pakete senden muss, wenn er möchte, dass sie ankommen.</p>

<p>Witzigerweise würde die MTU-Problematik auch erklären, warum Telekom-User betroffen sind, und andere nicht: Die 1492b-MTU wegen PPPoE.</p>

<h1 id="mtu-bist-dus-wirklich">MTU? Bist du’s wirklich?</h1>

<p>Es war vergleichsweise einfach zu überprüfen, ob dieses Problem <em>wirklich</em> mit der MTU zusammenhängt.
Ich habe <code class="language-plaintext highlighter-rouge">tcpdump</code> auf dem externen Interface meines Routers bei IN-BERLIN gestartet (wo Pakete auf einer 1500-MTU-Verbindung eingehen und über eine 1420-MTU-Verbindung hinausgehen) und einen Filter für ICMP-Pakete gesetzt, die für digitalcourage.social bestimmt sind; Dann habe ich versucht, eine Datei von meiner Workstation zu <code class="language-plaintext highlighter-rouge">wget</code>en. Und tatsächlich:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>% tcpdump -i vio0 -n host 217.197.90.87 and icmp
tcpdump: listening on vio0, link-type EN10MB
20:16:38.979978 IP 217.197.83.197 &gt; 217.197.90.87: ICMP 195.191.197.217 unreachable - need to frag (mtu 1420), length 36
20:16:39.984627 IP 217.197.83.197 &gt; 217.197.90.87: ICMP 195.191.197.217 unreachable - need to frag (mtu 1420), length 36
20:16:39.985268 IP 217.197.83.197 &gt; 217.197.90.87: ICMP 195.191.197.217 unreachable - need to frag (mtu 1420), length 36
20:16:40.984956 IP 217.197.83.197 &gt; 217.197.90.87: ICMP 195.191.197.217 unreachable - need to frag (mtu 1420), length 36
...
</code></pre></div></div>
<p>Das ist natuerlich etwas seltsam. Eigentlich sollte die Box die da mit zu groszen Packeten um sich wirft (217.197.90.87) recht schnell merken, dass die MTU zu grosz ist.
So nach dem ersten oder zweiten ‘Packet too large’.
Aber das ist der Box wohl egal…</p>

<h1 id="wo-es-bei-der-pmtud-harkt">Wo es bei der PMTUD harkt</h1>
<p>Zusammen mit Christian von Digitalcourage (der mir beim Debuggen geholfen hat und meine <code class="language-plaintext highlighter-rouge">/bin/instantmessangersh</code> zur Digitalcourage-Infrastruktur war, da ich auf deren Servern natürlich keinen Login habe) habe ich nun angefangen zu graben.
Irgendwo muessen die Paeckchen ja verloren gehen.
Das Graben war relativ schnell erfolgreich, da sowohl der Router von Digitalcourage als auch digitalcourage.social nur ICMP-Typ 8 (Echo-Antworten, was gemeint ist wenn Menschen von ‘ping’ reden) zuließ, aber nicht Typ 3 (Code 4).
Interessanterweise legte die Mastodon-Dokumentation dies nahe; Dankenswerterweise gibt es da aber schon einen pull-request <a href="https://github.com/mastodon/documentation/pull/998#pullrequestreview-1177831791">bereits aktualisiert</a>.</p>

<p>Zwei (in die Firewall-Frameworks der Systeme integrierte) <code class="language-plaintext highlighter-rouge">iptables -A INPUT -p icmp -m icmp --icmp-type 3 -j ACCEPT</code> später kamen die ICMP-Pakete endlich auch dort an, wo sie hingehören.
Also, <code class="language-plaintext highlighter-rouge">wget</code> and, schneller Download, oder? Nope. <code class="language-plaintext highlighter-rouge">-.-'</code></p>

<h1 id="irgendwas-is-komisch">Irgendwas is komisch…</h1>

<p>Irgendwas ist also komisch. Genaugenommen: Eine ganze Menge.
Die MSS wird einem entfernten Host mitgeteilt, wenn eine Verbindung hergestellt wird.
<em>Eigentlich</em> nageln meine Router die MSS fuer Verbindungen ueber Links mit einer MTU unter 1500 auf 1320 fest.
Also, PMTUD beiseite, die hätte eigentlich nichtmal gebraucht werden duerfen.
Und da PMTUD nun <em>eigentlich</em> funktioniert… sollte wirklich <em>nichts</em> diese Hosts davon abhalten, so zu kommunizieren, wie sie sollten.</p>

<p>Wir haben also angefangen, das funktionierende (wo user schnell downloaden koennen) mit dem anderen Host zu vergleichen.
Insbesondere haben wir uns <code class="language-plaintext highlighter-rouge">sysctl -a</code> (nicht auffaellig), <code class="language-plaintext highlighter-rouge">iptables -L -v -n</code> (wieder nichts) und <code class="language-plaintext highlighter-rouge">route get $myip</code>/<code class="language-plaintext highlighter-rouge">route show $myip</code> (nope) angesehen.
Wir haben auch nochmal ein <code class="language-plaintext highlighter-rouge">tcpdump</code> an die Leitung gehalten um zu verifizieren, dass mein Client die richtige MSS übermittelte (1320, und ja, das kam beim Starten der TCP-Session an).
Obwohl nichts an digitalcourage.social speziell zu sein schien ignorierte die box die existenz von MTUs unter 1500.
Jedes Packet wurde erst mit einer Groesze von 1500b gesendet… und dann 40b Schrittweise bei den retransmissions verkleinert, bis es irgendwann durch kam.
Das gibt natuerlich viel overhead und wenig Bandbreite.</p>

<h1 id="dinge-zum-laufen-bringen-vorerst">Dinge zum Laufen bringen (vorerst)</h1>

<p>Etwas frustriert (und mit etwas Input von anwlx), entschieden wir uns, einfach mal die MTU fuer meine IP mit einer statischen Route (vorsicht, Fachbegriff) festzudengeln.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>ip route add 195.191.197.206 via 217.197.90.81 mtu lock 1320
</code></pre></div></div>
<p>Und siehe da… es funktioniert. 100Mbit+ beim Downloaden auf meiner Workstation.</p>

<p>Trotz erweitertem Rumprobierens konnten wir leider nicht herausfinden, <em>warum</em> der Host MTU und MSS ignorierte.
An diesem Punkt  kurz nach 00:00 Uhr schlug ich vor, einfach einen Hotfix auf das Problem zu werfen, der für die meisten User funktioniert:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>ip r a 0.0.0.0/1 via 217.197.90.81 mtu lock 1320
ip r a 128.0.0.0/1 via 217.197.90.81 mtu lock 1320
</code></pre></div></div>
<p>Damit hatten wir die MTU fuer zwei spezifischere Routen als die default-Route (0.0.0.0/0) auf 1320 festgedengelt; Und da sie spezifischer als die default route sind, werden diese beiden dann auch dieser vorgezogen.
Und im zweifel ist das einfach etwas entspannter, als die default Route auf einer Prod-Box zu aendern.
Christian war ziemlich glücklich über diesen Vorschlag und warf ihn gleich auf die Box.
Für den Moment war das Problem also erstmal weg, und <a href="https://digitalcourage.social/@mro/109320388012887815">Antworten</a> auf die <a href="https://digitalcourage.social/@freiheit/109319503901785225">Ankündigung</a>, dass Dinge jetzt funktionieren <em>sollten</em>, suggerieren, dass es tatsächlich <em>funktionierte</em>.</p>

<h1 id="muss-das-so">Muss das so?</h1>

<p>Mir hat das natuerlich alles keine Ruhe gelassen (und ich hatte irgendwie am Anfange des Artikels eine <em>richtige</em> Erklaerung versprochen… ):
Also habe ich mal versucht das ganze nachzubauen</p>
<ul>
  <li>Erstmal eine VM mit der gleiche Software (Debian 10) aufsetzen (Debuggen in Prod ist immer etwas unlustig)</li>
  <li>Mastodon ohne Docker installieren (<em>vielleicht</em> macht das <em>irgendwas</em> kaputt?!)</li>
  <li>Die sysctl settings und geladenen Kernel-Module an die Einstellungen der prod-box anpassen</li>
</ul>

<p>Ich habe das am Samstag eingerichtet und war eigentlich ziemlich hoffnungsvoll, dass das <em>nicht</em> funktioniert.
Also er… der Fehler auftritt.
Tat er aber nicht. Kein Problem fuer mich.</p>

<p>Womit wir bei heute wären. Mich beschlich die Frage, ob vielleicht <em>irgendwas</em> an der Virtualisierungsumgebung komisch ist.
Ich habe mich wieder bei Christian gemeldet und gefragt, was fuer eine Virtualisierungsloesung laeuft (libvirt+kvm auf Debian).
Mein Bauchgefuehl liesz mich dann auch nach einem <code class="language-plaintext highlighter-rouge">lspci</code> von der Mastodon Box fragen. Vielleicht ist da was dabei?</p>

<p>Und da fiel mir in der tat gleich was auf:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>00:03.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8100/8101L/8139 PCI Fast Ethernet Adapter (rev 20)
</code></pre></div></div>
<p>KVM VMs machen eigentlich nur mit virtio Netzwerkkarten so richtig spasz.</p>

<p>Also, schneller Wechsel auf meiner Test-VM, die NIC wird eine ‘rtl8139’ und nichtmehr ‘virtio’, uuuuuuuuuuuund… Karpot. Endlich.
<code class="language-plaintext highlighter-rouge">wget</code> von meiner Workstation benimmt sich nun auch so wie es (nicht) soo, mit nur etwa 100 KB/s im Download.
Die zweite Digitalcourage Box, von der alle User immer mit der zu erwartenden Geschwindigkeit downloaden konnten, nutzt auch erwartungsgemaesz eine virtio NIC:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>00:03.0 Ethernet controller: Red Hat, Inc Virtio network device
</code></pre></div></div>

<p>Mit einer Idee wonach ich nun suchen koennte (ok, rtl8139, kaputt’ ist nicht wirklich ein einzigartiger Suchbegriff),  fand ich schließlich <a href="https://lore.kernel.org/all/20161114162505.GD26664@stefanha-x1.localdomain/">diesen</a> Thread auf den qemu-dev/netdev Mailinglisten.
Es scheint, dass andere tatsächlich schonmal das gleiche Problem hatten. Vor sechs Jahren. Der Thread stellt ziemlich schnell fest, dass die MTU für das TCP-Segment-Offloading im Qemu rtl8139-Code auf 1500 festgedengelt zu sein scheint. 
Der Thread schlägt sogar einen Patch vor, der das Problem beheben <em>sollte</em> (aber danach wirds still im thread). 
Letztendlich kommt der Code mit diesem Fehler vermutlich aus dem Jahr 2006, ist also rund 16 Jahre alt (oder auch: war schon zehn Jahre alt, als das Thema auf der Mailingliste aufkam).</p>

<p>Wenn ich mir den <a href="https://gitlab.com/qemu-project/qemu/-/blame/master/hw/net/rtl8139.c">‘Modifications:’ Header in <code class="language-plaintext highlighter-rouge">rtl8139.c</code></a> ansehe, habe ich den starken Verdacht, dass dies letztendlich auf diese Änderung hinausläuft:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> *  2006-Jul-04                 :   Implemented TCP segmentation offloading
 *                                  Fixed MTU=1500 for produced ethernet frames
</code></pre></div></div>
<p>Dazu passt auch, dass das abschalten von TSO/GSO den gleichen positiven Effekt wie eine <code class="language-plaintext highlighter-rouge">lock mtu 1320</code> route hat:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>ethtool -K ens18 tx off sg off tso off
</code></pre></div></div>
<p>Aber da ich nicht wirklich eine C-artige Person oder übermäßig qualifiziert bin was Systemrogrammierung angeht bin, dachte ich, es wäre vielleicht besser, einfach <a href="https://gitlab.com/qemu-project/qemu/-/issues/1312">ein Ticket zu erstellen</a> und Feierabend zu machen..</p>

<p>Damit gibt das Problem mir ruhe, und die Leute von Digitalcourage haben einen Grund, den Netzwerkschnittstellentyp ihrer virtualisierten NIC zu ändern. <code class="language-plaintext highlighter-rouge">^^</code></p>

<h1 id="woran-ich-denken-sollte">Woran ich denken sollte</h1>
<p>Zu zum Abschluss als ‘lessons learned’:</p>

<p>Beim Vergleichen zweier virtueller Maschinen auf dem selben Hypervisor ist es nicht moeglich davon auszugehen, dass diese die gleiche virtuelle ‘Hardware’ haben.</p>

<p>Oder davon auszugehen, dass VMs keine Hardwareprobleme haben…</p>]]></content><author><name></name></author><category term="networking" /><category term="debugging" /><category term="mtu" /><summary type="html"><![CDATA[OpenBSD Version: Erm... keine. o.O Arch: Alle NSFP: Ja, ne.]]></summary></entry><entry><title type="html">Time to be Real(tek): When PMTUD is not enough</title><link href="https://doing-stupid-things.as59645.net/networking/debugging/mtu/2022/11/14/time-to-be-real.html" rel="alternate" type="text/html" title="Time to be Real(tek): When PMTUD is not enough" /><published>2022-11-14T02:16:12+01:00</published><updated>2022-11-14T02:16:12+01:00</updated><id>https://doing-stupid-things.as59645.net/networking/debugging/mtu/2022/11/14/time-to-be-real</id><content type="html" xml:base="https://doing-stupid-things.as59645.net/networking/debugging/mtu/2022/11/14/time-to-be-real.html"><![CDATA[<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>OpenBSD version: Actually none
Arch:            Any
NSFP:            Please no...
</code></pre></div></div>

<p>So, for the last couple of months, the mastodon instance of <a href="https://digitalcourage.de/">Digitalcourage e.V.</a> has been <a href="https://digitalcourage.social/@freiheit/109240695486058178">suffering from network issues</a>.
These manifested themselves in users having a very low (around 100KB/s) download speed from the instance.
This, of course, was rather ‘not so good’.</p>

<p>Over these months, and with lots of user feedback, some clear patterns started to emerge.
Specifically, it seemed like mostly customers of Deutsche Telekom were affected, while others were able to retrieve files at the speeds they were expecting.
The admins over at digital courage had even tried to change the IP address of the system: To no avail.
Interestingly, though, another system right next to the mastodon instance at digitalcourage.social performed pretty much fine, even for otherwise affected Deutsche Telekom users.
Ultimately, this lead to a rather extensive discussion on whether Deutsche Telekom might be treating traffic to Digitalcourage’s servers a bit less equal than it should.</p>

<p>Last Wednesday (9 Nov 2022), i saw the ongoing thread and was kind of intruiged by the somewhat strange problem.
I chatted up a contact over at Deutsche Telekom, who was already involved in debugging the issue, who was friendly enough to introduce me to the Digitalcourage people, suggesting i might be able to help.
They agreed, and got into contact with me, and we could take a shot at debugging this.</p>

<p>As people might find it fun to read up on what <em>really</em> happened there (to which i only got tonight), i thought i drop this into a blog article.</p>

<h1 id="starting-to-debug">Starting to debug</h1>
<p>The first thing one needs when debugging such an issue is, naturally, a way to really look at what is happening, i.e., you have to be able to reproduce the issue.
For the case at hand, this was surprisingly easy.
When i tried to download the test-file from my home connection, things did not look good:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>% wget -O /dev/null https://digitalcourage.social/1GB.bin
--2022-11-09 18:53:17--  https://digitalcourage.social/1GB.bin
Resolving digitalcourage.social (digitalcourage.social)... 217.197.90.87
Connecting to digitalcourage.social (digitalcourage.social)|217.197.90.87|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1048576000 (1000M) [application/octet-stream]
Saving to: ‘/dev/null’

/dev/null             0%[                    ] 615.70K  98.8KB/s    eta 2h 53m 
</code></pre></div></div>
<p>This is good for multiple reasons.</p>
<ul>
  <li>a) It gives me a system to look at while debugging.</li>
  <li>b) My home connection is through a tunnel ultimately leaving through <a href="https://doing-stupid-things.as59645.net/colocation/networking/openbsd/infrastructure/2022/07/18/go-big-or-go-rack.html">AS59645</a>, which (partially) draws upstream from <a href="https://in-berlin.de/">IN-BERLIN</a>, the hoster Digitalcourage uses as well.</li>
  <li>c) It essentially rules out Deutsche Telekom as a culprit, which means that whatever is broken can most likely be fixed over at the systems of Digitalcourage.</li>
</ul>

<p>Hence, knowing DT is most likely out of the picture, and equipped with full controll on the (at least overlay) path from my machine up to the point where packets are handed to IN-BERLIN, i went to the next step.</p>

<p>So, the next steps was seeing ‘when’ things broke.</p>
<ul>
  <li>Getting the file from my workstation was slow.</li>
  <li>Getting the file from the first router was slow.</li>
  <li>Getting the file from my router over at IN-BERLIN was fast.
As the link between my router at IN-BERLIN and the one in front of my workstation goes via a less-than-1500 MTU link, this lead to a reall strong suspicion that this is an MTU related issue.</li>
</ul>

<h1 id="mtu">MTU?</h1>
<p>The MTU, or Maximum Transmission Unit is a byte value, which tells networked systems how large network packets are allowed to be.
It is strongly tied to the MSS (Maximum Segment Size) for TCP connections.
Commonly, the MSS is MTU - 20b (for IPv4) and MTU - 40b (for IPv6), i.e., if the MTU is 1500 (as it is for most ethernet links, and essentially most links on the Internet), the MSS should be 1480 for IPv4 and 1460 for IPv6.</p>

<p>Sadly, not all links on the Internet have an MTU of 1500.
My (wireguard based) tunnel connection, for example, has an MTU of 1420. 
Hence, the MSS for IPv4 is 1400.
Similarly, DSL customers of Deutsche Telekom have an MTU of 1492, because 8b are needed for PPPoE.
Cable network providers ususally do not use PPPoE, and hence their customers commonly have an MTU of 1500.</p>

<p>Of course, with links below 1500b being common, there are mechanisms for hosts to determine the maximum MTU on a path to a server/client, so they can make sure to not send packets larger than that.
This is commonly called Path MTU Discovery (PMTUD), and i had my <a href="https://doing-stupid-things.as59645.net/networking/bgp/nsfp/2022/08/07/making-it-ping-part-6.html">fair share of headeaches</a> around that already.
The idea behind PMTUD is, essentially, that each host on the path, as soon as it can not forward a packet because it is too large sends back am ICMP error message of ‘Type 3 Code 4’ (Fragmentation Needed and Don’t Fragment was Set).
When the sending host gets that packet, it knows that it has to send smaller packets if it wants them to arrive.</p>

<p>Funnily enough, the issue being related to the MTU would also explain why Deutsche Telekom customers are affected, while others are not: The 1492b MTU due to PPPoE.</p>

<h1 id="making-sure-its-the-mtu">Making sure it’s the MTU</h1>

<p>Verifying that this issue is MTU related was comparatively easy.
I fired up <code class="language-plaintext highlighter-rouge">tcpdump</code> on the outbound interface of my router at IN-BERLIN (where packets come in on a 1500 MTU interface and go out via a 1420 MTU interface), and set a filter for ICMP packets destined to digitalcourage.social; Then, i tried to <code class="language-plaintext highlighter-rouge">wget</code> a file from my workstation. And sure enough i saw:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>% tcpdump -i vio0 -n host 217.197.90.87 and icmp
tcpdump: listening on vio0, link-type EN10MB
20:16:38.979978 IP 217.197.83.197 &gt; 217.197.90.87: ICMP 195.191.197.217 unreachable - need to frag (mtu 1420), length 36
20:16:39.984627 IP 217.197.83.197 &gt; 217.197.90.87: ICMP 195.191.197.217 unreachable - need to frag (mtu 1420), length 36
20:16:39.985268 IP 217.197.83.197 &gt; 217.197.90.87: ICMP 195.191.197.217 unreachable - need to frag (mtu 1420), length 36
20:16:40.984956 IP 217.197.83.197 &gt; 217.197.90.87: ICMP 195.191.197.217 unreachable - need to frag (mtu 1420), length 36
...
</code></pre></div></div>
<p>You might notice that this is odd. There should not be that many, the sender of large packets (217.197.90.87) should rather quickly get the message that the MTU is too large…
But the box seems to be somewhat oblivious…</p>

<h1 id="where-pmtud-is-lost">Where PMTUD is lost</h1>
<p>Together with Christian from Digitalcourage (who helped me debugging this and was my <code class="language-plaintext highlighter-rouge">/bin/instantmessangersh</code> for commands on the Digitalcourage infrastructure, as i of course did not have a login on those machines) i now started to dig where those packets might get lost.
Digging was complete relatively quickly, as both Digitalcourage’s router and the digitalcourage.social only allowed ICMP type 8 (echo-reply, what you usually know was <code class="language-plaintext highlighter-rouge">ping</code>), but not Type 3 (code 4).
Interestingly, the mastodon documentation suggested this; Based on our findings, this was <a href="https://github.com/mastodon/documentation/pull/998#pullrequestreview-1177831791">already updated</a>, though.</p>

<p>Two (integrated in the systems’ firewall frameworks) <code class="language-plaintext highlighter-rouge">iptables -A INPUT -p icmp -m icmp --icmp-type 3 -j ACCEPT</code> later, the icmp packets finally arrived where they should.
Eagerly, i fired up a wget, looking forward to seeing the file rushing in with unseen speed and saw… well, nothing having changed, really.</p>

<h1 id="thinking-about-oddities">Thinking about oddities</h1>

<p>Now, with things not working now, there must be a bit more going wrong.
First of all, <em>technically</em> my systems do MSS clamping.
The MSS is communicated to a remote host when a connection is established, and my routers <em>should</em> set it to 1320 for packets traversing the tunnel.
So, PMTUD aside, the remote server <em>should</em> not have needed it to begin with.
With PMTUD now <em>technically</em> being able to work, there should really be <em>nothing</em> keeping these hosts from communicating as they should.</p>

<p>Hence, we started digging into what might be <em>different</em> between the working and non-working host.
Specifically, we looked at <code class="language-plaintext highlighter-rouge">sysctl -a</code> (nothing special), <code class="language-plaintext highlighter-rouge">iptables -L -v -n</code> (again, nothing), and <code class="language-plaintext highlighter-rouge">route get $myip</code>/<code class="language-plaintext highlighter-rouge">route show $myip</code> (nothing out of the ordinary).
We also held a tcpdump to the wire and verified that my client communicated the correct MSS (1320, and yes, it indicated that when starting the TCP session).
There seemed to be nothing special about digitalcourage.social, yet it was still blissfully ignorant of the existence of any MTUs &lt; 1500.
This meant that packets would be resend, each time with a segmentsize 40b lower, until it would finally arrive. This, of course, causes a bit of overhead, making things… slow.</p>

<h1 id="making-things-work-for-now">Making things work (for now)</h1>

<p>Somewhat frustrated with how things were going (and thanks to some input from anwlx), we decided to install a route for my IP with a locked MTU on the mastodon host:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>ip route add 195.191.197.206 via 217.197.90.81 mtu lock 1320
</code></pre></div></div>
<p>Lo and behold… things suddenly worked. I got an excess far beyond 100 Mbit from the host.</p>

<p>We continued to try around for a bit, but were unable to figure out <em>why</em> the host was ignoring MTU and MSS, except when explicitly locked.
At this point–slightly after 12AM–I suggested to apply a hotfix, which would work for the majority of users, and call it a night:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>ip r a 0.0.0.0/1 via 217.197.90.81 mtu lock 1320
ip r a 128.0.0.0/1 via 217.197.90.81 mtu lock 1320
</code></pre></div></div>
<p>This locked the MTU for two more-specifics (i.e., preferred) routes for ‘everyting’ to an MTU of 1320; Essentially the same thing as for my single IP, but for, well, everything.
And, in case of doubt, this is a more resillient approach than trying to update the default route on a production box.
Christian was actually quiet happy about that suggestion, and moved it in place.
For now, this fixed the issue, and <a href="https://digitalcourage.social/@mro/109320388012887815">replies</a> to the <a href="https://digitalcourage.social/@freiheit/109319503901785225">announcement</a> that things should work now suggested, that it actually <em>worked</em>.</p>

<h1 id="but-why">But WHY?!</h1>

<p>The hotfix was not really satisfactory for me (and i promised you some input on the root cause).
Hence, i tried to replicate the issue:</p>
<ul>
  <li>Setup a system with the same software (Debian 10) under my control (debugging in prod is always a bit unfunny)</li>
  <li>Install Mastodon without docker (<em>maybe</em> this breaks <em>something</em>?!)</li>
  <li>Set all sysctl values and loaded kernel modules the same as on the live host</li>
</ul>

<p>I set that up on Saturday, and was actually quiet hopeful that things would <em>not</em> work.
Well, except they did. I could not replicate the issue.</p>

<p>Which brings us to today. I started to wonder whether maybe there was <em>something</em> about the virtualization environment going on.
I checked in with Christian again, and asked them what they were using (plain libvirt+kvm on Debian).
Then, on a whim, i asked for an <code class="language-plaintext highlighter-rouge">lspci</code> from the box. Maybe there is something there?
Looking at the <code class="language-plaintext highlighter-rouge">lspci</code> output, one line immediately sprung into my eye:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>00:03.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8100/8101L/8139 PCI Fast Ethernet Adapter (rev 20)
</code></pre></div></div>
<p>Usually, when running kvm VMs, you want to have a virtio network card.</p>

<p>So, quick change over on my test-VM, making the NIC ‘rtl8139’ and not ‘virtio’ aaaaaand… things broke. Finally.
Doing a <code class="language-plaintext highlighter-rouge">wget</code> behind a low MTU link finally gave me only around 100KB/s.
Similarly, checking the <em>second</em> box from which thins worked all along, Christian found that it uses a virtio NIC:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>00:03.0 Ethernet controller: Red Hat, Inc Virtio network device
</code></pre></div></div>

<p>Equipped with <em>something</em> to search for (to be fair ‘rtl8139, broken’ is not the most unique thing to look for), i ultimately found <a href="https://lore.kernel.org/all/20161114162505.GD26664@stefanha-x1.localdomain/">this</a> thread on the qemu-dev/netdev mailinglists. It appears that others actually had the same problems before. The thread rather quickly notes that the MTU for TCP Segment Offloading seems to be fixed at 1500 in qemu’s rtl8139 code. The thread even suggests a patch which <em>should</em> fix the issue (but runs dry afterwards). Ultimately, the code with this bug must have been written in 2006, so round about 16 years ago (or rather: already ten years ago when the issue was discussed on the mailinglist).</p>

<p>Looking at the <a href="https://gitlab.com/qemu-project/qemu/-/blame/master/hw/net/rtl8139.c">‘Modifications:’ header in <code class="language-plaintext highlighter-rouge">rtl8139.c</code></a> i have a strong suspicion that this ultimately boils down to this change:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> *  2006-Jul-04                 :   Implemented TCP segmentation offloading
 *                                  Fixed MTU=1500 for produced ethernet frames
</code></pre></div></div>
<p>This also fits well with the fact that disabling TSO/GSO for the rtl8139 fixes the issue as well:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>ethtool -K ens18 tx off sg off tso off
</code></pre></div></div>
<p>But, as i am not really a <code class="language-plaintext highlighter-rouge">C</code>-ish person, or overly qualified coder… i figured it might be better to just <a href="https://gitlab.com/qemu-project/qemu/-/issues/1312">file a bug ticket</a> and call it a day.</p>

<p>At least I have closure now.</p>

<p>And the people at Digitalcourage a reason to change the network interface type of their virtualized NIC. <code class="language-plaintext highlighter-rouge">^^</code></p>

<h1 id="lesson-learned">Lesson Learned</h1>
<p>So, as a rather final-final thought; There is one big lesson learned.</p>

<p>If you compare two virtual machines on the same hypervisor, do not assume that the hardware is the same.</p>

<p>Or that virtual machines cannot have hardware bugs.</p>]]></content><author><name></name></author><category term="networking" /><category term="debugging" /><category term="mtu" /><summary type="html"><![CDATA[OpenBSD version: Actually none Arch: Any NSFP: Please no...]]></summary></entry><entry><title type="html">Heads in the cloud, and opinions on the ground: University IT in the cloud</title><link href="https://doing-stupid-things.as59645.net/research/clouds/measurement/2022/10/20/heads-in-the-cloud.html" rel="alternate" type="text/html" title="Heads in the cloud, and opinions on the ground: University IT in the cloud" /><published>2022-10-20T16:37:12+02:00</published><updated>2022-10-20T16:37:12+02:00</updated><id>https://doing-stupid-things.as59645.net/research/clouds/measurement/2022/10/20/heads-in-the-cloud</id><content type="html" xml:base="https://doing-stupid-things.as59645.net/research/clouds/measurement/2022/10/20/heads-in-the-cloud.html"><![CDATA[<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>OpenBSD version: 7.1 (Yeah, no, probably not...)
Arch:            Any
NSFP:            Opinions are never... 
</code></pre></div></div>

<p>So, on Monday the 18th of October, the <a href="https://fd.nl/samenleving/1454238/studentgegevens-ondanks-kritiek-massaal-in-de-amerikaanse-cloud-gezet">‘Financieele Dagblad’</a> published an article based on an interview with <a href="https://martina.lindorfer.in/">Martina Lindorfer</a> and me, on our joint work (<a href="https://arxiv.org/abs/2104.09462">arXiv preprint from 2021</a>) together with Seda Gürses and further colleagues from TU Delft.
This was then quickly picked up by <a href="https://nos.nl/artikel/2448717-driekwart-nederlandse-studentendata-opgeslagen-bij-amerikaanse-techbedrijven">NOS</a>, the <a href="https://nltimes.nl/2022/10/17/three-quarters-dutch-student-data-stored-american-tech-giants-cloud">NL Times</a>, and <a href="https://tweakers.net/nieuws/202370/nederlandse-onderwijsinstellingen-slaan-veel-data-op-op-amerikaanse-cloudservers.html">Tweakers.net</a>, ultimately also finding it’s way to <a href="https://www.reddit.com/r/thenetherlands/comments/y6zgm0/veel_nederlandse_onderwijsdata_in_amerikaanse/">Reddit</a> (and other politically relevant venues like the <a href="https://www.tweedekamer.nl/kamerstukken/kamervragen/detail?id=2022Z19631&amp;did=2022D42125">Dutch Parliament</a>).
Naturally, this lead to a lot of questions and comments raining into the forum sections of those publications.
As there are sometimes some interesting questions and opinions in these comments–and a brief newspaper article tends to be <em>too</em> brief for some technical depth–i figured it might be nice to have a brief FAQ (or FCC? Frequently Commented Comments?) on some points raised frequrently.
So, here we go, with a list in no particular order.</p>

<h1 id="but-those-servers-are-actually-in-the-eu">But those servers are <em>actually</em> in the EU</h1>

<p>What might be <em>the</em> top comment among all of the hundreds is the point that the instances running on Amazon’s cloud are <em>physically</em> located in the EU, or the universities may have forgotten to select the right (EU) availability zone.
Thing is, what we <em>measured</em> and <em>claimed</em> is whether specific infrastructure–and in this case the statement refers to <a href="https://en.wikipedia.org/wiki/Learning_management_system">Learning Management Systems</a>–is hosted on systems that are part of the Amazon cloud, independent of the specific location of those systems.
And, for Dutch Universities, by now, most have their LMS hosted with Amazon (note that Blackboard.com moved from Azure to AWS relatively recently, and the article references a perspective from before then).
Of course–for functional reasons like latency alone already–these systems are either in Dublin (Amazon EU West) or Frankfurt (Amazon EU Central).
Some of the IP adresses for these instances are even held by a very non Amazon-ish sounding <a href="https://www.ripe.net/membership/indices/data/us.a100row.html">A100 ROW Inc/GmbH</a>.</p>

<p>Well, A100 ROW GmbH is, of course, a 100% Amazon subsidiary. (Note, the <a href="https://www.ripe.net/membership/indices/data/us.a100row.html">RIPE NCC registry id of ‘us.a100row’</a> for this entity; For comparison: When my main residence was in the Netherlands, my <a href="https://doing-stupid-things.as59645.net/networking/bgp/nsfp/2022/04/04/making-it-ping-part-2.html">own LIR</a> was ‘nl.tobias’; After relocating that to Germany it became ‘de.wybt’… )
And that is basically the point.
Amazon’s cloud is not like Champagne.
It doesn’t become ‘sparkling automated infrastructure with an API’, just because it is no longer from the ‘Silicon Valley region of tech’.</p>

<p>First of all, the <a href="https://en.wikipedia.org/wiki/CLOUD_Act">Cloud Act</a> applies.
What this law basically says is ‘US authorities can subpoena <em>US companies</em> for data stored on their systems and <em>their foreign subsidiaries</em> regardless of where the data is physically located.’
This is well known, and hence has also been one of the <a href="https://curia.europa.eu/juris/liste.jsf?num=C-311/18">major points</a> in the Schrems rulings.
See also <a href="https://gdprhub.eu/images/b/b4/VK_Baden_W%C3%BCrttemberg_-_1_VK_23-22.pdf">this ruling</a> of a German state court on whether subsidiaries of US cloud companies are even viable in public tenders. The court claims no, because the necessary guarrantees, especially with the end of Standard Contract Clauses in Schrems II, can not be provided.</p>

<p>So, in summary, as long as things are in infrastructure belonging to a US company, it does not matter whether the servers are physically located in the EU; The bad parts of US law still apply.</p>

<p>And, besides this, the main point we’re making is not that much about the US government, but the power of individual cloud companies can inflict on universities (and society as a whole).</p>

<h1 id="can-you-share-your-data">Can you share your data?</h1>

<p>Right up next are requests for our raw data.
For the long-term survey, also looking at things over time, we used the Farsight SIE dataset of historic DNS requests seen by sensors all around the world.
This, of course, is not really a dataset to share publicly.</p>

<p>However, what we measure can be quickly gathered from the public DNS by oneself.
I hence wrote a small script that checks in on Dutch universities’ mail setups and learning management systems, and provides a detailed overview of the current status, as well as a brief interpretation, i.e., what is hosted where.
You can find that script here: <a href="https://git.aperture-labs.org/Cloudheads/cloudheads_nl_scraper">https://git.aperture-labs.org/Cloudheads/cloudheads_nl_scraper</a> 
Please feel free to run it yourself to gather the data, or include any institutions, like HBOs, which we did not include.</p>

<p>Anyway, if you want to take a look at the data, get it from the repository, or–if you don’t trust me–grab the script there and run it for yourself.</p>

<h1 id="are-lms-really-all-of-students-data">Are LMS really <em>all</em> of students’ data?</h1>

<p>Well, the statement made in the FD article is–for the majority of Dutch universities–about their <a href="https://en.wikipedia.org/wiki/Learning_management_system">Learning Management Systems</a> being in the cloud.
Those systems usually hold data on which courses a student registered for, depending on the setup (partial) as well as reported final grades, and a bunch of interaction in between students and with teachers.
This is, of course not <em>all</em> data universities hold on students.</p>

<p>Of course, the logs of what students do via the local Wifi–if logged by some network security solution, that is–are most likely stored locally (or send off to an offsite SOC for threat analysis).
Similarly, final bookkeeping on grades often is not in the LMS, but in a dedicated application.
Still, some universities are already evaluating cloud-based replacements.
Finance will run another system handling fee payments for students etc.
Email will also be handled by a different bunch of systems, too.
Here, though, we also find that Dutch universities regularly use the email offerings of major cloud providers, with Microsoft being the leading vendor there.</p>

<p>So, the LMS in the cloud is not <em>all</em> of students’ data.</p>

<p>But an important chunk.</p>

<h1 id="how-else-should-you-run-a-service">How else should you run a service?</h1>

<p>A common issue pointed out by commentators is that it is <em>really</em> hard to run a service without relying on cloud infrastructure.
That, is very true, and i <a href="https://doing-stupid-things.as59645.net/burning/world/resillience/2022/06/29/propositions-part-3.html">also wrote about this</a> on a more general level; Running stuff well is hard. Like, really hard.
And we are not even talking about things like the cancerous nature of <a href="https://dl.acm.org/doi/pdf/10.1145/3319535.3354212">Google’s font hosting</a>.
(I am still amazed how often i struggle to cut these out of self-hosted tools)
And this while the caching advantages that lead to the rise of Google fonts are gone by now for <a href="https://ieeexplore.ieee.org/document/5958027">security reasons</a>.</p>

<p>Besides–and coming back to the previous point–one of our main arguments is that the continuous use of cloud infrastructure leads to an increasingly reducing ability to run stuff yourself. 
This comes from saving money on those expensive engineers… which leads to dependence, because you no longer have the teams in-house to move <em>out</em> of the cloud.
Hence, with the point being that many organizations <em>already</em> can’t run infrastructure without the help of clouds–<a href="https://www.usenix.org/system/files/atc22-holzbauer.pdf">‘see also my work on the compelxity of email’</a>–this essentially just highlights the point we are making regarding progressing dependence.</p>

<p>Hence, even though it is a difficult task, we have to think about how to retain our ability to host (research and teaching) infrastructure ourselves.
In the Netherlands, there is apparently an <a href="https://eerlijkdigitaalonderwijs.petities.nl/?locale=en">ongoing pettition</a> for this.
I don’t doubt that this is a hard task, but it doesn’t get easier the longer we wait; Actually, on the contrary.</p>

<h1 id="of-course-universities-might-be-dependent-but-so-are-they-on-energy">Of course universities might be dependent; But so are they on energy…</h1>

<p>Continuing down the issue of dependence.
In our work, we are claiming that universities being dependent on some entities makes it really easy for tech companies to–essentially–blackmail universities.
Back at TU Delft, when listening to a presentation on this measurement work, a colleague brought this argument to the point by claiming: <em>‘Well, then universities are also dependent on their gas suppliers. Should we now produce our own energy?’</em> 
(You might notice that this argument was made before the 24th of February, and <em>really</em> didn’t age well… )
Thing is, the most accurate rebuttal of this point came from another colleague, a full professor on energy systems, essentially saying: <em>“Well, of course that is an issue in the energy domain. That is why that market is *heavily* regulated.”</em>
I can hardly add to that; Hits the nail on the head.</p>

<p>Companies are rational actors, which will do what is best for them under a set of given rules.
If the rules permit something, and it can make them profit, <a href="https://doing-stupid-things.as59645.net/burning/world/resillience/2022/06/30/propositions-part-4.html">they will do it</a>.
This is how rational actors work.</p>

<p>Also, in general, the approach of companies like Google to get a foothold in new markets is kind’a known:
Barge in with a <em>cheap</em> (or free) offer, start charging once you are ubiquous.
This is rather <a href="https://cgi.br/publicacao/educacao-em-um-cenario-de-plataformizacao-e-de-economia-dos-dados-problemas-e-conceitos/">well analyzed (PT; p. 27ff.)</a>, and currently also kind of in the <a href="https://tech.slashdot.org/story/22/10/03/2327248/universities-adapt-to-googles-new-storage-fees-or-migrate-away-entirely">find-out phase</a> of ‘sign up for the cloud, and find out’ for <a href="https://library.medschl.cam.ac.uk/blog/2022/01/changes-to-university-google-drive-storage-limits/">several</a> <a href="https://blink.ucsd.edu/technology/file-sharing/google/workspace.html">different</a> <a href="https://support.csuchico.edu/TDClient/1984/Portal/Projects/Details/?TID=408400">universities</a>.</p>

<p>Assuming tech companies would skip on this (legal) lever to increase their profits just due to their good heart is, to be honest, just a bit naive.</p>

<h1 id="but-why-would-they-even-care-for-makinginterfering-with-curricula">But why would they even <em>care</em> for making/interfering with curricula?!</h1>

<p>So, one of our big points is that the tech companies might use their aforementioned blackmail ability to also influence what is taught and researched, leading to our point that clouds may threaten academic integrity.
This regularly triggers the question of why tech comapnies would even <em>care</em> to meddle with curricula and research.</p>

<p>As for curricula, the answer is quite obviously because <a href="https://cloud.google.com/edu/curriculum">they already <em>have</em> curricula</a> for teaching ‘the cloud’ in universities, as well as <a href="https://csfirst.withgoogle.com/c/cs-first/en/curriculum.html">K-12 curricula</a> on computer science.
Similarly, there already were instances were large corporations used their market positions to influence research.
Facebook (now called Meta), for example, pulled a rather interesting move by demonstrating that <a href="https://www.npr.org/2021/08/04/1024791053/facebook-boots-nyu-disinformation-researchers-off-its-platform-and-critics-cry-f">‘canceling the <em>private</em> facebook accounts of people doing research they don’t like’</a> is <em>very</em> much in their toolbox.
Given that most people have stashed away quite a big part of their social circles and memories in their Facebook and Instagram accounts, this is a serious threat.
Similarly, we have the story of <a href="https://en.wikipedia.org/wiki/Timnit_Gebru">Timnit Gebru</a>, hired to critically reflect on dangers of AI by google, only to be fired when that work was <em>too</em> critical.
And, while all this is going on, we increasingly see tech companies putting faculty (indirectly) <a href="https://research.google/outreach/research-scholar-program/">on their payroll</a>.</p>

<h1 id="summary">Summary</h1>

<p>So, i hope to have contextualized some of the most pressing comments and questions i saw coming up.
If you think i missed to cover some important parts, please drop me a line, and i will work on another blog article.
Also, please feel free to share any non-question-comments you may have.
For both, please fee free to use my contact mail address for this blog: <a href="mailto:contact@as59645.net">contact@as59645.net</a>.</p>

<p>Until then, enjoy your day, and always remember:</p>

<p>Just because you hope it doesn’t happen, doesn’t mean it won’t.</p>]]></content><author><name></name></author><category term="research" /><category term="clouds" /><category term="measurement" /><summary type="html"><![CDATA[OpenBSD version: 7.1 (Yeah, no, probably not...) Arch: Any NSFP: Opinions are never...]]></summary></entry><entry><title type="html">[NL] Hoofd in de wolken, en meningen op de grond: Universitaire ICT in de cloud</title><link href="https://doing-stupid-things.as59645.net/onderzoek/clouds/meting/2022/10/20/hoofden-in-de-cloud.html" rel="alternate" type="text/html" title="[NL] Hoofd in de wolken, en meningen op de grond: Universitaire ICT in de cloud" /><published>2022-10-20T16:37:12+02:00</published><updated>2022-10-20T16:37:12+02:00</updated><id>https://doing-stupid-things.as59645.net/onderzoek/clouds/meting/2022/10/20/hoofden-in-de-cloud</id><content type="html" xml:base="https://doing-stupid-things.as59645.net/onderzoek/clouds/meting/2022/10/20/hoofden-in-de-cloud.html"><![CDATA[<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>OpenBSD versie: 7.1 (Ja, nee, waarschijnlijk niet...)
Arch:           Eender
NVVP:           Nooit voor meningen... (TM)
Vertaaling:     Stijn Pletinckx
</code></pre></div></div>

<p>Dus, op maandag 18 oktober verscheen er een artikel in het <a href="https://fd.nl/samenleving/1454238/studentgegevens-ondanks-kritiek-massaal-in-de-amerikaanse-cloud-gezet">‘Financieele Dagblad’</a> over een een interview met <a href="https://martina.lindorfer.in/">Martina Lindorfer</a> en ik over ons gezamelijk werk met <a href="https://www.tudelft.nl/tbm/onze-faculteit/afdelingen/multi-actor-systems/people/associate-professors/dr-fs-seda-gurses">Seda Gürses</a> en andere collega’s van TU Delft.
Dit werd al snel opgepikt door <a href="https://nos.nl/artikel/2448717-driekwart-nederlandse-studentendata-opgeslagen-bij-amerikaanse-techbedrijven">NOS</a>, <a href="https://nltimes.nl/2022/10/17/three-quarters-dutch-student-data-stored-american-tech-giants-cloud">NL Times</a>, <a href="https://tweakers.net/nieuws/202370/nederlandse-onderwijsinstellingen-slaan-veel-data-op-op-amerikaanse-cloudservers.html">Tweakers.net</a>, en uiteindelijk ook door <a href="https://www.reddit.com/r/thenetherlands/comments/y6zgm0/veel_nederlandse_onderwijsdata_in_amerikaanse/">Reddit</a> (en andere politiek relevante kanalen zoals de <a href="https://www.tweedekamer.nl/kamerstukken/kamervragen/detail?id=2022Z19631&amp;did=2022D42125">Tweede Kamer</a>).
Zoals vaak gebeurt in de media ontstonden er al snel reacties met discussies en vragen over ons artikel (<a href="https://arxiv.org/abs/2104.09462">arXiv preprint</a>).
Daar er soms interessante vragen en meningen voortvloeien uit desbetreffende discussies - en een kort krantenartikel de neiging heeft vaak <em>te</em> kort te zijn voor technische diepgang - besloot ik om een samengevatte FAQ (of FCC? “Frequently Commented Comments”) op te stellen die ingaat op de meest voorkomende vragen in verband met mijn artikel.
Dus, hier gaan we, met een lijst in willekeurige volgorde.</p>

<h1 id="maar-eigenlijk-bevinden-die-servers-zich-in-de-eu">Maar eigenlijk <em>bevinden</em> die servers zich in de EU</h1>

<p>Wat misschien wel de meest voorkomende opmerking is van alle honderden, is het argument dat de servers die op Amazon’s cloud draaien zich <em>fysiek</em> in de EU bevinden, of dat de universiteiten zijn vergeten de juiste (EU) beschikbaarheidszone te selecteren.
Het punt is, onze <em>metingen</em> en <em>claims</em> gaan over het feit of specifieke infrastructuur - en in dit geval verwijst dit naar <a href="https://nl.wikipedia.org/wiki/Learning_management_system">Learning Management Systems</a> - wordt gehost op systemen die deel uitmaken van de Amazon-cloud, onafhankelijk van de specifieke locatie van die systemen.
En, wat betreft de Nederlandse universiteiten hebben de meeste inmiddels hun LMS draaiende bij Amazon (merk op dat Blackboard.com relatief recentelijk van Azure naar AWS is verhuisd, en het artikel verwijst naar een perspectief van daarvoor).
Natuurlijk - om functionele redenen zoals bijvoorbeeld “latency” - bevinden deze systemen zich in Dublin (Amazon EU West) of Frankfurt (Amazon EU Central).
Sommige van de IP-adressen voor deze servers zijn zelfs in handen van een niet Amazon-klinkende <a href="https://www.ripe.net/membership/indices/data/us.a100row.html">A100 ROW Inc/GmbH</a>.</p>

<p>Welnu, A100 ROW GmbH is natuurlijk een 100% Amazon-dochteronderneming. (Zie ook de <a href="https://www.ripe.net/membership/indices/data/us.a100row.html">RIPE NCC registry id van ‘us.a100row’</a> voor deze entiteit; Ter vergelijking: toen mijn hoofdverblijfplaats in Nederland was, was mijn <a href="https://doing-stupid-things.as59645.net/networking/bgp/nsfp/2022/04/04/making-it-ping-part-2.html">eigen LIR</a> ‘nl.tobias’; Toen ik naar Duitsland verhuisde werd het ‘de.wybt’… )
En dat is in wezen het punt.
De “cloud” van Amazon is niet zoals champagne.
Het wordt geen ‘sprankelende geautomatiseerde infrastructuur met een API’, alleen maar omdat het niet meer uit de ‘Silicon Valley region of tech’ komt.</p>

<p>Allereerst is de <a href="https://en.wikipedia.org/wiki/CLOUD_Act">Cloud Act</a> van toepassing.
Wat deze wet in feite zegt is dat ‘Amerikaanse autoriteiten Amerikaanse bedrijven kunnen dagvaarden voor gegevens die zijn opgeslagen op hun systemen en hun buitenlandse dochterondernemingen, ongeacht waar de gegevens zich fysiek bevinden.’
Dit is algemeen bekend en is daarom ook een van de <a href="https://curia.europa.eu/juris/liste.jsf?num=C-311/18">belangrijke punten</a> in de Schrems-arresten geweest.
Zie ook <a href="https://gdprhub.eu/images/b/b4/VK_Baden_W%C3%BCrttemberg_-_1_VK_23-22.pdf">deze uitspraak</a> van een Duitse staatsrechtbank over de vraag of dochterondernemingen van Amerikaanse cloudbedrijven zelf aanklaagbaar zijn bij openbare aanbestedingen. De rechtbank beweert van niet, omdat de nodige garanties, vooral met het einde van de modelcontractbepalingen in Schrems II, niet kunnen worden verstrekt.</p>

<p>Dus, kort samengevat, zolang servers zich in de infrastructuur van een Amerikaans bedrijf bevinden, maakt het niet uit of de servers zich fysiek in de EU bevinden; De slechte delen van de Amerikaanse wetgeving zijn nog steeds van toepassing.</p>

<p>En daarnaast, wat we voornamelijk aankaarten gaat niet per se over de Amerikaanse overheid, maar meer over het punt dat individuele “cloud” bedrijven te veel macht krijgen en als gevolg invloed kunnen heben op universiteiten (en de maatschappij in het algemeen).</p>

<h1 id="kun-je-de-data-delen">Kun je de data delen?</h1>

<p>Vervolgens de verzoeken om onze ruwe data te delen.
Wat betreft de langetermijnstudie, die de resultaten in een tijdsperspectief plaatst, maken we gebruik van de “Farsight SIE” historische DNS data, gebasseerd op wereldwijde sensoren.
Dit is uiteraard data die best niet openbaar wordt vrijgegeven.</p>

<p>Desalniettemin kan men vrij snel hetgeen we meten zelf verzamelen op basis van publieke DNS data.
Om dit te demonstreren heb ik zelf een klein stukje code geschreven die informatie verzamelt over de mail server opstelllingen en LMS van Nederlandse universiteiten, en details weergeeft over de huidige status, alsook een korte interpretatie zoals wat waar gehost wordt. De code is hier te vinden: <a href="https://git.aperture-labs.org/Cloudheads/cloudheads_nl_scraper">https://git.aperture-labs.org/Cloudheads/cloudheads_nl_scraper</a>
Voel je vrij om het zelf uit te voeren om de gegevens te verzamelen, of om andere instanties toe te voegen, zoals HBO’s, die we niet hebben opgenomen.</p>

<p>Hoe dan ook, als je de gegevens wilt bekijken, kun je ze vinden in de repository, alsook de code - voor als je me niet vertrouwt - om de experimenten zelf uit te voeren.</p>

<h1 id="bevat-lms-echt-alle-gegevens-van-studenten">Bevat LMS echt <em>alle</em> gegevens van studenten?</h1>

<p>Welnu, het punt dat we maken in FD is dat - voor de meerderheid van de Nederlandse universiteiten - hun <a href="https://nl.wikipedia.org/wiki/Learning_management_system">“Learning Management Systems”</a> in de cloud staan.
Die systemen bevatten meestal gegevens over voor welke cursussen een student zich heeft ingeschreven, afhankelijk van de opzet (eind)cijfers, en een heleboel interactie tussen studenten en docenten.
Dit is natuurlijk niet <em>alle</em> data die universiteiten over studenten hebben.</p>

<p>Uiteraard, data over wat studenten doen over de locale WiFi verbinding - indien gemonitord - staan meestal lokaal opgeslagen (of worden doorgezonden naar een externe SOC voor “threat analysis”).
Tevens, het opslaan van officiele cijferlijsten gebeurt vaak niet via een LMS, maar eerder via specifieke software.
Desalniettemin overwegen sommige universiteiten om alsnog over te stappen naar cloud applicaties ter vervanging van desbetreffende systemen.
Financiele handelingen gebeuren ook via andere systemen, tevens ook als e-mail. 
Doch, hier zien we echter ook dat Nederlandse universiteiten regelmatig gebruik maken van het e-mailaanbod van grote cloudproviders, waarbij Microsoft daar de leidende leverancier is.</p>

<p>Het LMS in de cloud is dus niet <em>alle</em> gegevens van studenten.</p>

<p>Maar wel een belangrijk stuk.</p>

<h1 id="hoe-moet-je-anders-een-dienst-runnen">Hoe moet je anders een dienst runnen?</h1>

<p>Een vaak aangekaard punt in de reacties is dat het <em>bijzonder</em> moeilijk is om een infrastructuur te draaien zonder te berusten op een cloud opplossing.
Dat klopt, en ik <a href="https://doing-stupid-things.as59645.net/burning/world/resillience/2022/06/29/propositions-part-3.html">schreef hier ook over</a> op een meer algemeen niveau; Een systeem goed draaiende houden is moeilijk. Maar dan ook echt moeilijk.
En dan hebben we het nog niet eens over zaken als de rottende aard van <a href="https://dl.acm.org/doi/pdf/10.1145/3319535.3354212">Google’s font hosting</a>.
Ik sta er nog steeds van versteld hoe bijzonder lastig het is om deze uit zelf-gehoste tools te verwijderen.
En dit terwijl de caching voordelen die hebben geleid tot de populariteit van Google fonts om <a href="https://ieeexplore.ieee.org/document/5958027">veiligheidsredenen</a> inmiddels verdwenen zijn.</p>

<p>Bovendien - en terugkomend op het vorige punt - is een van onze belangrijkste argumenten dat het continue gebruik van cloudinfrastructuur leidt tot een steeds kleiner wordende mogelijkheid om dingen zelf te draaien.
Dit komt omdat we willen geld besparen op die dure ingenieurs… wat leidt to afhankelijkheid, omdat je niet langer het juiste personeel in huis hebt om <em>uit</em> de cloud te verhuizen.
Vandaar dat, het punt makende dat organisaties <em>nu al</em> geen infrastuctuur kunnen draaien zonder de hulp van clouds–<a href="https://www.usenix.org/system/files/atc22-holzbauer.pdf">‘zie ook mijn werk over de complexiteit van e-mail’</a>–dit in wezen alleen maar het argument van geleidelijke afhankelijkheid nog meer benadrukt.</p>

<p>Net daarom, ook al is het een moeilijke taak, dat we moeten nadenken over hoe we ons vermogen om zelf (onderzoeks- en onderwijs)infrastructuur te hosten, kunnen behouden.
In Nederland is hier blijkbaar een <a href="https://eerlijkdigitaalonderwijs.petities.nl/?locale=nl">lopend verzoek</a> voor.
Ik ontken niet dat dit een moeilijke taak is, maar het wordt niet makkelijker naarmate we langer wachten; Integendeel zelfs.</p>

<h1 id="natuurlijk-kunnen-universiteiten-afhankelijk-zijn-maar-dat-zijn-ze-ook-op-zaken-zoals-energie">Natuurlijk kunnen universiteiten afhankelijk zijn; Maar dat zijn ze ook op zaken zoals energie…</h1>

<p>Voortzetting van de kwestie over afhankelijkheid.
In ons werk beweren we dat universiteiten, die afhankelijk zijn van sommige entiteiten, het voor technologiebedrijven heel gemakkelijk maken om - in wezen - universiteiten te chanteren.</p>

<p>Destijds op TU Delft, luisterend naar een presentatie over deze studie, bracht een collega het volgend argument ter zake: <em>‘Nou, dan zijn universiteiten ook afhankelijk van hun gasleveranciers. Moeten we nu onze eigen energie opwekken?’</em>
(Het is je misschien opgevallen dat dit argument werd gemaakt vóór 24 februari, en <em>echt</em> niet beter geworden is met der tijd… )
De meest adequate weerlegging van dit punt kwam van een andere collega, een hoogleraar in energiesystemen, die in wezen zei: <em>“Nou, dat is natuurlijk een kwestie in het energiedomein. Daarom dat die markt <strong>sterk</strong> gereguleerd is.”</em>
Ik kan daar nauwelijks iets aan toevoegen; Slaat de spijker op de kop.</p>

<p>Bedrijven zijn rationele actoren, die, gegeven de huidige omstandigheden, altijd zullen doen wat het meest in hun voordeel uitkomt.
Als de regels iets toestaan, en het kan hen winst opleveren, <a href="https://doing-stupid-things.as59645.net/burning/world/resillience/2022/06/30/propositions-part-4.html">zullen ze het doen</a>.
Dat is hoe rationele actoren werken.</p>

<p>Tevens, de strategie van Google om voet aan grond te krijgen in nieuwe markten is allom bekend.
Stap binnen met een <em>goedkope</em> (of gratis) aanbieding, begin met extra kosten zodra je niet meer vervangebaar bent.
Dit is <a href="https://cgi.br/publicacao/educacao-em-um-cenario-de-plataformizacao-e-de-economia-dos-dados-problemas-e-conceitos/">goed bestudeerd (PT; p. 27ff.)</a>, en bevindt zich momenteel ook een beetje in de <a href="https://tech.slashdot.org/story/22/10/03/2327248/universities-adapt-to-googles-new-storage-fees-or-migrate-away-entirely">zoek-het-uit-fase</a> van ‘meld je aan voor de cloud, en zoek het uit’ voor <a href="https://library.medschl.cam.ac.uk/blog/2022/01/changes-to-university-google-drive-storage-limits/">meerdere</a>, <a href="https://blink.ucsd.edu/technology/file-sharing/google/workspace.html">verschillende</a>, <a href="https://support.csuchico.edu/TDClient/1984/Portal/Projects/Details/?TID=408400">universiteiten</a>.</p>

<p>Ervan uitgaan dat techbedrijven deze (juridische) hefboom om hun winst te verhogen zouden overslaan vanwege hun goede hart, is, om eerlijk te zijn, een beetje te naïef.</p>

<h1 id="maar-waarom-zouden-ze-zich-zelfs-bekommeren-over-het-maken-vaningrijpen-in-curricula">Maar waarom zouden ze zich zelfs bekommeren over het maken van/ingrijpen in curricula?!</h1>

<p>Een van onze grote punten is dat de technologiebedrijven hun eerder genoemde chantagevermogen kunnen gebruiken om ook te beïnvloeden wat er wordt onderwezen en onderzocht, wat ertoe leidt dat clouds de academische integriteit in gedrang kunnen brengen.
Dit roept regelmatig de vraag op <em>waarom</em> techbedrijven zich in de eerste plaats zouden willen bemoeien met curricula en onderzoek.</p>

<p>Wat curricula betreft, is het antwoord simpelweg <a href="https://cloud.google.com/edu/curriculum">omdat ze al curricula hebben</a> voor het onderwijzen van ‘de cloud’ op universiteiten, evenals <a href="https://csfirst.withgoogle.com/c/cs-first/en/curriculum.html">K-12 curricula</a> over informatica.
Evenzo waren er al gevallen waarin grote bedrijven hun marktposities gebruikten om onderzoek te beïnvloeden.
Facebook (nu beter gekend als Meta) deed bijvoorbeeld een nogal interessante zet door aan te tonen dat <a href="https://www.npr.org/2021/08/04/1024791053/facebook-boots-nyu-disinformation-researchers-off-its-platform-and-critics-cry-f">‘het annuleren van de <em>private</em> facebook-accounts van mensen die onderzoek doen dat ze niet leuk vinden’</a> wel degelijk binnen hun capaciteiten ligt.
Aangezien velen een behoorlijk groot deel van hun sociale kringen en herinneringen opgeslagen hebben in hun Facebook- en Instagram-account, is dit een serieuze bedreiging.
Eveneens hebben we het verhaal van <a href="https://en.wikipedia.org/wiki/Timnit_Gebru">Timnit Gebru</a>, aangenomen door Gogle om kritisch na te denken over de gevaren van AI, echter later ontslagen toen dat werk <em>te</em> kritisch werd.
En terwijl dit allemaal gaande is, zien we steeds vaker dat techbedrijven universitaire docenten (indirect) <a href="https://research.google/outreach/research-scholar-program/">op hun loonlijst zetten</a>.</p>

<h1 id="tot-slot">Tot slot</h1>

<p>Dus, ik hoop een aantal van de meest dringende opmerkingen en vragen die ik zag opkomen in hun context te hebben geplaatst.
Indien ik een aantal belangrijke delen heb gemist, stuur me dan een bericht, en dan zorg ik voor een vervolg blogartikel.
Voel je ook vrij om eventuele andere commentaren met mij te delen.
Voor beide kun je me bereiken op het volgende e-mailadres voor deze blog: <a href="mailto:contact@as59645.net">contact@as59645.net</a>.</p>

<p>Tot die tijd wens ik je een fijne dag toe, en onthoud altijd:</p>

<p>Hopen dat iets niet gebeurt is geen garantie dat het nooit zal gebeuren.</p>]]></content><author><name></name></author><category term="onderzoek" /><category term="clouds" /><category term="meting" /><summary type="html"><![CDATA[OpenBSD versie: 7.1 (Ja, nee, waarschijnlijk niet...) Arch: Eender NVVP: Nooit voor meningen... (TM) Vertaaling: Stijn Pletinckx]]></summary></entry><entry><title type="html">The window, the ping, and the packetloss of this thing^Wnl-ams02a-rc2</title><link href="https://doing-stupid-things.as59645.net/networking/debugging/routing/2022/10/11/the-window-the-ping.html" rel="alternate" type="text/html" title="The window, the ping, and the packetloss of this thing^Wnl-ams02a-rc2" /><published>2022-10-11T02:04:12+02:00</published><updated>2022-10-11T02:04:12+02:00</updated><id>https://doing-stupid-things.as59645.net/networking/debugging/routing/2022/10/11/the-window-the-ping</id><content type="html" xml:base="https://doing-stupid-things.as59645.net/networking/debugging/routing/2022/10/11/the-window-the-ping.html"><![CDATA[<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>OpenBSD version: 7.1 (and whatever nl-ams02a-rc2 runs on)
Arch:            Any
NSFP:            Well... found this in production... so... probably ok?
</code></pre></div></div>

<p>So, i had a problem; Had, because since yesterday, i actually know <em>why</em> i had this problem, and much more importantly, <em>how</em> to fix it.
See, for <a href="https://www.reddit.com/r/openbsd/comments/tgbktg/help_in_debugging_low_transfer_speeds_over_ssh/">quiet some time</a>, i had been having issues with low transfer speeds of my OpenBSD boxes accross a rather long link.
While, technically, i should be getting a clean downstream of 1gbit from my DOCSIS link from Ziggo at home, i would be stuck with anything between 10 and 50mbit, sustained, in the downlink.
Well, not everywhere. Windows boxes were fine. Linux systems, too. Well, kind of fine. They also did not <em>really</em> excell.
But i attributed the 300-400mbit i saw to the shady setup of different tunnels i am running to put my home router into the dfz.</p>

<p>What made this thing even more odd was that it–at first at least–seemed like only SSH/scp/rsync via SSH was affected.
Just ripping open a socket and pouring in packets was fine, i would see my 1gbit down.
Also, the boxes among each other locally would be fine.
Furthermore, it would not matter if this whole thing went down via TCP directly, or via TCP over one of my UDP based tunnels.
So, for some time i figured that it was _some_thing in the combination of SSH and the long(er) latency on my path to–back then–Hetzner.
Furthermore, this whole situation also did not really improve when <a href="https://doing-stupid-things.as59645.net/colocation/networking/openbsd/infrastructure/2022/07/18/go-big-or-go-rack.html">i moved</a> from my tunnel based shady-AS setup to a rack over at <a href="https://twitter.com/as24961">AS24961</a>.</p>

<p>Hit by a bit of frustration (and <a href="https://twitter.com/awlnx">awlnx’</a> suggestion to look at <a href="https://www.rfc-editor.org/rfc/rfc1323">TCP windows</a>), i gave debugging this issue that tracked me over two different DCs another shot.</p>

<h1 id="the-case-descriptiontm">The Case Description(TM)</h1>
<p>So, let’s first look at the case. What we have looks roughly like this (we will expand this graph in this article a bit):</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>      +-------+               +-------+
Ziggo |AS33915|&lt;--------+-----+AS24961| MyLoc
      +-------+         |     +-------+
                        |
                        |     +-------+
                        +-----+AS59645| Me :-)
                        |     +-------+
                        |
                        |     +-------+
                        +-----+AS29670| IN-Berlin
                        |     +-------+
                        |
                        |     +-------+
                        +-----+AS24940| Hetzner
                              +-------+
</code></pre></div></div>
<p>I have endpoints at MyLoc, Hetzner, IN-Berlin, and my own AS (which, among others) gets upstream from AS24961.
For yesterday’s debugging session, i also started testing using the OpenBSD base tool <code class="language-plaintext highlighter-rouge">tcpbench</code>; This helped me realize that this was <em>not</em> an SSH issue, but much more a <em>TCP issue</em>.
Now, from an endpoint at MyLoc and my own AS, i would see the reduced SSH bandwidth.</p>

<h1 id="a-look-out-of-the-windows">A look out of the windows</h1>
<p>So, first things first, i started a tcpbench server (<code class="language-plaintext highlighter-rouge">tcpbench -s</code>) on my node at AS33915, and another one at one of my machines which is solely routed through MyLoc (and not my own AS; Well, tunnel backbone on another routing table… ;-) )
As expected, transfer speeds were ‘meh’; The same for trying the same thing inside the tunnel.
However, this time around, i ran a <code class="language-plaintext highlighter-rouge">tcpdump</code> on both sides.</p>

<p>Equipped with the file from my node at home (in AS33915), i fired up wireshark.
Using the TCP window size scaling plot (Statistics -&gt; TCP Stream Graphs -&gt; Window Size Scaling), selecting the correct TCP stream, and looking at the correct direction, we end up with the following plot:</p>

<p><img src="/static/img/2022-10-11-gw02.png" alt="Graph of TCP window scaling issue: Window increases in steps, regularly interrupted by high drops every 2-4 seconds." /></p>

<p>We see the window step-wise scaling up… and some weird drops relatively evenly spaced every 2-4 seconds.
That… is odd. So, what happens then?</p>

<p><img src="/static/img/2022-10-11-loss.png" alt="Wireshark excerpt showing packet loss from the AS24961 host to the AS33915 host." /></p>

<p>Well, a packet got confused, and decided to get lost.
And now, all of a sudden, i knew why bandwidth was so horrendous. Packetloss. Occasional. With implications for the TCP window.
And, of course, that packet loss does not mind TCP packets being in a tunnel.
Drop is drop.</p>

<p>This leaves a lot of nice opportunities of where things could go amis:</p>
<ul>
  <li>My “high quality CPE” could be at fault</li>
  <li>My pf config could be funny</li>
  <li>My little VM box behind the Ziggo link (or a switch on the way there) might be an issue</li>
  <li>My routing table toying could lead to issues</li>
  <li>DOCSIS could be DOCSIS</li>
  <li>The switch in my rack could be ‘funny’</li>
  <li>One of the cables to my upstream could be ‘funny’</li>
  <li>Something else could be off</li>
</ul>

<p>The first things I ruled out were my pf config (which, to be fair, is not soooooo sophisticated), by simply disabling it, and my routing table stuff, simply by setting up a box that would route directly via AS24961 directly. No tables involved.
Both did not change the result.</p>

<h1 id="something-else-changed">Something else changed</h1>
<p>Interestingly, though, when i tested a bit further yesterday, i noticed something else that was odd: Now, both from Hetzner and IN-Berlin, traffic looked a lot closer to what i’d expect.
While this adds a lot more oddity to this issue, it also <em>rules out</em> some possible causes:</p>
<ul>
  <li>This can’t be DOCSIS being funny</li>
  <li>This can’t be my CPE being funny</li>
  <li>This actually <em>can’t</em> be <em>anything</em> around the systems behind the end-user connection in AS33915</li>
</ul>

<p>Hence, our map now looks somewhat like this (With intermediate steps obviously skipped), as two ASes are certainly different than the other two:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>      +-------+               +-------+
Ziggo |AS33915|&lt;-------+------+AS24961| MyLoc
      +-------+        |      +-------+
          ^            |
          |            |      +-------+
          |            +------+AS59645| Me :-)
          |                   +-------+
          |
          |                   +-------+
          +-------------------+AS29670| IN-Berlin
          |                   +-------+
          |
          |                   +-------+
          +-------------------+AS24940| Hetzner
                              +-------+
</code></pre></div></div>

<h1 id="routing-around">Routing around</h1>
<p>Still, this leaves <em>‘things’</em> around my rack as possible issues.
So… how to exclude that. 
Well, technically, i <em>should</em> have multiple routes to the box in AS33915 via my various upstreams.
This is where two of my other upstreams come in: AS50629 (LWLcom) and AS34927 (iFog).
Playing around with my upstreams, i quickly noticed that, for iFog, things pretty much looked the same as when i was going via AS24961.</p>

<p>Going through AS50629, however, things looked a bit different. My TCP session steadily climbed to 1gbit throughput.
The TCP window scaling plot also looked significantly more promising:</p>

<p><img src="/static/img/2022-10-11-lwlcom.png" alt="Graph of TCP window scaling issue: Window increases in steps, without being interrupted, leveling out eventually." /></p>

<p>This combination of things (not) working has a beautiful side-effect: Both upstreams come in as tagged VLANs via <em>the same cable</em>.
Effectively, this rules out the following causes:</p>
<ul>
  <li>The switch in my rack could be ‘funny’</li>
  <li>One of the cables to my upstream could be ‘funny’</li>
</ul>

<p>Effectively only leaving me with:</p>
<ul>
  <li>Something else could be off</li>
</ul>

<p>Great. Back to start.</p>

<h1 id="cant-you-have-issues-elsewhere">Can’t you have issues elsewhere?</h1>
<p>Having ruled out all of <em>my</em> stuff as the root cause, i set out to find <em>other</em> places where this might happen.
Using the <a href="https://ring.nlnog.net/">NLNOG Ring Node</a> at MyLoc and with some help from AS34936, i could quickly establish that it really was <em>not just me</em>.
Now, hunting for what was different between AS29670, AS24940, and AS50629 and all the others was on.
Tool for that, of course, traceroutes, or rather, MTR.</p>

<p>The issue here, though, is that the packet drops do not <em>really</em> seem to be all <em>that</em> frequent, except in an established TCP session.
Which makes it <em>really</em> hard to debug.
Only when looking at smokeping over 10 days, i see a <em>very</em> modest drop rate around 0.01%, which i do <em>not</em> see towards my node at IN-Berlin.
So, let’s play the exclusion game: What looks different (or the same?) between working and non-working traces?</p>

<p>Looking at traceroutes–here via LWLcom and MyLoc as an example–quickly showed that <em>all</em> traffic to AS33915 would go through the Liberty Global B.V. AS (AS6830) before passing through AS9143 (Vodafone/Ziggo) to the Ziggo AS.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>nw-test (37.157.249.226) -&gt; &lt;AS33915&gt; 2022-10-08T23:20:59+0000
   Host										Loss%	Snt	Last	 Avg	Best	Wrst	StDev
1. (waiting for reply)
2. lag19.core3-dus1.bb.as24961.net			 0.0%	7	 0.4	 0.3	 0.2	 0.4	0.1
3. lag8.core1-dus-ix.bb.as24961.net			 0.0%	7	 0.6	 0.6	 0.5	 0.7	0.1
4. 76.74.9.169								 0.0%	7	 0.8	 0.9	 0.6	 1.6	0.3
5. ae22.cr2-fra6.ip4.gtt.net				 0.0%	7	 4.8	 4.5	 3.9	 4.9	0.4
6. ip4.gtt.net								 0.0%	7	12.7	 8.8	 4.0	13.0	3.5
7. de-fra02a-rc1-ae-49-0.aorta.net			 0.0%	7	 7.5	 7.6	 7.5	 8.0	0.2
8. nl-ams02a-rc2-lag-11-0.aorta.net			83.3%	7	 7.4	 7.4	 7.4	 7.4	0.0
9. asd-tr0021-cr101-be60-2.core.as33915.net	 0.0%	7	 7.8	 8.0	 7.8	 8.5	0.3
10. gv-rc0052-cr102-et2-2.core.as33915.net	 0.0%	7	 9.6	 9.7	 9.5	10.0	0.2
11. (waiting for reply)
12. HOST.cable.dynamic.v4.ziggo.nl			 0.0%	7	21.3	20.6	16.4	23.1	2.2
</code></pre></div></div>
<p>And, for comparison, the working one via LWLcom:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>gw01.dus01.as59645.net (195.191.197.254) -&gt; &lt;AS33915&gt; 2022-10-08T23:18:43+0000
   Host										Loss%	Snt	Last	 Avg	Best	Wrst	StDev
1. gw02dus01.lwlcom.dus01.as59645.net		 0.0%	8	 0.7	 0.7	0.5		 1.2	0.2
2. lag-108.ear1.Dusseldorf1.Level3.net		33.3%	7	 0.6	 0.7	0.6		 0.8	0.1
3. ae1.3104.edge7.Amsterdam1.level3.net		 0.0%	7	 4.1	 4.4	3.8		 6.8	1.1
4. nl-srk03a-ri1-ae-8-0.aorta.net			 0.0%	7	 7.6	 7.6	7.5		 7.8	0.1
5. asd-tr0021-cr101-be65.core.as9143.net	14.3%	7	 8.6	 8.7	8.6		 8.8	0.1
6. gv-rc0052-cr102-et2-2.core.as33915.net	 0.0%	7	10.5	10.3	10.1	10.5	0.1
7. (waiting for reply)
8. HOST.cable.dynamic.v4.ziggo.nl			 0.0%	7	21.8	21.6	18.3	22.9	1.5
</code></pre></div></div>
<p>Looking again a bit closer at the traceroutes, we finally find <em>one</em> host that is the same for all traceroutes that show the occassional drops ruining our throughput, that is absent in all other traces: <code class="language-plaintext highlighter-rouge">nl-ams02a-rc2</code>
Sometimes it is <code class="language-plaintext highlighter-rouge">nl-ams02a-rc2-lag-11-0.aorta.net. (84.116.130.150)</code>, sometimes it is <code class="language-plaintext highlighter-rouge">nl-ams02a-rc2-lag-12-0.aorta.net (84.116.139.125)</code>. However, when packets drop it is <em>always</em> <code class="language-plaintext highlighter-rouge">nl-ams02a-rc2</code>, and if they don’t there is no <code class="language-plaintext highlighter-rouge">nl-ams02a-rc2</code>.
Also, note that this box seems to be rather busy dropping control plane packets, rocking an 83.3% packet loss over 7 packets (yeahyeah, router not pinger; but let’s call this ‘circumstancial evidence’; Still, over more packets that box usually gets to ~93% loss).
It must be busy with <em>something</em>.</p>

<p>Taking <code class="language-plaintext highlighter-rouge">nl-ams02a-rc2</code> into account, this leaves us with this version of our graph:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>  Ziggo                                   iFog
+-------+                               +-------+        +-------+
|AS33915|           +-------------------+AS34936|&lt;---+---+AS59645| Me :-)
+-------+           |                   +-------+    |   +---+---+
    ^               |                                |       |
    |               |                   +-------+    |       |
    |               +-------------------+AS24961|&lt;---+       |
    |               |                   +-------+            |
    |               |                     MyLoc              |
    |               | &lt; nl-ams02a-rc2                        |
    |               v                                        v
+---+---+       +-------+                                +-------+
| AS9143|&lt;------+ AS6830| Liberty Global                 +AS50629| LWLcom
+-------+       +-------+                                +-------+
Vodafone/           ^                                        |
  Ziggo             | &lt; no nl-ams02a-rc2                     |
                    |                                        |
                    +----------------------------------------+
                    |
                    |
                    |                                    +-------+
                    +------------------------------------+AS29670| IN-Berlin
                    |                                    +-------+
                    |
                    |                                    +-------+
                    +------------------------------------+AS24940| Hetzner
                                                         +-------+
</code></pre></div></div>

<h1 id="making-it-pingwnot-drop-again">Making it ping^Wnot drop again</h1>

<p>To test our hypothesis that nl-ams02a-rc2 has <em>something</em> to do with our bandwidth issues, the most straight-forward way is changing the path for one of the currently affected ASes to one no longer passing it.
After filling a ticket with a write-up of the issue with MyLoc, they changed the path for their route to AS33915.
Lo and behold: The loss is gone, my windows behave properly, and i can finally saturate my link.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>  Ziggo                                   iFog
+-------+                               +-------+        +-------+
|AS33915|           +-------------------+AS34936|&lt;---+---+AS59645| Me :-)
+-------+           |                   +-------+    |   +---+---+
    ^               |                                |       |
    |               |                   +-------+    |       |
    |               |             MyLoc +AS24961|&lt;---+       |
    |               |                   +-------+            |
    |               |                       |                |
    |               | &lt; nl-ams02a-rc2       |                |
    |               v                       |                v
+---+---+       +-------+                   |            +-------+
| AS9143|&lt;------+ AS6830| Liberty Global    |            +AS50629| LWLcom
+-------+       +-------+                   |            +-------+
Vodafone/           ^                       |                |
  Ziggo             | &lt; no nl-ams02a-rc2    |                |
                    |                       |                |
                    +-----------------------+                |
                    |                                        |
                    +----------------------------------------+
                    |
                    |
                    |                                    +-------+
                    +------------------------------------+AS29670| IN-Berlin
                    |                                    +-------+
                    |
                    |                                    +-------+
                    +------------------------------------+AS24940| Hetzner
                                                         +-------+
</code></pre></div></div>

<h1 id="conclusion-and-todo">Conclusion and TODO</h1>
<p>So, in conclusion: Sometimes it is really worth it to dig into something very strange that is happening to you.
And, methodological debugging can be fun (especially the feeling when you <em>FINALLY</em> figure it out.</p>

<p>Also, there is still the oddity left of why seemingly only OpenBSD, andnot Windows/Linux.
My personal guess is the specific implementation of TCP congestion control/window resizing in OpenBSD being just a tad more sensitive.</p>

<p>At the same time, this gives me the interesting TODO item of figuring out how to get AS6830 to fix this issue; 
Even though i <em>can</em> fix this myself for ASes where i can either change routes myself <em>or</em> motivate others to do so… there is a bit more ASes out there than those i can change routes for (which er… is one)… and i kind of like to fetch stuff via TCP.
Let’s be honest, if i call the Ziggo customer support, they’ll probably have me reset my router a couple of times.
Then again… probably worth trying. :-|</p>]]></content><author><name></name></author><category term="networking" /><category term="debugging" /><category term="routing" /><summary type="html"><![CDATA[OpenBSD version: 7.1 (and whatever nl-ams02a-rc2 runs on) Arch: Any NSFP: Well... found this in production... so... probably ok?]]></summary></entry></feed>