OpenBSD version: 7.1 (Yeah, no, probably not...) Arch: Any NSFP: Opinions are never...
So, on Monday the 18th of October, the ‘Financieele Dagblad’ published an article based on an interview with Martina Lindorfer and me, on our joint work (arXiv preprint from 2021) together with Seda Gürses and further colleagues from TU Delft. This was then quickly picked up by NOS, the NL Times, and Tweakers.net, ultimately also finding it’s way to Reddit (and other politically relevant venues like the Dutch Parliament). Naturally, this lead to a lot of questions and comments raining into the forum sections of those publications. As there are sometimes some interesting questions and opinions in these comments–and a brief newspaper article tends to be too brief for some technical depth–i figured it might be nice to have a brief FAQ (or FCC? Frequently Commented Comments?) on some points raised frequrently. So, here we go, with a list in no particular order.
But those servers are actually in the EU
What might be the top comment among all of the hundreds is the point that the instances running on Amazon’s cloud are physically located in the EU, or the universities may have forgotten to select the right (EU) availability zone. Thing is, what we measured and claimed is whether specific infrastructure–and in this case the statement refers to Learning Management Systems–is hosted on systems that are part of the Amazon cloud, independent of the specific location of those systems. And, for Dutch Universities, by now, most have their LMS hosted with Amazon (note that Blackboard.com moved from Azure to AWS relatively recently, and the article references a perspective from before then). Of course–for functional reasons like latency alone already–these systems are either in Dublin (Amazon EU West) or Frankfurt (Amazon EU Central). Some of the IP adresses for these instances are even held by a very non Amazon-ish sounding A100 ROW Inc/GmbH.
Well, A100 ROW GmbH is, of course, a 100% Amazon subsidiary. (Note, the RIPE NCC registry id of ‘us.a100row’ for this entity; For comparison: When my main residence was in the Netherlands, my own LIR was ‘nl.tobias’; After relocating that to Germany it became ‘de.wybt’… ) And that is basically the point. Amazon’s cloud is not like Champagne. It doesn’t become ‘sparkling automated infrastructure with an API’, just because it is no longer from the ‘Silicon Valley region of tech’.
First of all, the Cloud Act applies. What this law basically says is ‘US authorities can subpoena US companies for data stored on their systems and their foreign subsidiaries regardless of where the data is physically located.’ This is well known, and hence has also been one of the major points in the Schrems rulings. See also this ruling of a German state court on whether subsidiaries of US cloud companies are even viable in public tenders. The court claims no, because the necessary guarrantees, especially with the end of Standard Contract Clauses in Schrems II, can not be provided.
So, in summary, as long as things are in infrastructure belonging to a US company, it does not matter whether the servers are physically located in the EU; The bad parts of US law still apply.
And, besides this, the main point we’re making is not that much about the US government, but the power of individual cloud companies can inflict on universities (and society as a whole).
Can you share your data?
Right up next are requests for our raw data. For the long-term survey, also looking at things over time, we used the Farsight SIE dataset of historic DNS requests seen by sensors all around the world. This, of course, is not really a dataset to share publicly.
However, what we measure can be quickly gathered from the public DNS by oneself. I hence wrote a small script that checks in on Dutch universities’ mail setups and learning management systems, and provides a detailed overview of the current status, as well as a brief interpretation, i.e., what is hosted where. You can find that script here: https://git.aperture-labs.org/Cloudheads/cloudheads_nl_scraper Please feel free to run it yourself to gather the data, or include any institutions, like HBOs, which we did not include.
Anyway, if you want to take a look at the data, get it from the repository, or–if you don’t trust me–grab the script there and run it for yourself.
Are LMS really all of students’ data?
Well, the statement made in the FD article is–for the majority of Dutch universities–about their Learning Management Systems being in the cloud. Those systems usually hold data on which courses a student registered for, depending on the setup (partial) as well as reported final grades, and a bunch of interaction in between students and with teachers. This is, of course not all data universities hold on students.
Of course, the logs of what students do via the local Wifi–if logged by some network security solution, that is–are most likely stored locally (or send off to an offsite SOC for threat analysis). Similarly, final bookkeeping on grades often is not in the LMS, but in a dedicated application. Still, some universities are already evaluating cloud-based replacements. Finance will run another system handling fee payments for students etc. Email will also be handled by a different bunch of systems, too. Here, though, we also find that Dutch universities regularly use the email offerings of major cloud providers, with Microsoft being the leading vendor there.
So, the LMS in the cloud is not all of students’ data.
But an important chunk.
How else should you run a service?
A common issue pointed out by commentators is that it is really hard to run a service without relying on cloud infrastructure. That, is very true, and i also wrote about this on a more general level; Running stuff well is hard. Like, really hard. And we are not even talking about things like the cancerous nature of Google’s font hosting. (I am still amazed how often i struggle to cut these out of self-hosted tools) And this while the caching advantages that lead to the rise of Google fonts are gone by now for security reasons.
Besides–and coming back to the previous point–one of our main arguments is that the continuous use of cloud infrastructure leads to an increasingly reducing ability to run stuff yourself. This comes from saving money on those expensive engineers… which leads to dependence, because you no longer have the teams in-house to move out of the cloud. Hence, with the point being that many organizations already can’t run infrastructure without the help of clouds–‘see also my work on the compelxity of email’–this essentially just highlights the point we are making regarding progressing dependence.
Hence, even though it is a difficult task, we have to think about how to retain our ability to host (research and teaching) infrastructure ourselves. In the Netherlands, there is apparently an ongoing pettition for this. I don’t doubt that this is a hard task, but it doesn’t get easier the longer we wait; Actually, on the contrary.
Of course universities might be dependent; But so are they on energy…
Continuing down the issue of dependence. In our work, we are claiming that universities being dependent on some entities makes it really easy for tech companies to–essentially–blackmail universities. Back at TU Delft, when listening to a presentation on this measurement work, a colleague brought this argument to the point by claiming: ‘Well, then universities are also dependent on their gas suppliers. Should we now produce our own energy?’ (You might notice that this argument was made before the 24th of February, and really didn’t age well… ) Thing is, the most accurate rebuttal of this point came from another colleague, a full professor on energy systems, essentially saying: “Well, of course that is an issue in the energy domain. That is why that market is *heavily* regulated.” I can hardly add to that; Hits the nail on the head.
Companies are rational actors, which will do what is best for them under a set of given rules. If the rules permit something, and it can make them profit, they will do it. This is how rational actors work.
Also, in general, the approach of companies like Google to get a foothold in new markets is kind’a known: Barge in with a cheap (or free) offer, start charging once you are ubiquous. This is rather well analyzed (PT; p. 27ff.), and currently also kind of in the find-out phase of ‘sign up for the cloud, and find out’ for several different universities.
Assuming tech companies would skip on this (legal) lever to increase their profits just due to their good heart is, to be honest, just a bit naive.
But why would they even care for making/interfering with curricula?!
So, one of our big points is that the tech companies might use their aforementioned blackmail ability to also influence what is taught and researched, leading to our point that clouds may threaten academic integrity. This regularly triggers the question of why tech comapnies would even care to meddle with curricula and research.
As for curricula, the answer is quite obviously because they already have curricula for teaching ‘the cloud’ in universities, as well as K-12 curricula on computer science. Similarly, there already were instances were large corporations used their market positions to influence research. Facebook (now called Meta), for example, pulled a rather interesting move by demonstrating that ‘canceling the private facebook accounts of people doing research they don’t like’ is very much in their toolbox. Given that most people have stashed away quite a big part of their social circles and memories in their Facebook and Instagram accounts, this is a serious threat. Similarly, we have the story of Timnit Gebru, hired to critically reflect on dangers of AI by google, only to be fired when that work was too critical. And, while all this is going on, we increasingly see tech companies putting faculty (indirectly) on their payroll.
So, i hope to have contextualized some of the most pressing comments and questions i saw coming up. If you think i missed to cover some important parts, please drop me a line, and i will work on another blog article. Also, please feel free to share any non-question-comments you may have. For both, please fee free to use my contact mail address for this blog: firstname.lastname@example.org.
Until then, enjoy your day, and always remember:
Just because you hope it doesn’t happen, doesn’t mean it won’t.