Troels Henriksen is assistant professor at the University of Copenhagen where he teaches students about things that are beautiful (functional programming) and tries to make things that are beautiful also go fast (by writing compilers for functional programming languages).
It's been often said that pure functional programming is a great fit for parallel programming, because it is free of side effects. So how come GPUs, perhaps the most parallel mainstream devices, are not dominated by functional programming languages? It turns out that GPUs and similar high-performance processors are heavily restricted in what they can efficiently do, and the way we normally compile functional languages tends to run face-first into most of these restrictions. How do you compile a functional language when even something as simple as allocating memory becomes a struggle?
The trick is to carefully design a functional language that can be transformed into the kind of heavily restricted code expected by the hardware. The programmer may think they're using all the lovely higher-order functions we know and love, but really they're writing the kind of code you'd expect from a C compiler, full of loops, in-place updates, and up-front memory allocations! We have designed one such language, Futhark, which really feels like you're programming in a pretty decent subset of a language such as OCaml or Haskell, but with strategically placed limitations (and a very aggressive compiler) that lets it generate GPU code that is competitive with hand-written code.
In my talk, I'll talk about some of the crucial transformations, how the compiler transforms the program, and how it lays out data in memory in ways that allow efficient access - without the programmer having to worry about anything but writing nice, parallel, purely functional code.
Slides