Dense arithmetic tasks such as linear algebra are common building blocks of high-performance computations in scientific simulations and machine learning. Automatic optimization of such operations is extremely important in complex real-life use-cases to achieve the best performance on modern hardware, such as GPUs. In this talk we introduce simple extensions of the common high-level functional primitives that can represent dense operations over multidimensional arrays. We also construct rewrite rules for them to improve data locality and thus improve performance, such that automatic tools can be used to recognize the parts of an expression tree suitable for optimization.
Slides