Dr Evelina Gabasova is a machine learning and data science researcher. She works as Principal Research Data Scientist at The Alan Turing Institute, the UK's national institute for data science and artificial intelligence. She is a member of the research engineering team where she is connecting academic research with real-world applications. Her passion is to make data science understandable and accessible to everyone. When not wrangling data or training machine learning models, she is an active member of the F# community, Microsoft MVP and a technical speaker.
Data science is emerging as a hot topic across many areas both in industry and academia. In my research, I’m using machine learning methods to build mathematical models for cancer cell behaviours. But using today’s data science tools is hard – we waste a lot of time figuring out what format different CSV files use or what is the structure of JSON or XML files. Often, we need to switch between Python, Matlab, R and other tools to call functions that are missing elsewhere. And why are many programming languages used in data science missing tools standard in modern software engineering?
In this talk I’ll look at data science tools in F# and how they simplify the life of a modern scientist, who heavily relies on data analytics. F# provides a unique way of integrating external data sources and tools into a single environment. This means that you can seamlessly access not only data, but also R statistical and visualization packages, all from a single environment. Compile-time static checking and rich interactive tooling gives you many of the standard tools known from software engineering, while keeping the explorative nature of simple, scripting languages.
Using examples from my own research in bioinformatics, I’ll show how to use F# for data analysis using various type providers and other tools available in F#.