Some of you might know that for the last 2 years I was studying a Master’s degree in data science from the University of Dundee. This was a 2 years part-time course delivered by Andy Cobley and Mark Whitehorn. This course was fantastic and I recommend it – If you want to know more about the course, please give me a shout. The course is comprised of multiple modules. The final module is a research project, which you need to start thinking about towards the end of the first year of study. I selected my topic very early on, however being indecisive, I changed my idea 3 times (each time having written a good chunk of the project).
Why did I do this? I simply was not passionate about the subject of those projects. They we good ideas, but I just was not researching or building anything new. The outcome of my dissertation might have been a working project, however it would have felt hollow to me. I needed a topic I was passionate about. I have a core ethos that I take to every project I work on. “Never do anything more than once”. It is because of that, that I have spent much of career working either with or developing automation tools to accelerate and simplify my development processes. Having attended a lot of conferences, I became familiar with DevOps and how it accelerated the software industry. DevOps allows software developer to ship code faster. I have been applying the core principles of DevOps to all my recent projects, with great success.
The course covered a lot of the techniques required for data science, however it did not cover how to deploy a model in to production. I started researching deployment techniques. I read a lot of books which described the development process, each stopping at deployment. Although none of the book detailed how you actually deploy a model. I quickly discovered why. It is hard! I had my dissertation topic! “Applying DevOps to production machine learning”
Having spent 6 months writing about the topic, I wanted to share that with you in blog format. If you want to read my dissertation in full let me know. As there is a lot of content to get through, we will take it one step at a time.
The culmination of this blog series is a design pattern which allows a developer to commit their machine learning models in to Git and have a deployed model in production. To do this we will highlight various tools, languages and techniques along the way. All code examples are hosted in my GitHub account. I am trying to talk more and more about this subject at customers and conferences. If you would like to discuss this in more detail, please get in touch.
Research Blog series contents.
1. Setting the scene (this page)
2. An introduction to DevOps
3. DevOps in detail
4. Data Science process. And what is wrong most the current literature?
5. How DevOps is currently being implemented in the ML industry.
6. [Book Review] Machine Learning Logistics (Ted Dunning and Ellen Friendman)
7. How to apply DevOps to Data Science.
8. A design pattern for a one-click deployment of a data science model
9. Research outcomes and what is next
Technical blog series contents
1. An Introduction to Visual studio Team Services (VSTS)
2. An Introduction to Git
3. An introduction to Docker and Azure Container Registry
4. An introduction to Kubernetes
My aim is to publish a new section every 2 weeks.
All code will be copied in to the relevant blog, however you can obtain a copy on my GitHub.
Thanks for reading.