Some of you might know that for the last 2 years I was studying a Master’s degree in data science from the University of Dundee. This was a 2 years part-time course delivered by Andy Cobley and Mark Whitehorn. This course was fantastic and I recommend it – If you want to know more about the course, please give me a shout. The course is comprised of multiple modules. The final module is a research project, which you need to start thinking about towards the end of the first year of study. I selected my topic very early on, however being indecisive, I changed my idea 3 times (each time having written a good chunk of the project).
Why did I do this? I simply was not passionate about the subject of those projects. They we good ideas, but I just was not researching or building anything new. The outcome of my dissertation might have been a working project, however it would have felt hollow to me. I needed a topic I was passionate about. I have a core ethos that I take to every project I work on. “Never do anything more than once”. It is because of that, that I have spent much of career working either with or developing automation tools to accelerate and simplify my development processes. Having attended a lot of conferences, I became familiar with DevOps and how it accelerated the software industry. DevOps allows software developer to ship code faster. I have been applying the core principles of DevOps to all my recent projects, with great success.
The course covered a lot of the techniques required for data science, however it did not cover how to deploy a model in to production. I started researching deployment techniques. I read a lot of books which described the development process, each stopping at deployment. Although none of the book detailed how you actually deploy a model. I quickly discovered why. It is hard! I had my dissertation topic! “Applying DevOps to production machine learning”
Having spent 6 months writing about the topic, I wanted to share that with you in blog format. If you want to read my dissertation in full let me know. As there is a lot of content to get through, we will take it one step at a time.
The culmination of this blog series is a design pattern which allows a developer to commit their machine learning models in to Git and have a deployed model in production. To do this we will highlight various tools, languages and techniques along the way. All code examples are hosted in my GitHub account. I am trying to talk more and more about this subject at customers and conferences. If you would like to discuss this in more detail, please get in touch.
Research Blog series contents.
- Setting the scene (this page)
- An introduction to DevOps
- DevOps in detail
- Data Science, The process, problems and productionisation.
- How DevOps is currently being implemented in the ML industry.
- [Book Review] Machine Learning Logistics (Ted Dunning and Ellen Friedman)
- How to apply DevOps to Data Science.
- A design pattern for a one-click deployment of a data science model
- Research outcomes and what is next
Technical blog series contents
- An Introduction to Visual studio Team Services (VSTS)
- An Introduction to Git
- An introduction to Docker, Docker-compose
- An introduction Azure Container Registry
- An introduction to Kubernetes
- An Introduction to Helm
- A design pattern for Rendezvous
My aim is to publish a new section every 2 weeks.
All code will be copied in to the relevant blog, however you can obtain a copy on my GitHub.
Thanks for reading.
Agile Manifesto, 2001. The Agile Manifesto. Available at: http://agilemanifesto.org/ [Accessed 20/10/2017]
Biligrir, A. 2016. Continuous Monitoring: The Role of DevOps. Available at : https://devops.com/continuous-monitoring-role-devops-and-apm/ [Accessed 09/01/2018]
Brown, D. 2017. How do we (Microsoft) get started with DevOps. Available at : http://donovanbrown.com/post/How-do-we-get-started-with-DevOps [accessed 06/01/2018]
CrowdFlower, 2016. Data Science Report 2016. Available at: http://visit.crowdflower.com/rs/416-ZBE-142/images/CrowdFlower_DataScienceReport_2016.pdf [Accessed 20/10/2017]
DataOps Manifesto, 2017. The DataOps Manifesto. Available at: http://dataopsmanifesto.org/ [Accessed 11/01/2018]
Efron, B. 2003. Least Angle Regression. Available at: http://web.stanford.edu/~hastie/Papers/LARS/LeastAngle_2002.pdf [Accessed 20/10/2017]
Gollapudi, S. 2016 Practical Machine Learning. Packt Publishing: Brimingham
Haung, G. 2017. Peeking into the black box: Lessons from the front lines of machine-learning product launches. Available at https://www.oreilly.com/ideas/peeking-into-the-black-box-machine-learning-product-launches [accessed 20/10/2017]
Lutz, M. 2011. Programming Python: Powerful Object-Orientated Programming. O’Reilly press: Washington.
Microsoft. 2017a. Team Data Science Process. Available at https://docs.microsoft.com/en-us/azure/machine-learning/team-data-science-process/overview [accessed 20/10/2017]
Microsoft. 2017b. Team Foundation Server. Available at https://www.visualstudio.com/tfs/ [accessed 20/10/2017]
Microsoft. 2017c. Azure Resource Manager Documentation. Available at https://docs.microsoft.com/en-us/azure/azure-resource-manager/ [accessed 20/10/2017]
Microsoft. 2017d. Announcing Azure Building Blocks. Available at https://azure.microsoft.com/en-gb/blog/azurebuildingblocks/ [accessed 20/10/2017]
Microsoft. 2017e. Welcome to Azure Cosmos DB. Available at https://docs.microsoft.com/en-us/azure/cosmos-db/introduction [accessed 20/10/2017]
Microsoft. 2017f. What is Azure Machine Learning. Available athttps://docs.microsoft.com/en-us/azure/machine-learning/preview/overview-what-is-azure-ml [accessed 20/10/2017]
Microsoft. 2017g. Microsoft Azure: The Service Level Agreement. Available at https://azure.microsoft.com/en-gb/support/legal/sla/ [accessed 06/01/2018]
Microsoft, 2018. Azure CosmosDB: Data migration tool. Available at: https://docs.microsoft.com/en-us/azure/cosmos-db/import-data [Accessed 12/01/2018]
Milanesio, L. 2013. Git patterns and anti-patterns: Scaling from workgroup to enterprise. Dzone publishing: Cary NC
Muller E. 2010a. What Is DevOps? Available at https://theagileadmin.com/what-is-devops/ [Accessed 18/10/2017]
Mueller, E. 2010b. A DevOps Manifesto. Available at: https://theagileadmin.com/2010/10/15/a-devops-manifesto/ [Accessed 11/01/2018]
Olavsrud, T. 2017. What is DataOps? Collaborative, cross-functional analytics. Available at
https://www.cio.com/article/3237694/analytics/what-is-dataops-data-operations-analytics.html [Accessed 27/12/2017]
Palmer, A. 2015. From DevOps to DataOps. Available at https://www.tamr.com/from-devops-to-dataops-by-andy-palmer/ [Accessed 11/01/2018]
Puppet Labs. 2014. 2014 State of DevOps report. Available at: https://puppet.com/resources/whitepaper/2014-state-devops-report [Accessed 11/01/2018]
Puppet Labs. 2015. 2015 State of DevOps report. Available at: https://devops-research.com/assets/state-of-devops-2015.pdf [Accessed 11/01/2018]
Puppet Labs. 2016. 2016 State of DevOps report. Available at: https://puppet.com/resources/whitepaper/2016-state-of-devops-report [Accessed 11/01/2018]
Puppet Labs. 2017. 2017 State of DevOps report. Available at: https://puppet.com/resources/whitepaper/state-of-devops-report [Accessed 11/01/2018]
Raj, P. Chelladhurai, J. Singh, V. 2015. Learning Docker. Packt publishing: Birmingham.
Ramasubramanian, K & Singh, A. 2017. Machine learning with R. Apress: New York.
Sci-kit learn, 2017a. Available at: http://scikit-learn.org/stable/ [Accessed 27/12/2017]
Sci-kit learn, 2017b. Available at: http://scikit-learn.org/stable/auto_examples/linear_model/plot_ols.html [Accessed 27/12/2017]
Serialize, 2017. Serialize. Available at: https://stat.ethz.ch/R-manual/R-devel/library/base/html/serialize.html [Accessed 09/01/2018]
Scully, S. 2015. Production and Beyond: Deploying and Managing Machine learning models. Available at : https://www.youtube.com/watch?v=q-VPALG6ogY [Access 20/10/2017]
Seven, D. 2014. Knightmare: A DevOps Cautionary Tale. Available at: https://dougseven.com/2014/04/17/knightmare-a-devops-cautionary-tale/ [Accessed 20/10/2017]
Terraform, 2017. Terrafrom Home Page. Available at https://www.terraform.io/ [Accessed 20/10/2017]
Valentine, C, Merchan, W. 2017. DataOps: an Agile Methodology for Data-Driven Organizations. Available at https://www.datascience.com/resources/white-papers/dataops-with-mapr [Accessed 20/10/2017]
Witten, I, Frank, E, Hall, M. 2011. Data Mining, a practical guide to machine learning. 3rd edition, Elsevier: Massachusetts.
Wright, J. 2016. Learn Docker in 12 Minutes. Available at: https://www.youtube.com/watch?v=YFl2mCHdv24 [Accessed 11/01/2018]
Wright, J. 2017. Learn Docker-compose in 12 Minutes. Available at: https://www.youtube.com/watch?v=Qw9zlE3t8KoDocker-compose [Accessed 11/01/2018]