Applying DevOps to Data Science.

Hi everyone.

Some of you might know that for the last 2 years I was studying a Master’s degree in data science from the University of Dundee. This was a 2 years part-time course delivered by Andy Cobley and Mark Whitehorn. This course was fantastic and I recommend it – If you want to know more about the course, please give me a shout. The course is comprised of multiple modules. The final module is a research project, which you need to start thinking about towards the end of the first year of study. I selected my topic very early on, however being indecisive, I changed my idea 3 times (each time having written a good chunk of the project).

Why did I do this? I simply was not passionate about the subject of those projects. They we good ideas, but I just was not researching or building anything new. The outcome of my dissertation might have been a working project, however it would have felt hollow to me. I needed a topic I was passionate about. I have a core ethos that I take to every project I work on. “Never do anything more than once”. It is because of that, that I have spent much of career working either with or developing automation tools to accelerate and simplify my development processes. Having attended a lot of conferences, I became familiar with DevOps and how it accelerated the software industry. DevOps allows software developer to ship code faster. I have been applying the core principles of DevOps to all my recent projects, with great success.

The course covered a lot of the techniques required for data science, however it did not cover how to deploy a model in to production. I started researching deployment techniques. I read a lot of books which described the development process, each stopping at deployment. Although none of the book detailed how you actually deploy a model. I quickly discovered why. It is hard! I had my dissertation topic! “Applying DevOps to production machine learning”

Having spent 6 months writing about the topic, I wanted to share that with you in blog format. If you want to read my dissertation in full let me know. As there is a lot of content to get through, we will take it one step at a time.

The culmination of this blog series is a design pattern which allows a developer to commit their machine learning models in to Git and have a deployed model in production. To do this we will highlight various tools, languages and techniques along the way. All code examples are hosted in my GitHub account. I am trying to talk more and more about this subject at customers and conferences. If you would like to discuss this in more detail, please get in touch.

Research Blog series contents.
1. Setting the scene (this page)
2. An introduction to DevOps
3. DevOps in detail
4. Data Science process. And what is wrong most the current literature?
5. How DevOps is currently being implemented in the ML industry.
6. [Book Review] Machine Learning Logistics (Ted Dunning and Ellen Friendman)
7. How to apply DevOps to Data Science.
8. A design pattern for a one-click deployment of a data science model
9. Research outcomes and what is next
10. References

Technical blog series contents
1. An Introduction to Visual studio Team Services (VSTS)
2. An Introduction to Git
3. An introduction to Docker and Azure Container Registry
4. An introduction to Kubernetes

My aim is to publish a new section every 2 weeks.

All code will be copied in to the relevant blog, however you can obtain a copy on my GitHub.

Thanks for reading.

Calculating distance with SSIS and the Google distance matrix API

Recently I set up an interactive map in R & Shiny to show the distance between where a customer lives and one of the sites where a customer attends. What I wanted to do was to find out how long it took each person to get to the site using public transport, which proved difficult in R, so i settled for a “as the crow flies” approach using pythagoras. This worked, but since then I have had in the back of my mind a burning question “what if I could used the Google API to give me that distance”. This is what I have now done, and this is how you do it.

Google have many APIs that can be used at little to no cost. The Google distance matrix API allows you to enter a starting position and one or more end positions, and the API will return the distance based on a mode of transport of your choice (walking, driving, transit – public transport). All you need to do is register and get an API key –

Once you have an API key you will need to keep hold of this. This key allows you to call the API, no key, no result. If you read the documentation here you can get a few examples of how to use the key. If you change the URL below to include your API key where YOUR_API_KEY is currently you will retrieve the JSON for that journey. By default the GDM is set to driving distance, but this can be changed. Also this is currently JSON, if you would prefer to have the result as XML, change json? for xml?.,-73.89188969999998&destinations=40.6905615%2C-73.9976592%7C40.6905615%2C-73.9976592%7C40.6905615%2C-73.9976592%7C40.6905615%2C-73.9976592%7C40.6905615%2C-73.9976592%7C40.6905615%2C-73.9976592%7C40.659569%2C-73.933783%7C40.729029%2C-73.851524%7C40.6860072%2C-73.6334271%7C40.598566%2C-73.7527626%7C40.659569%2C-73.933783%7C40.729029%2C-73.851524%7C40.6860072%2C-73.6334271%7C40.598566%2C-73.7527626&key=YOUR_API_KEY

The most basic use is to calculate the distance from A to B. This is what we will look at using SSIS. The API takes in an origin and a destination. These can either be a lat & lng combo or an address which we be translated into Lat & lng when executed. To demonstrate the process I have created a database and loaded it with some sample customer and store data. We have 2 customers and 4 stores. These customers are linked to different stores. You can download the example files at the bottom of the page.

--CREATE DATABASE WebService_Example;
-- ###################################################################################
-- Create tables
-- ###################################################################################
USE WebService_Example;
DROP TABLE dbo.CustomerAddresses;
CREATE TABLE dbo.CustomerAddresses 
, CustomerName Varchar(150)
, CustomerAddress VARCHAR(500)
DROP TABLE dbo.StoreAddresses;
CREATE TABLE dbo.StoreAddresses
, StoreName VARCHAR(150)
, StoreType VARCHAR(20)
, StoreAddress VARCHAR(150)
DROP TABLE dbo.CustomerStoreAddress;
CREATE TABLE CustomerStoreAddress
CustomerAddressID INT
, StoreAddressID INT
DROP TABLE dbo.CustomerAddressDistance;
CREATE TABLE dbo.CustomerAddressDistance
CustomerName Varchar(150)
, CustomerAddress VARCHAR(500)
, StoreName VARCHAR(150)
, StoreAddress VARCHAR(150)
, Distance VARCHAR(50)
, EstimatedTime VARCHAR(50)
, ModeOfTransport VARCHAR(50)
-- ###################################################################################
-- Load tables
-- ###################################################################################
INSERT INTO dbo.CustomerAddresses
        ( CustomerName
        , CustomerAddress )
('A Customer', '46 Church Road, Alphington, Exeter, EX2 8SZ')
,('B Customer', 'Flat 9, Landsdown Terrace, Exeter, EX2 4JJ')
INSERT INTO dbo.StoreAddresses
        ( StoreName
        , StoreType
        , StoreAddress )
('Tesco Extra', 'Vale', 'Exeter Vale Shopping Centre, Russell Way, Exeter EX2 7EZ')
, ('Tesco Express', 'Heavitree', '85 Fore St, Heavitree, Exeter, Devon, EX1 2RN')
, ('Sainsburys Superstore', 'Alphington', 'Alphington Rd, Exeter EX2 8NH')
, ('Sainsburys Local', 'Sidwell', '12 Sidwell St, Exeter EX4 6NN');
INSERT INTO dbo.CustomerStoreAddress
        ( CustomerAddressID
        , StoreAddressID )
, (1,2)
, (1,3)
, (1,4)
, (2,3)
, (2,4)
-- ###################################################################################
-- Query
-- ###################################################################################
	, CA.CustomerAddress AS 'Org'
	, SA.StoreName
	, SA.StoreAddress AS 'Des'
FROM dbo.CustomerAddresses CA
INNER JOIN dbo.CustomerStoreAddress CSA ON CA.CustomerAddressID = CSA.CustomerAddressID
INNER JOIN dbo.StoreAddresses SA ON CSA.StoreAddressID = SA.StoreAddressID

This will build the tables and populate the tables that are used in the SSIS project.

The SSIS project consists of a data flow task (DFT), which contains and OLEDB source, a Derived column transformation (to add the columns which are to be loaded), a script task (C#) and the an OLEDB destination to load the results. There are 2 variables, the mode of transport (walking, driving, transit) and also your API key. Below is the C# required to call the API and shred the XML.

#region Namespaces
using System;
using System.Data;
using System.Web;
using System.Windows.Forms;
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Runtime.Wrapper;
using Microsoft.SqlServer.Dts.Runtime;
using System.Net;
using System.Xml;
// Make sure you add any missing references
public class ScriptMain : UserComponent
    public override void PreExecute()
    public override void PostExecute()
    public override void Input0_ProcessInputRow(Input0Buffer Row)
        // Create a new web client 
        WebClient wc = new WebClient();
        // Select your mode of transport - Edit the var in SSIS
        String TransportMode = Variables.ModeOfTransport;
        // Add reference to your API
        String APIKey = Variables.DistanceAPIKey; 
        // Build the URL that is to be requested
        String uri = "" + Row.Org + "&destinations=" + Row.Des + "&mode=" + TransportMode + "&key=" + APIKey;
        // Get the XML
        string geoCodeInfo = wc.DownloadString(uri);
        XmlDocument xmlDoc = new XmlDocument();
        // Shred the XML
        String Duration = Convert.ToString(xmlDoc.SelectSingleNode("DistanceMatrixResponse/row/element/duration/text").InnerText);
        String Distance = Convert.ToString(xmlDoc.SelectSingleNode("DistanceMatrixResponse/row/element/distance/text").InnerText);
        //String Fare = Convert.ToString(xmlDoc.SelectSingleNode("DistanceMatrixResponse/row/element/fare/text").InnerText);
        // Update the outgoing rows. 
        Row.Time = Duration;
        Row.Distance = Distance;

The script component looks at the rows coming in and passes them into a URL which is executed by a web client. The results are then updated into the Time and Distance Rows.

Once the data has been loaded we can analyse the results and see what is the closest store by multiple methods of transport.

-- ###################################################################################
-- Check results
-- ###################################################################################
-- transit
-- driving
-- walking
  , CustomerAddress
  , StoreName
  , StoreAddress
  , Distance
  , EstimatedTime
  , ModeOfTransport
  , CASE WHEN EstimatedTime LIKE '%hour%' then CONVERT(TIME,REPLACE(REPLACE(EstimatedTime,' hour ', ':'), ' mins', '')) ELSE '00:' + REPLACE(EstimatedTime, ' Mins', '') END TimeInMinutes
  , ROW_NUMBER() OVER (PARTITION BY CustomerName ORDER BY CASE WHEN EstimatedTime LIKE '%hour%' then CONVERT(TIME,REPLACE(REPLACE(EstimatedTime,' hour ', ':'), ' mins', '')) ELSE '00:' + REPLACE(EstimatedTime, ' Mins', '') END) AS 'Rank'


Download scripts and solution: GoogleAPI SSIS.

Sun Modelling for Data Warehouses. Visio template


Sun modelling is a technique for requirement gathering for a multidimensional data warehouse modelling. I have used or read about many different DW modelling techniques, however none is quite as simple and elegant as Sun Modelling. Sun Modelling was designed by Prof Mark Whitehorn (Dundee University). I won’t go into the details of how you execute a sun model today, however I will write about this in the future.

For the moment I have tried to simplify the process of building a sun model in visio. You can download a visio stencil which contains the main elements required to make a Sun Model. Also included is a sample visio document.

Download it here.


SQL Server string manipulation functions


Hi everyone,

As part of an ongoing training session I am doing for some of my colleagues on the basics of TSQL, I wanted to store this content somewhere. Where better than on my blog.

So today we will be starting with looking at some basic string manipulation functions, how they work and why you might want to use them. Then we will look at how we can use them together to perform a common dynamic name separation technique. All of these examples are using a table which is constructed using the Microsoft AdventureWorks database (this version is using 2012). To build this table you will need to download a version of AdventureWorks and attach the database (alternatively you could use the RedGate Azure AdventureWorks – You can find that here).

Before we continue please run the following TSQL to build the table. There are many intentional errors in the data – I have hoped to simulate normal user activity and errors.

USE AdventureWorks2012;
DROP TABLE StringMinipulationExample;
	PRINT 'Table didnt exist'
CREATE TABLE StringMinipulationExample
FullName Varchar(250)
, Address VARCHAR(1000)
INSERT INTO StringMinipulationExample
	' ' + COALESCE(Title, '') + ' ' + FirstName + ' ' + LastName + '   ' AS 'FullName'
	, COALESCE(AddressLine1, '') + ' ' + COALESCE(AddressLine2, '') + ' ' + COALESCE(City, '') + ' ' + COALESCE(PostalCode, '') AS 'Address'
FROM Person.BusinessEntityAddress 
INNER JOIN Person.Person ON	Person.BusinessEntityID = BusinessEntityAddress.BusinessEntityID
INNER JOIN Person.Address ON Address.AddressID = BusinessEntityAddress.AddressID
SELECT FullName, Address FROM dbo.StringMinipulationExample


When you’re working with strings you will no doubt want to join one sting with another, this is known as concatenation. The process is a simple one. Take one string and combine it with another.

SELECT 'Hello' + ' ' + 'world!'

We are using a + to concatenate. You will see in the middle the ‘ ‘ this is to add a space between the two words. Sounds simple and it is, however it can quickly become confusing when you start to concatenate with NULLS or different data types. NULLS in SQL Server can cause some grief but they have a purpose. When you concatenate two strings you get a new string – Hello World! However when you concatenate a string and a NULL you get a NULL. to avoid this we need to handle those NULLS and replace them with something else. Remember ” <> NULL. An empty space will still concatenate as it is not NULL. To handle those NULLS we can use conditional logic (CASE WHEN ELSE END) or we can use ISNULL() or COALESCE(). Concatenation needs to be to the same data type. For example when you write 1+1 do you want 2 or 11? 1 + 1 = 2, ‘1’ + ‘1’ = ’11’. INT + INT = INT, CHAR + CHAR = CHARCHAR, INT + CHAR = ERROR. To avoid this error we need to first convert the INT to a CHAR using either CAST or CONVERT.


ISNULL ( check_expression , replacement_value )

COALESCE ( check_expression , replacement_value )

Both of these functions are NULL replacement functions. There are differences in how the engine processes and also on ISNULL is TSQL only and not ANSI standard.



CAST ( expression AS data_type [ ( length ) ] )

CONVERT ( data_type [ ( length ) ] , expression [ , style ] )

Both of these functions help to change the data type in your query. A lot comes down to personal preference.


Both will return a char. You cannot convert all datatypes from one type to another. Refer to this image for reference.


LEFT ( character_expression , integer_expression )

RIGHT ( character_expression , integer_expression )

When you want to take the left most character from a string. Character_expression is your string and integer_expression is how many characters do you want. This include blank spaces.

SELECT FullName, LEFT(FullName, 5) AS 'Left5_FullName' FROM dbo.StringMinipulationExample
SELECT FullName, RIGHT(FullName, 5) AS 'Left5_FullName' FROM dbo.StringMinipulationExample

You can see that the left or right 5 characters have been selected. It might look odd as there are less then 5 characters visible on most rows and this is because there is preceeding and trailing spaces – keep reading to learn how to handle those.


SUBSTRING ( expression ,start , length )

When you want the middle x amount of characters from a string. In the example below I have used the programmers favourite Hello World!. Hello world! Has 12 characters including the space and the exclamation mark. If we want to take just the world we need to position our start on the w and take 5 spaces. The grey squares in the table below indicate which characters we are selecting. For those of you who are familiar with Excel this function is similar to the MID function.

SELECT 'Hello World!', SUBSTRING('Hello World', 7, 5)


1 2 3 4 5 6 7 8 9 10 11 12
H e l l o W o r l d !



UPPER ( character_expression )

LOWER ( character_expression )

When you want to cast a character as either uppercase or lower case. This function is similar to Ucase and Lcase in Excel and VB.

SELECT FullName, UPPER(FullName) AS 'Upper_FullName', LOWER(FullName) AS 'Lower_FullName' FROM dbo.StringMinipulationExample


CHARINDEX ( expressionToFind ,expressionToSearch [ , start_location ] )

When you want to know the position of a character in a string of characters. This is a really helpful function. If you think most names are FirstName SPACE LastName if you want to split the first and last name you can locate the space and take the LEFT and the SUBSTRING. The way this function is structured is a little backwards. We are looking at what we are trying to find, where we are trying to find it and how many character should be skipped before we search. The reason the latter is there is in the event that there are multiple spaces, if you know the first 2 characters are always a space you might want to add ,2 to start from the 3rd character.

SELECT FullName, CHARINDEX(' ', FullName, 3) FROM dbo.StringMinipulationExample

I have included a ,3 to the end of the function as I know I have 2 spaces at the start. I want to skip these. We will use a function to remove these in a moment. This function is similar to FIND() in Excel.


REPLACE ( string_expression , string_pattern , string_replacement )

When you want to replace a character or set of characters with a different character or set of characters. Useful when you need to find a character and replace it.

SELECT FullName, REPLACE(FullName, ' ', '_') FROM dbo.StringMinipulationExample

In this example I am replacing all the spaces with underscores.


REVERSE ( string_expression )

Does what is says on the tin! Reverses a string.
You might be asking well why would I want to do that? Because it is easy to find the first special character in a string, it is not as easy to find the last. How do you make the last the first? REVERSE it.

REVERSE('Hello World')

REVERSE(‘Hello World’) = ‘dlroW olleH’


RTRIM ( character_expression )

LTRIM ( character_expression )

When you want to remove the trailing spaces from the left, right or both sides of a string.

	, LTRIM(FullName) AS 'LeftTrim'
	, RTRIM(FullName) AS 'RightTrim'
	, RTRIM(LTRIM(FullName)) AS 'BothTrim'
FROM dbo.StringMinipulationExample

Combining the RTRIM and LTRIM allows you to remove chars from both sides.

Bringing it all together

So we have looked at a lot of the string manipulation function in SQL Server but to get the most out of these functions we need to using them together.
Lets look at how we can separate the Title, FirstName, LastName.

; WITH CleanedStrings AS 
	RTRIM(LTRIM(FullName)) AS 'FullName'
	, FullName AS 'UncleanFullName'
	, Address
FROM dbo.StringMinipulationExample
	, CleanedStrings.FullName 
	, REPLACE(LEFT(FullName, CHARINDEX('.', FullName)), '.', '') AS 'Title'
	, LEFT(LTRIM(REPLACE(CleanedStrings.FullName, LEFT(FullName, CHARINDEX('.', FullName)), '')), CHARINDEX(' ', LTRIM(REPLACE(CleanedStrings.FullName, LEFT(FullName, CHARINDEX('.', FullName)), '')))) AS 'FirstName'
	, REVERSE(LEFT(REVERSE(CleanedStrings.FullName), CHARINDEX(' ', REVERSE(CleanedStrings.FullName)))) AS 'LastName'
FROM CleanedStrings;

First we have a CTE to remove the left and right padded strings. Then the title can be found as it always is followed by a ‘.’.
First name was a little tricky. I had to remove the title then find the space and separate. I could have found the first space and the last and use SUBSTRING.

For the LastName I have reversed the string, found where the first space is in the reversed string, taken a left of that and the reversed the whole statement back to how it should have been.

Thank you for reading

Power BI & ODI Node Devon

2015-06-26 10.18.47

Hi everyone,

Over the last 2 days I have been partaking in lots of data related fun! On Friday 26th June I attended the ODI Devon’s first conference “Beyond the smart city” #btsc15. It was a great event. As part of this Myself and the SQL South West team were asked to do some technical sessions on Saturday (27th). I love getting out there and meeting new people so jumped at the opportunity. The session was aimed at individuals who might not know the MS BI stack very well, or someone who knew what there was but possibly not how to use it – No deep dives, etc.

My session was about exploring “Open data”. To do this I used a variety of Microsoft tools (Excel, PowerPivot, Azure, Azure SQL Database, SSIS and Power BI). I wanted to use this blog post the walk you through how I went about preparing for this session and as a reference for those who attended the session. I had one goal, to demonstrate how anyone can quickly “Ask questions about their data”.

So what did we do?

  1. We looked at where we can get open data
  2. We analysed that data in Excel using pivot tables, answering basic questions
  3. We wanted more data and somewhere to store it so we created an Azure SQL database
  4. We loaded the data into Azure using SSIS
  5. We analysed the data further in PowerPivot and built a model, but it wasn’t good enough
  6. We imported out PowerPivot model in to Power BI and started asking questions of our data

Let’s look at that in a bit more detail.

To start off I looked at where to find open data. For my examples I used the open data provided by Exeter council. On Exeter council’s website there is a lot of interesting rich datasets. This included property listings owned by ECC and also their spend over the last 2 years. I used both of these datasets as source data. Much like most open data, you get what you pay for. There was a small issue with the data, it in separate files – there were 15 files for the last 15 months.

Sure you can use Excel, and copy and paste each file in to one long list, but what if you want that stored somewhere you can quickly get to, or somewhere you can share that data with colleagues? You guessed it, we need a database to store this data. This session was designed for anyone to attend and the replicate at home. I didn’t want attendees to have to download SQL Express, and a lot of what I am demonstrating is cloud based, so Azure was the obvious choice.

So we needed a database in Azure (for this demo I am using the newer portal V2). Head over to and create an account. Once there complete the following steps to spin up a brand new SQL database.

  1. Open the market place and search for SQL Database
  2. Hit create
  3. Give your database a name and configure the settings
    • Server – Create a new server as required.
    • Choose you pricing model
    • Configure further as required
    • Hit create

Within 40 seconds you have a working database in the cloud. If you have trouble logging in to the database you may need to alter the firewall settings to allow the IP address you’re connecting from.

Now we have our functioning database we can begin to load it with data. To do this we first need to create a few tables to store the data.

First we create the spend data:

CREATE TABLE [dbo].[SpendData](
[Body Name] [varchar](50) NULL,
[Trans Date] [datetime2](7) NULL,
[Supplier Name] [varchar](50) NULL,
[Trans Ref] [int] NULL,
[Net Amount] [decimal](15, 2) NULL,
[Expense Description] [varchar](50) NULL,
[Expense Code] [int] NULL,
[Service Code] [varchar](50) NULL,
[BVACOP] [varchar](50) NULL,
[Category] [int] NULL

Then the building location data

CREATE TABLE [dbo].[BuildingLocations](
[Uprn] [varchar](50) NULL,
[AssetRef] [varchar](50) NULL,
[AssetName] [varchar](1000) NULL,
[Postcode] [varchar](50) NULL,
[Easting] [decimal](18, 0) NULL,
[Northing] [decimal](18, 0) NULL,
[InterestType] [varchar](50) NULL,
[AssetType] [varchar](150) NULL,
[AssetUse] [varchar](150) NULL

I also used a script from codeproject to dynamically create a date tally table (These are so important. If you don’t have a numbers table and a dates tally then go and make one now) Now we have 3 tables, 2 empty and 1 with 20 years of dates. Let’s load the missing data.


To do this I have used SQL Server Integration services – SSIS. SSIS is Microsoft’s ETL tool for moving and transforming data. The package is designed to be run several times to enable this the first step is to TRUNCATE the existing tables. After that we want to loop over all the files (I have simply pulled all the data from the ECC website) then a final transfer for the building data. The more data you have the more that will be imported – just keep adding the spend data to the folder. This SSIS package is available to download at the bottom of the page.


Once the data was loaded we headed over to Excel again a loaded the data in to a PowerPivot model. I am using Excel 2013 so PowerPivot is already installed but not enabled. For those using 2010 go and grab the add-in from here. Once installed enable the add-in and open PowerPivot. On the Home ribbon we looked at how to import the data from Azure.

Click on From database
Select SQL Server
Fill in credentials
Select tables.


Now we have our data we build our basic model by using the diagram view to connect the 2 tables.


Now we can ask more complicated questions of our data such as which quarter is the most money spent, we can analyse by month, year and any other column in our date tally. But really this isn’t good enough! Our end user needs to know a lot about Excel to really get what they need. We are creating something for someone who really has no idea about BI tools, maybe you managing director. So to really begin to ask questions of our data we need Power BI, in particular the Power BI Q & A.

At the time of writing, Power BI is still in preview. There is no cost structure in place and you need a corporate account to use it (You can’t use a gmail or live account). Head over to and register yourself.


What we want to do is save our PowerPivot model and suck that up in to Power BI.

  1. Click on the big yellow “Get Data” button in the bottom left to begin
  2. From there we want to get data from files
  3. Select “Local File”
  4. Navigate to the PowerPivot model
  5. Hit open

You should see an “Importing data” window


Once that is complete we can start literally asking questions of our data. Hit the pin icon when you’re happy to pin the item to your dashboard. You can start creating interactive reports that link from your dashboard and offer a more detailed view of your data.


That is a very high level introduction to working with data in Power BI. I will do some more blogs on Power BI in the coming weeks, once I get my head around the Power BI API :S

Thanks for reading

(Download files here)

SQL Saturday Oslo #414


At home my wife and I have a world map with all the places we would like to visit, slowly we are crossing these off (there is a lot of places we really want to visit). Oslo has been on there since last year’s SQL Saturday #317, there was a lot of buzz on Twitter up and on the day. I heard from those who attended that it was a great event! I knew I wanted to attend the next one as an attendee, however I have recently been attempting to improve my presenting skills and also to give back some of the time I have benefited from to the SQL community. Long story short I submitted and was selected  – YAY! I was really pleased to have been 1 of the staggering 151 sessions submitted – The SQL community is truly one-of-a-kind!

So, I am hoping you will join me and the other great speakers in Oslo, Norway on August 29th for my session SSRS Inception: Reporting on Reporting Services. You can read more about this project here.

Don’t forget that there is so much more on offer than just the Saturday. Head over to the SQL Saturday website for more information or follow the #SQLSatOslo hashtag for all the updates! One last thing. If you’re heading to Oslo why not extend your visit and see one of the great training days the Oslo team have put on. They both sound like fantastic sessions!

Friday, August 28th A full day of paid pre-conference workshops will be held on Friday, August 28th:

Margarita NaumovaA Practical Approach for Troubleshooting Performance Problems (Work Smarter not Harder) by Margarita Naumova
Level 300 – Advanced
Finding the source of a performance problem is one of the most difficult tasks for every DBA. Where to start, what to check, how to find what’s going on, and of course how to fix, those are usually the most common and important questions every DBA asks. And those questions usually come when the time is the most critical resource.
Eventbrite - A Practical Approach for Troubleshooting Performance Problems (Work Smarter not Harder) by Margarita Naumova


David Peter HansenUnderstanding SSIS Internals and Performance Tuning by David Peter Hansen
Level 300 – Advanced
You have worked with SQL Server Integration Services (SSIS) for a few years, but more often than not, you see performance problems with your packages. Some packages are running slow, while others are taking up more memory and CPU than before.

Eventbrite - Understanding SSIS Internals and Performance Tuning by David Peter Hansen


Head over to the #SQLSatOslo and register now.

SSRS Inception – Part 1


Hi everyone,

This is the first part of a series of blogs looking at a session I am delivering at SQL Saturday Exeter on the 25th of April 2015. The session is titled “SSRS Inception: Reporting on Reporting Services”. In this session I hope to give you the necessary tools and information you need to begin monitoring SSRS.

Why should we monitor SSRS? 

In SSRS there is no obvious way to see which reports are being used and how often. If you have a large SSRS environment then this can have an impact how your users consume your reports. How do you know if no one is looking at a report because it takes too long to run. With this in mind I went about developing a way to see which reports are being used, how often and by whom.

How to monitor SSRS

Before we begin Microsoft has a warning about playing with the SSRS database:

The table structure for the databases is optimized for server operations and should not be modified or tuned. Microsoft might change the table structure from one release to the next. If you modify or extend the database, you might limit or prevent the capability to perform future upgrades or apply service packs. You might also introduce changes that impair report server operations… All access to a report server database must be handled through the report server.. [1]

Ok, so it is clear Microsoft do not want you to mess with your production instance of SSRS and if you do you your instance might no longer be supported by Microsoft. You have been warned. But… that doesn’t mean we can’t play with a replicated version and leave the primary relatively untouched. To do this you have a few options. You could use transactional replication, log shipping, restoring a nightly backup or something different- I leave this part up to you. It really depends on how often you want your data to refresh. For my purposes a nightly run is fine.

But before we get in to how we can do this, let first have a look around the tables which comprise the SSRS DB.

SSRS database architecture 

I am using SQL Server Reporting Services with SQL Server 2012. When you configure SSRS you’re prompted to create a database to hold the definition of your reports. When you connect to SQL Server you’re able to see the database listed. Your database will be listed along with a TempDB.





If you expand this database you can see 34 tables, these tables are used to manage report items, subscriptions, schedules, report snapshots, system properties, data sources, datasets, symmetric keys and the execution log details – where most of this session is focused.

To support monitoring our executions we are going to need data from the following tables:

Catalog – The catalog holds all the juicy metadata about all the objects stored in SSRS.
Datasets – Info about published shared datasets.
DataSources – Info about published shared data sources.
ExecutionLogStorage – This is where the magic happens, the who, how, when why. All your questions are answered here!
ModelItemPolicy – Bridge between Catalog and Policies.
Policies – Used as part of the SSRS security
PolicyUserRole – Used as part of the SSRS security
ReportSchedule – Bridges Catalog, Schedule & subscriptions together.
Schedule – Details about when and how often a report is set to auto generate.
Subscriptions – Who is subscribed to receive a report.
SecData – All the detail about who has access.
Users – Who has access.

Before we go on, there are some caveats to consider before you start working with the SSRS DB.

1. Firstly a lot of the content is stored in XML. To get all the best bits of data you need to be able to write basic XML queries (XPATH).

2. A lot of what you find in the SSRS DB has very little documentation . For example there are tonnes of integer values for attributes like type, but there is no reference/enum/lookup table for type, this is common for almost all the tables. Where this happens I have done my best to outline which each value relates to.

3. SSRS DB will only store data for as long as you need it, be default it only stores data for 60 days – So when you start this process it is unlikely you will be able to mine anything more than 60 days of executions. However this can be altered by logging on to the SSRS DB and updating the ConfigurationInfo table to set the ExecutionLogDaysKept to something greater than 60 – depending on how many users and reports you have something like 120 is fine.

UPDATE  ConfigurationInfo
SET     Value = '365'
WHERE   Name = 'ExecutionLogDaysKept'

As SSRS DB only keeps executions for a short amount of time (unless it has been changed), I have modelled this project as a data warehouse – in the Kimball style. We essentially then have a lot of dimension tables such as Dim_Catalog, Dim_User and all the associated dimensions Dim_Date & Dim_Time. This follows the basic Kimball star schema. This is loaded as often as you need it to be, using SSIS. As part of the scripts section there is a script to create a job that will execute the SSIS package. The SSIS package isn’t very complicated and I will go in to more about it in the next blog.

At the end of this series of blogs I will make the scripts & packages available to download. Included will be everything you need to set up a clone of your SSRS DB, extract data from that, build where the data is to be loaded (the SSRSDB) and all the views for interrogating the data warehouse. Also included are a set of reports that will allow you to manage your reporting server. There is a lot more built in to this that I will also blog about and make available.

Within the SSRSDB, there are additional tables that allow you to create a metadata driven theme layer on top of your reports. This requires you to load information about each report in to the ReportServerDW. To help you to do this an Excel file is also included in the download. Once this has been populated, a job can be created (again in the download) that will notify users (via email) who asked for a report to be created, that their report has not been used for X amount of days and give them the option to save their report. If the users fails to save their report it was either removed or hidden. This is configured per report in the Excel document.

What this project can also do which is pretty cool, is act as a source control for your reports. Ok it isn’t really a source control because it doesn’t allow all the nice thing like branching, however it will store a version (in XML) of each iteration of a report uploaded. I will also include an SSIS package that will export all the historic versions on a report.

Think this is neat? Me to. In the next instalment we will look more in to the data being stored and the SSIS package which is used to load the data.

Thanks for reading.


Presenting at SQL Saturday Exeter 2015 (YAY!)


This year I have 3 new year’s resolutions.

  1. Start a blog – Done! You’re reading it 😛
  2. Gain the MCSE in both BI and the data platform – Working on it.
  3. Start presenting at user groups and conferences – Read on

I have been attending conferences for about the last 3 years. SQL Saturday Exeter in 2013 was the first database conference I attended. I had no idea what to expect, I remember thinking that everyone there would be a database guru and I would be so out of my depth. However this was not the case. SQL Saturday Exeter is run by the SQL South West UG, which at the time I had only been attending for about 4 months. The event was a great success! I learnt tonnes  and it sparked an interest that just keeps getting bigger – Presenting. To date, I have delivered lightning talks at SSW but nothing longer than 15 minutes. Since then I have attended about 10 conferences and always learn loads!

Now in its third year SQL Saturday Exeter is running again on the 24th and 25th of April. There is a paid for training day on the Friday and a free day on the Saturday (Click here to see the brilliant selection of sessions).  I am somewhat bias when it comes to SQL Saturday Exeter as I am co-organising the event, this meant I was unsure if I would have enough time to deliver a session. However I bit the bullet and submitted 3 sessions (on PowerPivot, TSQL and the last was on reporting on SQL Server Reporting Services). The last one was selected ! YAY!

So on the 24th of April at 15:00, in Exeter, you can come and see me present for the first time at a conference on “SSRS inception: Reporting on Reporting services” – Abstract below.

Still using Reporting Services? Yeah me too! Does this scenario sound familiar? You already have a large SSRS environment but the users still want more reports. You create a new, all singing, all dancing dashboard that removes the need for a bunch of reports, awesome. You launch it and explain to the business  that you are deprecating the old reports. “WAIT! We still need that for Jill in HR to do her end-of-month”, or “We might still need that” – these types of comments are what leads to having a load of reports that are not being used.

Wouldn’t it be great if you could see who is viewing a report, in what format and how often, all inside your current SSRS environment? With this data wouldn’t it then be even better if we could automate a process to email the owner of that report to say “HA! I told you no one looked at this report!”. In this session we will look at how we can do all of this and more by mining the SSRS DB.

This session came about from when I started working in my current role. I joined a company with a lot of SSRS reports (>150 – A mixture of dashboards and operational reporting). I was sure that there was no possible way that all these reports could be actually being used. So I set about making something that would prove me right. This has now been in production for about 5 months.

  • So what will you learn from this session?
  • You will learn about the SSRS DB and what it holds.
  • How to mine the SSRS DB for gold.
  • How to do this in a way that does not upset Microsoft.
  • A little bit about data warehousing
  • And a little bit of SSIS.

As there is a lot that has gone in to this project and I want it to be easy to use and understand I will write a blog in the next few weeks, that goes in to how each part is created before the session. All scripts, SSIS packages and the slides will be available to download before the session.

I am really looking forward to delivering my first conference session. I hope you like it and I look forward to seeing you in April.

If you cannot make it in April, I was also be presenting this session at SQL South West UG on the 16th of February, and at the Cardiff UG on the 31st of March.