In this blog I want to explore how DevOps is being applied in the industry today before we really dig in to applying DevOps to Machine Learning in the next blog. If this is the first blog you’re reading, then you might want to start at the beginning.

In recent years there has been a subtle shift appearing in the industry. It has begun to try to take the principles of DevOps and apply them to data analytics. In 2015 Andy Palmer coined the term DataOps, he describes DataOps as the intersection of data engineering, data integration, data quality and data security (Palmer, 2015). I first came across the term form Steph Locke’s blog “DataOps, its a thing. Honest”–its-a-thing-honest/  .

Continue reading

Hi everyone,

In the last blog An Introduction to DevOps we looked at the basics on what DevOps is. We only really skimmed the surface. I want to dig in to a bit more detail, which will make the discussion about Data Science and DevOps a little easier. I want to start by recommending two great books. You will see references to pages and quotations through out this series. All the references are list here: DevOps for Data Science. The two books I recommend are The DevOps Handbook and the Phoenix Project. Both books are fantastic and approach the subject from different angles.

Continue reading

Hi everyone.

Some of you might know that for the last 2 years I was studying a Master’s degree in data science from the University of Dundee. This was a 2 years part-time course delivered by Andy Cobley and Mark Whitehorn. This course was fantastic and I recommend it – If you want to know more about the course, please give me a shout. The course is comprised of multiple modules. The final module is a research project, which you need to start thinking about towards the end of the first year of study. I selected my topic very early on, however being indecisive, I changed my idea 3 times (each time having written a good chunk of the project).

Why did I do this? I simply was not passionate about the subject of those projects. They we good ideas, but I just was not researching or building anything new. The outcome of my dissertation might have been a working project, however it would have felt hollow to me. I needed a topic I was passionate about. I have a core ethos that I take to every project I work on. “Never do anything more than once”. It is because of that, that I have spent much of career working either with or developing automation tools to accelerate and simplify my development processes. Having attended a lot of conferences, I became familiar with DevOps and how it accelerated the software industry. DevOps allows software developer to ship code faster. I have been applying the core principles of DevOps to all my recent projects, with great success.

Continue reading

Recently I set up an interactive map in R & Shiny to show the distance between where a customer lives and one of the sites where a customer attends. What I wanted to do was to find out how long it took each person to get to the site using public transport, which proved difficult in R, so i settled for a “as the crow flies” approach using pythagoras. This worked, but since then I have had in the back of my mind a burning question “what if I could used the Google API to give me that distance”. This is what I have now done, and this is how you do it.

Google have many APIs that can be used at little to no cost. The Google distance matrix API allows you to enter a starting position and one or more end positions, and the API will return the distance based on a mode of transport of your choice (walking, driving, transit – public transport). All you need to do is register and get an API key –

Continue reading


Sun modelling is a technique for requirement gathering for a multidimensional data warehouse modelling. I have used or read about many different DW modelling techniques, however none is quite as simple and elegant as Sun Modelling. Sun Modelling was designed by Prof Mark Whitehorn (Dundee University). I won’t go into the details of how you execute a sun model today, however I will write about this in the future.

For the moment I have tried to simplify the process of building a sun model in visio. You can download a visio stencil which contains the main elements required to make a Sun Model. Also included is a sample visio document.

Download it here.



Hi everyone,

As part of an ongoing training session I am doing for some of my colleagues on the basics of TSQL, I wanted to store this content somewhere. Where better than on my blog.

So today we will be starting with looking at some basic string manipulation functions, how they work and why you might want to use them. Then we will look at how we can use them together to perform a common dynamic name separation technique. All of these examples are using a table which is constructed using the Microsoft AdventureWorks database (this version is using 2012). To build this table you will need to download a version of AdventureWorks and attach the database (alternatively you could use the RedGate Azure AdventureWorks – You can find that here).

Before we continue please run the following TSQL to build the table. There are many intentional errors in the data – I have hoped to simulate normal user activity and errors.

USE AdventureWorks2012;
DROP TABLE StringMinipulationExample;
	PRINT 'Table didnt exist'
CREATE TABLE StringMinipulationExample
FullName Varchar(250)
, Address VARCHAR(1000)
INSERT INTO StringMinipulationExample
	' ' + COALESCE(Title, '') + ' ' + FirstName + ' ' + LastName + '   ' AS 'FullName'
	, COALESCE(AddressLine1, '') + ' ' + COALESCE(AddressLine2, '') + ' ' + COALESCE(City, '') + ' ' + COALESCE(PostalCode, '') AS 'Address'
FROM Person.BusinessEntityAddress 
INNER JOIN Person.Person ON	Person.BusinessEntityID = BusinessEntityAddress.BusinessEntityID
INNER JOIN Person.Address ON Address.AddressID = BusinessEntityAddress.AddressID
SELECT FullName, Address FROM dbo.StringMinipulationExample


When you’re working with strings you will no doubt want to join one sting with another, this is known as concatenation. The process is a simple one. Take one string and combine it with another.

SELECT 'Hello' + ' ' + 'world!'

We are using a + to concatenate. You will see in the middle the ‘ ‘ this is to add a space between the two words. Sounds simple and it is, however it can quickly become confusing when you start to concatenate with NULLS or different data types. NULLS in SQL Server can cause some grief but they have a purpose. When you concatenate two strings you get a new string – Hello World! However when you concatenate a string and a NULL you get a NULL. to avoid this we need to handle those NULLS and replace them with something else. Remember ” <> NULL. An empty space will still concatenate as it is not NULL. To handle those NULLS we can use conditional logic (CASE WHEN ELSE END) or we can use ISNULL() or COALESCE(). Concatenation needs to be to the same data type. For example when you write 1+1 do you want 2 or 11? 1 + 1 = 2, ‘1’ + ‘1’ = ’11’. INT + INT = INT, CHAR + CHAR = CHARCHAR, INT + CHAR = ERROR. To avoid this error we need to first convert the INT to a CHAR using either CAST or CONVERT.


ISNULL ( check_expression , replacement_value )

COALESCE ( check_expression , replacement_value )

Both of these functions are NULL replacement functions. There are differences in how the engine processes and also on ISNULL is TSQL only and not ANSI standard.



CAST ( expression AS data_type [ ( length ) ] )

CONVERT ( data_type [ ( length ) ] , expression [ , style ] )

Both of these functions help to change the data type in your query. A lot comes down to personal preference.


Both will return a char. You cannot convert all datatypes from one type to another. Refer to this image for reference.


LEFT ( character_expression , integer_expression )

RIGHT ( character_expression , integer_expression )

When you want to take the left most character from a string. Character_expression is your string and integer_expression is how many characters do you want. This include blank spaces.

SELECT FullName, LEFT(FullName, 5) AS 'Left5_FullName' FROM dbo.StringMinipulationExample
SELECT FullName, RIGHT(FullName, 5) AS 'Left5_FullName' FROM dbo.StringMinipulationExample

You can see that the left or right 5 characters have been selected. It might look odd as there are less then 5 characters visible on most rows and this is because there is preceeding and trailing spaces – keep reading to learn how to handle those.


SUBSTRING ( expression ,start , length )

When you want the middle x amount of characters from a string. In the example below I have used the programmers favourite Hello World!. Hello world! Has 12 characters including the space and the exclamation mark. If we want to take just the world we need to position our start on the w and take 5 spaces. The grey squares in the table below indicate which characters we are selecting. For those of you who are familiar with Excel this function is similar to the MID function.

SELECT 'Hello World!', SUBSTRING('Hello World', 7, 5)


1 2 3 4 5 6 7 8 9 10 11 12
H e l l o W o r l d !



UPPER ( character_expression )

LOWER ( character_expression )

When you want to cast a character as either uppercase or lower case. This function is similar to Ucase and Lcase in Excel and VB.

SELECT FullName, UPPER(FullName) AS 'Upper_FullName', LOWER(FullName) AS 'Lower_FullName' FROM dbo.StringMinipulationExample


CHARINDEX ( expressionToFind ,expressionToSearch [ , start_location ] )

When you want to know the position of a character in a string of characters. This is a really helpful function. If you think most names are FirstName SPACE LastName if you want to split the first and last name you can locate the space and take the LEFT and the SUBSTRING. The way this function is structured is a little backwards. We are looking at what we are trying to find, where we are trying to find it and how many character should be skipped before we search. The reason the latter is there is in the event that there are multiple spaces, if you know the first 2 characters are always a space you might want to add ,2 to start from the 3rd character.

SELECT FullName, CHARINDEX(' ', FullName, 3) FROM dbo.StringMinipulationExample

I have included a ,3 to the end of the function as I know I have 2 spaces at the start. I want to skip these. We will use a function to remove these in a moment. This function is similar to FIND() in Excel.


REPLACE ( string_expression , string_pattern , string_replacement )

When you want to replace a character or set of characters with a different character or set of characters. Useful when you need to find a character and replace it.

SELECT FullName, REPLACE(FullName, ' ', '_') FROM dbo.StringMinipulationExample

In this example I am replacing all the spaces with underscores.


REVERSE ( string_expression )

Does what is says on the tin! Reverses a string.
You might be asking well why would I want to do that? Because it is easy to find the first special character in a string, it is not as easy to find the last. How do you make the last the first? REVERSE it.

REVERSE('Hello World')

REVERSE(‘Hello World’) = ‘dlroW olleH’


RTRIM ( character_expression )

LTRIM ( character_expression )

When you want to remove the trailing spaces from the left, right or both sides of a string.

	, LTRIM(FullName) AS 'LeftTrim'
	, RTRIM(FullName) AS 'RightTrim'
	, RTRIM(LTRIM(FullName)) AS 'BothTrim'
FROM dbo.StringMinipulationExample

Combining the RTRIM and LTRIM allows you to remove chars from both sides.

Bringing it all together

So we have looked at a lot of the string manipulation function in SQL Server but to get the most out of these functions we need to using them together.
Lets look at how we can separate the Title, FirstName, LastName.

; WITH CleanedStrings AS 
	RTRIM(LTRIM(FullName)) AS 'FullName'
	, FullName AS 'UncleanFullName'
	, Address
FROM dbo.StringMinipulationExample
	, CleanedStrings.FullName 
	, REPLACE(LEFT(FullName, CHARINDEX('.', FullName)), '.', '') AS 'Title'
	, LEFT(LTRIM(REPLACE(CleanedStrings.FullName, LEFT(FullName, CHARINDEX('.', FullName)), '')), CHARINDEX(' ', LTRIM(REPLACE(CleanedStrings.FullName, LEFT(FullName, CHARINDEX('.', FullName)), '')))) AS 'FirstName'
	, REVERSE(LEFT(REVERSE(CleanedStrings.FullName), CHARINDEX(' ', REVERSE(CleanedStrings.FullName)))) AS 'LastName'
FROM CleanedStrings;

First we have a CTE to remove the left and right padded strings. Then the title can be found as it always is followed by a ‘.’.
First name was a little tricky. I had to remove the title then find the space and separate. I could have found the first space and the last and use SUBSTRING.

For the LastName I have reversed the string, found where the first space is in the reversed string, taken a left of that and the reversed the whole statement back to how it should have been.

Thank you for reading

2015-06-26 10.18.47

Hi everyone,

Over the last 2 days I have been partaking in lots of data related fun! On Friday 26th June I attended the ODI Devon’s first conference “Beyond the smart city” #btsc15. It was a great event. As part of this Myself and the SQL South West team were asked to do some technical sessions on Saturday (27th). I love getting out there and meeting new people so jumped at the opportunity. The session was aimed at individuals who might not know the MS BI stack very well, or someone who knew what there was but possibly not how to use it – No deep dives, etc.

My session was about exploring “Open data”. To do this I used a variety of Microsoft tools (Excel, PowerPivot, Azure, Azure SQL Database, SSIS and Power BI). I wanted to use this blog post the walk you through how I went about preparing for this session and as a reference for those who attended the session. I had one goal, to demonstrate how anyone can quickly “Ask questions about their data”.

So what did we do?

  1. We looked at where we can get open data
  2. We analysed that data in Excel using pivot tables, answering basic questions
  3. We wanted more data and somewhere to store it so we created an Azure SQL database
  4. We loaded the data into Azure using SSIS
  5. We analysed the data further in PowerPivot and built a model, but it wasn’t good enough
  6. We imported out PowerPivot model in to Power BI and started asking questions of our data

Let’s look at that in a bit more detail.

To start off I looked at where to find open data. For my examples I used the open data provided by Exeter council. On Exeter council’s website there is a lot of interesting rich datasets. This included property listings owned by ECC and also their spend over the last 2 years. I used both of these datasets as source data. Much like most open data, you get what you pay for. There was a small issue with the data, it in separate files – there were 15 files for the last 15 months.

Sure you can use Excel, and copy and paste each file in to one long list, but what if you want that stored somewhere you can quickly get to, or somewhere you can share that data with colleagues? You guessed it, we need a database to store this data. This session was designed for anyone to attend and the replicate at home. I didn’t want attendees to have to download SQL Express, and a lot of what I am demonstrating is cloud based, so Azure was the obvious choice.

So we needed a database in Azure (for this demo I am using the newer portal V2). Head over to and create an account. Once there complete the following steps to spin up a brand new SQL database.

  1. Open the market place and search for SQL Database
  2. Hit create
  3. Give your database a name and configure the settings
    • Server – Create a new server as required.
    • Choose you pricing model
    • Configure further as required
    • Hit create

Within 40 seconds you have a working database in the cloud. If you have trouble logging in to the database you may need to alter the firewall settings to allow the IP address you’re connecting from.

Now we have our functioning database we can begin to load it with data. To do this we first need to create a few tables to store the data.

First we create the spend data:

CREATE TABLE [dbo].[SpendData](
[Body Name] [varchar](50) NULL,
[Trans Date] [datetime2](7) NULL,
[Supplier Name] [varchar](50) NULL,
[Trans Ref] [int] NULL,
[Net Amount] [decimal](15, 2) NULL,
[Expense Description] [varchar](50) NULL,
[Expense Code] [int] NULL,
[Service Code] [varchar](50) NULL,
[BVACOP] [varchar](50) NULL,
[Category] [int] NULL

Then the building location data

CREATE TABLE [dbo].[BuildingLocations](
[Uprn] [varchar](50) NULL,
[AssetRef] [varchar](50) NULL,
[AssetName] [varchar](1000) NULL,
[Postcode] [varchar](50) NULL,
[Easting] [decimal](18, 0) NULL,
[Northing] [decimal](18, 0) NULL,
[InterestType] [varchar](50) NULL,
[AssetType] [varchar](150) NULL,
[AssetUse] [varchar](150) NULL

I also used a script from codeproject to dynamically create a date tally table (These are so important. If you don’t have a numbers table and a dates tally then go and make one now) Now we have 3 tables, 2 empty and 1 with 20 years of dates. Let’s load the missing data.


To do this I have used SQL Server Integration services – SSIS. SSIS is Microsoft’s ETL tool for moving and transforming data. The package is designed to be run several times to enable this the first step is to TRUNCATE the existing tables. After that we want to loop over all the files (I have simply pulled all the data from the ECC website) then a final transfer for the building data. The more data you have the more that will be imported – just keep adding the spend data to the folder. This SSIS package is available to download at the bottom of the page.


Once the data was loaded we headed over to Excel again a loaded the data in to a PowerPivot model. I am using Excel 2013 so PowerPivot is already installed but not enabled. For those using 2010 go and grab the add-in from here. Once installed enable the add-in and open PowerPivot. On the Home ribbon we looked at how to import the data from Azure.

Click on From database
Select SQL Server
Fill in credentials
Select tables.


Now we have our data we build our basic model by using the diagram view to connect the 2 tables.


Now we can ask more complicated questions of our data such as which quarter is the most money spent, we can analyse by month, year and any other column in our date tally. But really this isn’t good enough! Our end user needs to know a lot about Excel to really get what they need. We are creating something for someone who really has no idea about BI tools, maybe you managing director. So to really begin to ask questions of our data we need Power BI, in particular the Power BI Q & A.

At the time of writing, Power BI is still in preview. There is no cost structure in place and you need a corporate account to use it (You can’t use a gmail or live account). Head over to and register yourself.


What we want to do is save our PowerPivot model and suck that up in to Power BI.

  1. Click on the big yellow “Get Data” button in the bottom left to begin
  2. From there we want to get data from files
  3. Select “Local File”
  4. Navigate to the PowerPivot model
  5. Hit open

You should see an “Importing data” window


Once that is complete we can start literally asking questions of our data. Hit the pin icon when you’re happy to pin the item to your dashboard. You can start creating interactive reports that link from your dashboard and offer a more detailed view of your data.


That is a very high level introduction to working with data in Power BI. I will do some more blogs on Power BI in the coming weeks, once I get my head around the Power BI API :S

Thanks for reading

(Download files here)


At home my wife and I have a world map with all the places we would like to visit, slowly we are crossing these off (there is a lot of places we really want to visit). Oslo has been on there since last year’s SQL Saturday #317, there was a lot of buzz on Twitter up and on the day. I heard from those who attended that it was a great event! I knew I wanted to attend the next one as an attendee, however I have recently been attempting to improve my presenting skills and also to give back some of the time I have benefited from to the SQL community. Long story short I submitted and was selected  – YAY! I was really pleased to have been 1 of the staggering 151 sessions submitted – The SQL community is truly one-of-a-kind!

So, I am hoping you will join me and the other great speakers in Oslo, Norway on August 29th for my session SSRS Inception: Reporting on Reporting Services. You can read more about this project here.

Don’t forget that there is so much more on offer than just the Saturday. Head over to the SQL Saturday website for more information or follow the #SQLSatOslo hashtag for all the updates! One last thing. If you’re heading to Oslo why not extend your visit and see one of the great training days the Oslo team have put on. They both sound like fantastic sessions!

Friday, August 28th A full day of paid pre-conference workshops will be held on Friday, August 28th:

Margarita NaumovaA Practical Approach for Troubleshooting Performance Problems (Work Smarter not Harder) by Margarita Naumova
Level 300 – Advanced
Finding the source of a performance problem is one of the most difficult tasks for every DBA. Where to start, what to check, how to find what’s going on, and of course how to fix, those are usually the most common and important questions every DBA asks. And those questions usually come when the time is the most critical resource.
Eventbrite - A Practical Approach for Troubleshooting Performance Problems (Work Smarter not Harder) by Margarita Naumova


David Peter HansenUnderstanding SSIS Internals and Performance Tuning by David Peter Hansen
Level 300 – Advanced
You have worked with SQL Server Integration Services (SSIS) for a few years, but more often than not, you see performance problems with your packages. Some packages are running slow, while others are taking up more memory and CPU than before.

Eventbrite - Understanding SSIS Internals and Performance Tuning by David Peter Hansen


Head over to the #SQLSatOslo and register now.


Hi everyone,

This is the first part of a series of blogs looking at a session I am delivering at SQL Saturday Exeter on the 25th of April 2015. The session is titled “SSRS Inception: Reporting on Reporting Services”. In this session I hope to give you the necessary tools and information you need to begin monitoring SSRS.

Why should we monitor SSRS? 

In SSRS there is no obvious way to see which reports are being used and how often. If you have a large SSRS environment then this can have an impact how your users consume your reports. How do you know if no one is looking at a report because it takes too long to run. With this in mind I went about developing a way to see which reports are being used, how often and by whom.

How to monitor SSRS

Before we begin Microsoft has a warning about playing with the SSRS database:

The table structure for the databases is optimized for server operations and should not be modified or tuned. Microsoft might change the table structure from one release to the next. If you modify or extend the database, you might limit or prevent the capability to perform future upgrades or apply service packs. You might also introduce changes that impair report server operations… All access to a report server database must be handled through the report server.. [1]

Ok, so it is clear Microsoft do not want you to mess with your production instance of SSRS and if you do you your instance might no longer be supported by Microsoft. You have been warned. But… that doesn’t mean we can’t play with a replicated version and leave the primary relatively untouched. To do this you have a few options. You could use transactional replication, log shipping, restoring a nightly backup or something different- I leave this part up to you. It really depends on how often you want your data to refresh. For my purposes a nightly run is fine.

But before we get in to how we can do this, let first have a look around the tables which comprise the SSRS DB.

SSRS database architecture 

I am using SQL Server Reporting Services with SQL Server 2012. When you configure SSRS you’re prompted to create a database to hold the definition of your reports. When you connect to SQL Server you’re able to see the database listed. Your database will be listed along with a TempDB.





If you expand this database you can see 34 tables, these tables are used to manage report items, subscriptions, schedules, report snapshots, system properties, data sources, datasets, symmetric keys and the execution log details – where most of this session is focused.

To support monitoring our executions we are going to need data from the following tables:

Catalog – The catalog holds all the juicy metadata about all the objects stored in SSRS.
Datasets – Info about published shared datasets.
DataSources – Info about published shared data sources.
ExecutionLogStorage – This is where the magic happens, the who, how, when why. All your questions are answered here!
ModelItemPolicy – Bridge between Catalog and Policies.
Policies – Used as part of the SSRS security
PolicyUserRole – Used as part of the SSRS security
ReportSchedule – Bridges Catalog, Schedule & subscriptions together.
Schedule – Details about when and how often a report is set to auto generate.
Subscriptions – Who is subscribed to receive a report.
SecData – All the detail about who has access.
Users – Who has access.

Before we go on, there are some caveats to consider before you start working with the SSRS DB.

1. Firstly a lot of the content is stored in XML. To get all the best bits of data you need to be able to write basic XML queries (XPATH).

2. A lot of what you find in the SSRS DB has very little documentation . For example there are tonnes of integer values for attributes like type, but there is no reference/enum/lookup table for type, this is common for almost all the tables. Where this happens I have done my best to outline which each value relates to.

3. SSRS DB will only store data for as long as you need it, be default it only stores data for 60 days – So when you start this process it is unlikely you will be able to mine anything more than 60 days of executions. However this can be altered by logging on to the SSRS DB and updating the ConfigurationInfo table to set the ExecutionLogDaysKept to something greater than 60 – depending on how many users and reports you have something like 120 is fine.

UPDATE  ConfigurationInfo
SET     Value = '365'
WHERE   Name = 'ExecutionLogDaysKept'

As SSRS DB only keeps executions for a short amount of time (unless it has been changed), I have modelled this project as a data warehouse – in the Kimball style. We essentially then have a lot of dimension tables such as Dim_Catalog, Dim_User and all the associated dimensions Dim_Date & Dim_Time. This follows the basic Kimball star schema. This is loaded as often as you need it to be, using SSIS. As part of the scripts section there is a script to create a job that will execute the SSIS package. The SSIS package isn’t very complicated and I will go in to more about it in the next blog.

At the end of this series of blogs I will make the scripts & packages available to download. Included will be everything you need to set up a clone of your SSRS DB, extract data from that, build where the data is to be loaded (the SSRSDB) and all the views for interrogating the data warehouse. Also included are a set of reports that will allow you to manage your reporting server. There is a lot more built in to this that I will also blog about and make available.

Within the SSRSDB, there are additional tables that allow you to create a metadata driven theme layer on top of your reports. This requires you to load information about each report in to the ReportServerDW. To help you to do this an Excel file is also included in the download. Once this has been populated, a job can be created (again in the download) that will notify users (via email) who asked for a report to be created, that their report has not been used for X amount of days and give them the option to save their report. If the users fails to save their report it was either removed or hidden. This is configured per report in the Excel document.

What this project can also do which is pretty cool, is act as a source control for your reports. Ok it isn’t really a source control because it doesn’t allow all the nice thing like branching, however it will store a version (in XML) of each iteration of a report uploaded. I will also include an SSIS package that will export all the historic versions on a report.

Think this is neat? Me to. In the next instalment we will look more in to the data being stored and the SSIS package which is used to load the data.

Thanks for reading.