Hi all.

For those who have wondered if I have fallen off the grid, I have not.

I have been blogging at http://blogs.adatis.co.uk/terrymccann/

Thanks

Terry

## Calculating distance with SSIS and the Google distance matrix API

Recently I set up an interactive map in R & Shiny to show the distance between where a customer lives and one of the sites where a customer attends. What I wanted to do was to find out how long it took each person to get to the site using public transport, which proved difficult in R, so i settled for a “as the crow flies” approach using pythagoras. This worked, but since then I have had in the back of my mind a burning question “what if I could used the Google API to give me that distance”. This is what I have now done, and this is how you do it.

Google have many APIs that can be used at little to no cost. The Google distance matrix API allows you to enter a starting position and one or more end positions, and the API will return the distance based on a mode of transport of your choice (walking, driving, transit – public transport). All you need to do is register and get an API key – https://developers.google.com/maps/documentation/distance-matrix/

Once you have an API key you will need to keep hold of this. This key allows you to call the API, no key, no result. If you read the documentation here you can get a few examples of how to use the key. If you change the URL below to include your API key where YOUR_API_KEY is currently you will retrieve the JSON for that journey. By default the GDM is set to driving distance, but this can be changed. Also this is currently JSON, if you would prefer to have the result as XML, change json? for xml?.

`https://maps.googleapis.com/maps/api/distancematrix/json?units=imperial&origins=40.6655101,-73.89188969999998&destinations=40.6905615%2C-73.9976592%7C40.6905615%2C-73.9976592%7C40.6905615%2C-73.9976592%7C40.6905615%2C-73.9976592%7C40.6905615%2C-73.9976592%7C40.6905615%2C-73.9976592%7C40.659569%2C-73.933783%7C40.729029%2C-73.851524%7C40.6860072%2C-73.6334271%7C40.598566%2C-73.7527626%7C40.659569%2C-73.933783%7C40.729029%2C-73.851524%7C40.6860072%2C-73.6334271%7C40.598566%2C-73.7527626&key=YOUR_API_KEY`

The most basic use is to calculate the distance from A to B. This is what we will look at using SSIS. The API takes in an origin and a destination. These can either be a lat & lng combo or an address which we be translated into Lat & lng when executed. To demonstrate the process I have created a database and loaded it with some sample customer and store data. We have 2 customers and 4 stores. These customers are linked to different stores. You can download the example files at the bottom of the page.

```--CREATE DATABASE WebService_Example; GO   -- ################################################################################### -- Create tables -- ################################################################################### USE WebService_Example; GO   DROP TABLE dbo.CustomerAddresses; CREATE TABLE dbo.CustomerAddresses ( CustomerAddressID INT IDENTITY PRIMARY KEY , CustomerName Varchar(150) , CustomerAddress VARCHAR(500) ); GO   DROP TABLE dbo.StoreAddresses; CREATE TABLE dbo.StoreAddresses ( StoreAddressID INT IDENTITY PRIMARY KEY , StoreName VARCHAR(150) , StoreType VARCHAR(20) , StoreAddress VARCHAR(150) ); GO   DROP TABLE dbo.CustomerStoreAddress; CREATE TABLE CustomerStoreAddress ( CustomerAddressID INT , StoreAddressID INT )   DROP TABLE dbo.CustomerAddressDistance; CREATE TABLE dbo.CustomerAddressDistance ( CustomerName Varchar(150) , CustomerAddress VARCHAR(500) , StoreName VARCHAR(150) , StoreAddress VARCHAR(150) , Distance VARCHAR(50) , EstimatedTime VARCHAR(50) , ModeOfTransport VARCHAR(50) )     -- ################################################################################### -- Load tables -- ###################################################################################   INSERT INTO dbo.CustomerAddresses ( CustomerName , CustomerAddress ) VALUES ('A Customer', '46 Church Road, Alphington, Exeter, EX2 8SZ') ,('B Customer', 'Flat 9, Landsdown Terrace, Exeter, EX2 4JJ')   INSERT INTO dbo.StoreAddresses ( StoreName , StoreType , StoreAddress ) VALUES ('Tesco Extra', 'Vale', 'Exeter Vale Shopping Centre, Russell Way, Exeter EX2 7EZ') , ('Tesco Express', 'Heavitree', '85 Fore St, Heavitree, Exeter, Devon, EX1 2RN') , ('Sainsburys Superstore', 'Alphington', 'Alphington Rd, Exeter EX2 8NH') , ('Sainsburys Local', 'Sidwell', '12 Sidwell St, Exeter EX4 6NN');   INSERT INTO dbo.CustomerStoreAddress ( CustomerAddressID , StoreAddressID ) VALUES (1,1) , (1,2) , (1,3) , (1,4) , (2,3) , (2,4)   -- ################################################################################### -- Query -- ###################################################################################   SELECT CA.CustomerName , CA.CustomerAddress AS 'Org' , SA.StoreName , SA.StoreAddress AS 'Des' FROM dbo.CustomerAddresses CA INNER JOIN dbo.CustomerStoreAddress CSA ON CA.CustomerAddressID = CSA.CustomerAddressID INNER JOIN dbo.StoreAddresses SA ON CSA.StoreAddressID = SA.StoreAddressID```

This will build the tables and populate the tables that are used in the SSIS project.

The SSIS project consists of a data flow task (DFT), which contains and OLEDB source, a Derived column transformation (to add the columns which are to be loaded), a script task (C#) and the an OLEDB destination to load the results. There are 2 variables, the mode of transport (walking, driving, transit) and also your API key. Below is the C# required to call the API and shred the XML.

```  #region Namespaces using System; using System.Data; using System.Web; using System.Windows.Forms; using Microsoft.SqlServer.Dts.Pipeline.Wrapper; using Microsoft.SqlServer.Dts.Runtime.Wrapper; using Microsoft.SqlServer.Dts.Runtime; using System.Net; using System.Xml; // Make sure you add any missing references #endregion     [Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute] public class ScriptMain : UserComponent {   public override void PreExecute() { base.PreExecute(); }     public override void PostExecute() { base.PostExecute(); }   public override void Input0_ProcessInputRow(Input0Buffer Row) {   // Create a new web client WebClient wc = new WebClient();   // Select your mode of transport - Edit the var in SSIS String TransportMode = Variables.ModeOfTransport;   // Add reference to your API String APIKey = Variables.DistanceAPIKey;   // Build the URL that is to be requested String uri = "https://maps.googleapis.com/maps/api/distancematrix/xml?units=imperial&origins=" + Row.Org + "&destinations=" + Row.Des + "&mode=" + TransportMode + "&key=" + APIKey;   // Get the XML string geoCodeInfo = wc.DownloadString(uri); XmlDocument xmlDoc = new XmlDocument(); xmlDoc.LoadXml(geoCodeInfo);   // Shred the XML String Duration = Convert.ToString(xmlDoc.SelectSingleNode("DistanceMatrixResponse/row/element/duration/text").InnerText); String Distance = Convert.ToString(xmlDoc.SelectSingleNode("DistanceMatrixResponse/row/element/distance/text").InnerText); //String Fare = Convert.ToString(xmlDoc.SelectSingleNode("DistanceMatrixResponse/row/element/fare/text").InnerText);   // Update the outgoing rows. Row.Time = Duration; Row.Distance = Distance;   }   }```

The script component looks at the rows coming in and passes them into a URL which is executed by a web client. The results are then updated into the Time and Distance Rows.

Once the data has been loaded we can analyse the results and see what is the closest store by multiple methods of transport.

```-- ################################################################################### -- Check results -- ###################################################################################   -- transit -- driving -- walking SELECT CustomerName , CustomerAddress , StoreName , StoreAddress , Distance , EstimatedTime , ModeOfTransport , CASE WHEN EstimatedTime LIKE '%hour%' then CONVERT(TIME,REPLACE(REPLACE(EstimatedTime,' hour ', ':'), ' mins', '')) ELSE '00:' + REPLACE(EstimatedTime, ' Mins', '') END TimeInMinutes , ROW_NUMBER() OVER (PARTITION BY CustomerName ORDER BY CASE WHEN EstimatedTime LIKE '%hour%' then CONVERT(TIME,REPLACE(REPLACE(EstimatedTime,' hour ', ':'), ' mins', '')) ELSE '00:' + REPLACE(EstimatedTime, ' Mins', '') END) AS 'Rank' FROM CustomerAddressDistance;```

## Sun Modelling for Data Warehouses. Visio template

Sun modelling is a technique for requirement gathering for a multidimensional data warehouse modelling. I have used or read about many different DW modelling techniques, however none is quite as simple and elegant as Sun Modelling. Sun Modelling was designed by Prof Mark Whitehorn (Dundee University). I won’t go into the details of how you execute a sun model today, however I will write about this in the future.

For the moment I have tried to simplify the process of building a sun model in visio. You can download a visio stencil which contains the main elements required to make a Sun Model. Also included is a sample visio document.

## Exeter’s SQL South West UG – Scripts and slides

Thanks for the massive turnout last night at SQL South West, you guys rule!

I presented a session called TSQL(L): Tricky SQL, Quandaries and Lessons Learnt.

Thanks again for attending. I had a lot of fun with this session. In the session we talked about a few websites:

Gaps & Islands: https://www.simple-talk.com/sql/t-sql-programming/the-sql-of-gaps-and-islands-in-sequences/

See you all next month.

Slides:

## Tally/Number tables in SQL Server #TSQL2sday

As part of #TSQL2sday Mickey Stuewe (b|t) has proposed a question looking at best practices when someone has made some bad design decisions, generally when I am facing bad design problems I am the one who put them there (:P). This tip is not necessarily a “how to fix an issue” but more a tip on how to work with dates.

As a BI developer I am always working with dates and numbers. A user will come and ask for a query relating to what happened last month – What they typically mean is what happened per day? How much did we sell on each day? How many customer visited our website each day? How many members of my team were absent in October?

When you’re working with HR data you commonly want to see where you have absences over a period of time. This might be a chart per day or per month. So this is what we want to do. We want to see each day for October how many people were absent on each day – and here we start to see a common issue. This is also an issue that might be neglected where you commonly have sales per day, when you have no sales on an edge-case day it might get missed, and you really don’t want to miss that!

Here is some sample data to illustrate the point. The sample data is my team and their absences. We have Timmy and Bobby both have had a bad month and have been ill on multiple occasions (man-flu), they have a few consecutive periods of illness and the odd one day. What I want to see is 31 rows (one for every day in October) returned showing any zeros where there were no absences. However our HR system only logs out a row for data logged and not a row for those where there has been nothing logged. So a query like this (which is what the end-user asked for) looks like this:

```DECLARE @Example TABLE (ID INT IDENTITY, StaffMember VARCHAR(50), AbsenceDate DATE, AbsenceReason VARCHAR(50)) INSERT INTO @Example ( StaffMember, AbsenceDate, AbsenceReason ) VALUES ('Bobby', '20151001', 'Sickness') ,('Bobby', '20151002', 'Sickness') ,('Bobby', '20151003', 'Sickness') ,('Bobby', '20151009', 'Sickness') ,('Bobby', '20151022', 'Sickness') ,('Bobby', '20151021', 'Sickness') ,('Bobby', '20151023', 'Sickness') ,('Bobby', '20151024', 'Sickness') ,('Sammy', '20151001', 'Sickness') ,('Sammy', '20151003', 'Sickness') ,('Sammy', '20151022', 'Sickness') ,('Sammy', '20151025', 'Sickness')```

When you run the following:

`SELECT AbsenceDate, COUNT(*) AS 'AbsenceCount' FROM @Example WHERE AbsenceDate >= '20151001' AND AbsenceDate < '20151101' GROUP BY AbsenceDate`

But we are missing most of the days. If this data is to be used for a visualisation in Excel or SSRS then we want to see this per day, however we will only have pointers for those where the data has been logged, and that is an issue. But that isn’t at the fault of the developers, why would you write out more than you need? It doesn’t make sense. Ok we have an issue, but it doesn’t stop there! Our user now wants to know “what days were there no absences?”. How can we look for data that does not exist?

We obviously need something to help. We need a tally table. A tally table or commonly known as a numbers table is a simple table which holds numbers or dates. In a data warehouse you might have a date dimension – this is similar. A tally is a table that has a load of numbers in it. 1 to 100 million or however many you need, it is up to you. For this example we want one row in a table for every date. The script below will build you a table and insert as many rows as you want. For this example I have used 365 days from 01-01-2015 this will gives a row for every day in 2015. Alter the @j variable to extend this and alter the @startDate to move the start date.

After running this script you will have a full table and now we have something that we can use to start answering the questions we have been asked. So lets look at our questions again.

Question number one. Create a list of all absences in October (including where we have no absences)

```; WITH TallyDates AS (SELECT * FROM dbo.TallyTbl WHERE TallyDate &gt;= '20151001' AND TallyDate &lt; '20151101') , AbsenceData AS (SELECT AbsenceDate, COUNT(*) AS 'AbsenceCount' FROM @Example GROUP BY AbsenceDate) SELECT TallyDates.TallyDate, COALESCE(AbsenceData.AbsenceCount,0) AS 'AbsenceCount' FROM TallyDates LEFT JOIN AbsenceData ON TallyDates.TallyDate = AbsenceData.AbsenceDate```

Now we have a full list of all the days including those where we have had no absences. Ideal!

Question number two: In October what dates were no absences? An to make it a little harder, which days were working days – wondering why there was an TallyWeekend column? It was for this.

```; WITH TallyDates AS (SELECT TallyDate, TallyDateWeekend FROM dbo.TallyTbl WHERE TallyDate &gt;= '20151001' AND TallyDate &lt; '20151101') SELECT TallyDates.TallyDate AS 'DateWithNoAbsence' FROM TallyDates WHERE TallyDates.TallyDate NOT IN (SELECT AbsenceDate FROM @Example) AND TallyDates.TallyDateWeekend = 0```

So something that seemed really difficult before is now really simple. So not a bad design solution but a way to manage dates when you need to look for missing dates.

If you do not want to make a tally table or you cannot alter the database then you can do this all in a different way. Itzik Ben-Gan developed a little script that can be used to generate millions or rows in a zero IO. You can read about this here: http://sqlmag.com/sql-server/virtual-auxiliary-table-numbers

``` WITH L0 AS(SELECT 1 AS c UNION ALL SELECT 1), L1 AS(SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B), L2 AS(SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B), L3 AS(SELECT 1 AS c FROM L2 AS A CROSS JOIN L2 AS B), L4 AS(SELECT 1 AS c FROM L3 AS A CROSS JOIN L3 AS B), L5 AS(SELECT 1 AS c FROM L4 AS A CROSS JOIN L4 AS B)```

Thanks

## SQL Server string manipulation functions

Hi everyone,

As part of an ongoing training session I am doing for some of my colleagues on the basics of TSQL, I wanted to store this content somewhere. Where better than on my blog.

So today we will be starting with looking at some basic string manipulation functions, how they work and why you might want to use them. Then we will look at how we can use them together to perform a common dynamic name separation technique. All of these examples are using a table which is constructed using the Microsoft AdventureWorks database (this version is using 2012). To build this table you will need to download a version of AdventureWorks and attach the database (alternatively you could use the RedGate Azure AdventureWorks – You can find that here).

Before we continue please run the following TSQL to build the table. There are many intentional errors in the data – I have hoped to simulate normal user activity and errors.

```USE AdventureWorks2012; GO   BEGIN TRY DROP TABLE StringMinipulationExample; END TRY BEGIN CATCH PRINT 'Table didnt exist' END CATCH GO   CREATE TABLE StringMinipulationExample ( FullName Varchar(250) , Address VARCHAR(1000) );   INSERT INTO StringMinipulationExample SELECT ' ' + COALESCE(Title, '') + ' ' + FirstName + ' ' + LastName + ' ' AS 'FullName' , COALESCE(AddressLine1, '') + ' ' + COALESCE(AddressLine2, '') + ' ' + COALESCE(City, '') + ' ' + COALESCE(PostalCode, '') AS 'Address' FROM Person.BusinessEntityAddress INNER JOIN Person.Person ON Person.BusinessEntityID = BusinessEntityAddress.BusinessEntityID INNER JOIN Person.Address ON Address.AddressID = BusinessEntityAddress.AddressID   SELECT FullName, Address FROM dbo.StringMinipulationExample```

# CONCATENATION

When you’re working with strings you will no doubt want to join one sting with another, this is known as concatenation. The process is a simple one. Take one string and combine it with another.

`SELECT 'Hello' + ' ' + 'world!'`

We are using a + to concatenate. You will see in the middle the ‘ ‘ this is to add a space between the two words. Sounds simple and it is, however it can quickly become confusing when you start to concatenate with NULLS or different data types. NULLS in SQL Server can cause some grief but they have a purpose. When you concatenate two strings you get a new string – Hello World! However when you concatenate a string and a NULL you get a NULL. to avoid this we need to handle those NULLS and replace them with something else. Remember ” <> NULL. An empty space will still concatenate as it is not NULL. To handle those NULLS we can use conditional logic (CASE WHEN ELSE END) or we can use ISNULL() or COALESCE(). Concatenation needs to be to the same data type. For example when you write 1+1 do you want 2 or 11? 1 + 1 = 2, ‘1’ + ‘1’ = ’11’. INT + INT = INT, CHAR + CHAR = CHARCHAR, INT + CHAR = ERROR. To avoid this error we need to first convert the INT to a CHAR using either CAST or CONVERT.

# ISNULL & COALESCE

#### COALESCE ( check_expression , replacement_value )

Both of these functions are NULL replacement functions. There are differences in how the engine processes and also on ISNULL is TSQL only and not ANSI standard.

```SELECT ISNULL(NULL, '') SELECT COALESCE(NULL, '')```

# CAST& CONVERT

#### CONVERT ( data_type [ ( length ) ] , expression [ , style ] )

Both of these functions help to change the data type in your query. A lot comes down to personal preference.

```SELECT CAST( 1 AS CHAR(1)) SELECT CONVERT(CHAR(1), 1)```

Both will return a char. You cannot convert all datatypes from one type to another. Refer to this image for reference.

# LEFT & RIGHT

#### RIGHT ( character_expression , integer_expression )

When you want to take the left most character from a string. Character_expression is your string and integer_expression is how many characters do you want. This include blank spaces.
Example:

```SELECT FullName, LEFT(FullName, 5) AS 'Left5_FullName' FROM dbo.StringMinipulationExample SELECT FullName, RIGHT(FullName, 5) AS 'Left5_FullName' FROM dbo.StringMinipulationExample```

You can see that the left or right 5 characters have been selected. It might look odd as there are less then 5 characters visible on most rows and this is because there is preceeding and trailing spaces – keep reading to learn how to handle those.

# SUBSTRING

#### SUBSTRING ( expression ,start , length )

When you want the middle x amount of characters from a string. In the example below I have used the programmers favourite Hello World!. Hello world! Has 12 characters including the space and the exclamation mark. If we want to take just the world we need to position our start on the w and take 5 spaces. The grey squares in the table below indicate which characters we are selecting. For those of you who are familiar with Excel this function is similar to the MID function.
Example:

`SELECT 'Hello World!', SUBSTRING('Hello World', 7, 5)`

 1 2 3 4 5 6 7 8 9 10 11 12 H e l l o W o r l d !

# UPPER & LOWER

#### LOWER ( character_expression )

When you want to cast a character as either uppercase or lower case. This function is similar to Ucase and Lcase in Excel and VB.
Example

`SELECT FullName, UPPER(FullName) AS 'Upper_FullName', LOWER(FullName) AS 'Lower_FullName' FROM dbo.StringMinipulationExample`

# CHARINDEX

#### CHARINDEX ( expressionToFind ,expressionToSearch [ , start_location ] )

When you want to know the position of a character in a string of characters. This is a really helpful function. If you think most names are FirstName SPACE LastName if you want to split the first and last name you can locate the space and take the LEFT and the SUBSTRING. The way this function is structured is a little backwards. We are looking at what we are trying to find, where we are trying to find it and how many character should be skipped before we search. The reason the latter is there is in the event that there are multiple spaces, if you know the first 2 characters are always a space you might want to add ,2 to start from the 3rd character.

`SELECT FullName, CHARINDEX(' ', FullName, 3) FROM dbo.StringMinipulationExample`

I have included a ,3 to the end of the function as I know I have 2 spaces at the start. I want to skip these. We will use a function to remove these in a moment. This function is similar to FIND() in Excel.

# REPLACE

#### REPLACE ( string_expression , string_pattern , string_replacement )

When you want to replace a character or set of characters with a different character or set of characters. Useful when you need to find a character and replace it.

`SELECT FullName, REPLACE(FullName, ' ', '_') FROM dbo.StringMinipulationExample`

In this example I am replacing all the spaces with underscores.

# REVERSE

#### REVERSE ( string_expression )

Does what is says on the tin! Reverses a string.
You might be asking well why would I want to do that? Because it is easy to find the first special character in a string, it is not as easy to find the last. How do you make the last the first? REVERSE it.
Example:

`REVERSE('Hello World')`

REVERSE(‘Hello World’) = ‘dlroW olleH’

# RTIM & LTRIM

#### LTRIM ( character_expression )

When you want to remove the trailing spaces from the left, right or both sides of a string.
Example:

```SELECT FullName , LTRIM(FullName) AS 'LeftTrim' , RTRIM(FullName) AS 'RightTrim' , RTRIM(LTRIM(FullName)) AS 'BothTrim' FROM dbo.StringMinipulationExample```

Combining the RTRIM and LTRIM allows you to remove chars from both sides.

# Bringing it all together

So we have looked at a lot of the string manipulation function in SQL Server but to get the most out of these functions we need to using them together.
Lets look at how we can separate the Title, FirstName, LastName.

```; WITH CleanedStrings AS ( SELECT RTRIM(LTRIM(FullName)) AS 'FullName' , FullName AS 'UncleanFullName' , Address   FROM dbo.StringMinipulationExample ) SELECT CleanedStrings.UncleanFullName , CleanedStrings.FullName , REPLACE(LEFT(FullName, CHARINDEX('.', FullName)), '.', '') AS 'Title' , LEFT(LTRIM(REPLACE(CleanedStrings.FullName, LEFT(FullName, CHARINDEX('.', FullName)), '')), CHARINDEX(' ', LTRIM(REPLACE(CleanedStrings.FullName, LEFT(FullName, CHARINDEX('.', FullName)), '')))) AS 'FirstName' , REVERSE(LEFT(REVERSE(CleanedStrings.FullName), CHARINDEX(' ', REVERSE(CleanedStrings.FullName)))) AS 'LastName' FROM CleanedStrings;```

First we have a CTE to remove the left and right padded strings. Then the title can be found as it always is followed by a ‘.’.
First name was a little tricky. I had to remove the title then find the space and separate. I could have found the first space and the last and use SUBSTRING.

For the LastName I have reversed the string, found where the first space is in the reversed string, taken a left of that and the reversed the whole statement back to how it should have been.

## SSRSDB – Automated contact with users when reports are not used

Hi everyone,

This is the 4th part of the SSRSDB blog series. In this part we will be looking at how we can use data that we are collecting to automate contact to our users. Have you ever been asked to create a new dashboard that replace current reports, then asked if you can remove those replaced reports and been told “no, we still need those”. This happens a lot. The problem it causes is repetition of reporting, when there is repetition it can lead to multiple reports with slightly different business rules, this leads to confusion and misleading information.

How do we avoid this happening? One method is to analyse how frequency of the report use and compare it to a predefined limit of usage. In this section I want to look at this approach.

In the last section we looked at the additional metadata which holds additional data to support monitoring the report, this includes who owns the report and their email address. The metadata also contains the following additional columns:

Inactive Period – The amount of days that is allowed to elapse without usage. This needs to be set per report, the reason for this is that you will likely have some reports that are run daily and other that are only ran one every quarter.

Inactive Action – There are 2 outcomes at the stage:

NOTHING – Do Nothing
EMAIL – Send a friendly email to the user.

Final Period – The amount of days that can elapse for the final action.

Final Action – This is the action that takes place once the inactive period has elapsed. At current you’re limited to the following actions:

NOTHING – Do nothing
HIDE – Hide the report*
DELETE – Trigger the ReportServer to delete the report*

*Microsoft do not want you to do this, this is making a change to the ReportServer without using the ReportServer, this is a big no-no. However the process to hide and the process to delete is triggering the same stored procedures that the ReportServer GUI uses.

A typical set-up might be to set a report to email after 7 days, the report is then still available to users if they want to run the report, after a further 7 days the report is set to either hide or it is deleted. If you’re working in an environment that does not source control your reports then you can still get a version of the report back from dim_catalog in the event it is deleted in error – On a side note you should be source controlling your reports, use an SVN and source control solution.

With this data we can start looking at a process to automate contact with our users. Ok let’s look at the SQL to do this. What we have is a query to check if a user needs to be contacted and then we are going to loop over these with a cursor and send them a friendly email to say “Hey! You are not using the report I worked on!”.

To allow this to work you need to have configured database mail on your server. You can find a short guide on configuring database mail using this link – http://blog.sqlauthority.com/2008/08/23/sql-server-2008-configure-database-mail-send-email-from-sql-database/. Once you have this configured then you can apply the following code to contact your users*.

*If you have not updated your metadata then this will not make contact.

Notify the user

```    DECLARE DeprecatedReportsCursor CURSOR FOR WITH LastExecution AS ( SELECT [Item_Skey] , MAX([ExecutionStartDate]) maxdate FROM [dbo].[Fact_Transaction_Executions] GROUP BY [Item_Skey]   ) SELECT c.Item_Name , COALESCE(CONVERT(NVARCHAR(50),MAXDATE),'**NEVER RUN**') AS 'Execution_Date' , M.Item_Owner , CASE WHEN maxdate IS NULL THEN 999 ELSE DATEDIFF(DAY,MAXDATE,GETDATE()) END AS 'Days_Since_Last_Execution' ,C.item_URL ,m.Item_In_Active_Period ,CASE WHEN (m.Item_Final_Period - DATEDIFF(DAY,MAXDATE,GETDATE())) < 0 THEN 0 ELSE m.Item_Final_Period - DATEDIFF(DAY,MAXDATE,GETDATE()) END AS 'Days_To_Remove' ,M.Item_Owner_Email AS 'Email_Address' FROM dbo.Dim_Catalog C LEFT JOIN LastExecution LE ON LE.[item_skey] = C.item_Skey LEFT JOIN Support.Item_Metadata M ON C.Item_BusinessKey = M.Item_BusinessKey WHERE c.SCD_Is_Current = 1 AND c.Item_Type = 2 AND c.Item_Is_Hidden = 0 AND M.Item_Type = 'Main' AND M.Item_In_Active_Action = 'EMAIL' AND DATEDIFF(DAY,MAXDATE,GETDATE()) >= CONVERT(INT,m.Item_In_Active_Period) ORDER BY c.item_Name ;   DECLARE @Date varchar(100) DECLARE @MainBody varchar(MAX) DECLARE @Link varchar(500) DECLARE @ReportName VARCHAR(500) DECLARE @ExecutionDate VARCHAR(500) DECLARE @ReportRequestedBy VARCHAR(500) DECLARE @DSE INT DECLARE @InActivePeriod VARCHAR(5) DECLARE @DaysToRemove VARCHAR(5) DECLARE @EmailSubject VARCHAR(500) DECLARE @OwnerEmailAddress VARCHAR(100)     OPEN DeprecatedReportsCursor FETCH NEXT FROM DeprecatedReportsCursor INTO @ReportName, @ExecutionDate, @ReportRequestedBy, @DSE, @Link,@InActivePeriod,@DaysToRemove, @OwnerEmailAddress   WHILE @@FETCH_STATUS = 0 BEGIN   SET @MainBody = ' <html> <head> <title>Notification Email</title> </head> <body> <table width="800" border="0" cellpadding="0" cellspacing="0" bgcolor="#ffffff" style="font-size: 13"> <tr bgcolor="#008a9f" width="734px"><td><font color="#008a9f">h</font></td></tr> <tr bgcolor="#00c3e1" width="734px"> <td><font color="#00c3e1">h</font></td> </tr> <tr bgcolor="#FFFFFF" width="734px"> <td><font color="#FFFFFF">h</font></td> </tr> <tr valign="top" align="center"> <td style="border-width : 0px;"><img src="http://www.hyperbi.co.uk/wp-content/uploads/2015/01/HyperBi.png" alt="" title="" border="0" height="150" /> </td> </tr> <tr bgcolor="#FFFFFF" width="734px"> <td><font color="#FFFFFF">h</font></td> </tr> <tr valign="top"> <td style="border-width : 0px; font-family:Arial;"> <h1 style="color:#E33030; text-align: center; ">Report due for depreciation. <u>Act now!</u></h1> <p>Hello,</p> <p><b>' + @ReportName + '</b> - Has not been used for more than ' + @InActivePeriod + ' days. </p> <p>This report is now scheduled for depreciation and will be removed <strong>' + CASE WHEN @DaysToRemove = 0 THEN 'Today' ELSE 'in ' + @DaysToRemove + ' days' END + '</strong>.</p> <p> Please click the link below to avoid this action. <br /> </p> <p> <a href="' + @Link + '">' + @ReportName + ' </a> <br /> </p> </BR> <p>Regards,</p> <p><i>Management Information Team.</i></p> </br></br>   </td>   </tr> <tr bgcolor="#ffffff" width="734px"> <td><font color="#ffffff">h</font></td> </tr> <tr bgcolor="#00c3e1" width="734px"> <td><font color="#00c3e1">h</font></td> </tr> <tr bgcolor="#008a9f" width="734px"><td><font color="#008a9f">h</font></td></tr> <tr bgcolor="#000000"><td style="padding: 10px; font-family:Arial"><font size="1" color="#FFFFFF"> <p style="font-size: 12; font-family:Arial"></br> Please send any feedback to <a style="text-decoration: none; color: #FFFFFF" href="mailto:reportqueries@hyperbi.co.uk">reportqueries@hyperbi.co.uk</a> </p> <p style="font-size: 12; font-family:Arial"> This email was auto generated. Please do not reply directly. </p> <p></p></br>   </font></td></tr>   </table>     </body> </html> '   SET @EmailSubject = 'Important: Report due for depreciation - ' + @ReportName; SET @ReportRequestedBy = @OwnerEmailAddress   EXEC msdb.dbo.sp_send_dbmail @profile_name = 'hyperbi', @recipients = @OwnerEmailAddress, @subject = @EmailSubject, @body = @MainBody, @body_format = 'HTML'   PRINT @MainBody FETCH NEXT FROM DeprecatedReportsCursor INTO @ReportName, @ExecutionDate, @ReportRequestedBy, @DSE, @Link,@InActivePeriod,@DaysToRemove, @OwnerEmailAddress   END CLOSE DeprecatedReportsCursor DEALLOCATE DeprecatedReportsCursor```

Working our way from the top we have a common table expressions which is calculating when a report was last accessed (successfully – non-successful runs are not stored in the ExectionStorageLog table). This is fed in to a query which is comparing the report with the metadata to calculate if the user should be contacted by email – this must be expressed in the metadata. This is all loaded in to the a CURSOR statement (typically I would say that CURSORs are evil and should be avoided at all costs, however much like all questions related to database administration it depends, looping to send emails is a great reason to use a CURSOR).

From here we load a set of variables from the cursor, these variables are loaded in to an HTML block which is used for the emails. You will notice that I have used quite an old style of HTML, this isnt because I am old school, but because these emails are typically going to Outlook, which has very basic HTML processing. Once the HMTL is built it is being passed over to database mail, EXEC msdb.sp_send_dbmail handles this for us. Once executed this will fire an email for every report which has not been accessed beyond the in active period. The user will receive an email that looks like this:

This email is completely customisable with a little knowledge of HTML. It features your company’s logo (1), a “call to action” prompting the user that they need to decide what to do with their report, all the details about their report are listed (3), a link to the report is included (4)(this link allows the user to click on the link and run the report, running the report will reset the counter) and lastly it contains all the contact information for your report writing team.

Personally I have an agent job scheduled to run once a week on a Monday morning.

Automatic expiration.

Sending an email is all well and good however what happens if no one responds, or the only person who used that report left the company months ago. We need an automated solution to remove/hide these based on the final action. To do that we have another agent job.

```    DECLARE DeprecatedReportsCursor CURSOR FOR WITH LastExecution AS ( SELECT [Item_Skey] , MAX([ExecutionStartDate]) maxdate FROM [dbo].[Fact_Transaction_Executions] GROUP BY [Item_Skey]   ) SELECT c.Item_BusinessKey ,m.Item_Final_action ,c.Item_Path FROM dbo.Dim_Catalog C LEFT JOIN LastExecution LE ON LE.[item_skey] = C.item_Skey LEFT JOIN Support.Item_Metadata M ON C.Item_BusinessKey = M.Item_BusinessKey WHERE c.SCD_Is_Current = 1 AND c.Item_Type = 2 AND M.Item_Type = 'Main' AND DATEDIFF(DAY,MAXDATE,GETDATE()) >= CONVERT(INT,m.Item_Final_Period) ORDER BY c.item_Name ;   DECLARE @Item_ID varchar(100) DECLARE @SQL varchar(MAX) DECLARE @FinalAction VARCHAR(50) DECLARE @Item_Path VARCHAR(250)   OPEN DeprecatedReportsCursor FETCH NEXT FROM DeprecatedReportsCursor INTO @Item_ID, @FinalAction, @Item_Path   WHILE @@FETCH_STATUS = 0 BEGIN   IF @FinalAction = 'HIDE' BEGIN SET @SQL = ' UPDATE c SET Hidden = 1 FROM reportserver.dbo.CATALOG c WHERE ItemID = ''' + @Item_ID + '''' PRINT 'Report Hidden' END   IF @FinalAction = 'DELETE' BEGIN SET @sql = 'EXEC ReportServer.dbo.DeleteObject @Path=N''' + @Item_Path + ''',@AuthType=1,@Prefix=N''' + @Item_Path + '%''' PRINT 'Report deleted' END   IF @FinalAction = 'NOTHING' BEGIN PRINT 'nothing' END   PRINT @SQL   EXEC (@sql)   FETCH NEXT FROM DeprecatedReportsCursor INTO @Item_ID, @FinalAction, @Item_Path   END CLOSE DeprecatedReportsCursor DEALLOCATE DeprecatedReportsCursor```

This process is very similar to the process above, we have a CURSOR that loops over all the reports that are past the final notice period and either does nothing, hides or deletes the report.

As stated above this is all done in the same way as the report server to avoid causing any internal issues (use this at your own risk).

I have this as a separate job which runs the evening of the first job.

With that you have an automated report expiration process for SSRS.

Hi everyone,

This is the 3rd instalment to my series looking at reporting on SQL Server Reporting Services. In the last section we looked at the data warehouse. In this section we will look at loading the DW with all the data we need to effectivly monitor our SSRS environment.

The ETL process is formed from 2 steps.

Step 1 – Back up the existing ReportServer and restore it to a new database “ReportServer_Clone”. There are several reasons why you might want to do this, the main one is to remove any strain from your production server. Microsoft does not advise that you make alterations to the ReportServer database so we are goon on that front too. The final stage is to add a view and persist that view to a table. This is a very expensive view which is why it is persisted. This view is used for most of the load process so persisting to a table enables the DW to load faster.

Step 2 – The ETL phase. The ETL has 5 phase.
Phase 1 – Load variables. The SSIS packages is designed to be as dynamic as possible and fit to your environment. When you create the SSRSDB there is a table called “Configuration” under the “Support” schema. This holds all the connection and dynamic variables.

Phase 2 – Load the dimensions.
You will notice that there are less dimensions being loaded than were created in the build process. This is because some are enumerator tables and will not change over the life of your versions of SSRS, for that reason there is no need to load them daily.

For all other dimensions these are loaded using TSQL MERGE, to enable this process we need a staging table to make the comparison. Essentially what a TSQL MERGE does is an UPSERT, an UPDATE and INSERT. The dimensions are all type 2 slowly changing dimensions. So when a change is registered we are closing off the old row and adding a new one.

Each sequence container contains a truncate staging EXECUTE SQL task followed by a DATA TRANSFER task to move the data from the ReportServer_Clone database to the staging tables. There is a degree of transformation that also happens, mainly around converting INT tables to expanded results. Example, IsHidden is a boolean value BIT value (0 OR 1) when filtering this in a report it is not obvious what 0 and 1 relate to, so this is transformed to be “Hidden” and “Not Hidden”.

The dimensions being loaded here are Dim_Catalog – report details, Dim_Format – Format of the report, Dim_User – who has been using the report, Dim_Role – what role does the use have and finally Dim_Dataset.

Once the data has been loaded in to staging it passed in to the TSQL MERGE.

Phase 3 – Fact loading. The facts are loaded using the LOOKUP design pattern. When you set up the SSRSDB you will have a set of historic data, when this is first loaded you want all that historic data. After that has been loaded there is no point reloading all the data as this is just a waste of IO. Instead as part of the configuration once the ETL has been ran once it will set the “full load” option to “Incremental”. Once this has happened only the data loaded that day will be removed.

Three fact tables are being loaded Fact_transaction_Execution, Fact_Factless_PUR and Fact_Factless_Dataset_fields.

The first step is to either remove today’s data or all the data (dependant on whether this is the first time you’re loading the SRSDB). Once the data is removed we move the data from the ReportSerer_Clone database to the SSRSDB.

(1) A query to extract the data and transform it, (2) look up the current Item_skey, (3) look up the current Formay_skey, (4) look up the current User_skey, (5) look up the current Action_skey, (6) look up the current Source_skey and finally load to SSRSDB.

The same process happens for the other 3 fact tables.

Phase 4 – Loading and updating the metadata. Part of the power of SSRSDB is the extended metadata. This is achieved using a simple Excel spreadsheet. When SSRSDB is created a table is also created which hold the metadata, Support.Item_Metadata. The columns in this table mirror that of the Excel Spread sheet.

The metadata allows you to store additional data about the report which will be used to automate communicating changes/usage of reports. The metadata stores the style (theme management blog coming soon), date it was created, who requested it, what type of report is it (Main or drilldown), the report owners email (again for automation) In active period (how many days can a report go without being used, what happens after that time, final period and what happens.

When you first use the SSRSDB this will be blank, rather than manually populating this there is a process to read the file and write out the existing contents of the ReportServer, any changes after this will be updated.

The final phase is to read the ReportServer logs. This is a for each loop to loop over each log and load it in to the SSRSDB.

In the download attached you will also find a set of jobs that you can create. Once these have been created and scheduled you have a fully automated ReportServer data warehouse. I personally have this scheduled to run every 30 minutes.

## Power BI & ODI Node Devon

Hi everyone,

Over the last 2 days I have been partaking in lots of data related fun! On Friday 26th June I attended the ODI Devon’s first conference “Beyond the smart city” #btsc15. It was a great event. As part of this Myself and the SQL South West team were asked to do some technical sessions on Saturday (27th). I love getting out there and meeting new people so jumped at the opportunity. The session was aimed at individuals who might not know the MS BI stack very well, or someone who knew what there was but possibly not how to use it – No deep dives, etc.

My session was about exploring “Open data”. To do this I used a variety of Microsoft tools (Excel, PowerPivot, Azure, Azure SQL Database, SSIS and Power BI). I wanted to use this blog post the walk you through how I went about preparing for this session and as a reference for those who attended the session. I had one goal, to demonstrate how anyone can quickly “Ask questions about their data”.

So what did we do?

1. We looked at where we can get open data
2. We analysed that data in Excel using pivot tables, answering basic questions
3. We wanted more data and somewhere to store it so we created an Azure SQL database
4. We loaded the data into Azure using SSIS
5. We analysed the data further in PowerPivot and built a model, but it wasn’t good enough
6. We imported out PowerPivot model in to Power BI and started asking questions of our data

Let’s look at that in a bit more detail.

To start off I looked at where to find open data. For my examples I used the open data provided by Exeter council. On Exeter council’s website there is a lot of interesting rich datasets. This included property listings owned by ECC and also their spend over the last 2 years. I used both of these datasets as source data. Much like most open data, you get what you pay for. There was a small issue with the data, it in separate files – there were 15 files for the last 15 months.

Sure you can use Excel, and copy and paste each file in to one long list, but what if you want that stored somewhere you can quickly get to, or somewhere you can share that data with colleagues? You guessed it, we need a database to store this data. This session was designed for anyone to attend and the replicate at home. I didn’t want attendees to have to download SQL Express, and a lot of what I am demonstrating is cloud based, so Azure was the obvious choice.

So we needed a database in Azure (for this demo I am using the newer portal V2). Head over to https://portal.azure.com/ and create an account. Once there complete the following steps to spin up a brand new SQL database.

1. Open the market place and search for SQL Database
2. Hit create
3. Give your database a name and configure the settings
• Server – Create a new server as required.
• Choose you pricing model
• Configure further as required
• Hit create

Within 40 seconds you have a working database in the cloud. If you have trouble logging in to the database you may need to alter the firewall settings to allow the IP address you’re connecting from.

Now we have our functioning database we can begin to load it with data. To do this we first need to create a few tables to store the data.

First we create the spend data:

```CREATE TABLE [dbo].[SpendData]( [Body Name] [varchar](50) NULL, [Trans Date] [datetime2](7) NULL, [Supplier Name] [varchar](50) NULL, [Trans Ref] [int] NULL, [Net Amount] [decimal](15, 2) NULL, [Expense Description] [varchar](50) NULL, [Expense Code] [int] NULL, [Service Code] [varchar](50) NULL, [BVACOP] [varchar](50) NULL, [Category] [int] NULL )```

Then the building location data

```CREATE TABLE [dbo].[BuildingLocations]( [Uprn] [varchar](50) NULL, [AssetRef] [varchar](50) NULL, [AssetName] [varchar](1000) NULL, [Postcode] [varchar](50) NULL, [Easting] [decimal](18, 0) NULL, [Northing] [decimal](18, 0) NULL, [InterestType] [varchar](50) NULL, [AssetType] [varchar](150) NULL, [AssetUse] [varchar](150) NULL )```

I also used a script from codeproject to dynamically create a date tally table (These are so important. If you don’t have a numbers table and a dates tally then go and make one now) http://www.codeproject.com/Articles/647950/Create-and-Populate-Date-Dimension-for-Data-Wareho. Now we have 3 tables, 2 empty and 1 with 20 years of dates. Let’s load the missing data.

To do this I have used SQL Server Integration services – SSIS. SSIS is Microsoft’s ETL tool for moving and transforming data. The package is designed to be run several times to enable this the first step is to TRUNCATE the existing tables. After that we want to loop over all the files (I have simply pulled all the data from the ECC website) then a final transfer for the building data. The more data you have the more that will be imported – just keep adding the spend data to the folder. This SSIS package is available to download at the bottom of the page.

Once the data was loaded we headed over to Excel again a loaded the data in to a PowerPivot model. I am using Excel 2013 so PowerPivot is already installed but not enabled. For those using 2010 go and grab the add-in from here. Once installed enable the add-in and open PowerPivot. On the Home ribbon we looked at how to import the data from Azure.

Click on From database
Select SQL Server
Fill in credentials
Select tables.

Now we have our data we build our basic model by using the diagram view to connect the 2 tables.

Now we can ask more complicated questions of our data such as which quarter is the most money spent, we can analyse by month, year and any other column in our date tally. But really this isn’t good enough! Our end user needs to know a lot about Excel to really get what they need. We are creating something for someone who really has no idea about BI tools, maybe you managing director. So to really begin to ask questions of our data we need Power BI, in particular the Power BI Q & A.

At the time of writing, Power BI is still in preview. There is no cost structure in place and you need a corporate account to use it (You can’t use a gmail or live account). Head over to https://powerbi.microsoft.com/ and register yourself.

What we want to do is save our PowerPivot model and suck that up in to Power BI.

1. Click on the big yellow “Get Data” button in the bottom left to begin
2. From there we want to get data from files
3. Select “Local File”
4. Navigate to the PowerPivot model
5. Hit open

You should see an “Importing data” window

Once that is complete we can start literally asking questions of our data. Hit the pin icon when you’re happy to pin the item to your dashboard. You can start creating interactive reports that link from your dashboard and offer a more detailed view of your data.

That is a very high level introduction to working with data in Power BI. I will do some more blogs on Power BI in the coming weeks, once I get my head around the Power BI API :S

## SQL Saturday #411 Cambridge

WOW! What a week. In less than 7 days I have been accepted to speak at 3 SQL Saturday conferences! The schedule currently looks like this:

July – SQL Saturday Manchester
August – SQL Saturday Oslo
September – SQL Saturday Cambridge

SQL Saturday in the UK has become something of a regular occurrence. In 2015 alone have 4 (Exeter, Manchester, Cambridge and Southampton), but it all started in Cambridge with the first UK based SQL Saturday back in 2012. I missed that event but went to the following event in 2013. I had a blast! The only downside to the conference was that I didn’t spend enough time in Cambridge! I am hoping to change that this time and take in some of the sights. What I remember from last time were bikes! Bikes everywhere! I don’t have the best luck with bicycles – Ask me when you see me – I will show you the scars….

This will be the 3rd time that Mark and the team have organised SQL Saturday Cambridge and they have certainly gone big! 2 full days of training and 6 tracks on the Saturday! That is bigger than big! That is MASSIVE! In 2013 SQL Saturday Cambridge tried something different by making the even half SQL half SharePoint. I don’t use SharePoint, but like many I administer it from time-to-time. In 2013 I went to a SharePoint session and really enjoyed it. I will do the same in September.

Looking forward to seeing you there. Register now: http://www.sqlsaturday.com/411/EventHome.aspx

Thursday 10th September 2015

Penelope Coventry – Automating business processes with SharePoint, Office 365 and Azure

Benjamin Nevarez – SQL Server Query Tuning & Optimization

Niko Neugebauer – A Day with Columnstore Indexes

Friday 11th September 2015

Rasmus Reinholdt Nielsen – Building tomorrows ETL architecture with BIML and MDS today

Kennie Nybo Pontoppidan – SQL Anti-Patterns

Mark Broadbent & Regis Baccaro – Implementing Business Intelligence & SharePoint High Availability Solutions

Microsoft – Building a Data Driven Strategy