Dynamically importing Excel files with SSIS & C#

Hi everyone.

Last week my team had a request to automate a slow task the business was running on a daily basis. Every day they received an Excel file from one of our providers, this has been happening for almost 5 years. They had amassed over 1500 files. The current process involved a lengthy VBA script inside Excel that would open each file and amalgamate the data and find the most recent transaction – Any idea how long that took? 6 hours! Every day, an entire box was dedicated to this task – This needed to be changed.

The project seemed quite simple however there were a few factors which made the simple task more complicated.

1. The volume of files – 1500+
2. The sheet names were different on each workbook, with no logic in the naming – This was previously a manual task.
3. Different types of files – Excel versions 97-03 (xls) & 2007 (xlsx)
4. The data did not start at row 1 — It started at row 7

You will find this sort of operation in almost any organisation, or at least parts of it – it is a very common SSIS task. So I wanted to detail my method for tackling something like this in the hope someone else might be able to use it. The sample files are available to download below.

So let’s plan how we can tackle each of the main issue we have.

The volume of files.

In the downloads you will find a file called master.xlsm. This workbook contains a little script which will generate as many files as you need, all with different sheet names and sample data.. This is not a very quick process, so I suggest you keep the number of files small and set it going when you’re making a coffee – It will produce ~ 40 files a minute. I have created 500 and duplicated them 3 times for the purpose of the demo.

Whenever I need to work with a load of files I starting thinking how can I loop over each of these files in turn. There are many ways to do this SSIS is one option, but so can PowerShell or even log parser. For this example I am using SSIS. This task is incredibly easy in SSIS, just drop a foreach task on the canvas and configure.

2015-05-02_13-15-35

Before we configure the foreach loop to point at the folder we need to loop over, we first need to create a variable to hold output (the file path). When creating a process like this I want the process to be self auditing. By this I mean when we are processing files, I want to know which files have been processed, when and whether or not the transfer was a success. So what I am going to do is first create the variable to hold the fully qualified path and then write that out to a log table. You can see an example of this below. I have created a variable and XL_FilePath it is scoped at the package level (available to all child tasks), a type of string (char) with a default value of one of my files (to make a later step easier).

2015-05-02_16-29-24

Now we have both the foreach loop and the var we can double click on the Foreach loop container to configure. There are a few option we want to set here.

Enumerator: Foreach File Enumerator
Expressions: Blank (we are not using any expressions)
Folder: This wants to be set to the location of your files. For my example that is – C:\HyperBI\XL_SSIS\XL_Files
Files: this relates to the naming structure of your file. I know all my files are XL so I will leave this as “*.*”
Retrieve file name: You have 3 options here. Fully qualified – C:\HyperBi\XL_SSIS\XL_Files\1.xls, Name and extention – 1.xls and Name only – 1.
Traverse subfolders: The final option is if we want to go over the sub folders. All of the sample data is in one folder, so I will leave this unchecked.

2015-05-02_16-47-54

Now we need to associate the variable with the qualified path. On the left when you select Variable Mappings you will be given the option to associate the output of the loop with a variable. Select the variable we created earlier. SSIS will add an index of 0. SSIS uses a 0 based array for it index. 0 being the first column returned. You can see below that if we pipe the variable into a script task to MessageBox the result of the variable we get the following. There is a good blog here which details how to set this up.

test2

The next step is to load this into SQL server so we can loop over the files to load our data ready for processing/auditing. This is done by creating a table called SSIS_FILE_LOAD_LOG.  This table has 4 columns.

CREATE TABLE SSIS_FILE_LOAD_LOG(
SSIS_FILE_ID INT IDENTITY PRIMARY KEY
, SSIS_FILE_PATH VARCHAR(250)
, SSIS_AUDIT_DATE DATETIME DEFAULT GETDATE()
, SSIS_AUDIT_OUTCOME VARCHAR(50)
);

With this we can now add an Execute SQL task and connect it to our server/table to load the results. In the event that something goes wrong in the load I don’t want to have to reload all the previous files, hence the outcome column. We will check that the outcome is either NULL or ‘FAIL’ when we process. We want to loop over the files when this is executed and load in new rows for our new files, in order to do this we don’t want to load in files already processed or pending process. This is done by checking if the file has been processed before inserting the new row. By adding the following TSQL in to our Execute SQL task we can ensure we only capture the file paths once.

IF NOT EXISTS (SELECT 1 FROM SSIS_FILE_LOAD_LOG WHERE SSIS_FILE_PATH = ?)
INSERT INTO SSIS_FILE_LOAD_LOG (SSIS_FILE_PATH, SSIS_AUDIT_DATE) VALUES (?, NULL)

The IF EXISTS part checks if the record is in the table, if it is nothing will happen. The 2 “?” relate to variables and are configured on the Parameter Mapping screen. You will notice the parameter names are 0 and 1. The ordinal positions of the variables.

2015-05-02_19-30-15

After running this I have a table containing 1500 file paths, ready to be processed.

The data doesn’t start on row 1

This is a pain but it is an easy fix as long as you know the sheet name. In the example data I have included, the data starts at row 4 and is in columns A:D. We want to skip the first 5 rows and process the data without headers.

2015-05-02_19-46-58

 

If we add a data transfer task, followed by an Excel source – You will also need to add an Excel connection to one of the files. By default the connection will read from A1 and not A5. You can alter this by changing the range SSIS is looking at. The easiest way to do this is to create a new variable called XL_Reference with the value of ‘ZDJbagZ5$A5:D’. This is the sheet name and the start and end of your data (yours will be different as the files are random). A5 is the start and D is the end. I have omitted a number ie DE200 as I don’t know when the data ends. You can see the configuration of this below:

test3

This works, however only if we know the name of the sheet ahead of time.  As we do not know the sheet name of our next file we need to add some C#.

The sheet names were different on each workbook, with no logic in the naming

SSIS is great at many things and one of those things is processing lots of Excel files. The problem is they have to be the same structure. When this changes processing becomes an issue. The names of the sheets were different in every workbook. In some it would be the name of the file, others the date, the word extract or a simple Sheet 1. For this example every sheet name is randomly generated. In order to process the data we will need to know what the sheet name is before we process. How can we do this? Well with either VB or C# inside a script task.

Things get a little more advanced now, don’t worry stick with it and all will be OK, I promise (grab the files and play along). If you have not worked with script tasks before they can be a little daunting. They come in 2 flavours, VB and C#. I am not a big fan of VB and I used to be a C#.net developer for a short time so I prefer C#. You could use VB however the following solution is all C#. Once you have added the script task you will need to add the XL_FilePath variable to the read-only variables and the XL_Reference to the read-write variables. Then edit the script and add the following:

To the namespaces (top of the file)

using Excel = Microsoft.Office.Interop.Excel;

To the main function (replace all existing code)

var xl_file_full = Dts.Variables["User::XL_FilePath"].Value;
 
Excel.Application xlApp = new Excel.Application();
Excel.Workbook xlWorkBook = xlApp.Workbooks.Open(xl_file_full.ToString(), 0, true, 5, "", "", true, Excel.XlPlatform.xlWindows, "\t", false, false, 0, true, 1, 0);
Excel.Worksheet xlWorkSheet = (Excel.Worksheet)xlWorkBook.Worksheets.get_Item(1);
 
xlApp.Quit();
xlApp = null;
 
GC.Collect();
GC.WaitForPendingFinalizers();
 
Dts.Variables["User::XL_Reference"].Value = xlWorkSheet.Name + "$A5:D";
 
Dts.TaskResult = (int)ScriptResults.Success;

Once done you will also need some references to be added (rick click references in the solution explorer window). You will need to add a reference to the C Sharp language assembly and also the COM Excel inter op. Your references should look like this once done.

2015-05-02_20-52-53

Lets examine what is happening block by block.

using Excel = Microsoft.Office.Interop.Excel;

Here you’re adding a reference to let the script task know when you refer to “Excel” you actually mean “Microsoft.Office.Interop.Excel”.

Break down:

Define a variable and pass in the SSIS variable XL_FilePath.

var xl_file_full = Dts.Variables[“User::XL_FilePath”].Value;

Creates a new instance of a workbook, Opens the work book passed in via the var and grabs the first sheet name.

Excel.Application xlApp = new Excel.Application();
Excel.Workbook xlWorkBook = xlApp.Workbooks.Open(xl_file_full.ToString(), 0, true, 5, "", "", true, Excel.XlPlatform.xlWindows, "\t", false, false, 0, true, 1, 0);
Excel.Worksheet xlWorkSheet = (Excel.Worksheet)xlWorkBook.Worksheets.get_Item(1);

Close the Excel workbook and clear the variables.

xlApp.Quit();
xlApp = null;

Free up memory by initiating the garbage collection.

GC.Collect();
GC.WaitForPendingFinalizers();

Set the SSIS variable XL_Reference to be the sheet name and the reference to the data (A5:D);

Dts.Variables["User::XL_Reference"].Value = xlWorkSheet.Name + "$A5:D";
Dts.TaskResult = (int)ScriptResults.Success;

straight forward once you know what it is doing. The garbage collection is really important as if you omit this you will be left with 1500 open session to Excel and your server will die!

Different types of files – Excel versions 97-03 (xls) & 2007 (xlsx)

SSIS will read an Excel 97/03 file type under the 2007 format, by default it is backwards compatible! Awesome, one less thing to worry about. So now we have all the parts working independently we need to string them all together.

First we need load the results of the initial file loop in to an ADO object so we can loop over the results. In order to do this we need to do 2 things:

1. Create a new variable called XL_ADO_Object, with a data type of Object.
2. Create another Execute SQL task.  This time we will be selecting from the log table. Use the following TSQL

SELECT SSIS_FILE_ID, SSIS_FILE_PATH FROM SSIS_FILE_LOAD_LOG WHERE COALESCE(SSIS_FILE_PATH, 'X') <> 'SUCCESS'

We then need to configure the result set to Full result set, this allows SSIS to keep hold of the results of the Execute SQL. With these results we want to load them in to our newly created ADO Object variable. You can assign the results by selecting the result set navigation option. Add a new result set with a name of 0 and a variable name of User::XL_ADO_Object. Once this is complete we have an object loaded with the first 2 columns from our table (the ID and the FilePath).

test4

We need another foreach loop container to iterate over the results in our ADO Object, we will want to load the output of each row to a variable, the one we created earlier for the file path will work fine and we need another to hold the primary key of our audit table (which you will need to create before the next step – I have created a variable called Audit_ID with a type of int32) – We will use this ID to update whether the file was processed or not. To configure the foreach loop we will need to change the collection enumerator to Foreach ADO Enumerator and set the source to be the Object variable you created. Configure the variable mapping to Audit_ID with an index of 0 and XL_FilePath with an Index of 1 (0 = column 1, 1 = column 2) .

Now we can move the script task and the data flow task to inside the second foreach loop and connect them – script task first. We will want to use the variable for the source of our Excel file so we will need to configure the Excel Connection Manager to use a variable before continuing. Under the properties of the Excel connection add an expression and set the ExcelFilePath to be equal to the XL_FilePath variable.

2015-05-03_16-51-23

We now need to expand the data flow task to move the data to SQL Server. If you add an OLEDB destination task and connect it to your table. The TSQL for creating that table is below.

CREATE TABLE SSIS_XL_TO_SQL (
    [Invoice_ID] INT IDENTITY PRIMARY KEY,
    [Invoive_Date] datetime,
    [Customer_ID] float,
    [Order_Qty] float,
    [Order_Amount] float,
    [Source_FilePath] nvarchar(250),
    [Processed_Date] datetime
)

You will notice there is also a column for processed date and for source – If there is an issue the the data we can quickly track it back to the file it came from. To populate these we will need to add 2 derived columns. Between the source and destination add a derived column transformation. Add a new column with the expression of getdate()  – to add the date and time of the execution, then add another which references the XL_FilePath variable. Connect the 2 new columns and the 4 from the Excel file to our new table.

2015-05-03_17-06-19

The final step we have is to add another Execute SQL task to update the log file with the outcome of the movement. Connect another SQL task and add the following TSQL:

UPDATE SSIS_FILE_LOAD_LOG SET SSIS_AUDIT_OUTCOME = ‘SUCCESS’, SSIS_AUDIT_DATE = GETDATE() WHERE SSIS_FILE_ID = ?

You will need to configure the Parameter mapping to use the Audit_ID variable. Add this on the success of the data flow task and create another to act as an update when the process fails – change where it states ‘SUCCESS’ to ‘FAIL’. Connect both the script and data flow task to the fail update with logical OR failure constraints. And with that we have a working Excel processor that will work for any number of files.

test5

Thanks for reading.

Download all files here.

 

SSRS Inception – Part 2

totem

Hi everyone, this is the second part of this series looking at creating a monitoring solution for SQL Server Reporting Services. The second instalment looks to cover the following:

  1. The process of backing up your existing SSRS instance
  2. The structure of the SSRSDB

The process of backing up your existing SSRS instance

My personal preference for getting data from the ReportServer database to somewhere that you can work with it without causing issues to your production server is to take a backup and restore is to another database. I have called this ReportServer_Clone (Original right! :P)

The process will be as follows:

  1. Take a backup of the ReportServer – You dont need the ReportServer_TempDB.
  2. Restore that elsewhere
  3. Load the ReportDW with your data – Via SSIS.
  4. Run the usage reports

You might be asking why cannot I just not query the SSRS ReportServer directly? Well you can – but at your own risk. The ReportServer is a temperamental old gal. You can cause a whole lot of issue by making just the smallest of changes outside of the report manager.

As I have stated the backup and restore process is just my preference, it is one of the many options that you have available to you for getting your data out of ReportServer. You might choose to use transactional replication of the key tables (refer to part one for a list of key tables). Other alternatives would be use a HADR option, AlwaysOn availability groups or mirroring with snapshots . Both methods will work well for you. Personally for the size of my environment I opted for a simple BU and restore.

The process will back up the ReportServer and restore it to your server of choice, you can do this locally or push the backup to a different server. Once you have a restored version we will add a view and then persist that view to a table. The view we will add is from Bret Stateham (B). Bret has a fantastic codeplex project that does a lot of the hard work for you. I have made very minor amendments to the codeplex project. Bret if you’re reading this. Thank you for your fantastic scripts. You can find Bret’s codeplex here: SSRS queries.

Now that we have a working restored version of the ReportServer we will want to look at the data SSRS is generating. The easiest way to do this is to query the ExecutionLog3 view. This is the process for monitoring that ships out of the box from Microsoft. It will give you a very basic view of what is happening on your environment. Bret has expanded this with ‘SSRS Queries’. For me and my organisation the users wanted something that was in a style they understood, they already use SSRS, so why not use SSRS as the presentation layer – before we get in to the presentation layer we need to touch on where the data is going to be stored. As a data warehouse developer, creating a DW seemed the best option.

The structure of the SSRSDB

There are many factors that went in to the decision to model this data as a DW. The key reason is to capture history.

ReportServer will only hold the history for 60 days. By modelling this is  to a DW we can keep the history for as long as we feel necessary. If your ReportServer has become very large and you are experiencing performance issues or possible issue with storage, then you could look to reduce the amount of history stored to 14 days and let SSRSDB handle the storage on a larger slower disk – or even Azure.

There are so many moving parts in SSRS, and SSRS changes all the time. An example of change might be an alteration to the TSQL of a query, report styling, users who have access, new columns added, columns removed, change of shared data source etc. There is an awful lot of information inside ReportServer that is lost when changes happen. Does this scenario sound familiar?  A user calls you and claims they had access to a report last week but cant open it now. How do you prove that they did have access? Simple answer track their access daily.

In part one of this series we looked at the main tables in the ReportServer. The ReportServer is all hinged around a central table – execution_log. When modelling this in to a DW I was working towards a standard Kimball star schema. If you have read the Kimball groups books then you might be familiar with their 4 step process for defining a DW.

  1.  Select the business Process – We want to monitor the execution of SSRS reports.
  2. Declare the grain – We will monitor at the per execution grain.
  3. Identify the dimensions – Calendar, User, Item, User etc.
  4. Identify the facts – Report executions

Now we have the 4 step defined we can model that in to our star schema.

ssrsdb_dia

We have a few different types of dimensions and facts that have been employed in this project. We have our standard slowly changing dimensions and also a collection of role-playing dimensions. Role-playing dimensions are used to simplify the model, where we have many different dates associated to an execution/item – When was the item executed, when was it created, when was it last updated etc. We could have physically created multiple version of the calendar dimension however, that would have involved maintaining multiple versions – and I really hate doing more work than I need to! So we have a series of views based on calendar.

To capture change we will be implementing type 2 slowly changing dimensions – apologies to some if it feels like I am teaching you to suck eggs. For those who are unfamiliar with Kimball DW design, a slowly changing dimension tracks history over time and comes in several flavours. You can read more about them here. SSRSDB uses predominantly type 2 SCD, a type 2 SCD captures history by adding a new line for every changed row. To ensure that we know which rows are current we also typically add an current flag (a bit boolean value) and also an active from date and an active to date. This allows you to quickly see which dimension attributes reflect the production system while maintaining history and allowing us track user movement.

We also have multiple types of fact table. Fact_Executions is a standard transitional fact table. Each row represents a row in the expectation log tables in the ReportServer. Along side this is also 2 additional factless fact tables to monitor changes in report structure and user permissions. Now that we have our SSRSDB model we can begin to design a solution to load it with data from the ReportServer_Clone database.

We will look more at the ETL phase of SSRSDB in part 3.

Thanks for reading!

 

 

 

SQLBits 2015 – Super Heroes!

header

(Thanks to Pyramid Analytics for the amazing sketch – They accurately captured my physique!)

Before I get in to making all those who were unable to attend SQLBits peanut butter and jelly, I would first like to thank all the organisers, speakers and the wonderful helpers! Without them the conference would not be possible, or at least it wouldn’t run smoothly! A special thanks to Annette Allen for running the helpers (hard!) and for allowing me to be part of such an awesome bunch of helpers – I don’t want to say that the helpers this year we the best ever – but we were pretty damn awesome!

For those of you who do not know, SQLBits is the largest SQL Server conference Europe with more than 1500 attendees. For the last few years SQLBits has been a 3 day conference with day 1 being a pre-conference training day, followed by 2 days of hour log sessions. This year the SQLBits organisers opted to add another day to the bill in the form of a mixed pre-conference training day / session day. I really like this idea as the conference is always over in a flash (pun intended).

view

Day 0: aka “Travel day” * How many bags!
I am based in the UK so didn’t quite have the same journey times as other attendees (who come from as far as America & South Africa). My journey time was a little under 5 hours by taxi, train, tube then the DLR. I was a helper last year and frankly I loved it! You get a completely different experience when wearing a green shirt, I recommend it to you all! I was helping this year so was eager to get to the ExCeL centre early to help out. Upon arriving at the ExCeL it was clear that SQLBits was not a little fish area any more. If you have not been to the Excel before it is MASSIVE! By my estimate it is half a mile long (which made finding a vending machine fun!) If that isnt enough there were 2 other conferences and a movie being filmed at the same time – massive! Over the 4 days I racked up 30 miles on fitbit. Any way I digress.

If you are under any idea that running a conference is easy, take a look at the amount of bags that needed to be packed (~1800). Sounds like a lot of work but it is made easy by amazing company! Upon arriving I was able to catch up with old friends and make some new ones! As always there was beer and pizza for those who helped (always a good bribe for IT nerds!).

After the bag packing we headed to the Aloft (the hotel connected to the ExCeL) for a well deserved catch up and some drinks. My junior developer came with me this year which was awesome as he had to listen to me talking about bits for the last year and this year he was there. Particular highlight of the evening for me was spending some time chatting to Itzik Ben-gan about TSQL – If you don’t know Itzik then you should checkout his books (http://www.amazon.co.uk/Itzik-Ben-Gan/e/B001IGQENW).

bags

Day 1: Pre-conference day & the boat ride
After a 06:15 wake up call I was off to the ExCel for day 1 of SQLBits. The weather was brilliant, crisp and clear, which made for a very pleasant walk from my hotel to the ExCeL. Helper briefing at 07:30 was followed by an hour and a half of checking in lovely attendees. I was monitoring Adam Jorgenson and Bradley “King of costume changes” Ball’s Optimising your companies data lifecycle” session at 09:00. This was a great session with a real mixture of content, DBA, BI & machine learning. I particularly enjoyed Bradley’s many costume changes!

Train hard, party harder should be the tag line of SQLBits! Following a day of training was the SQLBits boat tour of the Thames. How do you get 300 SQLBits attendees from the ExCeL to the boat? Squeeze them all in the DLR! This was a lot of fun!

This was a great evening and I got to see London from a different perspective. Back to the hotel for an early night!

boat

Day 2: Standard sessions & the pub quiz
Another 06:15 start for another 07:30 helper briefing. This was a half pre-conference training and half general sessions day. I was too slow with my selection and missed out on some amazing sessions. My junior attended David Peter Hansen’s SSIS internals pre-con which I would have loved to attend. Hopefully I will have chance to see it again in Exeter in April (http://sqlsatexeter.azurewebsites.net/). I spent a lot of the day hanging around the community area chatting to folks about user groups SQL Saturday conferences – needless to say I had a blast! You guys are amazing!

When I did get to a session I attended the following:

Deadlocking for mere mortals – Jonathan Kehayias. Great session. I love deadlock graphs!
Data loading performance presentation – Matan Yungman. Matan is a gent! Great session, great speaker, great podcast!
Azure SQL DB – Nigel Ellis – Azure SQL DB 12 is funky! Really great session!

This was a very DBA flavoured day for me. As a BI developer I typically tend to stick with what I know. This in the past has been really limiting. Stepping out of your comfort zone and expanding your knowledge in a different area, allows me to understand more about the engine which allows me to write better queries. By only attending 3 sessions I missed a lot of great content, but not to worry! SQLBits put all their content online after the event! There is almost 12 conferences worth of content on their website! Hundreds of hours of expert training, all FREE!

I have attended the last 3 SQLBits and every year I have been incredibly envious of the teams who win all the amazing SQLBits prizes. This year there were so many awesome prizes on offer, ranging from t-shirts to Blu-ray boxsets and large Lego sets. My team this year came 3rd (I think), which meant we missed the prizes. 🙁 Never-the-less it was a great quiz!

Day 3: Standard sessions, the Cow note (as it will hence be known) & the super party!
As a helper I spent the Friday morning posted down stairs directing people to registration and the key note. The Key note was interesting. This year the key note was delivered by Rohan Kumar – Director of software engineering at Microsoft. In true super hero style Rohan appeared from the bat mobile! Did he go on to tell us about SQL Server V next? No! We learnt about wi-fi cows. There were some genius tweets going back and forth though out the session.

tweet

After the Cow note, I got back to attending sessions.

Practical T-SQL Tips and Tricks to Coding – SQL Sentry. I got to see some great tools in action. Plan explorer is great!
Query Tuning Mastery: Clash of the Row Goals – Adam Machanic. Adam is a great speaker! You don’t write sp_Whoisactive without knowing your stuff!
Deep into Isolation levels – Boris Hristov. Another great session. I love Boris’ Google hangouts, so it was great to meet him in person.
Indexing Internals – Denny Cherry. At this point in the day my brain had melted! Denny always delivers a great session!

After the last session, it was a quick dash back to the hotel before the Party. SQLBits is known for its amazing parties! In 2013 I was dressed like a dragon, 2014 saw me head-to-toe as a steampunk gent. 2015 was super hero themed. I LOVE super heroes! I must have decided and then undecided 50 times before settling on my costume idea. I am a massive fan of Adventure Time so this year I was Finn! Unfortunately 95% of SQLBits attendees had never seen Adventure Time. I ate, drank, partied, rode a zip line I even battled aliens with Steve Jones. Yet again this was an amazing event.

party

Day 4: Saturday sessions & the trip home 🙁
Saturday is the FREE community day and is chock full of content. Sessions started at 08:15 an ran until 18:00 – pretty full on!

I attended the following sessions:

How to decide if your database is a good candidate for Virtualisation – Denny Cherry
Analytics Master Class-exploiting exotic patterns in data – Mark Whitehorn. This session was the best session at SQLBits and possibly the best session I have attended at any conference. Mark is a Professor at Dundee university and his teaching style was spot on. The content was tricky but demonstrated in an accessible way. After a regular session there is typically a few people left asking questions. Testament to Mark, half the room were waiting to ask more. I really hope you all watch this session when it is available.
The BI Power Hour – Matt Masson. They say what happens in the Power Hour stays in the power hour – It was awesome 😛
Introducing the Reimagined Power BI Platform – Jen Underwood. Power BI is great!

And with that, SQLBits was over for another year and this year was great! I had a blast and met some really amazing people – especially those who knew me through this blog!

See you all next year!

SSRS Inception – Part 1

large_inception_blu-ray_4

Hi everyone,

This is the first part of a series of blogs looking at a session I am delivering at SQL Saturday Exeter on the 25th of April 2015. The session is titled “SSRS Inception: Reporting on Reporting Services”. In this session I hope to give you the necessary tools and information you need to begin monitoring SSRS.

Why should we monitor SSRS? 

In SSRS there is no obvious way to see which reports are being used and how often. If you have a large SSRS environment then this can have an impact how your users consume your reports. How do you know if no one is looking at a report because it takes too long to run. With this in mind I went about developing a way to see which reports are being used, how often and by whom.

How to monitor SSRS

Before we begin Microsoft has a warning about playing with the SSRS database:

The table structure for the databases is optimized for server operations and should not be modified or tuned. Microsoft might change the table structure from one release to the next. If you modify or extend the database, you might limit or prevent the capability to perform future upgrades or apply service packs. You might also introduce changes that impair report server operations… All access to a report server database must be handled through the report server.. [1]

Ok, so it is clear Microsoft do not want you to mess with your production instance of SSRS and if you do you your instance might no longer be supported by Microsoft. You have been warned. But… that doesn’t mean we can’t play with a replicated version and leave the primary relatively untouched. To do this you have a few options. You could use transactional replication, log shipping, restoring a nightly backup or something different- I leave this part up to you. It really depends on how often you want your data to refresh. For my purposes a nightly run is fine.

But before we get in to how we can do this, let first have a look around the tables which comprise the SSRS DB.

SSRS database architecture 

I am using SQL Server Reporting Services with SQL Server 2012. When you configure SSRS you’re prompted to create a database to hold the definition of your reports. When you connect to SQL Server you’re able to see the database listed. Your database will be listed along with a TempDB.

SSRS_1

 

 

 

If you expand this database you can see 34 tables, these tables are used to manage report items, subscriptions, schedules, report snapshots, system properties, data sources, datasets, symmetric keys and the execution log details – where most of this session is focused.

To support monitoring our executions we are going to need data from the following tables:

Catalog – The catalog holds all the juicy metadata about all the objects stored in SSRS.
Datasets – Info about published shared datasets.
DataSources – Info about published shared data sources.
ExecutionLogStorage – This is where the magic happens, the who, how, when why. All your questions are answered here!
ModelItemPolicy – Bridge between Catalog and Policies.
Policies – Used as part of the SSRS security
PolicyUserRole – Used as part of the SSRS security
ReportSchedule – Bridges Catalog, Schedule & subscriptions together.
Schedule – Details about when and how often a report is set to auto generate.
Subscriptions – Who is subscribed to receive a report.
SecData – All the detail about who has access.
Users – Who has access.

Before we go on, there are some caveats to consider before you start working with the SSRS DB.

1. Firstly a lot of the content is stored in XML. To get all the best bits of data you need to be able to write basic XML queries (XPATH).

2. A lot of what you find in the SSRS DB has very little documentation . For example there are tonnes of integer values for attributes like type, but there is no reference/enum/lookup table for type, this is common for almost all the tables. Where this happens I have done my best to outline which each value relates to.

3. SSRS DB will only store data for as long as you need it, be default it only stores data for 60 days – So when you start this process it is unlikely you will be able to mine anything more than 60 days of executions. However this can be altered by logging on to the SSRS DB and updating the ConfigurationInfo table to set the ExecutionLogDaysKept to something greater than 60 – depending on how many users and reports you have something like 120 is fine.

UPDATE  ConfigurationInfo
SET     Value = '365'
WHERE   Name = 'ExecutionLogDaysKept'

As SSRS DB only keeps executions for a short amount of time (unless it has been changed), I have modelled this project as a data warehouse – in the Kimball style. We essentially then have a lot of dimension tables such as Dim_Catalog, Dim_User and all the associated dimensions Dim_Date & Dim_Time. This follows the basic Kimball star schema. This is loaded as often as you need it to be, using SSIS. As part of the scripts section there is a script to create a job that will execute the SSIS package. The SSIS package isn’t very complicated and I will go in to more about it in the next blog.

At the end of this series of blogs I will make the scripts & packages available to download. Included will be everything you need to set up a clone of your SSRS DB, extract data from that, build where the data is to be loaded (the SSRSDB) and all the views for interrogating the data warehouse. Also included are a set of reports that will allow you to manage your reporting server. There is a lot more built in to this that I will also blog about and make available.

Within the SSRSDB, there are additional tables that allow you to create a metadata driven theme layer on top of your reports. This requires you to load information about each report in to the ReportServerDW. To help you to do this an Excel file is also included in the download. Once this has been populated, a job can be created (again in the download) that will notify users (via email) who asked for a report to be created, that their report has not been used for X amount of days and give them the option to save their report. If the users fails to save their report it was either removed or hidden. This is configured per report in the Excel document.

What this project can also do which is pretty cool, is act as a source control for your reports. Ok it isn’t really a source control because it doesn’t allow all the nice thing like branching, however it will store a version (in XML) of each iteration of a report uploaded. I will also include an SSIS package that will export all the historic versions on a report.

Think this is neat? Me to. In the next instalment we will look more in to the data being stored and the SSIS package which is used to load the data.

Thanks for reading.

[1] https://msdn.microsoft.com/en-gb/library/ms156016.aspx

Presenting at SQL Saturday Exeter 2015 (YAY!)

stage-fright1

This year I have 3 new year’s resolutions.

  1. Start a blog – Done! You’re reading it 😛
  2. Gain the MCSE in both BI and the data platform – Working on it.
  3. Start presenting at user groups and conferences – Read on

I have been attending conferences for about the last 3 years. SQL Saturday Exeter in 2013 was the first database conference I attended. I had no idea what to expect, I remember thinking that everyone there would be a database guru and I would be so out of my depth. However this was not the case. SQL Saturday Exeter is run by the SQL South West UG, which at the time I had only been attending for about 4 months. The event was a great success! I learnt tonnes  and it sparked an interest that just keeps getting bigger – Presenting. To date, I have delivered lightning talks at SSW but nothing longer than 15 minutes. Since then I have attended about 10 conferences and always learn loads!

Now in its third year SQL Saturday Exeter is running again on the 24th and 25th of April. There is a paid for training day on the Friday and a free day on the Saturday (Click here to see the brilliant selection of sessions).  I am somewhat bias when it comes to SQL Saturday Exeter as I am co-organising the event, this meant I was unsure if I would have enough time to deliver a session. However I bit the bullet and submitted 3 sessions (on PowerPivot, TSQL and the last was on reporting on SQL Server Reporting Services). The last one was selected ! YAY!

So on the 24th of April at 15:00, in Exeter, you can come and see me present for the first time at a conference on “SSRS inception: Reporting on Reporting services” – Abstract below.

Still using Reporting Services? Yeah me too! Does this scenario sound familiar? You already have a large SSRS environment but the users still want more reports. You create a new, all singing, all dancing dashboard that removes the need for a bunch of reports, awesome. You launch it and explain to the business  that you are deprecating the old reports. “WAIT! We still need that for Jill in HR to do her end-of-month”, or “We might still need that” – these types of comments are what leads to having a load of reports that are not being used.

Wouldn’t it be great if you could see who is viewing a report, in what format and how often, all inside your current SSRS environment? With this data wouldn’t it then be even better if we could automate a process to email the owner of that report to say “HA! I told you no one looked at this report!”. In this session we will look at how we can do all of this and more by mining the SSRS DB.

This session came about from when I started working in my current role. I joined a company with a lot of SSRS reports (>150 – A mixture of dashboards and operational reporting). I was sure that there was no possible way that all these reports could be actually being used. So I set about making something that would prove me right. This has now been in production for about 5 months.

  • So what will you learn from this session?
  • You will learn about the SSRS DB and what it holds.
  • How to mine the SSRS DB for gold.
  • How to do this in a way that does not upset Microsoft.
  • A little bit about data warehousing
  • And a little bit of SSIS.

As there is a lot that has gone in to this project and I want it to be easy to use and understand I will write a blog in the next few weeks, that goes in to how each part is created before the session. All scripts, SSIS packages and the slides will be available to download before the session.

I am really looking forward to delivering my first conference session. I hope you like it and I look forward to seeing you in April.

If you cannot make it in April, I was also be presenting this session at SQL South West UG on the 16th of February, and at the Cardiff UG on the 31st of March.

Exeter, UK. Leading the way in Big Data

bigstock-Big-data-concept-in-word-tag-c-49922318

In recent months Big Data seems to have hit Exeter. With the announcement of a 97 million pound investment in to a new big data “Super Computer”, the Exeter based MET Office is set to make the South West a “world-leader in weather and climate change” and Big Data [1].

This increased attention on the South West seems to have prompted a surge of Big Data related training and conferences.

Unlocking Big Data – Investing in Human Capital 28/01/2015 [2]

The aim of the event is to help the Heart of the South West realise its Big Data potential by shaping the conversation and coming together to define the focus for our investment in Big Data Human Capital. It will also discuss possible funding proposals to the European Structural and Investment Fund Strategy.

There are 5 sessions tackling subjects such as building a Big Data team and the Big Data revolution. This is a free to attend conference held at the MET Office on the 28th January, and is organised by the Heart Of The South West (HOSW). The HOSW are expecting this event to be well subscribed so register early. Register here. [2]

Date: 28/05/2015
Start time: 08:45
Location: MET Office

Data visualisation workshop – 28/01/2015 [3]

The workshop is designed to encourage collaborations and discussions between bioscientists, mathematicians, systems biologists, physicists, geographers, medics, engineers, psychologists, computing scientists and anyone else with an interest in visualising large and/or complex data.

Unfortunately this workshop is on the same day as “Unlocking Big Data”. This workshop is being organised as part of Exeter University’s Big Data Challenges Series and The Exeter Imaging Network. This workshop is designed to promote cross-disciplinary networking regarding the challenges of visualising and interpreting data.

You can find details here.

Date: 28/01/2015
Start time: 13:30
Location: Exeter University Washington Singer 219

ExIStA-YSS: Early-Career Stats Showcase – 11/02/2015 [4]

The evening will begin with a series of eight-minute talks from eight speakers whose professions span the statistical spectrum, giving a unique insight into the variety of different careers that are available, and a flavour of the current challenges and future trends within each area.

Following on from the Data vis workshop, Exeter University will be hosting an evening showcasing young-career statisticians work or those working with statistics in the South West. The evening will feature 8 minute talks followed by snacks. Free to attend. Register via their website [4].

Date: 11/02/2015
Start time: 17:00
Location: Exeter University, Meeting Rooms 1-3 of the Forum, on the University of Exeter’s Streatham campus

SQL Saturday Exeter 2015 – 24/04/2015 & 25/04/2015 [5]
(Disclosure, I am an organiser of SQL Saturday Exeter).

Now in its third year, SQL Saturday Exeter is a community driven free to attend event on a huge variety of topics for every type of data professional. Sessions are not announced but in recent years sessions topics have included Parallel data warehousing, data visualisation, R & hadoop for Big Data.

24th April – Friday pre-conference training day. Whole day training on one of a number of sessions from world class SQL Server professionals. Sessions have not been announced yet, for more details follow #SQLSatExeter on Twitter or via the SQLSouthWest website [5]

25th April – SQL Saturday. Approximately 30 training sessions over 4 tracks covering all areas of working with SQL Server. Speakers will be a mixture of MVP’s, international speakers and people making their first appearances as speakers. Admittance to the Saturday event is free, all costs are covered by donations and sponsorships .

Please register soon as attendance is limited and tickets go fast!

Register here [6]

Date: 24/04/2015 & 25/04/2015
Start time: 08:00
Location: Jurys Inn, Exeter.

With 4 events in 4 months, Exeter is storming ahead in the Big Data sphere. If you’re having a big data event let me know in the comments.

[1] http://www.metoffice.gov.uk/news/releases/archive/2014/new-hpc
[2] http://www.eventbrite.co.uk/e/unlocking-big-data-investing-in-human-capital-registration-14775029511
[3] http://www.exeter.ac.uk/news/events/details/index.php?event=2562
[4] http://www.exista.org/events/exista-yss-showcase/#d.en.427409
[5] http://sqlsouthwest.co.uk/sql-saturday-372/
[6] https://www.sqlsaturday.com/372/eventhome.aspx

PIVOT! – Manual Pivot and the PIVOT function

Pivot

(Whenever I hear PIVOT this is what I think of, PIVOTTTT!!!)

Right, to start this series of posts I wanted to write about a method of manipulating data I refer to as a “manual PIVOT”.

For those who have use the TSQL PIVOT you with know it enables you to transpose data like you can do in excel (Rows become columns etc). For those who don’t already use it have you ever looked at the syntax in TSQL? It is horrible and almost impossible to remember (see below).

SELECT,
    [first pivoted column] AS,
    [second pivoted column] AS,
    ...
    [last pivoted column] AS
FROM
    (<select>)
    AS 
PIVOT
(
    ()
FOR
[]
    IN ( [first pivoted column], [second pivoted column],
    ... [last pivoted column])
) AS 
;
</select>

You can however create something similar using conditional logic.

If you’re using a version of SQL Server before SQL2012 then you can use a CASE statement, for those on a version greater than or equal to SQL2012 you can use either a CASE or the new Inline If (IIF). If you are unfamiliar with the CASE statement the syntax is as follows:

CASE WHEN Expression THEN Result WHEN Expression THEN Result ELSE Result END as x

Each “Expression” is a boolean, supplement with conditional logic which evaluates to TRUE.

For all of these examples I will use the AdventureWorks2012 database. If you don’t have this you can grab it from CodePlex

use AdventureWorks2012;
GO
 
SELECT CustomerID, OrderDate, TotalDue FROM Sales.SalesOrderHeader ORDER BY CustomerID

Pivot_Q1
The query above produces a simple list of what each customer has spent ordered by the customerID. What if we want to know how much each customer had spent each year? Well we could write something like this:

SELECT  CustomerID
		, YEAR(OrderDate) as 'OrderYear'
        , SUM(TotalDue) AS 'OrderTotal'
FROM    Sales.SalesOrderHeader
GROUP BY CustomerID, YEAR(OrderDate)
ORDER BY CustomerID

Pivot_Q2

This is fine but what if the requirement was to have years as columns to see the spend trend. This is where we can use the conditional logic to create a manual pivot. Using CASE and the YEAR(OrderDate) we can check to see which year the sales relate to.

SELECT  CustomerID
        , CASE WHEN YEAR(ORDERDate) = 2005 THEN (TotalDue) END AS '2005_OrderTotal'        
		, CASE WHEN YEAR(ORDERDate) = 2006 THEN (TotalDue) END AS '2006_OrderTotal'
        , CASE WHEN YEAR(ORDERDate) = 2007 THEN (TotalDue) END AS '2007_OrderTotal'
        , CASE WHEN YEAR(ORDERDate) = 2008 THEN (TotalDue) END AS '2008_OrderTotal'
        , CASE WHEN YEAR(ORDERDate) = 2009 THEN (TotalDue) END AS '2009_OrderTotal'
FROM    Sales.SalesOrderHeader
ORDER BY CustomerID

Pivot_Q3

The problem with this is that we have multiple rows for our customers. Where a customer has had a spend in 2005, all other years will show NULL. As we are looking for the total amount spent in each year we can simply SUM these totals and GROUP BY the CustomerID.

SELECT  CustomerID
        , SUM(CASE WHEN YEAR(ORDERDate) = 2005 THEN (TotalDue) END) AS '2005_OrderTotal'        
		, SUM(CASE WHEN YEAR(ORDERDate) = 2006 THEN (TotalDue) END) AS '2006_OrderTotal'
        , SUM(CASE WHEN YEAR(ORDERDate) = 2007 THEN (TotalDue) END) AS '2007_OrderTotal'
        , SUM(CASE WHEN YEAR(ORDERDate) = 2008 THEN (TotalDue) END) AS '2008_OrderTotal'
        , SUM(CASE WHEN YEAR(ORDERDate) = 2009 THEN (TotalDue) END) AS '2009_OrderTotal'
FROM    Sales.SalesOrderHeader
GROUP BY CustomerID
ORDER BY CustomerID

Pivot_Q4

This is now displaying one row for each customer with the amounts spent between 2005 and 2009 as columns.

This method works with all sorts of use cases and isn’t restricted to mathematical calculations. You can use this method to to calculate the top 3 products sold for each customer as columns for example.

; WITH RankedProducts AS 
(
SELECT  CustomerID
		, p.Name
		, SUM(OrderQty) OrderQty
		, ROW_NUMBER() OVER (PARTITION BY CustomerID ORDER BY SUM(OrderQty) DESC) OrderRank
FROM    Sales.SalesOrderDetail
		INNER JOIN Sales.SalesOrderHeader ON Sales.SalesOrderHeader.SalesOrderID = Sales.SalesOrderDetail.SalesOrderID
        INNER JOIN Production.Product p ON p.ProductID = Sales.SalesOrderDetail.ProductID
GROUP BY 
	CustomerID
		, p.Name
)
SELECT CustomerID
	, MAX(CASE WHEN OrderRank = 1 THEN Name ELSE NULL END) AS 'ProductSales1'
	, MAX(CASE WHEN OrderRank = 2 THEN Name ELSE NULL END) AS 'ProductSales2'
	, MAX(CASE WHEN OrderRank = 3 THEN Name ELSE NULL END) AS 'ProductSales3'
FROM RankedProducts
GROUP BY 
	CustomerID

Pivot_Q5

This works by apply in a MAX to the Name based on the OrderRank. OrderRank is calculated using a window function for more on window functions, see an upcoming post in this series. MAX works because it treats the Name as alphanumeric, which allows the MAX to take the word over the NULL as that is higher alphanumerically.

You can use this to work out all sorts scenarios.

Thanks for reading.

Lets get started

So I have been meaning to start blogging for the last few years. I love TSQL and I really enjoy solving tricky queries using a combination TSQL tricks. So this blog will focus on these topics as well as BI and DBA related fun.