Over the last 2 days I have been partaking in lots of data related fun! On Friday 26th June I attended the ODI Devon’s first conference “Beyond the smart city” #btsc15. It was a great event. As part of this Myself and the SQL South West team were asked to do some technical sessions on Saturday (27th). I love getting out there and meeting new people so jumped at the opportunity. The session was aimed at individuals who might not know the MS BI stack very well, or someone who knew what there was but possibly not how to use it – No deep dives, etc.
My session was about exploring “Open data”. To do this I used a variety of Microsoft tools (Excel, PowerPivot, Azure, Azure SQL Database, SSIS and Power BI). I wanted to use this blog post the walk you through how I went about preparing for this session and as a reference for those who attended the session. I had one goal, to demonstrate how anyone can quickly “Ask questions about their data”.
So what did we do?
- We looked at where we can get open data
- We analysed that data in Excel using pivot tables, answering basic questions
- We wanted more data and somewhere to store it so we created an Azure SQL database
- We loaded the data into Azure using SSIS
- We analysed the data further in PowerPivot and built a model, but it wasn’t good enough
- We imported out PowerPivot model in to Power BI and started asking questions of our data
Let’s look at that in a bit more detail.
To start off I looked at where to find open data. For my examples I used the open data provided by Exeter council. On Exeter council’s website there is a lot of interesting rich datasets. This included property listings owned by ECC and also their spend over the last 2 years. I used both of these datasets as source data. Much like most open data, you get what you pay for. There was a small issue with the data, it in separate files – there were 15 files for the last 15 months.
Sure you can use Excel, and copy and paste each file in to one long list, but what if you want that stored somewhere you can quickly get to, or somewhere you can share that data with colleagues? You guessed it, we need a database to store this data. This session was designed for anyone to attend and the replicate at home. I didn’t want attendees to have to download SQL Express, and a lot of what I am demonstrating is cloud based, so Azure was the obvious choice.
So we needed a database in Azure (for this demo I am using the newer portal V2). Head over to https://portal.azure.com/ and create an account. Once there complete the following steps to spin up a brand new SQL database.
- Open the market place and search for SQL Database
- Hit create
- Give your database a name and configure the settings
- Server – Create a new server as required.
- Choose you pricing model
- Configure further as required
- Hit create
Within 40 seconds you have a working database in the cloud. If you have trouble logging in to the database you may need to alter the firewall settings to allow the IP address you’re connecting from.
Now we have our functioning database we can begin to load it with data. To do this we first need to create a few tables to store the data.
First we create the spend data:
CREATE TABLE [dbo].[SpendData]( [Body Name] [varchar](50) NULL, [Trans Date] [datetime2](7) NULL, [Supplier Name] [varchar](50) NULL, [Trans Ref] [int] NULL, [Net Amount] [decimal](15, 2) NULL, [Expense Description] [varchar](50) NULL, [Expense Code] [int] NULL, [Service Code] [varchar](50) NULL, [BVACOP] [varchar](50) NULL, [Category] [int] NULL )
Then the building location data
CREATE TABLE [dbo].[BuildingLocations]( [Uprn] [varchar](50) NULL, [AssetRef] [varchar](50) NULL, [AssetName] [varchar](1000) NULL, [Postcode] [varchar](50) NULL, [Easting] [decimal](18, 0) NULL, [Northing] [decimal](18, 0) NULL, [InterestType] [varchar](50) NULL, [AssetType] [varchar](150) NULL, [AssetUse] [varchar](150) NULL )
I also used a script from codeproject to dynamically create a date tally table (These are so important. If you don’t have a numbers table and a dates tally then go and make one now) http://www.codeproject.com/Articles/647950/Create-and-Populate-Date-Dimension-for-Data-Wareho. Now we have 3 tables, 2 empty and 1 with 20 years of dates. Let’s load the missing data.
To do this I have used SQL Server Integration services – SSIS. SSIS is Microsoft’s ETL tool for moving and transforming data. The package is designed to be run several times to enable this the first step is to TRUNCATE the existing tables. After that we want to loop over all the files (I have simply pulled all the data from the ECC website) then a final transfer for the building data. The more data you have the more that will be imported – just keep adding the spend data to the folder. This SSIS package is available to download at the bottom of the page.
Once the data was loaded we headed over to Excel again a loaded the data in to a PowerPivot model. I am using Excel 2013 so PowerPivot is already installed but not enabled. For those using 2010 go and grab the add-in from here. Once installed enable the add-in and open PowerPivot. On the Home ribbon we looked at how to import the data from Azure.
Click on From database
Select SQL Server
Fill in credentials
Now we have our data we build our basic model by using the diagram view to connect the 2 tables.
Now we can ask more complicated questions of our data such as which quarter is the most money spent, we can analyse by month, year and any other column in our date tally. But really this isn’t good enough! Our end user needs to know a lot about Excel to really get what they need. We are creating something for someone who really has no idea about BI tools, maybe you managing director. So to really begin to ask questions of our data we need Power BI, in particular the Power BI Q & A.
At the time of writing, Power BI is still in preview. There is no cost structure in place and you need a corporate account to use it (You can’t use a gmail or live account). Head over to https://powerbi.microsoft.com/ and register yourself.
What we want to do is save our PowerPivot model and suck that up in to Power BI.
- Click on the big yellow “Get Data” button in the bottom left to begin
- From there we want to get data from files
- Select “Local File”
- Navigate to the PowerPivot model
- Hit open
You should see an “Importing data” window
Once that is complete we can start literally asking questions of our data. Hit the pin icon when you’re happy to pin the item to your dashboard. You can start creating interactive reports that link from your dashboard and offer a more detailed view of your data.
That is a very high level introduction to working with data in Power BI. I will do some more blogs on Power BI in the coming weeks, once I get my head around the Power BI API :S
Thanks for reading