Calculating distance with SSIS and the Google distance matrix API

APIHeader
Recently I set up an interactive map in R & Shiny to show the distance between where a customer lives and one of the sites where a customer attends. What I wanted to do was to find out how long it took each person to get to the site using public transport, which proved difficult in R, so i settled for a “as the crow flies” approach using pythagoras. This worked, but since then I have had in the back of my mind a burning question “what if I could used the Google API to give me that distance”. This is what I have now done, and this is how you do it.

Google have many APIs that can be used at little to no cost. The Google distance matrix API allows you to enter a starting position and one or more end positions, and the API will return the distance based on a mode of transport of your choice (walking, driving, transit – public transport). All you need to do is register and get an API key – https://developers.google.com/maps/documentation/distance-matrix/

Once you have an API key you will need to keep hold of this. This key allows you to call the API, no key, no result. If you read the documentation here you can get a few examples of how to use the key. If you change the URL below to include your API key where YOUR_API_KEY is currently you will retrieve the JSON for that journey. By default the GDM is set to driving distance, but this can be changed. Also this is currently JSON, if you would prefer to have the result as XML, change json? for xml?.

https://maps.googleapis.com/maps/api/distancematrix/json?units=imperial&origins=40.6655101,-73.89188969999998&destinations=40.6905615%2C-73.9976592%7C40.6905615%2C-73.9976592%7C40.6905615%2C-73.9976592%7C40.6905615%2C-73.9976592%7C40.6905615%2C-73.9976592%7C40.6905615%2C-73.9976592%7C40.659569%2C-73.933783%7C40.729029%2C-73.851524%7C40.6860072%2C-73.6334271%7C40.598566%2C-73.7527626%7C40.659569%2C-73.933783%7C40.729029%2C-73.851524%7C40.6860072%2C-73.6334271%7C40.598566%2C-73.7527626&key=YOUR_API_KEY

The most basic use is to calculate the distance from A to B. This is what we will look at using SSIS. The API takes in an origin and a destination. These can either be a lat & lng combo or an address which we be translated into Lat & lng when executed. To demonstrate the process I have created a database and loaded it with some sample customer and store data. We have 2 customers and 4 stores. These customers are linked to different stores. You can download the example files at the bottom of the page.

--CREATE DATABASE WebService_Example;
GO
 
-- ###################################################################################
-- Create tables
-- ###################################################################################
USE WebService_Example;
GO
 
DROP TABLE dbo.CustomerAddresses;
CREATE TABLE dbo.CustomerAddresses 
(
CustomerAddressID INT IDENTITY PRIMARY KEY
, CustomerName Varchar(150)
, CustomerAddress VARCHAR(500)
);
GO
 
DROP TABLE dbo.StoreAddresses;
CREATE TABLE dbo.StoreAddresses
(
StoreAddressID INT IDENTITY PRIMARY KEY
, StoreName VARCHAR(150)
, StoreType VARCHAR(20)
, StoreAddress VARCHAR(150)
); 
GO
 
DROP TABLE dbo.CustomerStoreAddress;
CREATE TABLE CustomerStoreAddress
(
CustomerAddressID INT
, StoreAddressID INT
)
 
DROP TABLE dbo.CustomerAddressDistance;
CREATE TABLE dbo.CustomerAddressDistance
(
CustomerName Varchar(150)
, CustomerAddress VARCHAR(500)
, StoreName VARCHAR(150)
, StoreAddress VARCHAR(150)
, Distance VARCHAR(50)
, EstimatedTime VARCHAR(50)
, ModeOfTransport VARCHAR(50)
)
 
 
-- ###################################################################################
-- Load tables
-- ###################################################################################
 
INSERT INTO dbo.CustomerAddresses
        ( CustomerName
        , CustomerAddress )
VALUES
('A Customer', '46 Church Road, Alphington, Exeter, EX2 8SZ')
,('B Customer', 'Flat 9, Landsdown Terrace, Exeter, EX2 4JJ')
 
INSERT INTO dbo.StoreAddresses
        ( StoreName
        , StoreType
        , StoreAddress )
VALUES
('Tesco Extra', 'Vale', 'Exeter Vale Shopping Centre, Russell Way, Exeter EX2 7EZ')
, ('Tesco Express', 'Heavitree', '85 Fore St, Heavitree, Exeter, Devon, EX1 2RN')
, ('Sainsburys Superstore', 'Alphington', 'Alphington Rd, Exeter EX2 8NH')
, ('Sainsburys Local', 'Sidwell', '12 Sidwell St, Exeter EX4 6NN');
 
INSERT INTO dbo.CustomerStoreAddress
        ( CustomerAddressID
        , StoreAddressID )
VALUES
(1,1)
, (1,2)
, (1,3)
, (1,4)
, (2,3)
, (2,4)
 
-- ###################################################################################
-- Query
-- ###################################################################################
 
SELECT 
	CA.CustomerName
	, CA.CustomerAddress AS 'Org'
	, SA.StoreName
	, SA.StoreAddress AS 'Des'
FROM dbo.CustomerAddresses CA
INNER JOIN dbo.CustomerStoreAddress CSA ON CA.CustomerAddressID = CSA.CustomerAddressID
INNER JOIN dbo.StoreAddresses SA ON CSA.StoreAddressID = SA.StoreAddressID

This will build the tables and populate the tables that are used in the SSIS project.

The SSIS project consists of a data flow task (DFT), which contains and OLEDB source, a Derived column transformation (to add the columns which are to be loaded), a script task (C#) and the an OLEDB destination to load the results. There are 2 variables, the mode of transport (walking, driving, transit) and also your API key. Below is the C# required to call the API and shred the XML.

 
#region Namespaces
using System;
using System.Data;
using System.Web;
using System.Windows.Forms;
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Runtime.Wrapper;
using Microsoft.SqlServer.Dts.Runtime;
using System.Net;
using System.Xml;
// Make sure you add any missing references
#endregion
 
 
[Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
public class ScriptMain : UserComponent
{
 
    public override void PreExecute()
    {
        base.PreExecute();
    }
 
 
    public override void PostExecute()
    {
        base.PostExecute();
    }
 
    public override void Input0_ProcessInputRow(Input0Buffer Row)
    {
 
        // Create a new web client 
        WebClient wc = new WebClient();
 
        // Select your mode of transport - Edit the var in SSIS
        String TransportMode = Variables.ModeOfTransport;
 
        // Add reference to your API
        String APIKey = Variables.DistanceAPIKey; 
 
        // Build the URL that is to be requested
        String uri = "https://maps.googleapis.com/maps/api/distancematrix/xml?units=imperial&origins=" + Row.Org + "&destinations=" + Row.Des + "&mode=" + TransportMode + "&key=" + APIKey;
 
        // Get the XML
        string geoCodeInfo = wc.DownloadString(uri);
        XmlDocument xmlDoc = new XmlDocument();
        xmlDoc.LoadXml(geoCodeInfo);
 
        // Shred the XML
        String Duration = Convert.ToString(xmlDoc.SelectSingleNode("DistanceMatrixResponse/row/element/duration/text").InnerText);
        String Distance = Convert.ToString(xmlDoc.SelectSingleNode("DistanceMatrixResponse/row/element/distance/text").InnerText);
        //String Fare = Convert.ToString(xmlDoc.SelectSingleNode("DistanceMatrixResponse/row/element/fare/text").InnerText);
 
        // Update the outgoing rows. 
        Row.Time = Duration;
        Row.Distance = Distance;
 
    }
 
}

The script component looks at the rows coming in and passes them into a URL which is executed by a web client. The results are then updated into the Time and Distance Rows.

Once the data has been loaded we can analyse the results and see what is the closest store by multiple methods of transport.

-- ###################################################################################
-- Check results
-- ###################################################################################
 
-- transit
-- driving
-- walking
SELECT
    CustomerName
  , CustomerAddress
  , StoreName
  , StoreAddress
  , Distance
  , EstimatedTime
  , ModeOfTransport
  , CASE WHEN EstimatedTime LIKE '%hour%' then CONVERT(TIME,REPLACE(REPLACE(EstimatedTime,' hour ', ':'), ' mins', '')) ELSE '00:' + REPLACE(EstimatedTime, ' Mins', '') END TimeInMinutes
  , ROW_NUMBER() OVER (PARTITION BY CustomerName ORDER BY CASE WHEN EstimatedTime LIKE '%hour%' then CONVERT(TIME,REPLACE(REPLACE(EstimatedTime,' hour ', ':'), ' mins', '')) ELSE '00:' + REPLACE(EstimatedTime, ' Mins', '') END) AS 'Rank'
FROM
    CustomerAddressDistance;

2016-05-14_17-07-30

Download scripts and solution: GoogleAPI SSIS.

Sun Modelling for Data Warehouses. Visio template

2016-05-12_14-29-31

Sun modelling is a technique for requirement gathering for a multidimensional data warehouse modelling. I have used or read about many different DW modelling techniques, however none is quite as simple and elegant as Sun Modelling. Sun Modelling was designed by Prof Mark Whitehorn (Dundee University). I won’t go into the details of how you execute a sun model today, however I will write about this in the future.

For the moment I have tried to simplify the process of building a sun model in visio. You can download a visio stencil which contains the main elements required to make a Sun Model. Also included is a sample visio document.

Download it here.

2016-05-12_14-37-50