(Whenever I hear PIVOT this is what I think of, PIVOTTTT!!!)
Right, to start this series of posts I wanted to write about a method of manipulating data I refer to as a “manual PIVOT”.
For those who have use the TSQL PIVOT you with know it enables you to transpose data like you can do in excel (Rows become columns etc). For those who don’t already use it have you ever looked at the syntax in TSQL? It is horrible and almost impossible to remember (see below).
SELECT, [first pivoted column] AS, [second pivoted column] AS, ... [last pivoted column] AS FROM (<select>) AS PIVOT ( () FOR  IN ( [first pivoted column], [second pivoted column], ... [last pivoted column]) ) AS ; </select>
You can however create something similar using conditional logic.
If you’re using a version of SQL Server before SQL2012 then you can use a CASE statement, for those on a version greater than or equal to SQL2012 you can use either a CASE or the new Inline If (IIF). If you are unfamiliar with the CASE statement the syntax is as follows:
CASE WHEN Expression THEN Result WHEN Expression THEN Result ELSE Result END as x
Each “Expression” is a boolean, supplement with conditional logic which evaluates to TRUE.
For all of these examples I will use the AdventureWorks2012 database. If you don’t have this you can grab it from CodePlex
use AdventureWorks2012; GO SELECT CustomerID, OrderDate, TotalDue FROM Sales.SalesOrderHeader ORDER BY CustomerID
The query above produces a simple list of what each customer has spent ordered by the customerID. What if we want to know how much each customer had spent each year? Well we could write something like this:
SELECT CustomerID , YEAR(OrderDate) as 'OrderYear' , SUM(TotalDue) AS 'OrderTotal' FROM Sales.SalesOrderHeader GROUP BY CustomerID, YEAR(OrderDate) ORDER BY CustomerID
This is fine but what if the requirement was to have years as columns to see the spend trend. This is where we can use the conditional logic to create a manual pivot. Using CASE and the YEAR(OrderDate) we can check to see which year the sales relate to.
SELECT CustomerID , CASE WHEN YEAR(ORDERDate) = 2005 THEN (TotalDue) END AS '2005_OrderTotal' , CASE WHEN YEAR(ORDERDate) = 2006 THEN (TotalDue) END AS '2006_OrderTotal' , CASE WHEN YEAR(ORDERDate) = 2007 THEN (TotalDue) END AS '2007_OrderTotal' , CASE WHEN YEAR(ORDERDate) = 2008 THEN (TotalDue) END AS '2008_OrderTotal' , CASE WHEN YEAR(ORDERDate) = 2009 THEN (TotalDue) END AS '2009_OrderTotal' FROM Sales.SalesOrderHeader ORDER BY CustomerID
The problem with this is that we have multiple rows for our customers. Where a customer has had a spend in 2005, all other years will show NULL. As we are looking for the total amount spent in each year we can simply SUM these totals and GROUP BY the CustomerID.
SELECT CustomerID , SUM(CASE WHEN YEAR(ORDERDate) = 2005 THEN (TotalDue) END) AS '2005_OrderTotal' , SUM(CASE WHEN YEAR(ORDERDate) = 2006 THEN (TotalDue) END) AS '2006_OrderTotal' , SUM(CASE WHEN YEAR(ORDERDate) = 2007 THEN (TotalDue) END) AS '2007_OrderTotal' , SUM(CASE WHEN YEAR(ORDERDate) = 2008 THEN (TotalDue) END) AS '2008_OrderTotal' , SUM(CASE WHEN YEAR(ORDERDate) = 2009 THEN (TotalDue) END) AS '2009_OrderTotal' FROM Sales.SalesOrderHeader GROUP BY CustomerID ORDER BY CustomerID
This is now displaying one row for each customer with the amounts spent between 2005 and 2009 as columns.
This method works with all sorts of use cases and isn’t restricted to mathematical calculations. You can use this method to to calculate the top 3 products sold for each customer as columns for example.
; WITH RankedProducts AS ( SELECT CustomerID , p.Name , SUM(OrderQty) OrderQty , ROW_NUMBER() OVER (PARTITION BY CustomerID ORDER BY SUM(OrderQty) DESC) OrderRank FROM Sales.SalesOrderDetail INNER JOIN Sales.SalesOrderHeader ON Sales.SalesOrderHeader.SalesOrderID = Sales.SalesOrderDetail.SalesOrderID INNER JOIN Production.Product p ON p.ProductID = Sales.SalesOrderDetail.ProductID GROUP BY CustomerID , p.Name ) SELECT CustomerID , MAX(CASE WHEN OrderRank = 1 THEN Name ELSE NULL END) AS 'ProductSales1' , MAX(CASE WHEN OrderRank = 2 THEN Name ELSE NULL END) AS 'ProductSales2' , MAX(CASE WHEN OrderRank = 3 THEN Name ELSE NULL END) AS 'ProductSales3' FROM RankedProducts GROUP BY CustomerID
This works by apply in a MAX to the Name based on the OrderRank. OrderRank is calculated using a window function for more on window functions, see an upcoming post in this series. MAX works because it treats the Name as alphanumeric, which allows the MAX to take the word over the NULL as that is higher alphanumerically.
You can use this to work out all sorts scenarios.
Thanks for reading.