Archive for the ‘Data Warehousing’ Category.

Date Dimension Stored procedure using a CTE

Have been playing around a little more making my time dimension creation and population a little more efficient

Let’s say we have a dimension table created like so


CREATE TABLE [dbo].[CALENDAR](
[CAL_DATE] [datetime] NULL,
[CAL_DAY] [varchar](50) NULL,
[CAL_DAY_IN_WEEK] [int] NULL,
[CAL_DAY_IN_MONTH] [int] NULL,
[CAL_DAY_OF_YEAR] [int] NULL,
[CAL_WEEK] [int] NULL,
[CAL_MONTH] [varchar](50) NULL,
[CAL_SHORT_MONTH] [varchar](50) NULL,
[CAL_MONTH_IN_YEAR] [int] NULL,
[CAL_QUARTER] [int] NULL,
[CAL_HALF_YEAR] [int] NULL,
[CAL_YEAR] [int] NULL,
[IS_WEEKEND] [int] NULL,
[WORKING_DAY_NO] [int] NULL
) ON [PRIMARY]
GO

We can then use a CTE to generate a list of data for this table efficiently (well, more efficiently than a loop)

Example: (generates data for 1 Jan 2010 to 31 Dec 2012).


WITH mycte as
(
select cast('2010-01-01' as datetime) DateValue,
case when datepart(dw, cast('2010-01-01' as datetime) -1) < 6 
then 0 else 1 end IS_WEEKEND

union all
select DateValue + 1,
case when datepart(dw, DateValue) < 6 then 0 else 1 end IS_WEEKEND
from mycte
where DateValue + 1 < '2012-12-31'
)
select DateValue as CAL_DATE
, datename(dw, DateValue) CAL_DAY
, datepart(dw, DateValue-1) CAL_DAY_IN_WEEK
, datepart(dd, DateValue) CAL_DAY_IN_MONTH
, datepart(dy, DateValue) CAL_DAY_OF_YEAR
, datepart(ww, DateValue) CAL_WEEK
, datename(mm, DateValue) CAL_MONTH
, substring(datename(mm, DateValue), 1, 3) CAL_SHORT_MONTH
, datepart(mm, DateValue) CAL_MONTH_IN_YEAR
, datepart(qq, DateValue) CAL_QUARTER
, case when datepart(mm, DateValue) < 7 then 1 else 2 end CAL_HALF_YEAR
, datepart(yy, DateValue) CAL_YEAR
, IS_WEEKEND
, (SELECT sum(IS_WEEKEND) FROM mycte
WHERE
DateValue and datepart(yy, DateValue) = datepart(yy, m.DateValue)
and datename(mm, DateValue) = datename(mm, m.DateValue)
) AS WORKING_DAY_NO
FROM mycte m
OPTION (MAXRECURSION 0)

So, to make this more efficient, how about wrapping this up in a stored procedure so we can just pass the start and end dates we need.


-- ==================================================================================
-- Author: Will Riley (www.wills-blog.com | Twitter @troublewithdata)
-- Create date: 2012-03-27
-- Description: Populate a date dimension table using a CTE based stored procedure
--
-- Note: Use it like so:-
--
-- INSERT INTO [dbo].[CALENDAR]
-- execute MyDateGenerator '2010-01-01','2011-12-31'
--
-- You also need to create the calendar table
-- ==================================================================================

USE [TestDW]
GO

IF EXISTS (SELECT * FROM sysobjects WHERE id = object_id(N’[dbo].[MyDateGenerator]‘) AND OBJECTPROPERTY(id, N’IsProcedure’) = 1)
BEGIN
DROP PROCEDURE [dbo].[MyDateGenerator]
END

GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO

CREATE PROCEDURE [dbo].[MyDateGenerator]
@FromDate datetime, @ToDate datetime

AS BEGIN

SET NOCOUNT ON;

WITH mycte as
(
select cast(@FromDate as datetime) DateValue,
case when datepart(dw, cast(@FromDate as datetime) -1) < 6 then 0 else 1 end IS_WEEKEND
union all
select DateValue + 1,
case when datepart(dw, DateValue) < 6 then 0 else 1 end IS_WEEKEND
from mycte
where DateValue + 1 < @ToDate
)
select DateValue as CAL_DATE
, datename(dw, DateValue) CAL_DAY
, datepart(dw, DateValue-1) CAL_DAY_IN_WEEK
, datepart(dd, DateValue) CAL_DAY_IN_MONTH
, datepart(dy, DateValue) CAL_DAY_OF_YEAR
, datepart(ww, DateValue) CAL_WEEK
, datename(mm, DateValue) CAL_MONTH
, substring(datename(mm, DateValue), 1, 3) CAL_SHORT_MONTH
, datepart(mm, DateValue) CAL_MONTH_IN_YEAR
, datepart(qq, DateValue) CAL_QUARTER
, case when datepart(mm, DateValue) < 7 then 1 else 2 end CAL_HALF_YEAR
, datepart(yy, DateValue) CAL_YEAR
, IS_WEEKEND
, (SELECT sum(IS_WEEKEND) FROM mycte
WHERE
DateValue and datepart(yy, DateValue) = datepart(yy, m.DateValue)
and datename(mm, DateValue) = datename(mm, m.DateValue)
) AS WORKING_DAY_NO

FROM mycte m

OPTION (MAXRECURSION 0)
END
GO

Now all we need to do is to run it like so:


INSERT INTO [dbo].[CALENDAR]
execute MyDateGenerator '2010-01-01','2011-12-31'

I could imagine this being useful within an incremental SSIS package to add a new “chunk” of data to your time dimension…

Intro to BI

Houston Neal of Software Advice has written a nice article to introduce BI to newcomers to the subject.

If the words “extract transform load” sound like a foreign language, you’re in the right place. Making sense of the business intelligence (BI) software market can be challenging for even the most technical of software buyers. It’s complex stuff.

You can read the full article here

SQL Server 2008 R2 RTM

Live from the conference call

Just 20 months after the release of SQL Server 2008, SQL Server 2008 R2 (“Gemini”/”Kilimanjaro”) is today released to manufacturing…

  • Available from Technet/MSDN 3rd May 2010
  • Available Worldwide from 13th May 2010

The conference call featured Ted Kummert and Tom Casey of Microsoft.

Parallel Data Warehouse (project “Madison”) will be released later in the year.

Introduction to Master Data Services

The MSDEV people are about to launch a series of training courses for Master Data Services that covers early concepts, setup, model building, configuration, security model setup and the object model. This should be a great series and promises to be a solid introduction to the product.

Have a look here

SQL 2008 R2 Master Data Services

Well, I finally got around to taking a look at R2 and specifically Master Data Services (due mostly to some strange decisions on my current data migration project).

First impressions – I actually quite like it. I can see that it is going to require some deep late-night learning to completely get my head around it all but it’s certainly not what I would call a Version 1 product (actually given that it’s born out of Stratature +EDM it’s more like version 4/5) . The new web based UI is pretty clunky though and for sure left me a little frustrated at times. However mosts of my other issues primarily centred around “de-nannying” Windows Server 2008 R2… all I can say (from a guy who likes to be fairly agile) is AAAAAARRRRGGGGHHH!

So here was the set up.

Virtual Server

  • 2 x Processors (2.33ghz)
  • 4GB RAM
  • 64-bit Windows Server 2008 R2 (Enterprise)

SQL Server

  • 2008 R2 Enterprise CTP2 (Eval)

Some things I needed to do for later on but may as well have done here

  • Upped the server from default to run IIS, Web Services etc
  • Ensured .Net 3.5 sp1 was on board

After the SQL Install, you need to run the MDS install from the SQL Install Media.

Rather than reinvent the wheel, I will point you to the msdn installation area and two excellent posts to get installed, web app configured (this took me a couple of attempts until I found Sutha’s article) and run in some test data to see how it works.

Installation

Web App Config

Baby Steps (Nick Barclay’s 7 step tutorials)


And the result was this

first hierarchy

first hierarchy









One last useful link – when you are importing data, there are a set of error codes applied to the datasets (nice touch). The problem is that they are not documented anywhere (crap touch…attention to detail!!). The result, I spent a couple of hours looking for ERR210000 and what I might have done wrong only to find from Kirk’s helpful list that ERR210000 means “all is well” (great “ERR” can now mean “OK”)

Have a play…


SQL Server 2008 R2 November CTP

It’s available today for Technet & MSDN subscribers…

SQL Server 2008 R2 is the next generation of the Microsoft SQL Server database platform, planned for release in the first half of calendar year 2010. MSDN and TechNet Subscribers can download the SQL Server 2008 R2 November CTP today. Non-subscribers will be able to download the November CTP later this month. The R2 release brings significant new capabilities and enhancements that can help your business keep pace with today’s growing and changing data management needs. Increase productivity and reduce data management costs by taking advantage of an even more scalable platform with comprehensive database and application management tools. SQL Server 2008 R2 helps users to build rich analysis and reports and helps organizations improve the quality of their data. Find out more, download the Community Technology Preview pre-release and let us know what you think. Also, join the SQL Server communities around the world and the SQL Server development team in the conversation.

More on the SQL MERGE statement

Here’s a link to a useful article by Kristen Hodges about using the new 2008 MERGE SQL construct to carry out Type 1 & Type 2 Dimension updates

Article

Based on our experience, this is certainly worth a look over the old SCD component – we have seen some major performance improvements using MERGE over the SCD components (and other workarounds that we had previously used).