Archive for the ‘SSIS’ Category.

Date Dimension Stored procedure using a CTE

Have been playing around a little more making my time dimension creation and population a little more efficient

Let’s say we have a dimension table created like so


CREATE TABLE [dbo].[CALENDAR](
[CAL_DATE] [datetime] NULL,
[CAL_DAY] [varchar](50) NULL,
[CAL_DAY_IN_WEEK] [int] NULL,
[CAL_DAY_IN_MONTH] [int] NULL,
[CAL_DAY_OF_YEAR] [int] NULL,
[CAL_WEEK] [int] NULL,
[CAL_MONTH] [varchar](50) NULL,
[CAL_SHORT_MONTH] [varchar](50) NULL,
[CAL_MONTH_IN_YEAR] [int] NULL,
[CAL_QUARTER] [int] NULL,
[CAL_HALF_YEAR] [int] NULL,
[CAL_YEAR] [int] NULL,
[IS_WEEKEND] [int] NULL,
[WORKING_DAY_NO] [int] NULL
) ON [PRIMARY]
GO

We can then use a CTE to generate a list of data for this table efficiently (well, more efficiently than a loop)

Example: (generates data for 1 Jan 2010 to 31 Dec 2012).


WITH mycte as
(
select cast('2010-01-01' as datetime) DateValue,
case when datepart(dw, cast('2010-01-01' as datetime) -1) < 6 
then 0 else 1 end IS_WEEKEND

union all
select DateValue + 1,
case when datepart(dw, DateValue) < 6 then 0 else 1 end IS_WEEKEND
from mycte
where DateValue + 1 < '2012-12-31'
)
select DateValue as CAL_DATE
, datename(dw, DateValue) CAL_DAY
, datepart(dw, DateValue-1) CAL_DAY_IN_WEEK
, datepart(dd, DateValue) CAL_DAY_IN_MONTH
, datepart(dy, DateValue) CAL_DAY_OF_YEAR
, datepart(ww, DateValue) CAL_WEEK
, datename(mm, DateValue) CAL_MONTH
, substring(datename(mm, DateValue), 1, 3) CAL_SHORT_MONTH
, datepart(mm, DateValue) CAL_MONTH_IN_YEAR
, datepart(qq, DateValue) CAL_QUARTER
, case when datepart(mm, DateValue) < 7 then 1 else 2 end CAL_HALF_YEAR
, datepart(yy, DateValue) CAL_YEAR
, IS_WEEKEND
, (SELECT sum(IS_WEEKEND) FROM mycte
WHERE
DateValue and datepart(yy, DateValue) = datepart(yy, m.DateValue)
and datename(mm, DateValue) = datename(mm, m.DateValue)
) AS WORKING_DAY_NO
FROM mycte m
OPTION (MAXRECURSION 0)

So, to make this more efficient, how about wrapping this up in a stored procedure so we can just pass the start and end dates we need.


-- ==================================================================================
-- Author: Will Riley (www.wills-blog.com | Twitter @troublewithdata)
-- Create date: 2012-03-27
-- Description: Populate a date dimension table using a CTE based stored procedure
--
-- Note: Use it like so:-
--
-- INSERT INTO [dbo].[CALENDAR]
-- execute MyDateGenerator '2010-01-01','2011-12-31'
--
-- You also need to create the calendar table
-- ==================================================================================

USE [TestDW]
GO

IF EXISTS (SELECT * FROM sysobjects WHERE id = object_id(N’[dbo].[MyDateGenerator]‘) AND OBJECTPROPERTY(id, N’IsProcedure’) = 1)
BEGIN
DROP PROCEDURE [dbo].[MyDateGenerator]
END

GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO

CREATE PROCEDURE [dbo].[MyDateGenerator]
@FromDate datetime, @ToDate datetime

AS BEGIN

SET NOCOUNT ON;

WITH mycte as
(
select cast(@FromDate as datetime) DateValue,
case when datepart(dw, cast(@FromDate as datetime) -1) < 6 then 0 else 1 end IS_WEEKEND
union all
select DateValue + 1,
case when datepart(dw, DateValue) < 6 then 0 else 1 end IS_WEEKEND
from mycte
where DateValue + 1 < @ToDate
)
select DateValue as CAL_DATE
, datename(dw, DateValue) CAL_DAY
, datepart(dw, DateValue-1) CAL_DAY_IN_WEEK
, datepart(dd, DateValue) CAL_DAY_IN_MONTH
, datepart(dy, DateValue) CAL_DAY_OF_YEAR
, datepart(ww, DateValue) CAL_WEEK
, datename(mm, DateValue) CAL_MONTH
, substring(datename(mm, DateValue), 1, 3) CAL_SHORT_MONTH
, datepart(mm, DateValue) CAL_MONTH_IN_YEAR
, datepart(qq, DateValue) CAL_QUARTER
, case when datepart(mm, DateValue) < 7 then 1 else 2 end CAL_HALF_YEAR
, datepart(yy, DateValue) CAL_YEAR
, IS_WEEKEND
, (SELECT sum(IS_WEEKEND) FROM mycte
WHERE
DateValue and datepart(yy, DateValue) = datepart(yy, m.DateValue)
and datename(mm, DateValue) = datename(mm, m.DateValue)
) AS WORKING_DAY_NO

FROM mycte m

OPTION (MAXRECURSION 0)
END
GO

Now all we need to do is to run it like so:


INSERT INTO [dbo].[CALENDAR]
execute MyDateGenerator '2010-01-01','2011-12-31'

I could imagine this being useful within an incremental SSIS package to add a new “chunk” of data to your time dimension…

Download details: SQL Server 2008 R2 Update for Developers Training Kit (January 2011 Update)

This is a pretty good resource for those people wanting to get started with the latest versions of the Microsoft BI stack.

Download details: SQL Server 2008 R2 Update for Developers Training Kit (January 2011 Update)

Using TFS 2010 with BIDS 2008 R2

Sadly, out of the box, you can’t connect from BIDS 2008 (the SQL 2008 or 2008 R2 version) to TFS 2010. I guess this will come with Denali where hopefully the new Juneau Development Environment will be integrated with TFS 2010.

I managed to get SQL Server 2008 R2 BI development tools (BIDS 2008) to work with TFS 2010 by doing the following:

 1. Install BI Development Tools that come with SQL Server 2008 R2

2. Install VS2008 Team Explorer (for BIDS/VS2008 from MSFT available here)

3. Install VS2008 Service Pack 1 (available here)

4. Install  VS2008 SP1 Forward Compatibility Update for TFS 2010 (available here)

Once I had done that, I was able to connect to a TFS 2010 Project and successfully check VS2008 projects in and out of source control, raise bugs etc.

Have attached a few screen grabs showing the integration from BIDS and also from the Web Client Portal (basically Sharepoint for TFS) showing some of the out of the box dashboard stuff.

Note that when entering a TFS server to connect to in BIDS, a change from TFS 2008 means that you now need to fully qualify your TFS Project

When asked for which TFS Server to connect to, type http://VM2008BITFS:8080/TFS (assuming you use default ports etc)

TFS Team Web Access

Team Web Access in TFS 2010

TFS Project Dashboard

Portal Dashboard in TFS 2010

BIDS Check In

BIDS check-in & source control access (similar to TFS 2008)

SSIS: DBSTATUS_UNAVAILABLE

Scenario: Your SSIS package fails and you see error messages in your log files along the lines of

Description: Failed to retrieve long data for column “xxxxx”.

End Error  Error: 2010-03-03 16:30:02.84     Code: 0xC020901C     Source: Data Flow Task 1 Source – Query [1]

Description: There was an error with output “OLE DB Source Output” (11) on component “Source – Query” (1).

The column status returned was: “DBSTATUS_UNAVAILABLE“.

You will probably find that the column “xxxxx” is something like a text or picture data type (i.e. BLOB or similar) and what’s happening is that SQL Server is either running out of  Disk Space to store this stuff.

Step one is obviously to check if you actually need all this data and optimise your dataflow by removing any unnecessary columns.

If you can’t do that, try setting the BLOBTempStoragePath and BufferTempStoragePath to Fast Drives, or at least drives with more disk space.

See these posts for more details

http://connect.microsoft.com/SQLServer/feedback/details/346675/ssis-runs-out-of-temp-file-names

http://blogs.msdn.com/sqlperf/archive/2007/05/01/set-blobtempstoragepath-and-buffertempstoragepath-to-fast-drives.aspx

More on the SQL MERGE statement

Here’s a link to a useful article by Kristen Hodges about using the new 2008 MERGE SQL construct to carry out Type 1 & Type 2 Dimension updates

Article

Based on our experience, this is certainly worth a look over the old SCD component – we have seen some major performance improvements using MERGE over the SCD components (and other workarounds that we had previously used).

Kilimanjaro – is this the knockout punch from Microsoft?


SQL Server Kilimanjaro and Project Madison

This week the BI Conference has kicked off with an announcement!  There will be a BI focused release of SQL Server code named Kilimanjaro which is expected H1 2010 with CTPs available within the next 12 months.  Kilimanjaro is focused on "People Ready BI".  
A key component of Kilimanjaro is "Project Gemini", this is all about self-service BI using Excel.  The vision behind Gemini is to free IT resources by giving business intelligence analysis and reporting capabilities to the end-users without IT losing control.

Is this where Microsoft finally fill the gap left when they shoe-horned some parts of Proclarity in to PPS, leaving themselves without a great stand-alone OLAP client tool? Let’s hope so… I am a little uncertain as to the direction that they are taking BI in. As a once frustrated end-user, I can see how the masses of information workers will love the capabilities. I can also see that this may well lead to a huge increase in unstructured an potentially conflicting information if not managed very carefully.

At the same Microsoft will be releasing "Project Madison", this is the integration of Datallegro which will offer you an easy scale out solution to support data warehouses of up to 100s of terabytes.

All in all very exciting news and I will keep you up-to-date whenever I learn more. I am certainly hoping for more information on the management of the data, how this fits with the overall DW framework, and mysterious lack of information to date on PerformancePoint2…

SQL 2008 Goes “Live”

Although SQL Server 2008 was “launched” last February at a marketing event that also featured Windows Server 2008 and Visual Studio 2008, this week Microsoft finally released its database software to manufacturing.

SQL Server 2008 is available to MSDN and TechNet subscribers and will also be available for evaluation downloads. Pricing for the database will be the same as the previous version, SQL Server 2005.

 

 

http://msdn.microsoft.com/en-us/subscriptions/downloads/default.aspx Has Ent, Dev, Std, Web and Workgroup for subscriber downloads, can’t see Express or Ent Trial yet on the non subscriber pages.

Also Eval Edition and Feature Pack are now available

If you need some help installing, see this SQL School Video