What is Data Warehouse?
In a very simple word it means, “It's a place where we store
all of our data”.
How it is different from Database?
Usually data warehouse is also a database. The records from
multiple data sources (may be some other databases)are collected and stored in
Denormalized manner. It normally stores months or years of data to support
historical analysis
What is Denormalization?
Normalization is a database designing technique which makes
sure that there will not be any redundant data.
It makes our data more reliable (because there will not be
any repeating data. Exiting data will be referred wherever required)
It makes database management easy.
It reduces the size of database.
What is Business Intelligence?
BI is all about leveraging our existing data and converting
them into information or we can say Knowledge. We use this knowledge for making
decisions in company.
BI or Business Intelligence is simply a solution for
Collect information from multiple data sources
Transform that data into meaningful information
And finally show data to users with elegant presentation.
Introduction to Business Intelligence development studio /
Sql Server Data tools
In the Microsoft world we will use “Business Intelligence
development studio” commonly known as BIDS for this purpose.
It’s an IDE which will let us develop Data Analysis and
Business Intelligence solutions.
It has special project types and tools for developing “Sql
server Integration Services”, “Sql server Analysis services” and “Sql server
reporting services”.
Basic idea on SSIS,
SSAS and SSRS
Step 1. Open Sql Server Data Tools
Step 2. Click on File >> New >> Project. A dialog
box will popup similar to like this
As
you can see, it lets us create three kind of project. In order to perform a
complete business intelligence task we need to go up with all these three
projects.
1. Integration Services – SSIS – Sql server
Integration services
It
will let us perform wide range of data migration tasks. It let us collect data
from various data sources and store them into central location.
2. Analysis Services – SSAS –Sql Server
Analysis services
It
will let us analyze the data
3. Reporting Services – SSRS – Sql Server
Reporting services
It
will let us create reports from analyzed data and present it to end user.
Start with SSIS
SSIS
basically performs three basic things,
- Collect data from various sources. – We call it Extraction(E)
- Data obtained from different sources may or may not be same format. So first convert all of them according to business needs - We call it Transformation(T)
- Load them into one big data source(mostly Data Warehouse) – We call it Load(L)
Together
termed as ETL process.
Simple ETL process
using SSIS
Problem statement
You have Two excel files as follows,
·
You will collect data from first file (Datafile.xlsx).
·
Convert that data so that it match to second excel file format
(merge Title, FirstName and LastName and call them as Name).
·
Dump final result to second file (Result.xlsx).
Step by Step Demo
Step 1. Create
New Project
Click File >> New >> Project. Select Integration
Services from the group. Specify some nice name. Say Ok.
It will open up SSIS designer which you will use for creating
and maintaining Integration service packages. It looks like follows,
In the solution explorer under “SSIS packages”
folder you will see one default package created with name “Package,dtsx”. If
you want you can simply “rename it”or “remove it and add new one” (right click
the folder and say “New SSIS Package”).
Note: Package is simply a collection of
connections, control flow elements, data flow elements, event handlers,
parameters etc. We will talk about each one of this as move further.
Step 2. Create Connection Manager for Excel File
2.1 Right click Connection Manager and Say New Connection.
2.2
Select Excel from the popup and click on Add.
2.3
Click the browse button and select the excel file and click on OK.
Step 3. Rename
Connection Managers
3.1 Right click the connection manager just added and rename it
to SourceExcelManager.
Step 4. Create Destination connection manager
4.1 Follow the Step no 3 and create one more connection manager
pointing to Result.xlsx file.
4.2 Rename connection manager to ExcelDestinationManager.
Step 5. Create control flow – Pass data from Source Excel to
Destination Excel.
5.1 Make sure control flow tab is selected in SSIS designer.
5.2
Select data flow task from the toolbox and drag it into designer.
5.3
Rename Data Flow Task to “Source excel to Destination excel transfer task”
Control Flow
Control
flow will be used to define the workflow. As the name implies it control the
flow of execution.
·
What all tasks need to
be executed?
·
What will be the
sequence?
·
Whether tasks need to
be executed in loops or not?
Etc.
Etc.
Step 6.
CreateData Flow.
Data Flow
Data Flow defines the flow of data between source and
destination.
6.1 Double click the control flow created in last step.
6.2 It will take you to second tab that is Data Flow tab.
Step 7. Create Excel Source
7.1 Now you will see a different SSIS toolbox all together. In
toolbox you will see couple of groups defined like Sources, Destinations and
Transformations.
Take excel source from Source group and place it in designer.
Step 8.
Configure Excel Source
8.1 Red Cross mark on excel source indicates that, it’s not
configured yet. Double click the excel source. It will show up a dialog box
something like this.
8.2 Select Data Source as "SourceExcelManager", Data
Access Mode as "Table or View" and Name of the sheet as
"DataSheet1" (Name of the sheet in the excel file).
Note: This
Excel Source will perform the Extraction Task (E) in the ETL process
Step 9. Create
Derived Column
9.1 From the SSIS toolbox from Transformation group drag Derived
column to SSIS designer.
Step 10. Connect Source to Derived Column
10.1 Click the Excel source added in prior step.
You can see a small blue arrow attached to the source. We call
it “Data Flow Path”.
Data Flow Path: It
lets you define how data will flow.
Click on the blue arrow and connect it to Derived Column.
Note: we
will speak about the red arrow in one of the future article in the series.
Step 11.
Configure the derived column
11.1 Double click the derived column. Popup looks like follows.
11.2 Put down Derived Column Name as Name,
Select Derived Column as <add>and expression as Title + “ “ + FirstName + “ “ +
LastName
11.3 Click Ok.
11.3 Click Ok.
Note: This
DerivedColumn will perform the Transformation Task (T) in the ETL process
Step 12.
Create Excel Destination
12.1 Add Excel Destination from the Destination group in SSIS
toolbox.
Note: This
Excel Destination will perform the Load Task (L) in the ETL process
Step 13. Connect Derived Column to Excel Destination
13.1 Just like step no 10, connect derived column to excel
destination.
Step 14. Configure Excel Destination
14.1 Double click the Excel destination, popup looks like
follow.
14.2 Set connection Manager to “ExcelConnectionManager”, Data
Access mode to “Table or View” and Name of the excel sheet to “Datasheet1”.
14.3 Click on mapping and make sure its proper, if not make sure
to do it before proceeding.
Note: In
our case, mapping will be already done by the IDE itself (because names of
columns are matching).
14.4 Click ok.
14.4 Click ok.
Step 15. Execute package
15.1 Press F5.
On successful execution you will get a screen something like
this.
16.2 Open the Result.xlsx file and confirm the output.
We have successfully completed our very first
lab on SSIS.
Hope you enjoyed reading this. Your comments,
votes and suggestions motivates us for writing more stuffs like this.
0 comments:
Post a Comment