Sunday, 21 May 2017

SSIS- Interview Questions

Q: What is SSIS? How it is related with SQL Server.
SQL Server Integration Services (SSIS) is a component of SQL Server which can be used to perform a wide range of Data Migration and ETL operations. SSIS is a component in MSBI process of SQL Server.
This is a platform for Integration and Workflow applications. It is known for a fast and flexible OLTP and OLAP extensions used for data extraction, transformation, and loading (ETL). The tool may also be used to automate maintenance of SQL Server databases and multidimensional data sets.

Q: What is a workflow in SSIS 2014 ?
Workflow is a set of instructions on to specify the Program Executor on how to execute tasks and containers within SSIS Packages.

Q. What is a project and Package in SSIS?
Project is a container for developing packages. Package is nothing but an object. It implements the functionality of ETL — Extract, Transform and Load — data.

Q. What are the 4 elements (tabs) that you see on a default package designer in BIDS?
Control Flow, Data Flow, event Handler and package explorer. (Parameters – 2012 Data Tools)

Q. What is a Control flow and Data Flow elements in SSIS?

Control Flow:

Control flow element is one that performs any function or provides structure or control the flow of the elements. There must be at least one control flow element in the SSIS package. In SSIS a workflow is called a control-flow. A control-flow links together our modular data-flows as a series of operations in order to achieve a desired result.

A control flow consists of one or more tasks and containers that execute when the package runs. To control order or define the conditions for running the next task or container in the package control flow

Data Flow:

All ETL tasks related to data are done by data flow elements. It is not necessary to have a data flow element in the SSIS package. A data flow consists of the sources and destinations that extract and load data, the transformations that modify and extend data, and the paths that link sources, transformations, and destinations. Before you can add a data flow to a package, the package control flow must include a Data Flow task. The Data Flow task is the executable within the SSIS package that creates, orders, and runs the data flow. A separate instance of the data flow engine is opened for each Data Flow task in a package.


What are the 3 data flow components in SSIS?

Ans:

Source
Transformation
Destination

Q. What are connections and connection managers in SSIS?

Ans:

Connection as its name suggests is a component to connect to any source or destination from SSIS — like a sql server or flat file or lot of other options that SSIS provides. Connection manager is a logical representation of a connection.

12) Explain what is connection managers in SSIS?

While gathering data from different sources and writing it to a destination, connection managers are helpful.  Connection manager facilitates the connection to the system that include information’s like data provider information, server name, authentication mechanism, database name, etc.

Q. What is the use of Check Points in SSIS?

SSIS provides a Checkpoint capability which allows a package to restart at the point of failure.

Q. Name Transformations available in SSIS?

DATACONVERSION: Converts columns data types from one to another type. It stands for Explicit Column Conversion.

DATAMININGQUERY: Used to perform data mining query against analysis services and manage Predictions Graphs and Controls.

DERIVEDCOLUMN: Create a new (computed) column from given expressions.

EXPORTCOLUMN: Used to export a Image specific column from the database to a flat file.

FUZZYGROUPING: Used for data cleansing by finding rows that are likely duplicates.

FUZZYLOOKUP: Used for Pattern Matching and Ranking based on fuzzy logic.

AGGREGATE: It applies aggregate functions to Record Sets to produce new output records from aggregated values.

AUDIT: Adds Package and Task level Metadata: such as Machine Name, Execution Instance, Package Name, Package ID, etc..

CHARACTERMAP: Performs SQL Server column level string operations such as changing data from lower case to upper case.

MULTICAST: Sends a copy of supplied Data Source onto multiple Destinations.

CONDITIONALSPLIT: Separates available input into separate output pipelines based on Boolean Expressions configured for each output.

COPYCOLUMN: Add a copy of column to the output we can later transform the copy keeping the original for auditing.

IMPORTCOLUMN: Reads image specific column from database onto a flat file.

LOOKUP: Performs the lookup (searching) of a given reference object set to a data source. It is used for exact matches only.

MERGE: Merges two sorted data sets into a single data set into a single data flow.

MERGEJOIN: Merges two data sets into a single dataset using a join junction.

ROWCOUNT: Stores the resulting row count from the data flow / transformation into a variable.

ROWSAMPLING: Captures sample data by using a row count of the total rows in dataflow specified by rows or percentage.

UNIONALL: Merge multiple data sets into a single dataset.

PIVOT: Used for Normalization of data sources to reduce anomalies by converting rows into columns

UNPIVOT: Used for de-normalizing the data structure by converts columns into rows in case of building Data Warehouses.

What is conditional split?

As the name suggest, this transformation splits the data based on condition and route them to different path. The logic for this transformation is based on CASE statement. The condition for this transformation is an expression. This transformation also provides us with default output, where rows matching no condition are routed. Conditional split is useful in scenarios like Telecom industry data you want to divide the customer data on gender, condition would be:
GENDER == ‘F’

Continue Reading →

Saturday, 20 May 2017

SSIS - A Basic Demo

Creating SSIS project and getting started

So let’s understand the very first requirement of the project. Goal of this demo is understanding SSIS project basics.

In this Demo we are going to learn, how we can Upload data from a txt file to Sql server database.

We have Customer.txt file as follows. (Make sure you create one and save it somewhere for demo purpose.)


Note: DOB is in “dd/mm/yyyy” format.

First requirement is,loading data from above text file into Customer table in the SSISDB database in Sql server management studio. (Create the database and table in your machine for demo purpose)


Step 1 Create SSIS project.


Goto File menu and click on New Project Option. 


Select Integration Service Project and click Ok button. (Change the Project name if you want.)


Step 2 Create New Package

·         In SSIS world Package is an executable file.
·         Visual studio provides a UI interface called SSIS designer for designing packages.
·         Internally package is an XML file which will be executed by special utility called dtsexec.This utility will installed as a part of MSBI installation. 
·         Packages will be have an extension called DTSX which stands for Data transformation services executable. In earlier version of sql server to perform ETL we had a feature called Data Transformation services. DTSX is named after it.
                          
Step 3 Design Control Flow

Double click the new created package in solution explorer.
As you can see in SSIS designer we have several tabs Control Flow, Data Flow, Parameters, Event Handlers, and Package Explorer. We will look into each one of these tabs one by one.
Right now we are interested in Control Flow tab.

·         This tab will let us decide what need to be done.
·         You will notice we have SSIS toolbox in left side. If it is not available in your demo then you can get it from View>>Other Windows>>SSIS toolbox.

Toolbox contains tasks like Data Flow Task, Execute Sql Task etc. Each task let us achieve some different behaviour.

·         Right now our requirement asks us to load data from a text file to Sql server database and for that we will be required DataFlow Task. Simply drag the task from SSIS toolbox to SSIS designer.
Right click the newly created Data Flow Task and select rename. Name it as CsvCustomer to TblCustomer.



Step 3 Add DestinationConnection Manager

Other than all these tabs, SSIS designer also provides something called Connection Managers section. It's located in the bottom Corner of the designer.
Simply right click the area and select New Ado.Net Connection…”


Click New button

Enter Server Name, Enter Credential, and Select Database and click OK.



Click Ok again.
Rename connection manager if you want.


Step 4 Add SourceConnection Manager

Now its time to add Source Connection Manager.
Right click the connection manager area but this time select New Flat file Connection...
It will launch Flat file Connection manager editor.


In left side of the dialog couple of sections are defined like General, Columns, and Advanced etc.
Select Columns sections. No need to change any settings at this moment. Simply click Ok.
                          

Step 5 Configure Data Flow Task

After that double click Data Flow task. It will take you to Data flow tab.


Step 6 Add Source

Data flow tab is the one which will actually decide ETL. Here we will define, from where to where data will flow and if there is any transformation required or not.
As soon as you move to the Data Flow tab, you will notice a change in the SSIS toolbox.


As you can see, tasks in the toolbox is segregated into three groups Sources, Transforms and Destinations. Common is a special group which contain mostly used sources, transformations and destinations.
We are interested in Flat file source. You will find it in Other Sources section. Simply drag it to SSIS designer and rename it to CustomerCsv

Step 7 Configure Source
Double click the CustomerCsv source. It will launch Flat file source editor. Select “Flat File Connection Manager from the dropdown.



Move to the column section and make any changes if required and click ok.

Step 8 – Add Destination

Drag Ado.Net destination from “Other Destinations” section to SSIS designer and rename it to “TblCustomer”. 

Step 9 – Configure Destination

Destination task cannot configured unless and until it have a proper input.
Now click the “CustomerCsv” source. You will notice two arrows coming out of it. Blue one and red one. We will talk about red one later. Blue one is the one though which data will flow. Hence this arrow is called as “Data Flow Path”. Take that arrow and connect it to “TblCustomer” destination.



Step 10- Configure Destination (Continued)

Double click the “TblCustomer” destination again. In the “configuration editor window” select Connection manager from dropdown and table to “Customer”



Click on “Mappings” section and confirm that all mappings are correct.



Click ok.

Step 11- Execute and Test the package

As I said sometime back, Package will be executed by a special utility called “DtsExec.exe”. Visual studio makes our life easy at the time of development. For testing simply press F5 ☻ everything else will be handled by Visual studio and package start executing.


Open database table and check the records.


Finally achieved.☻☻☻

Continue Reading →

Monday, 15 May 2017

MSBI (SSIS)

What is Data Warehouse?
In a very simple word it means, “It's a place where we store all of our data”.

How it is different from Database?
Usually data warehouse is also a database. The records from multiple data sources (may be some other databases)are collected and stored in Denormalized manner. It normally stores months or years of data to support historical analysis

What is Denormalization?
Normalization is a database designing technique which makes sure that there will not be any redundant data.
It makes our data more reliable (because there will not be any repeating data. Exiting data will be referred wherever required)
It makes database management easy.
It reduces the size of database.

What is Business Intelligence?
BI is all about leveraging our existing data and converting them into information or we can say Knowledge. We use this knowledge for making decisions in company.

BI or Business Intelligence is simply a solution for
Collect information from multiple data sources
Transform that data into meaningful information
And finally show data to users with elegant presentation.

Introduction to Business Intelligence development studio / Sql Server Data tools
In the Microsoft world we will use “Business Intelligence development studio” commonly known as BIDS for this purpose.

It’s an IDE which will let us develop Data Analysis and Business Intelligence solutions.
It has special project types and tools for developing “Sql server Integration Services”, “Sql server Analysis services” and “Sql server reporting services”.
With the release of Sql Server 2012, BIDS was renamed to Sql Server Data tools.


Basic idea on SSIS, SSAS and SSRS
Step 1. Open Sql Server Data Tools
Step 2. Click on File >> New >> Project. A dialog box will popup similar to like this


As you can see, it lets us create three kind of project. In order to perform a complete business intelligence task we need to go up with all these three projects.

1. Integration Services – SSIS – Sql server Integration services
It will let us perform wide range of data migration tasks. It let us collect data from various data sources and store them into central location.

2. Analysis Services – SSAS –Sql Server Analysis services
It will let us analyze the data

3. Reporting Services – SSRS – Sql Server Reporting services
It will let us create reports from analyzed data and present it to end user.

Start with SSIS
SSIS basically performs three basic things,

  •            Collect data from various sources. – We call it Extraction(E)
  •       Data obtained from different sources may or may not be same format. So first convert all of them according to business needs - We call it Transformation(T)
  •             Load them into one big data source(mostly Data Warehouse) – We call it Load(L)
Together termed as ETL process.

Simple ETL process using SSIS

Problem statement

You have Two excel files as follows,


·         You will collect data from first file (Datafile.xlsx).
·         Convert that data so that it match to second excel file format (merge Title, FirstName and LastName and call them as Name).
·         Dump final result to second file (Result.xlsx).

Step by Step Demo

Step 1. Create New Project
Click File >> New >> Project. Select Integration Services from the group. Specify some nice name. Say Ok.

It will open up SSIS designer which you will use for creating and maintaining Integration service packages. It looks like follows,


In the solution explorer under “SSIS packages” folder you will see one default package created with name “Package,dtsx”. If you want you can simply “rename it”or “remove it and add new one” (right click the folder and say “New SSIS Package”).
Note: Package is simply a collection of connections, control flow elements, data flow elements, event handlers, parameters etc. We will talk about each one of this as move further.
Step 2. Create Connection Manager for Excel File
2.1 Right click Connection Manager and Say New Connection.

2.2 Select Excel from the popup and click on Add.

2.3 Click the browse button and select the excel file and click on OK.

Step 3. Rename Connection Managers
3.1 Right click the connection manager just added and rename it to SourceExcelManager.

Step 4. Create Destination connection manager
4.1 Follow the Step no 3 and create one more connection manager pointing to Result.xlsx file.
4.2 Rename connection manager to ExcelDestinationManager.
Step 5. Create control flow – Pass data from Source Excel to Destination Excel.
5.1 Make sure control flow tab is selected in SSIS designer.

5.2 Select data flow task from the toolbox and drag it into designer.

5.3 Rename Data Flow Task to “Source excel to Destination excel transfer task”

Control Flow
Control flow will be used to define the workflow. As the name implies it control the flow of execution.
·         What all tasks need to be executed?
·         What will be the sequence?
·         Whether tasks need to be executed in loops or not?
Etc.
Step 6. CreateData Flow.
Data Flow
Data Flow defines the flow of data between source and destination.
6.1 Double click the control flow created in last step.
6.2 It will take you to second tab that is Data Flow tab.

Step 7. Create Excel Source
7.1 Now you will see a different SSIS toolbox all together. In toolbox you will see couple of groups defined like Sources, Destinations and Transformations.
Take excel source from Source group and place it in designer.

Step 8. Configure Excel Source
8.1 Red Cross mark on excel source indicates that, it’s not configured yet. Double click the excel source. It will show up a dialog box something like this.

8.2 Select Data Source as "SourceExcelManager", Data Access Mode as "Table or View" and Name of the sheet as "DataSheet1" (Name of the sheet in the excel file).
Note: This Excel Source will perform the Extraction Task (E) in the ETL process
Step 9. Create Derived Column
9.1 From the SSIS toolbox from Transformation group drag Derived column to SSIS designer.
Step 10. Connect Source to Derived Column
10.1 Click the Excel source added in prior step.
You can see a small blue arrow attached to the source. We call it “Data Flow Path”.

Data Flow Path: It lets you define how data will flow.
Click on the blue arrow and connect it to Derived Column.

Note: we will speak about the red arrow in one of the future article in the series.
Step 11. Configure the derived column
11.1 Double click the derived column. Popup looks like follows.

11.2 Put down Derived Column Name as Name, Select Derived Column as <add>and expression as Title + “ “ + FirstName + “ “ + LastName
11.3 Click Ok.
Note: This DerivedColumn will perform the Transformation Task (T) in the ETL process
Step 12. Create Excel Destination
12.1 Add Excel Destination from the Destination group in SSIS toolbox.
Note: This Excel Destination will perform the Load Task (L) in the ETL process
Step 13. Connect Derived Column to Excel Destination
13.1 Just like step no 10, connect derived column to excel destination.
Step 14. Configure Excel Destination
14.1 Double click the Excel destination, popup looks like follow.

14.2 Set connection Manager to “ExcelConnectionManager”, Data Access mode to “Table or View” and Name of the excel sheet to “Datasheet1”.
14.3 Click on mapping and make sure its proper, if not make sure to do it before proceeding.

Note: In our case, mapping will be already done by the IDE itself (because names of columns are matching).
14.4 Click ok.
Step 15. Execute package
15.1 Press F5.
On successful execution you will get a screen something like this.

16.2 Open the Result.xlsx file and confirm the output.

We have successfully completed our very first lab on SSIS.

Hope you enjoyed reading this. Your comments, votes and suggestions motivates us for writing more stuffs like this.


Continue Reading →

Topics

ADFS (1) ADO .Net (1) Ajax (1) Angular (47) Angular Js (15) ASP .Net (14) Authentication (4) Azure (3) Breeze.js (1) C# (49) CD (1) CI (2) CloudComputing (2) Coding (8) CQRS (1) CSS (2) Design_Pattern (7) DevOps (4) DI (3) Dotnet (10) DotnetCore (19) Entity Framework (4) ExpressJS (4) Html (4) IIS (1) Javascript (17) Jquery (8) Lamda (3) Linq (10) microservice (4) Mongodb (1) MVC (46) NodeJS (8) React (10) SDLC (1) Sql Server (32) SSIS (3) SSO (1) TypeScript (3) UI (1) UnitTest (2) WCF (14) Web Api (16) Web Service (1) XMl (1)

Dotnet Guru Archives