Dotnet Guru ~

Monday, 9 April 2018

Group by and Partition by - SQL Server

Suraj Kumar Maddheshiya April 09, 2018 Sql Server No comments

Difference between Group by and Partition by

Now we will list out below difference between two

Group by
The SQL GROUP BY clause can be used in a SELECT statement to collect data across
multiple records and group the results by one or more columns.

In more simple words GROUP BY statement is used in conjunction with the
aggregate functions to group the result-set by one or more columns.

Reduces the no. of records
In select we need to use only columns which are used in group by. but we can use aggregate functions.
In filter condition we need to use having clause instead of where clause.

Partition By

No. of records will not be reduced. Instead of that it will add one extra column.
In select we can use N no. of columns. No restrictions.
We can use where clause in filter condition apart from partition column.

Now we will try to learn the difference by looking into below the examples.

Take below table data which is used in the the article.

id  firstname                   lastname                    Mark
-------------------------------------------------------------------
1   arun                        prasanth                    40
2   ann                         antony                      45
3   sruthy                      abc                         41
6   new                         abc                         47
1   arun                        prasanth                    45
1   arun                        prasanth                    49
2   ann                         antony                      49

Group by Syntax:

SELECT expression1, expression2, ... expression_n, 
       aggregate_function (aggregate_expression)
FROM tables
WHERE conditions
GROUP BY expression1, expression2, ... expression_n;

We can apply GroupBy in our table

select SUM(Mark) marksum, firstname from TableA group by id,firstName

Result:

marksum  firstname
----------------
94      ann                      
134     arun                     
47      new                      
41      sruthy

In our real table we have 7 rows and when we apply group by id, the server group the results based on id

In simple words
here group by normally reduces the number of rows returned by rolling them up and calculating Sum for each row.

Partition by Query:

select SUM(Mark) OVER (PARTITION BY id) AS marksum, firstname from TableA

marksum firstname 
-------------------
134     arun                     
134     arun                     
134     arun                     
94      ann                      
94      ann                      
41      sruthy                   
47      new

look at the results it will partition the rows and results all rows not like group by.

****************************************************************

--BELOW IS A SAMPLE WHICH OUTLINES THE SIMPLE DIFFERENCES

--READ IT AND THEN EXECUTE IT

--THERE ARE THREE ROWS OF EACH COLOR INSERTED INTO THE TABLE

--CREATE A database called testDB

--CREATE DATABASE testDB;

-- use testDB

USE [TestDB]

-- create Paints table

CREATE TABLE [dbo].[Paints](

[Color] [varchar](50) NULL,

[glossLevel] [varchar](50) NULL

) ON [PRIMARY]

-- Populate Table

insert into paints (color, glossLevel)

select 'red', 'eggshell'

union

select 'red', 'glossy'

union

select 'red', 'flat'

union

select 'blue', 'eggshell'

union

select 'blue', 'glossy'

union

select 'blue', 'flat'

union

select 'orange', 'glossy'

union

select 'orange', 'flat'

union

select 'orange', 'eggshell'

union

select 'green', 'eggshell'

union

select 'green', 'glossy'

union

select 'green', 'flat'

union

select 'black', 'eggshell'

union

select 'black', 'glossy'

union

select 'black', 'flat'

union

select 'purple', 'eggshell'

union

select 'purple', 'glossy'

union

select 'purple', 'flat'

union

select 'salmon', 'eggshell'

union

select 'salmon', 'glossy'

union

select 'salmon', 'flat'

-------------------------------------

SELECT * FROM paints

/* COMPARE 'GROUP BY' color to 'OVER (PARTITION BY Color)' */

-- GROUP BY Color

-- row quantity defined by group by aggregate (count(*)) defined by group by

select count(*) as 'count',Color from paints group by color

-- OVER (PARTITION BY... Color

-- row quantity defined by main query aggregate defined by OVER-PARTITION BY

select color, glossLevel, count(*) OVER (Partition by color) from paints

/* COMPARE 'GROUP BY' color, glossLevel to 'OVER (PARTITION BY Color, GlossLevel)' */

-- GROUP BY Color, GlossLevel

-- row quantity defined by GROUP BY aggregate (count(*)) defined by GROUP BY

select count(*) from paints group by color, glossLevel

-- Partition by Color, GlossLevel

-- row quantity defined by main query

-- aggregate (count(*)) defined by OVER-PARTITION BY

select color, glossLevel, count(*) OVER (Partition by color, glossLevel) from paints

Continue Reading →

WCF Concurrency

Suraj Kumar Maddheshiya April 09, 2018 WCF No comments

Introduction
Concurrency is the control of multiple threads active in an InstanceContext at any given time. This is controlled using the System.ServiceModel.ServiceBehaviorAttribute. ConcurrencyMode is the ConcurrencyMode enumeration.

WCF concurrency will help us to configure how WCF service instances can serve multiple requests at the same time.

There are three basic types of concurrency supported by WCF 4.0:

Single Concurrency Mode
Multiple Concurrency Mode
Reentrant Concurrency Mode

Single Concurrency Mode

When the service is set to Single Concurrency Mode, each instance context is allowed to have a maximum of one thread processing messages at the same time. In other words, WCF will provide synchronization with the service instance and not allow concurrent calls with a synchronization lock. In short only one request will proceed at any time and the next request must wait until the first request does not proceed.

Every incoming request must try to acquire the sync lock; if no lock is found then it allows access to the service and this request makes a sync lock. When finished operations, WCF will unlock the sync lock and allow other requests to come in.

[ServiceBehavior(ConcurrencyMode = ConcurrencyMode.Single)]
public class Service1 : IService1{
     public string GetData(int value)
     {
          return string.Format("You entered: {0}", value);
     }
}

Multiple Concurrency Mode

When the service is set to Multiple Concurrency Mode, the service allows multiple accesses at the same time. Simply service instance is not associated with any sync lock. So that concurrent calls are allowed on the service instance. WCF does not create any queue for client messages and replays them as soon as they arrive. Each service has multiple threads processing messages concurrently. The service implementation must be thread-safe to use this concurrency mode.

With Concurrency Mode Multiple, threads can call an operation at any time. It is our responsibility to guard our state with locks.

[ServiceBehavior(ConcurrencyMode = ConcurrencyMode.Multiple)]
public class Service1 : IService1{
    readonly object ThisLock = new object();
    public string GetData(int value)
    {
        string myRetString = string.Empty;
        lock (this.ThisLock)
        {
             myRetString = string.Format("You entered: {0}", value);
        }
        return myRetString;
    }
}

Reentrant Concurrency Mode

The Reentrant concurrency mode is nothing but a modified version of the single concurrency mode. Similar to single concurrency, reentrant concurrency is associated with a service instance and also sync lock. So that a concurrent call on the same instance is never called. In other words multiple calls on the same instance is not allowed.

[ServiceBehavior(ConcurrencyMode = ConcurrencyMode.Reentrant)]
public class Service3 : IService1{
     public string GetData(int value)
     {
          return string.Format("You entered: {0}", value); ;
     }
}

The service instance is single-threaded and accepts reentrant calls. When the reentrant service calls out, the service state must be in a consistent state, because others could be allowed into the service instance while the service calls out.

However, if the reentrant service call to another service or a callback, and that call chain (or causality) somehow wind its way back to the service instance.

The only case where a service configured with the Single Concurrency Mode can call back to its clients is when the callback contract operation is configured as one-way because there will not be a reply message to contend for the lock.

Continue Reading →

Sunday, 8 April 2018

Abstraction vs Encapsulation in .Net

Suraj Kumar Maddheshiya April 08, 2018 C# 2 comments

Encapsulation: Encapsulate hides variables or some implementation that may be changed so often in a class to prevent outsiders access it directly. They must access it via getter and setter methods.

Abstraction: Abstraction is used to showing what is relevant but in a higher degree(class, interface). Clients use an abstract class(or interface) do not care about who or which it was, they just need to know what it can do.

Constructor in abstract class
There's an implicit call to the base constructor prior to the derived constructor execution. Keep in mind that unlike interfaces, abstract classes do contain implementation. That implementation may require field initialization or other instance members. Note the following example and the output:

abstract class Animal
   {
       public string DefaultMessage { get; set; }

       public Animal()
       {
           Console.WriteLine("Animal constructor called");
           DefaultMessage = "Default Speak";
       }
       public virtual void Speak()
       {
           Console.WriteLine(DefaultMessage);
       }
   }

    class Dog : Animal
    {
        public Dog(): base()//base() redundant.  There's an implicit call to base here.
        {
            Console.WriteLine("Dog constructor called");
        }
        public override void Speak()
        {
            Console.WriteLine("Custom Speak");//append new behavior
            base.Speak();//Re-use base behavior too
        }
    }

Although we cannot directly construct an Animal with new, the constructor is implicitly called when we construct a Dog.

OUTPUT:
Animal constructor called
Dog constructor called
Custom Speak
Default Speak

IN Short :
Encapsulation: hiding data using getters and setters etc.
Abstraction: hiding implementation using abstract classes and interfaces etc.

Continue Reading →

Thursday, 5 April 2018

Clustered and Non clustered Index - SQL Server

Suraj Kumar Maddheshiya April 05, 2018 Sql Server No comments

Database indexes are used to improve the speed of data retrievel from database table with a large number of records. Database indexes (both clustered indexes and non-clustered indexes) are quite similar to book indexes in their functionality. A book index allows you to go straight to the different topics discussed in the book. If you want to search for a specific topic, you just go to index, find the page number that contains the topic that you are looking for and then can go straight to that page. Without an index, you would have to search the whole book.

Database indexes work in the same way. Without indexes you would have to search the whole table in order to perform a specific database operation. With indexes, you do not have to scan through all the table records. The index points you directly to the record that you are searching for, significantly reducing your query execution time.

SQL Server indexes can be divided into two main types:

Clustered Indexes
Non-Clustered Indexes

Let’s first start with a clustered index.

Clustered Index

A clustered index physically sorts the data in database table. there can be only one clustered index per table. By default a clustered index is created on a primary key column.

Let’s create a dummy table with primary key column to see the default clustered index.

Execute the following script:

CREATE DATABASE Hospital

CREATETABLEPatients
(
id INT PRIMARY KEY,
name VARCHAR(50) NOT NULL,
gender VARCHAR(50) NOT NULL,
age INT NOT NULL
)

The above script creates a dummy database Hospital. The database has 4 columns: id, name, gender, age. The id column is the primary key column. When the above script is executed, a clustered index is automatically created on the id column. To see all the indexes in a table, you can use the “sp_helpindex” stored procedure.

EXECUTE sp_helpindex Patients

You will see in output window, the index name, description and the column on which the index is created.

If you add a new record to the Patients table, it will be stored in ascending order of the value in the id column. If the first record you insert in the table has an id of three, the record will be stored in the third row instead of the first row since clustered index maintains physical order.

Custom Clustered Indexes

You can create your own clustered indexes. However, before you can do that you have to create the existing clustered index. We have one clustered index due to primary key column. If we remove the primary key constraint, the default cluster will be removed. The following script removes the primary key constraint.

ALTER TABLE Patients
DROP CONSTRAINT PK__Patients__3213E83F3DFAFAAD
GO

The following script creates a custom index “IX_tblPatient_Age” on the age column of the Patients table. Owing to this index, all the records in the Patients table will be stored in ascending order of the age.

CREATE CLUSTERED INDEX IX_tblPatient_Age
ON Patients(age ASC)

Let’s now add a few dummy records in the Patients table to see if they are actually inserted in the ascending order of age:

USE Hospital

INSERT INTO Patients
VALUES
(1, 'Sara', 'Female', 34),
(2, 'Jon', 'Male', 20),
(3, 'Mike', 'Male', 54),
(4, 'Ana', 'Female', 10),
(5, 'Nick', 'Female', 29)

In the above script, we add 5 dummy records. Notice the values for the age column. They have random values and are not in any logical order. However, since we have created a clustered index, the records will be actually inserted in the ascending order of the value in the age column. You can verify this by selecting all the records from the Patients table.

SELECT * FROM Patients

You can see that records are ordered in the ascending order of the values in the age column.

For the tables without a clustered index, data is stored in an unordered heap. Heaps do not have an ordered structure, they do not have a natural order. This can get problematic as the table grows in size and data gets fragmented over time.

Non-Clustered Indexes

A non-clustered index is also used to speed up search operations. Unlike a clustered index, a non-clustered index doesn’t physically sort data into a table. In fact, a non-clustered index is stored in a separate location from the data table. A non-clustered index is like a book index, which is located separately from the main contents of the book. Since non-clustered indexes are located in a separate location, there can be multiple non-clustered indexes per table.

To create a non-clustered index, you have to use the “CREATE NONCLUSTERED” statement. The rest of the syntax remains the same as the syntax for creating a clustered index. The following script creates a non-clustered index “IX_tblPatient_Name” which sorts the records in ascending order of the name.

use Hospital
CREATE NONCLUSTERED INDEX IX_tblPatient_Name
ON Patients(name ASC)

The above script will create an index which contains the names of the patients and the address of their corresponding records as shown below:

Name	Record Address
Sara	Record Address
Jon	Record Address
Mike	Record Address
Nicky	Record Address
Sara	Record Address

Here, the “Record address” in each row is the reference to the actual table records for the Patients with corresponding names.

For example, if you want to retrieve age and gender of the patient named “Mike”, the database will first search “Mick” in the non-clustered index “IX_tblPatient_Name” and from the non-clustered index it will fetch the actual record reference and will use that to return actual age and gender of the Patient named “Mike”

Since a database has to make two searches, first in the non-clustered index and then in the actual table, non-clustered indexes can be slower for search operations. However, for INSERT and UPDATE operations, non-clustered indexes are faster since the order of the records only needs to be updated in the index and not in the actual table.

When to Use Clustered or Non-Clustered Indexes

Now that you know the differences between a clustered and a non-clustered index, let’s see the different scenarios for using each of them.

1. Number of Indexes

This is pretty obvious. If you need to create multiple indexes on your database, go for non-clustered index since there can be only one clustered index.

2. SELECT Operations

If you want to select only the index value that is used to create and index, non-clustered indexes are faster. For example, if you have created an index on the “name” column and you want to select only the name, non-clustered indexes will quickly return the name.

However, if you want to select other column values such as age, gender using the name index, the SELECT operation will be slower since first the name will be searched from the index and then the reference to the actual table record will be used to search the age and gender.

On the other hand, with clustered indexes since all the records are already sorted, the SELECT operation is faster if the data is being selected from columns other than the column with clustered index.

3. INSERT/UPDATE Operations

The INSERT and UPDATE operations are faster with non-clustered indexes since the actual records are not required to be sorted when an INSERT or UPDATE operation is performed. Rather only the non-clustered index needs updating.

4. Disk Space

Since, non-clustered indexes are stored at a separate location than the original table, non-clustered indexes consume additional disk space. If disk space is a problem, use a clustered index.

5. Final Verdict

As a rule of thumb, every table should have at least one clustered index preferably on the column that is used for SELECTING records and contains unique values. The primary key column is an ideal candidate for a clustered index.

On the other hand columns that are often involved in INSERT and UPDATE queries should have a non-clustered index assuming that disk space isn’t a concern.

Reference: https://www.sqlshack.com/, https://www.sqlservertutorial.net, https://www.spotlightcloud.io/

Continue Reading →

Dotnet Guru ~

Monday, 9 April 2018

Group by and Partition by - SQL Server

WCF Concurrency

Sunday, 8 April 2018

Abstraction vs Encapsulation in .Net

Thursday, 5 April 2018

Clustered and Non clustered Index - SQL Server

Topics

Pages

Dotnet Guru Archives