This article (Part I) explores one of the feature (of SQL server 2008 and higher) of providing the auditing abilities i.e. CDC – Change Data Capture. We will discuss the intrinsic of CDC in Part II.
Introduction and background:
Auditing considerations are important while
designing the applications especially enterprise ones. In transaction based
system, auditing and tracking are crucial component of any system from the
perspective of security as well as data. Data auditing needs to be employed for
configuration or master data wherein the changes would have large impact.
Traditional techniques are using triggers,
stored procedures or hooking the logic in front end. All these approaches have
pros and cons. Triggers were popular choice but there inherent performance
issues or possibility of bottlenecks along-with dependency on table structure is
making them tricky.
Stored procedures are also preferred but it
has limitations of not able to track changes outside it.
Triggers and stored procedures are not
agnostic of table structure changes and also prone to bugs.
Although triggers are automatic, most of
the database architects are refraining themselves from recommending those. Not
to miss is that triggers also add maintenance overheads.
The third approach taken is where the
auditing mechanism is implemented in UI through auditing components and such
calls are hooked to the places where auditing is to be accomplished. This looks
simpler but has lot of disadvantages as it has to be implemented at many places
introducing redundancy, additional database trip and also such operation needs
to be included in a transaction along-with the database operation. This also incurs maintenance overhead.
To address such issues, Microsoft has come
up with more sophisticated and atomic approach called CDC- Change Data Capture.
It provides ability to record all data change operations such as INSERT, UPDATE
and DELETE. The data is captured asynchronously which means it is not captured
in transactions containing these operations rather it is captured through the
transaction logs reducing the performance impact.
Objective
To learn about the CDC feature of SQL
server (version 2008 and higher)
Using the code
Note: For
the examples, the schema for Employee table from sample database i.e.
AdvntureWorks in SQL 2008 R2 is used (after some changes in structure). For
demonstration, the scripts are also attached.
Setting up CDC has few
administrative tasks to be performed on the database for who CDC is to be
implemented.
Let’s
create a database called “TrialDB”.
USE master;
go
IF EXISTS (SELECT * FROM sys.databases WHERE name = 'TrialDB')
DROP DATABASE TrialDB;
go
CREATE DATABASE
TrialDB;
Go
The next step is to enable the CDC for this
database.
USE TrialDB;
GO
EXECUTE sys.sp_cdc_enable_db;
GO
USE master;
go
IF EXISTS (SELECT * FROM sys.databases WHERE name = 'TrialDB')
DROP DATABASE TrialDB;
go
CREATE DATABASE
TrialDB;
Go
The next step is to enable the CDC for this
database.
USE TrialDB;
GO
EXECUTE sys.sp_cdc_enable_db;
GO
The execution of this command results into
3 actions
1.
Create the Schema named “cdc”
2.
Create database user named
“cdc”
3.
Create 5 tables with schema
“cdc”
This is depicted in following diagram.

Is it all done? Not yet, the CDC is enabled
for database but it has to be enabled for individual tables too.
Let’s create a table called “Employee”.
This table originally belongs to database “AdventureWorks”.
SET ANSI_NULLS
ON
GO
SET QUOTED_IDENTIFIER
ON
GO
CREATE TABLE
[Employee](
[EmployeeID] [int] IDENTITY(1,1) NOT NULL,
[NationalIDNumber] [nvarchar](15) NOT NULL,
[ContactID] [int] NOT
NULL,
[LoginID] [nvarchar](256) NULL,
[ManagerID] [int] NULL,
[Title] [nvarchar](50) NULL,
[BirthDate] [datetime] NULL,
[MaritalStatus] [nchar](1) NULL,
[Gender] [nchar](1) NULL,
[HireDate] [datetime] NULL,
[VacationHours] [smallint] NULL,
[SickLeaveHours] [smallint] NULL,
[rowguid] [uniqueidentifier] ROWGUIDCOL NULL,
[ModifiedDate] [datetime] NULL,
CONSTRAINT
[PK_Employee_EmployeeID] PRIMARY KEY CLUSTERED
(
[EmployeeID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS
= ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON
[PRIMARY]
GO
Once the table is created, the CDC can be
enabled as shown in following screenshot. This shows the script as well as a
new table created named “cdc.Employee_CT”.
Is it all done? Not yet, the CDC is enabled
for database but it has to be enabled for individual tables too.
Let’s create a table called “Employee”.
This table originally belongs to database “AdventureWorks”.

Hurray, the CDC is set up for database table and
table and we are ready to plunge into database operations and see the results
thereof.
INSERT:
Following query would insert a record into Employee table
INSERT INTO
[TrialDB].[DBO].[Employee]
([NationalIDNumber]
,[ContactID]
,[LoginID]
,[ManagerID]
,[Title]
,[BirthDate]
,[MaritalStatus]
,[Gender]
,[HireDate]
,[VacationHours]
,[SickLeaveHours]
)
VALUES
(14417807
,1209
,'trial\guy1'
,16
,'Production Technician - WC60'
,NULL
,'M'
,'M'
,GETDATE()
,21
,30
)
So this INSERT operation shall be captured
along-with data and we can confirm this with following query.

We can see the data inserted captured
along-with some system data.
__$operation denotes the database operation
1: DELETE
2: INSERT
3: Before UPDATE
4: After UPDATE
UPDATE: We will update the inserted record
and change the data for column “NationalIDNumber” as depicted below.
As seen, there are 2 records in the CDC
table one for record before update and other for after update. It is
interesting and very useful for security sensitive systems.
Let’s insert one more record.
INSERT INTO
[TrialDB].[DBO].[Employee]
([NationalIDNumber]
,[ContactID]
,[LoginID]
,[ManagerID]
,[Title]
,[BirthDate]
,[MaritalStatus]
,[Gender]
,[HireDate]
,[VacationHours]
,[SickLeaveHours]
)
VALUES
(14417809
,1210
,'trial\guy1'
,16
,'Production Technician - WC60'
,NULL
,'M'
,'M'
,GETDATE()
,21
,30
)
GO
Following is snapshot of data in CDC table
where we can see the records for this insert.

DELETE: We would delete the first record
and see how it is captured.

Summary and Conclusion
The auditing and tracking are important
system need for enterprise applications. Especially in Finance domain where the
systems are transactions base, it attains far more importance. Traditionally the triggers, stored procedure
and front end components were implemented. They have pro and cons and fit for
particular scenarios. This has performance impacts as well as maintenance
overhead. With Microsoft’s CDC feature in SQL server 2008, it seems that it is fool
proof feature addressing the needs of atomicity, performance and more detailed
and sophisticated approach.
We saw a practical example and also went
through database operations and their data capture.
This looks like very promising feature and
it would be easy to settle for at least new project. For existing systems,
there is little one time overhead for setting up the CDC but it could be worth
given its benefits.
HAPPY PROGRAMMING!!!
Reference
http://msdn.microsoft.com/en-us/library/bb522489(v=sql.105).aspx