CDC in SQL server 2008 R2 (Part II)

Vishvvas
Posted by in Sql Server category on for Intermediate level | Points: 250 | Views : 8133 red flag
Rating: 5 out of 5  
 1 vote(s)

This article (Part II) discusses the intrinsic of CDC as one of the feature (of SQL server 2008 and higher) for providing the data auditing abilities

Introduction

In the previous article (http://www.dotnetfunda.com/articles/article1924-cdc-in-sql-server-2008-r2-part-i.aspx), we discussed about auditing needs in general and solutions for data auditing. Also we saw how to implement CDC and consequent results.

In this article, we would dive deep to know what the architecture of CDC and how it is structured in the SQL server 2008.

Objective


To learn about the intrinsic of CDC feature of SQL server (version 2008 and higher) To learn about the intrinsic of CDC feature of SQL server (version 2008 and higher)


Description


Note: SQL 2008 R2 is referenced and also couple of MSDN articles, image/s are referred for this write-up.

The diagram which depicts the data flow for CDC is as below

 (Image taken from http://msdn.microsoft.com/en-us/library/cc645937%28v=sql.105%29.aspx)

As seen in the diagram, the data capture process captures the data from transaction logs resulting better performance by avoiding overload on DML operations and transactions. The snapshot for captured changed data is retrieved through the query functions (as shown above) which are used for data warehouse.

CDC database tables

After reading the transaction logs, it populates the change tables. This change tables are mentioned in the Part I still we would revisit them.

[cdc].[captured_columns]

[cdc].[change_tables]

[cdc].[ddl_history]

[cdc].[index_columns]

[cdc].[lsn_time_mapping]

[cdc].[<capture_instance>_CT]

Apart from the last one, these are kind of system table for CDC and actual data would be captured in separate tables which would correspond to the database tables for which CDC is to be enabled (as indicated last in the list)

CDC Database Stored Procedures

Following stored procedures are created in database as a part of enabling the CDC for particular database.

sys.sp_cdc_add_job (Transact-SQL)

sys.sp_cdc_generate_wrapper_function (Transact-SQL)

sys.sp_cdc_change_job (Transact-SQL)

sys.sp_cdc_get_captured_columns (Transact-SQL)

sys.sp_cdc_cleanup_change_table (Transact-SQL)

sys.sp_cdc_get_ddl_history (Transact-SQL)

sys.sp_cdc_disable_db (Transact-SQL)

sys.sp_cdc_help_change_data_capture (Transact-SQL)

sys.sp_cdc_disable_table (Transact-SQL)

sys.sp_cdc_help_jobs (Transact-SQL)

sys.sp_cdc_drop_job (Transact-SQL)

sys.sp_cdc_scan (Transact-SQL)

sys.sp_cdc_enable_db (Transact-SQL)

sys.sp_cdc_start_job (Transact-SQL)

sys.sp_cdc_enable_table (Transact-SQL)

sys.sp_cdc_stop_job (Transact-SQL)

These are employed for

1.       Configure CDC

2.       Manage (i.e. create and run) the CDC agent jobs

3.       Provide the metadata to CDC consumers i.e. the one who would be utilizing CDC which could be a data warehouse of simple system for display history

CDC Database Functions

Following functions are created to retrieve the information about the changes which are captured as part of CDC.

sys.fn_cdc_has_column_changed (Transact-SQL)

sys.fn_cdc_increment_lsn (Transact-SQL)

sys.fn_cdc_decrement_lsn (Transact-SQL)         

sys.fn_cdc_is_bit_set (Transact-SQL)

sys.fn_cdc_get_column_ordinal (Transact-SQL)

sys.fn_cdc_map_lsn_to_time (Transact-SQL)

sys.fn_cdc_get_max_lsn (Transact-SQL)

sys.fn_cdc_map_time_to_lsn (Transact-SQL)

sys.fn_cdc_get_min_lsn (Transact-SQL)

cdc.fn_cdc_get_net_changes_<capture_instance> (Transact-SQL)         

cdc.fn_cdc_get_all_changes_<capture_instance> (Transact-SQL)

The last 2 functions returns the rows for change applied for data. Other functions are administrative and related to meta data.

SQL Server Agent Jobs

There are 2 SQL server agent jobs for CDC

      1.       To populate the change tables à Capture Job

2.       To clean-up the change tablesà Clean-up job

These jobs are created through stored procedures once a database table is enabled for CDC. The “Capture Job” starts immediately and removed when CDC is disabled for database. The creation and removal can be controlled by administrators.

Limitations of CDC

CDC doesn’t support the capturing of changes in case of computed columns, usage of column-set in case of sparse columns. For timestamp data type the data is converted into binary.

Querying and meta data

Let’s check out for the columns in the tables meant for captured data.

(i)                   __$operation  denotes the database operation

1: DELETE             2: INSERT             3: Before UPDATE            4: After UPDATE

(ii)                __$start_lsn  : It’s a Log Sequence Number (LSN) of data type binary(10) and is common in one transaction.

(iii)               __$end_lsn : Meant for informational purpose

(iv)              __$seqval : Represents a sequence value for order (sequence) in a transaction

(v)                __$update_mask : Serves as a bit mask for identifying which columns from change tables are changed.

Conclusion

The data auditing through CDC is really useful feature. In the part I of this series we saw practical example. We delved deep in CDC intrinsic and touch based the data flow overview, the stored procedures, agent jobs and querying etc. in this part.

Definitely there is a bit of administrative and monitoring overhead involved in CDC but it is taken care by the atomic, unified, configurable execution of capturing the changed data.

Hope these articles have helped to understand the minimum fundamental aspects of CDC for actual implementation and also implementation steps.

HAPPY PROGRAMMING!!!

Reference

http://msdn.microsoft.com/en-us/library/bb522489(v=sql.105).aspx

http://msdn.microsoft.com/en-us/library/cc645858.aspx

 



Page copy protected against web site content infringement by Copyscape

About the Author

Vishvvas
Full Name: Vishwas Sutar
Member Level: HonoraryPlatinum
Member Status: Member,MVP
Member Since: 5/30/2011 2:13:10 AM
Country: India

http://www.dotnetfunda.com
Extensive and rich experience across gamut of technologies and programming languages like PB,VB,C++,VB.NET, C#, Classic ASP,ASP.NET, ASP.NET MVC.

Login to vote for this post.

Comments or Responses

Login to post response

Comment using Facebook(Author doesn't get notification)