
As seen in the diagram, the data capture
process captures the data from transaction logs resulting better performance by
avoiding overload on DML operations and transactions. The snapshot for captured
changed data is retrieved through the query functions (as shown above) which
are used for data warehouse.
CDC database tables
After reading the transaction logs, it
populates the change tables. This change tables are mentioned in the Part I
still we would revisit them.
[cdc].[captured_columns]
[cdc].[change_tables]
[cdc].[ddl_history]
[cdc].[index_columns]
[cdc].[lsn_time_mapping]
[cdc].[<capture_instance>_CT]
Apart from the last one, these are kind of
system table for CDC and actual data would be captured in separate tables which
would correspond to the database tables for which CDC is to be enabled (as
indicated last in the list)
CDC Database Stored Procedures
Following stored procedures are created in
database as a part of enabling the CDC for particular database.
sys.sp_cdc_add_job (Transact-SQL)
sys.sp_cdc_generate_wrapper_function (Transact-SQL)
sys.sp_cdc_change_job (Transact-SQL)
sys.sp_cdc_get_captured_columns
(Transact-SQL)
sys.sp_cdc_cleanup_change_table
(Transact-SQL)
sys.sp_cdc_get_ddl_history (Transact-SQL)
sys.sp_cdc_disable_db (Transact-SQL)
sys.sp_cdc_help_change_data_capture
(Transact-SQL)
sys.sp_cdc_disable_table (Transact-SQL)
sys.sp_cdc_help_jobs (Transact-SQL)
sys.sp_cdc_drop_job (Transact-SQL)
sys.sp_cdc_scan (Transact-SQL)
sys.sp_cdc_enable_db (Transact-SQL)
sys.sp_cdc_start_job (Transact-SQL)
sys.sp_cdc_enable_table (Transact-SQL)
sys.sp_cdc_stop_job (Transact-SQL)
These are employed for
1.
Configure CDC
2.
Manage (i.e. create and run)
the CDC agent jobs
3.
Provide the metadata to CDC
consumers i.e. the one who would be utilizing CDC which could be a data
warehouse of simple system for display history
CDC Database Functions
Following functions are created to retrieve
the information about the changes which are captured as part of CDC.
sys.fn_cdc_has_column_changed
(Transact-SQL)
sys.fn_cdc_increment_lsn (Transact-SQL)
sys.fn_cdc_decrement_lsn (Transact-SQL)
sys.fn_cdc_is_bit_set (Transact-SQL)
sys.fn_cdc_get_column_ordinal
(Transact-SQL)
sys.fn_cdc_map_lsn_to_time (Transact-SQL)
sys.fn_cdc_get_max_lsn (Transact-SQL)
sys.fn_cdc_map_time_to_lsn (Transact-SQL)
sys.fn_cdc_get_min_lsn (Transact-SQL)
cdc.fn_cdc_get_net_changes_<capture_instance>
(Transact-SQL)
cdc.fn_cdc_get_all_changes_<capture_instance>
(Transact-SQL)
The last 2 functions returns the rows for
change applied for data. Other functions are administrative and related to meta
data.
SQL Server Agent Jobs
There are 2 SQL server agent jobs for CDC
1.
To populate the change tables à Capture Job
2.
To clean-up the change tablesà Clean-up job
These jobs
are created through stored procedures once a database table is enabled for CDC.
The “Capture Job” starts immediately and removed when CDC is disabled for
database. The creation and removal can be controlled by administrators.
Limitations of CDC
CDC doesn’t support the capturing of
changes in case of computed columns, usage of column-set in case of sparse
columns. For timestamp data type the data is converted into binary.
Querying and meta data
Let’s check out for the columns in the
tables meant for captured data.
(i)
__$operation denotes the database operation
1: DELETE 2:
INSERT 3: Before UPDATE 4: After UPDATE
(ii)
__$start_lsn
: It’s a Log Sequence Number (LSN) of
data type binary(10) and is common in one transaction.
(iii)
__$end_lsn
: Meant for informational purpose
(iv)
__$seqval
: Represents a sequence value for order (sequence) in a transaction