Concepts for Data Engineers: Change Data Capture — CDC

Cássio Bolba
5 min readMay 9, 2023

In this series I’m introducing several important concepts that new Data Engineers should be aware of. The other topics I talked so far:
Data Modelling
CDC
Idempotency

I also have 2 series about python:
🐍 Efficient Python
🐍 Software Engineering with Python

Change Data Capture (CDC) is a method of identifying and capturing changes made to a database. It captures data changes and enables businesses to keep track of all modifications made to their data, including updates, inserts, and deletes. CDC is a critical tool for businesses, particularly those who deal with large volumes of data, as it allows them to make better decisions by providing analysis. In this article, we will discuss CDC, its benefits, how it works, and its importance.

How CDC Works

CDC works by capturing changes made to a database and forwarding them to a target system. It uses log files to capture changes made to the database. The log files are generated by the database management system and contain information about every change made to the database. CDC monitors the log files and captures the changes made to the database in real-time or batch.

Once CDC captures the changes, it forwards them to the target system. The target system can be another database or a data warehouse, where the captured data can be…

--

--