Python, Beam and Google Dataflow: From Batch to Streaming in a few lines

Is it possible to transform my script from a pipeline in Batch to Streaming, without a headache? YES, on Apache Beam.

Cássio Bolba
3 min readJun 17, 2023
data pipelines

Just before we start:

Are you not a Medium member? I think you should consider signing up via my referral link to take use all Medium has for you costing just $5 a month!

I created a very simple script in Apache Beam, for a task in Batch, using Apache Beam’s Direct Runner, that is, running locally (not in a Spark or Dataflow engine). The data consumed is flight data, containing various information, such as flight number, origin, destination, delay in departure, delay in arrival…

Sample data

So, I created a routine that filters only the records with a positive arrival delay, in column 8 (starting from index 0), and the respective Airport, in column 4. The script looks like this:

--

--