Member-only story
Apache Beam, Python and GCP: Deploying a Batch Pipeline on Google DataFlow
In this article, we will describe how to deploy a batch pipeline, created locally, to Google Dataflow, in a very simplified way.

In the previous article (here), we explored how to change a pipeline from batch to streaming with just a few extra lines. This shows us the versatility of using Apache Beam.
In this article, we will describe how to deploy a batch pipeline, created locally, to Google Dataflow, in a very simplified way. There are other methods to deploy, more or less complex. Complexity that depends on your level of knowledge in python, especially.
Let’s put your hand in the dough?
Create a Service Account
Go to IAM & Admin > Service Accounts > + Create > name your SA > Create:

Then give Dataflow Worker permission > Click Done

Once created, go to the 3 dots to the right of the created SA, and click on Create Key > Select JSON > Create

Ready, SA (Service Account Created) and exported, it should be in your Downloads folder! Here are some more details on how to use the Python SDK and Dataflow
In your Local Environment
If you are running Apache in Direct Runner, i.e. locally, you already have Apache Beam packages installed. Now also install Apache Beam SDK packages for GCP with the following command via CMD or :
pip install apache-beam[gcp]
This SDK allows your local Apache Beam code, which runs with the Direct Runner (it’s worth researching possible runners, such as spark, flink…), to be converted and…