Apache Beam, Python and GCP: Deploying a Streaming Pipeline on Google DataFlow using PubSub
Following a sequence of articles on APACHE BEAM, here we will describe how to deploy a streaming pipeline, created locally, to Google Dataflow, in a very simplified way.
I already published 2 previous articles about apache beam. One talking about the simplicity of changing a batch pipe to streaming in this framework ( here ), and another about how to publish a batch pipeline on Google DataFlow ( here ). And in this third article, I want to share with you the publication of a streaming pipeline, in a simplified way, where you consume data from a PubSub subscription and write, another PubSub topic.
CREATE SERVICE ACCOUNT
Go to IAM & Admin > Service Accounts > + Create > name your SA > Create:
Then give Dataflow Worker permission > Click Done:
Once created, go to the 3 dots to the right of the created SA, and click Create Key > Select JSON > Create:
They are exactly the same steps that I described in the article on the batch pipeline.
Ready, SA (Service Account Created) and exported, it should be in your Downloads folder! Here are some more details on how to use the Python SDK and Dataflow
YOUR LOCAL ENVIRONMENT
Here below, I describe exactly the same instructions as in the article about the batch pipeline… But how it needs to be done and I didn’t if you read the previous article, here are the details:
If you are running Apache in Direct Runner, i.e. locally, you already have Apache Beam packages installed. Now also install the Apache Beam SDK packages for GCP with the following command…