Pythian Blog: Technical Track

Datascape podcast episode 26: a brief tour of the Google Cloud Platform

In episode 26 of the Datascape, we give you a whirlwind tour of the Google Cloud Platform, otherwise known as GCP. We of course can’t cover everything, but we’re going to take you through a couple of different reference architectures and cover the most exciting features – to us anyway. To help us navigate this tour, I’ve invited John Laham and Bjoern Rost to join us for this episode and share with us the inner-workings of GCP, what they’ve found to be the most exciting and the most challenging about the platform and how you can educate yourself further about this platform and others similar to it. John and Bjoern have a wealth of knowledge on GCP and walk us through everything from pipeline transformation, to Google Cloud Data Flow and Dataproc, to BigQuery, to Google’s Stack Driver. We also discover more on the platform’s security features, other serverless services and why we shouldn’t be thinking of GCP as a tool, but rather as a building block. If you’re looking to expand your knowledge and gain a deeper insight into this platform, this episode is for you!

Key points from this episode:

• Learn more about the architecture of the batch processing pipeline in GCP.
• The three parts that the pipeline is divided into in GCP: ingest, transform, load.
• Find out more about the Google tools that perform the pipeline transformation.
• Learn more about Google Cloud Data Flow and the role of Dataproc.
• When to use Data Flow versus when to use DataProc and how long to run them.
• Why cloud storages like Amazon S3 are probably the most serverless services we use.
• Discover more about BigQuery and how it differs from Oracle and Sequel Server.
• Why you shouldn’t think of GCP as a tool, but rather as a building block.
• How to secure your platform and ensure that services are able to speak to each other.
• Learn more about Google’s IEM and the power of service account features.
• Find out more about bridging security between on prem and Google Cloud.
• How Google Cloud Identity and G-suite work and more about their features.
• The beauty of the Cloud SQL proxy when it comes to authenticating services.
• Discover just how auditable security is on GCP straight out of the box.
• Why you need to be extremely diligent in how you create and assign service accounts.
• Learn more about Google’s Stack Driver and why it is such a powerful platform.
• Helpful tools and resources for listeners to learn more about this software.
• And much more!

Links mentioned in today’s episode:

Google Cloud Data Flow
BigQuery
Google Tableau
Apache Daemon
Kubernetes
Big Table
Fire Base
G-Suite
Certified Data Engineers
Data Science Book
Coursera courses
Udemy
Cloudscape Podcast

No Comments Yet

Let us know what you think

Subscribe by email