Pythian Blog: Technical Track

Five things to know before migrating your data warehouse to Google BigQuery

  In September, Google announced enhancements to BigQuery, the Google Cloud Platform service for large-scale data analytics. So how do you know if Google BigQuery is the right tool for what you want to do—and if it is, what’s the best way of migrating your data into it? We’ve compiled some tips to help companies answer these and related questions. Here are five things you should know before migrating your data warehouse to Google BigQuery:  

1. Know your requirements

It’s important to appreciate what Google BigQuery is—and what it isn’t. What it is is a powerful analytics engine for processing big datasets. Which means if you’re working with a small dataset, you’re not going to realize its full potential. There are also some specific database functions it doesn’t support (e.g., it’s not an OLTP database, it doesn’t support locking, multi-row/table transactions, primary keys and referential integrity ). Make sure your data and your data processing goals are well aligned with BigQuery’s capabilities.

2. Validate your assumptions

If BigQuery seems like the right fit, test it out. Identify a “lighthouse” project—some kind of leading initiative or an area with substantial cost or performance impact—to put BigQuery through its paces. As you do, set measurable goals for performance, cost and usability, and see how Google BigQuery delivers.

3. Think integration

Google BigQuery will give you a powerful analytics engine, but you’ll most likely still want to draw on your existing tools for data transformation, visualization, and so on. Confirm ahead of time how BigQuery will integrate with the rest of your data environment and determine how you’ll need to adjust your existing data pipeline to make it fit.

4. Factor your costs

Make sure you understand Google’s pricing structure and what option will work best for your enterprise. For Google Cloud Platform, computational resource and storage costs are usage-based and calculated independently. And although storage is always volume based with some automatic discounts for idle data (which is currently set at more than 90 days), firms that prefer a predictable monthly cost can reserve computational resources.

5. Look beyond migration

Get familiar with the ways Google BigQuery lets you monitor and analyze service usage to ensure you’ll have the analytics you need to evaluate performance, resource demand and cost over time. BigQuery supports Stackdriver integration, audit logging, and more—so it should yield insights that you can convert into action for your organization, but make sure up front. In addition to these considerations, you will need to consider storage for things like landing zone, archiving, staging, etc.. And with Goolge’s new Cloud Storage offerings (Multi-Region, Nearline and Coldline - https://cloud.google.com/storage/pricing), adopting a Google Cloud Platform solution may be even more attractive. The new storage offering will benefit any organization that recognizes that data is key to driving better business outcomes. This solution provides: more storage options to help companies create optimal performance/price mixes, well-priced low latency storage options, multi-regional storage to improve reliability and responsiveness for companies with geographically dispersed consumers.   If you’d like a more in-depth look at BigQuery and the migration process, download our white paper A Framework for Migrating Your Data Warehouse to Google BigQuery.

No Comments Yet

Let us know what you think

Subscribe by email