Pythian Blog: Technical Track

Google announces feature releases on BigQuery and other updates

Recently, I joined Chris Presley in an episode of the Cloudscape Podcast to share some of the latest news taking place around Google Cloud Platform (GCP). Some of the highlights of our discussion were:
  • BigQuery now available in the London region and
  • Column-Based Time Partitioning of Tables
  • Cloud Spanner DML: GA
  • Cloud NAT: Beta
  • Google-managed certificates for Load Balancers: Beta
  • Preemptible Cloud TPUs: GA
  • Cloud IoT Core – Device Commands: Beta
  DATA ANALYTICS BigQuery is now available in the London region, with support for column-based time partitioned tables. BigQuery is one of my favorite services in GCP and I’m always excited when there are new feature releases on BigQuery. There are two important releases to note. First, GCP has released the new regions that were promised. The London region has been released as far as BigQuery is concerned and so have the Singapore and Sydney regions. Basically, we are looking at the progression of BigQuery being pulled out across multiple regions, which is pretty much in plan with what Google has been publishing and pushing in order to roll-out BigQuery across the globe. This is helpful for a lot of international customers who were always struggling with data sovereignty issues and didn’t want data to leave their country. Having the data closer to home really helps with security and compliance with multiple regulations. The second big release from BigQuery’s perspective is the ability to have column-based partitions on specific date fields. This is a very big release in terms of being able to manage data much more effectively. One of the other things that is a part of this release is enabling DML statements, as far as these partitions are concerned. It’s not a regular standard SQL. It’s a little bit customized in terms of how BigQuery handles partitioning. One thing to consider is we all know BigQuery is a multiple end solution and it has a strict limit on quotas and limits in terms of the number of partitions that you can process for a given table. Even though these are very good releases, there are still some limitations in terms of the total number of partitions you can have on a table. Obviously, this is because of their underlying architecture - how BigQuery was created - but there are rumors making the rounds that they are going to increase this and try to make these limits a little bit more flexible. DATABASES Cloud Spanner DML: GA I also want to talk about the latest Cloud Spanner updates and the cloud funnel which are both really exciting. When you have a lot of tools as a database developer and suddenly those tools are taken away from you, it’s never a happy situation. This is exactly what happened with Cloud Spanner because they did not really have the statements. Simple things like insert, update, delete were not really a part of Cloud Spanner because of the way it was developed. It’s not a traditional database solution. There was a lot of pushback from the industry that, “You really need these statements. We cannot do everything using REST API services because as we are doing a lot of migrations, and it’s not always possible to convert things into the model in which Spanner operates.” So finally, Google went ahead and allowed for Data Manipulation Language (DML) services which are now General Availability (GA) on Cloud Spanner. To that effect, it’s not only supported on Cloud Spanner, but they’ve already released JDBC drivers which allow for these DML statements to be executed on Cloud Spanner databases, as well. This is actually a pretty big release. For those of us working with numerous customers, this was one of the primary complaints that we heard from customers over and over again. We had to use third party tools and other ways to do DML, but now, with this being already rolled into Spanner, it’s going to be much easier. I’m assuming that there’s going to be much higher adoption for Cloud Spanner going forward - just because the enablement of this feature really helps. COMPUTE Cloud Network Address Translation (NAT): Beta Cloud NAT allows for Google VMs that are within a secure zone to communicate outbound to the internet. So basically, the IPs or any other firewall rules do not need to be configured at the VM level. The network security can all be taken care of at the NAT level, which is also highly available and fully managed. This is only an outbound connection for a VM to communicate to the internet and to the global services without actually exposing the VM publicly directly. This was a catch-up that Google did in order to provide a little bit more security. But note that it’s only applicable to outbound connections. There is no inbound connection as far as these NATs are concerned. And you also need to bring your own servers in order to do the routing. This is one update that GCP did which was a little bit difficult to follow, especially using the cloud firewalls that were available within the networking aspects of GCP. NETWORKING Google-managed certificates for Load Balancers: Beta Managed certificates for load balancers are another example of a problem that many customers were experiencing and then one of the cloud providers managing that as a service and providing that outright. It was always a problem in managing SSL certificates - when they should be used, when they expire and then allocating dedicated resources (people) or tools to manage all that. But what GCP has now announced is a managed SSL for load balancers, which means you can now actually use Google-managed SSL certificates. This managed service allows for tracking and managing the SSL certificates - when it expires, auto-renewals and all of those things that you typically require from an SSL lifecycle management perspective. You can always bring your own managed SSL certificates, that’s not a problem, but this is another service that is available to leverage as a fully-managed one. AI AND MACHINE LEARNING Preemptible Cloud TPUs: GA This is a pretty big and exciting announcement, especially for all the machine learning geeks out there. Right now, you have TPUs that are allowed as preemptible instances on GCP. You can have multiple preemptible instances using TPUs, as well. So any type of elastic scaling that you need in terms of using preemptible instances and reducing your cost during seasonal peaks of running your machine learning models and algorithms can definitely be accomplished now. I have a feeling that Google is not only investing a lot in TPUs (because of their TensorFlow program), but also working very closely with Nvidia in order to enable GPUs, as well. So right now you can also use GPUs within containers in Kubernetes. Essentially, you can have this fully managed without looking at the VMs for both TPUs and GPUs. This is pretty awesome. So they have a very consolidated approach and I see from a backend standpoint that Google is working very closely with Nvidia. And they have some of the latest Nvidia P4s and Tesla 100s being released on GCP first, even before some of the other major cloud providers. GCP had to do a lot of catch up on this because they were lagging on the speed but they are pretty much out there now. This is an important announcement, especially for very elastic workloads or people who needed that level of elasticity in terms of running their machine learning workloads, whether in GPUs or in TPUs. We are seeing a lot of adoption in this particular space, especially with moving a lot of high-performance computing onto these newer models and scaling up as much as needed and then scaling back down pretty quickly on GCP. INTERNET OF THINGS Cloud IoT Core – Device Commands: Beta One of the reasons I wanted to include internet of things (IoT) updates is because we have never really talked about GCP IoT in this podcast before and I wanted to spend some time to discuss GCP’s capabilities in this space. GCP was definitely a lagger in the IoT space, but with a couple of new announcements in the last few months, it has really started to catch up. A little bit of context here: one of the things that GCP announced was Cloud IoT Edge, which is essentially a software and a chip which can be deployed in IoT devices. So they have a chip, basically an Edge TPU, which you can deploy on your IoT device and it can do much in learning intensive flow activities on the device itself even before it reaches your network. These are some of the biggest capabilities that they were trying to do in terms of pushing a lot of compute actions to the edge, and then being able to do a lot of derivative actions on the network and the compute engines on the cloud. To this effect, they have actually done a couple of releases, especially in terms of managing these IoT devices, a little bit better. They have device commands that you can actually publish from your central cloud infrastructure out to your devices. This is through Cloud IoT Core on GCP (which can be used to send commands to devices). This is in beta. So all these messages and commands that are available on your IoT devices can now be sent over in MQTT from your central place all the way out to the IoT Edge devices (since Cloud IoT Core supports the standard MQTT and HTTP protocols). The second part of this, again in beta, is basically being able to shift all the device logs back into the central network where you can identify, access, turn on, turn off and handle all of the other regular device logging events. You can have all of that consolidated and visualized in Stackdriver. So basically now the IoT ecosystem from GCP is a little bit more consolidated and well-baked. There are a few aspects of it which is still in beta but we are hoping that within the next couple of months everything will be in General Availability and I definitely feel that at this point in time, GCP now has a very strong and compelling IoT story. They are pushing a lot of work down to the edge and are really making this big bet on making edge computing on IoT more of a priority for enabling IoT services on GCP. This was a summary of the GCP topics we discussed during the podcast, Chris also welcomed Warner Chaves (Microsoft Azure) who discussed topics related to his expertise. Hear the full conversation and be sure to subscribe to the podcast to be notified when a new episode has been released.  

No Comments Yet

Let us know what you think

Subscribe by email