Pythian Blog: Technical Track

Examining select announcements from Microsoft Ignite 2018

I recently joined Chris Presley for a special version of his podcast, Cloudscape , to talk about some of the recent announcements at Microsoft Ignite 2018. Topics of discussion included:
  • SQL 2019
  • Azure SQL DB Hyperscale and Managed Instance
  • Cosmos DB multi-master
  • Kubernetes virtual node preview and Kubernetes in Azure
Every time these conferences happen, there is an avalanche of announcements. I’d like to give readers a summary of this event but before we dive in, I just want to clarify that this list of announcements here is not exhaustive. There are so many announcements that I cherry picked my favorites and the ones I thought were the biggest. If you want to see the entire list, check out Azure’s blog . Cloud is a big deal now, and this event covered hybrid cloud and a lot of interesting open source integrations with Microsoft. Azure is increasingly trying to sell Azure as not a Microsoft-centric cloud, but a cloud where you can do open-sourced innovation as well. The other pillars are data and AI. Microsoft keeps developing and pushing for the more aggressive releases of all their data-related products and services. I watched part of Satya Nadella’s keynote and heard some pretty interesting client stories, such as a big one with Coca-Cola. They were on stage with Adobe and SAP and they announced an open data initiative; they’re going to have a common data model with the three companies so that any customer can pull the data they own from each one of those companies in one unified model. I thought that was a very interesting move. SQL 2019 Let’s take a look at the SQL Server 2019 announcement. I think the biggest and probably the most surprising for a lot of people is that HDFS and Spark are going to be bundled with SQL 2019. From what I understand, Microsoft is going to support it as part of the same product family. If you deploy your Spark cluster using this bundle, then your SQL Server’s support will also cover whatever you’re playing around with this on the Spark cluster. Based on my client experience, I don’t think people who are already running their big data clusters are going to change their process or workflow to suddenly use this distribution that is bundled with SQL 2019. I think it’s difficult to expect big changes like this from a company that already has an established big data infrastructure or services. So, who is the audience for this? I had a conversation with Jack Vaughn, a tech writer for TechTarget. He covered the Ignite announcements as well and he asked me this exact same thing. I told him I think this is for people who feel the big data ecosystem is intimidating and they don’t know where to start. If they’re already comfortable with SQL Server and they’re comfortable with the Microsoft workflow and process and support, then this gives them an easy way to follow the documentation and get up and running fast. They can follow the process that is laid out in the installer and they are going to end up with an HDFS cluster. It’s probably not going to be a huge investment at the very beginning, but this gives them an easy way to get started. Another thing that’s coming out in this release in 2019 is that they’ve expanded the capabilities of PolyBase quite a bit. They’ve added support for PolyBase to go into Oracle, TeraData, MongoDb and I believe PostgreSQL as well. To make it really clear for people that not familiar with PolyBase - it is a parallel processing data loading/querying technology and it’s faster than what you’re getting performance-wise with a linked server. You can build an entire layer of virtualized schemas on SQL Server that actually all tie up to different backend databases. It’s turning SQL Server as your hub for all these different data sources and using it as your main point of integration. That ties into why would you want to use SQL with HDFS? Well, if you’re already going to be using SQL as your point of data integration for Oracle and Teradata and Mongo, then why not also add the point of integration for your big data cluster? Azure SQL DB Hyperscale and managed instance Hyperscale is a new way to run Azure SQL Databases. The big change is that Microsoft is changing the core architecture of how their database system works and optimizing it for the cloud and how they run Azure. Previously, we could all say we have SQL Server which Microsoft has turned into a database as a service offering and added features first in the cloud as they move into the cloud-first model. But the way the pieces fit is still the same as if you were to run it on bare metal. That part has not changed until now. Hyperscale is now taking the different modules of SQL Server and decoupling the compute from the storage engine in a way that is optimized for the elasticity of the cloud. Hyperscale, for example, will support SQL databases that can grow to a hundred terabytes without being tied down to a specific compute configuration. Scaling up and down used to be a data-sized operation. With Hyperscale, because these two layers are decoupled, you can scale your compute up or down a lot faster regardless of the size of your database. A database that’s 20 GBs will scale up or down just as fast as a database that is 20 terabytes. That’s another big value proposition for Hyperscale. They re-architected the log so that log performance will be faster than a regular SQL Database. So for very high throughput OLTP workloads, you’re going to get a lot better performance and scale to more replicas faster. For example, they really leverage the fact that you can redistribute their log records queue to many replicas by changing this entire module of SQL Server and making it cloud-aware. For clients with high performance, read-only scalability requirements, Hyperscale is a really good fit as it is still a fully managed service but improves on many of the pain points and bottlenecks you might find currently running very intensive workloads on Azure SQL Database. Right now, Hyperscale is in preview and it is for SQL Database only. It is very interesting because this is really a re-architecture of database engines, literally re-architected for the cloud. I think the goal eventually is to expand it so that there will be Hyperscale for elastic pools and Hyperscale for managed instances. The other big announcement for the Azure SQL family is the Managed Instances are now in general availability. If you’re not familiar with Managed Instances, it takes the concept of the Instance that is core to the SQL Server retail product but it is built as a service in Azure. The point of this service is to be a very low friction migration experience for people who already run SQL Server. Cosmos DB multi-master If you’re not familiar with Cosmos DB, check out our Datascape Podcast because we did a deep dive into Cosmos a few months ago and we also covered in the Cloudscape at some point that Cosmos had the multi-master capability in preview. The big announcement now is that the multi-master feature is now GA. To give everybody an idea of what this accomplishes is:
  • Multiple database copies work in read and write mode and the service handles the data replication and consistency between each other.
  • You can have a global database deployed in any Azure region around the world.
  • You can have multiple ways to do conflict detection. Either use the automated system, which is based on Microsoft's global clock system to do last writer wins resolution, or you can implement your own conflict resolution code.
There is no other system as a “database as a service” offering, in the world that does this right now. I can deploy a database that is in California, in Germany and in Tokyo and have the local users all read and write to their local copies. Under the covers, this is replicated to all the other data centers. And while Amazon DynamoDB has the idea of global tables, the implementation just provides eventual consistency, it doesn’t provide any other guarantees like Cosmos does. Also, the big drawback to Dynamo’s global table is that you have to have an empty table to add regions. That’s not the case with Cosmos. Let’s say you had California and Germany and after the fact, you thought, I really wish I could just now onboard my new site in Brazil. Well, you can just add it with a few clicks or one line of PowerShell and onboard your users in Brazil and that’s it. You don’t have to do any changes or preparations in any way to add regions to it. There is no vendor lock-in anymore because you can choose to use one of the open APIs. The proprietary API is the document/SQL-based API which is the one that was released when the service started with DocumentDB. As time has progressed, the Cassandra API has gone GA, announced at Ignite as well. There has been a MongoDb API available for some time as well. I believe they’re also working to enable Hbase as another API to come in to Cosmos Db. The idea is that if your application is built with the Cassandra API and you suddenly decide that you don’t want to use Cosmos for some reason, you can just build yourself a Cassandra cluster and bring it on-premises. And your application is going to work because you were already using that Cassandra driver. It’s not even like you have to use a proprietary driver for it to work. You use your Cassandra driver and you just point the connection to Cosmos DB. The beauty of this being offered as a service is the logistics and infrastructure complexity of providing multi-master databases across continents is prohibitive for the vast majority of organizations. Using Cosmos Db on the other hand, right now I can go and do it in the portal and it will take me 10 minutes to set it all up. Kubernetes virtual node preview and Kubernetes in Azure Stack There were also some Kubernetes announcements, such as a capability that they are calling a virtual node for Kubernetes. It is in preview right now and it gives you the ability to burst Kubernetes as necessary for your workload. So you can deploy a Kubernetes cluster and, if necessary, with this virtual node feature you can spin up extra nodes for it based on workload demand. We actually talked about it a few episodes ago in the Cloudscape when we talked about how Microsoft was deploying Azure Stack in more regions around the world. So there’s an announcement that the Kubernetes service is now part of Azure Stack. If you are using the regular Kubernetes service in the Azure Stack, well you could move it to pretty much anywhere. You could deploy those same apps in Kubernetes and AWS and Kubernetes and Google or in Kubernetes and Azure cloud. So you can remain cloud agnostic if this something valuable to you.   Listen to the full conversation to learn about more announcements we discussed. Be sure to subscribe to the podcast to be notified when a new episode has been released.

No Comments Yet

Let us know what you think

Subscribe by email