Pythian Blog: Technical Track

News and updates from Microsoft Azure

I recently joined Chris Presley for his podcast, Cloudscape , to talk about what’s happening in the world of cloud-related matters. I shared the most recent events surrounding Microsoft Azure. Topics of discussion included:
  • Azure Data Factory copy tool updates
  • Microsoft Azure Stack expands availability to 92 countries
  • Azure Backup for SQL Server in preview
  • User defined restore points for SQL DW
  • Traffic Analytics is now GA
  • Azure Standard SSD Managed Disks in preview
  • Disaster Recovery (DR) for Azure Virtual Machines (VMs) using Azure Site Recovery (ASR) is now GA
Azure Data Factory copy tool updates Microsoft has been on a tear lately with Azure Data Factory. I think it’s because the previous generation of Azure Data Factory, V1, was a weakness in the Azure data story. So now they have V2 rolling out, but they continue to bring out updates for it every month. (Readers can check out past episodes of Cloudscape, where we’ve talked about ADFV2 quite a bit). This month, they’ve improved the copy tool quite a bit. Often the first part of extract, transform, load (ETL) is to just take the data out, and it can become very mechanical to create those steps over and over, especially if you have small changes because you have to do those steps manually. It basically becomes just like skeleton moves. That’s where the copy tool comes into place. It is basically a nice interface tool to set up everything and then you just press “generate” and it generates the pipelines for all the tables. It can also generate the pipelines for all the different sources. It supports 70+ sources, all the ones that are supported by ADF. This includes SaaS offerings such as Salesforce, Marketo, etc. You can say, “Apply to all the tables in this database,” and it automatically goes out and creates all the pipelines for you. It’s just a way to get your first part of the ELT that you’re building finished very quickly and it is a lot more scalable in terms of effort. Microsoft Azure Stack expands availability to 92 countries The big news for Azure Stack is that you can buy a configurable appliance that can be scaled vertically and horizontally. You can put it in your data center and it is as if you had Azure on your own premises. There are many reasons why you would want to do that. If you want to do a true hybrid or transparent implementation of Azure, for example, you can have your private cloud running Azure Stack appliance and select specific workloads to move to the public cloud. This allows you to take advantage of much more flexibility and elasticity when needed in the public cloud while keeping your sensitive workloads on-premises and using the same platform. Up until now, Azure Stack has been available in about 49 countries. They have announced that starting last month, they are going to roll it out to 92 countries. The big deal here is that instead of just staying there in those 49 countries waiting to see what the market says about Azure Stack, they’re increasing their distribution reach and they are able to put these appliances in the hands of more people around the world. I assume that it must be doing okay in the market to be able to justify increasing the investment. I have spoken to some people that do sales for Azure Stack, and they say that some companies see Azure Stack as a differentiator, something that none of the other cloud providers are offering right now. For people like us that are immersed in the cloud world, it seems a bit unusual that people will be interested in a local cloud appliance. What we need to consider is that for some businesses, they need to answer: how do I roll back from the cloud? How do I roll back from the cloud in a way that it’s really easy to do? These businesses see Azure Stack as a way to mitigate risk. Let’s say the regulatory environment changes and suddenly they can’t run in the public cloud anymore. How do they move back on-premises in the least disruptive way? Azure Stack pretty much IS that way, because it also supports all the same APIs. Azure backup and restore points There are a couple of backup and recovery updates this month for Azure. The first one that we have is the new tight integration between the Microsoft managed backup service and SQL Server. Thinking back, it was interesting that the backup services came out without a SQL Server integration, considering it is Microsoft’s flagship database. I guess it was just a matter of time. So now you just have to be able to point the backup service to your SQL Server VM or if you give it access to your subscription, the backup service can do discovery for you and see which VMs are running SQL Server. At that point, it’s very much just policy-based backup and recovery. You can pick the schedule and you can pick the type of backups you are doing. You can select how long you want to keep them for and you can do long-term retention. The nice thing about it is that it is not just a backup service, it also has baked-in restore capabilities. So, through either PowerShell code or with the portal interface, you can select the backup of this database, restore it to this point in time and the backup service figures out the sequence for you. It is not just sucking data and dumping it somewhere. It actually has logic that understands what it’s backing up and it can restore you to a specific point in time. I think this is a big deal because a lot of people have no clue how to restore. The other point was the user-defined snapshots for Azure SQL Data Warehouse, interestingly, because a lot of people don’t really think a lot about backing up data warehouses. From the very beginning, Azure SQL Data Warehouse promised to do a daily snapshot and it has continued to improve to the point of an 8 hour RPO over 7 days at the moment. So you can always recover and at the most, you will have 8 hours of data loss if something happens or somebody really screws up an ETL and actually destroys some data. Now what they’ve added is the ability to do user-defined snapshots. Let’s say you are doing some sort of bigger operation on your data; you can actually create a snapshot whenever you want and it will be kept in that system for a week. At any point, you can revert or restore that snapshot as a new data warehouse. This is more for that type of situation where you’re doing a big refactoring or you are doing a big ETL job and you want to have the security of, “Oh, we screwed something up and we need to go back to that snapshot.” Then you can do that. Or let’s say you are doing some sort of big report, you can take the snapshot, continue with your data warehousing operations, and restore that snapshot. If you needed that report for that exact moment in time, you can run that report and workload against that snapshot and then destroy it. And if you don’t destroy it, it will be automatically be destroyed after seven days. Disaster Recovery (DR) for Azure Virtual Machines (VMs) using Azure Site Recovery (ASR) is now GA For somebody not familiar with it, Azure Site Recovery is a VM replication service that has been marketed as a way to have DR on the cloud for your on-premises workloads. It can support Hyper-V and VMware machines that you have on-prem and you can replicate them to an Azure region. Now you can do the same thing but for Azure VMs themselves. You can just select, for example, my East US VM fleet and I can now use Azure site recovery for disaster recovery to West US, for example, and then side recovery handles all the replication under the covers for your VMs. For people who haven’t tried it out and are curious about it, Site Recovery has a really cool agent that you can run on-premises and it will tell you the VMs that you should run on Azure and the type and size of storage that you are going to need. It also gives you a cost estimate. It is a very interesting service and now, the big announcement is just expansion to making it compatible with Azure VMs. Traffic Analytics is now GA Traffic Analytics is all about being able to analyze and visualize your network traffic in and out of your Azure subscription, as well as inside your Azure subscription. You can use it for performance reasons, for example, to identify hot spots in your topology where you are maxing out on network bandwidth. You can also use it as a security tool because they can give you a flow diagram of where data is moving in and out of your network perimeter. Let’s say you find out that suddenly you are moving data from subnet A and subnet B and that is not supposed to happen. This will allow you to find data leaks in your configuration. And it is all done through the analytics service for you. Maybe you are running a high-performance link in a subnet that is not really being utilized or you’re actually maxing out on another link and another subnet. Then that is the one you need to scale up or down. Up until now, finding some of these things was more or less a lot of processing metrics - knowing what to look for and manually checking configurations. Traffic analytics comes in to basically change all of these into a central management dashboard and makes it a lot easier to spot these issues, which are very important nowadays. Azure Standard SSD Managed Disks in preview When Azure first came out with their regular standard disks, they were all hard disk-based. It is what we would call a low IOPS, high latency device. Another problem is that there was an unpredictable latency of the standard hard disk drive discs. This is a big deal for IO-sensitive workloads like databases. Microsoft said, “Well, obviously we understand these are too slow, so we are going to come out with premium storage SSDs.” That’s how Premium storage SSDs came to be and they are what we run most of our clients’ database workloads on. They are low latency, very predictable, high IOPS storage. The problem was that they didn’t really have a solution in the middle. Amazon, for example, has the general purpose SSDs which are a step up from hard disk drives, but they are a lot cheaper than provisioned IOPS SSDs. Now the standard SSD disks come in as that middle ground. They are SSD-based and they have more or less similar IOPs and throughput as the regular hard disk drives, but they have lower latency. Most importantly, they have predictable performance. They’re not to be used for workloads that need high-speed IO, but I think it is going to be the new standard for pretty much all VMs. People are going to stop using the standard blob storage disks. They are going to use it only for backup or for really low-priority files and things like that. Most production non-IO intensive VMs, running a web server for example, are all just going to run on standard SSD from now on. This was a summary of the Microsoft Azure topics we discussed during the podcast, Chris also welcomed John Laham (Google Cloud Platform), and Greg Baker (Amazon Web Services) who discussed topics related to their expertise. Listen to the full conversation and be sure to subscribe to the podcast to be notified when a new episode has been released.

No Comments Yet

Let us know what you think

Subscribe by email