Microsoft Azure for Industry Podcast

Diving Deep on Azure Storage Solutions, Part 1

Episode Summary

In this first of a 2-part series, Vamshi Kommineni takes us through Azure Storage and Data Lake from the perspective of industry use cases and relevant capabilities. Not only do we learn about the different capabilities of these services, but we also cover discuss their use at hyper-scale and what that can mean for customers. Vamshi shares several customer stories that highlight the technology’s evolution and usage as well as highlighting some common usage patterns.

Episode Notes

In this first of a 2-part series, Vamshi Kommineni takes us through Azure Storage and Data Lake from the perspective of industry use cases and relevant capabilities. Not only do we learn about the different capabilities of these services, but we also cover discuss their use at hyper-scale and what that can mean for customers. Vamshi shares several customer stories that highlight the technology’s evolution and usage as well as highlighting some common usage patterns.

Transcript

Show Links

Azure Storage Solutions

Azure Files

Azure Blob Storage

Azure Data Lake

Azure Disk Storage

Azure Archive Storage

Guest

Vamshidhar Kommineni

Principal PM Manager for Azure Storage at Microsoft

Vamshi and his team is focus on Azure Storage, Data Lake, as well as some other technologies. He is responsible for business strategy and growth along with industry relationships like Media and Entertainment, Automotive, Healthcare, and more.

Find Vamshi on LinkedIn

Episode Transcription

David:

Welcome to the Azure for Industry podcast, we're your hosts David Starr and Paul Maher. In this podcast, you hear from thought leaders across various industries, discussing technology trends and innovation, sharing how Azure is helping transform business. You'll also hear directly from Microsoft thought leaders on how our products and services are meeting industries continually evolving needs.

 

David:

Vamshidhar Kommineni is principal program manager for Azure Storage at Microsoft. Vamshi and his team focus on Azure Storage and Data Lake. He's responsible for business strategy and growth along with industry relationships like media and entertainment, automotive, healthcare, and more industries in addition to that. Welcome to the show Vamshi, it's a pleasure to have you.

 

Vamshi:

Thank you, David. It's great to be here.

 

David:

And today, unfortunately, I'm missing my partner, Paul, who was always on the show with us, Paul Maher's unable to make it today. So it's just going to be you and I, but I think we're going to have a great conversation. So to start, I wonder if you could tell us a little bit about your team and the products that you make.

 

Vamshi:

Great. Thanks David. So the Azure Storage team provides the unstructured data platform for the Azure Cloud. Fundamentally, if you think about it in terms of sort of your on-premises storage, we provide all the different, the unstructured data services. We have block storage, that's attached to a virtual machines. We call it Azure disks and you get that in different flavors from hard disk back to SSD back all the way up to what we call ultra disks for mission-critical workloads. In relation to, your shared storage or file storage, we have a couple of different services. We have Azure Files, which provides SMB access to your shared storage and has a great connector for Windows file servers on-premises. We also have Azure NetApp files, which is a service that we shipped together with our partners NetApp, that allows us to have a great experience for our customers using shared storage and Azure, and particularly for NetApp customers.

 

Vamshi:

On the object storage side, we have Azure Blob Storage, which is sort of the canonical thing. When you talk about Cloud storage, you think about object storage and Azure Blob Storage is our object storage platform. Recently we've also extended Azure Blob Storage to have Data Lake capabilities. So it's a significantly enhanced now and supports, Hadoop workloads and other kinds of analytics queries against data stored in the Azure Blob Storage Data Lake.

 

David:

When you mentioned Hadoop, are you talking about that typical sort of lambda data processing pattern that we see?

 

Vamshi:

Absolutely. Yeah, I think it's sort of this notion and it kind of separated into a couple of different paths. One is you have data streaming in from lots of devices and things like that. And Blob Storage can function as a great data sync where you can take in data from millions of clients and store that. But then usually there's a post-processing step and while you hook it up to a Databricks cluster or run a MapReduce job, we're using the Hadoop cluster. And with the Data Lake storage capabilities, you can seamlessly mount that data as a Hadoop file system into one of these analytics clusters.

 

David:

That's great. And one of the things I don't want to pass over too quickly, because I think it gets brushed aside a little bit in conversation, but it's so helpful and handy is the file storage service. Because so many people in enterprises are used to click on the Z drive, right? And the Z drive is some shared file store on some server under an IT person's desk, right? And file storage is a similar service, but it's provided by Azure backed up. We have a single source and we can mount that through SMB mounts, right?

 

Vamshi:

Yeah, no, that's a great point, David. And I think, file servers are in particular can be a pinpoint for IT professionals because you host mission-critical data for your application. So let's say if you an architectural farm or something, you have all your CAD CAM files on there. And any downtime on those servers on those file servers is pretty painful for both the IT department and all of the internal folks that are dependent on those file shares. The reason why with both Azure Files and Azure NetApp Files. We've taken this notion of a managed pass service and sort of melded the convenience and compatibility of a file share with the power and flexibility of a fully managed file service. So when you use Azure Files, what you're getting is actually a kind of a server-less managed file service, if you will. There's no one, Windows file server, that's serving up your Z drive anymore.

 

Vamshi:

It is the entire distributed Azure storage platform. So you have no single points of failure and you get all the scale and pay-you-go flexibility and all of that with Azure Files. But on the other side, it just looks like an SMB share that you can net use to and used on your server machines or on your newer client machines. And as you mentioned, other things kind of get very easy as well. Backup becomes, in just a couple of clicks and you're able to get snapshots and snapshot management on all of the life-cycle management for that. So you can do previous version restores and things to that effect. And further with Azure Files in particular, we have a connector for on-prem Windows file servers for Azure File Sync that allows you to take your Windows file server and converted into an on-premises cache for your file data that's now in the Cloud. So you can get local branch office, have local branch offices without great Internet access, have great file performance through that connector.

 

Vamshi:

So, yeah, you're absolutely right. They're sort of super-important not to forget about and make sure customers and partners have a great path, while migrating applications into the Cloud.

 

David:

And as long as we're on general Azure Storage, I have to mention something. You said a Blob is sort of canonical, unstructured storage. And Blob Storage is so handy, because each Blob that is present is addressable via URL. And people are doing some very interesting things with this. I was doing this myself the other day, hosting a single page application inside of a Blobstore and it serves up the application just fine. So there are a lot of inventive ways people have started using Blob. You were talking about very large volume and data processing and that's great stuff too, obviously, but there's some inventive ways people are using Blobstores. So that's interesting.

 

Vamshi:

Yeah, definitely David. I think, even for sort of hobby developers and sort of folks like us who love to tinker with stuff, Blob Storage can give you storage for pennies and the gigabyte and really give you a flexible, powerful platform. And it's used in a number of different app dev frameworks. It can be used as this hosting data to host data behind a WordPress site. It's really quite flexible and scalable, not just for the, as you say, the enterprise scenarios, but also sort of the day-to-day scenarios. And it really is flexible and capable in that respect.

 

David:

Well, we've talked about a few use cases here, but I'm very curious how we use these tools at Microsoft ourselves. What kind of scale are we supporting internally? Because Microsoft's a fairly large company. We use our own products, of course. So what do you see internally done with Azure Storage?

 

Vamshi:

Yeah, I think that's a great question, David. And really, Azure Storage was one of the first services that we built as part of the Azure platform. It's now into its second decade, believe it or not. And the first preview versions of Azure Storage were launched in 2008, and the GA service did GA of Azure in 2010 included Azure Storage.

 

David:

That's fascinating to think it's been around that long. It takes a [crosstalk 00:09:37].

 

Vamshi:

Yeah, absolutely. I think we're sort of used to thinking of Cloud as a recent phenomenon, but the development of Azure goes back to around 2007, when a small group of us started building and experimenting and building the platform here at Microsoft. So, over those years, one of the strong tenets for us has always been the following, which is, we can't really ask our customers to trust their mission-critical data to us, if we don't host our own mission-critical data as Microsoft, on our own storage platform. We called out in some sense, the canonical term, or the old term for this is dog-fooding, right? Is, do you eat your own dog food? And do you host all of your mission-critical data on the platform? And we have been for nearly that entire time over the last decade to the point where if you look at any Microsoft service today, it's probably running on top of Azure Storage, either directly or indirectly for any kind of data persistence needs.

 

Vamshi:

I'll give you a couple of examples of this. And just to kind of set a sense of the scale. Let's start with sort of enterprise data, right? What do you think of when you think about enterprise information worker data? You think about Office 365, SharePoint Online, Teams, OneDrive for Business. All of those services are hosted and stored their data in Azure Blob Storage. So when you, when you store a document into SharePoint Online, or you collaborate on a document, what's actually happening is that data ultimately ends up on as Blobs and Blob Storage, and there's exabytes of data from our Office 365 customers across tens of thousands of companies all hosted at very high scale inside the Azure platform and Azure Blob Storage in particular. You look at Teams and as we've all been working remotely for the last few months, the usage of Teams for sort of video calls and recordings and all of that has just exploded. And all of that data is actually stored in Blob Storage as well. The transcoding happens in Azure Media Services, for those recorded calls and videos and the hosting is done on a Blob Storage.

 

Vamshi:

Another good example, kind of transitioning over to the consumer side of the services that Microsoft runs, all Xbox, there's a whole bunch of Xbox services that rely on Azure Blob Storage for their persistence layer. So if you game, or you have kids who game and they're using Cloud saved games that stored on Azure Blob Storage, any kind of game downloads or game assets that are stored, those are all kind of come off of a Blob Storage. And another good example in that world, is how you have kids and they're into Minecraft. There's a lot of Azure-related functionality that Minecraft depends on. So there's sort of this huge gamut from our enterprise services all the way into our consumer services that are reliant on these unstructured data services that the Azure Storage steamships.

 

David:

Well, you said exabytes, which is not a term most people use in their day-to-day jobs. So that's obviously huge scale. And since the services have been around so long, I guess it just makes sense that we've taken more and more dependencies on them over time. How about Data Lake?

 

Vamshi:

Yeah, it's a good question. Data Lake capabilities on the platform are a more recent thing. The GA of the Data Lake capability on top of Blob Storage was February of 2019. And even in that realm, I think we have a bunch of different internal services using Data Lake capabilities in order to run business analytics, post-mortem analytics on clickstream data, view data, et cetera, to kind of build out that picture. So we'll talk a little bit more I think later about external industry examples and the like, and even there, we've seen great adoption of the Data Lake capabilities, because this notion of having huge amounts of data petabytes or tens of petabytes of call it data exhaust, if you will, that firms are collecting and then analyzing and trying to make and build insights out of, that's a place that the scale of object storage with the cost efficiency, connected that up with the analytics capabilities, and you've got a really, really good option for data usage.

 

David:

Data exhaust is a great term. I haven't heard that before. Does that get to the idea that a lot of organizations just have this philosophy of let's store everything, and then we'll come back and search for patterns, whereas others are looking to make some intelligent decisions about what data to store ahead of time?

 

Vamshi:

Yeah, no, that's a great point. And this has evolved over time. If you look over a couple of decades in the storage industry, historically storage was very expensive and measured in how much it costs you to store a petabyte of data. So, the trend was to always aggregate and keep, build these data warehouses and cubes and sort of have very structured data storage, right? And that's the history that service well for a long time and things, products like SQL server and the like, and data warehouses in various [inaudible 00:15:56]. But as what's happened over the last decade, as Cloud storage technology has really sort of driven the cost of Cloud storage down, is you have this notion where you can actually start to say, I want to keep everything or almost everything, right?

 

Vamshi:

But the challenge becomes one now of not, can I store this, or do I need to aggregate it, but how do you make sense of that data, right? How do you draw insights from petabytes and petabytes of data that no human can ever really look through? And that's where I think, you've had this revolution of analytics technologies, starting sort of with the web search engines and MapReduce jobs and things. But now you know you have technologies like Databricks, Azure's own Synapse Analytics that help you sort of bring these very, very disparate, very unstructured data sources together and run queries against them, deploy machine learning algorithms against them. So you really have this kind of paradigm shift that was enabled by these underlying enabling technologies that really kind of has changed the dynamic on that.

 

Vamshi:

So I think, for us as well in that focus, we clearly see that where, across a bunch of different industries, customers now come to us and say, hey, we want to store every piece of data we ever generate or will ever generate. Do you have a system that scales for me? Do you have a system that allows me to extract insights? And finally, do you have a way to do that cost efficiently? Because like everything else in life, our budgets are not infinite. And we have to operate within those things. And that's the sort of essential set of challenges that we look at nowadays in the Azure Storage platform.

 

David:

And now let's take a moment out to listen to this very important message.

 

Speaker 3:

Did you know, the Microsoft Commercial Marketplace allows you to find and purchase leading Microsoft certified solutions for Microsoft partners? The Microsoft Commercial Marketplace includes Microsoft AppSource and Azure Marketplace. Each storefront serves unique customer requirements and different target audiences. So publishers can ensure solutions are available to the right customers. For applications that integrate with Microsoft 365 products, visit appsource.microsoft.com, get solutions tailored to your industry that work with the products that you already use. For B2B Azure-based solutions, visit azuremarketplace.microsoft.com. Here, you can discover, try and deploy the Cloud software solutions you want.

 

David:

One of the things of course that happens at Microsoft is that our product development is often driven by developer requests. Developers are the heart and soul of using our products, of course, but what current or pending capabilities of Azure Storage or Data Lake are the result of industry requests that you might be getting?

 

Vamshi:

There's definitely a great, one of the really fun parts about working on Cloud technologies is you have this very tight loop between your customers, your partners, your developers, and ourselves, as the folks building the platform. It's actually really fun, because you can build things in a much more agile way. You don't have multiyear product cycles. You can learn about what customers are trying to do on the platform and go build those capabilities. So I'll take a few examples and kind of talk about them in different industries and at a high level if you look at, let's say healthcare, right? The notion of hosting regulated data with HIPAA compliance and other compliance that's a geo-locale specific, has been an interesting journey for us. Even something as simple as ingesting data from a hospital or a medical software partner, where you have to have these very complex chains of custody of just be able to prove data provenance, prove the data's not being tampered with, these regulated industries have a lot of these challenges, and it's not just healthcare, it's financial services, there's pharmaceutical industries.

 

Vamshi:

So that's driven a lot of good functionality over the years, both in our data box product, that allows us to do offline ingestion, as well as the kinds of data verification. And they know what kind of hashes we store against data. And as all the way into launching a couple of years ago, warm storage for object storage. So warm capabilities on top of object storage across all tiers of object storage is not something we sort of, started with, but it was something that our financial services customers made very clear that they needed something that had those same capabilities that they use in our premises. So that's a good example. Another example is with the retail and media, when you're talking about serving data to customers, historically we'd do this off of our object storage platform.

 

Vamshi:

And one of the things that popped up with these customers is, hey, we're trying to use Blob Storage as the backing storage for a user-visible action. But when we have a cache miss from the CDN and come back to object storage, the time to first bite is super-critical. And that's not something objects or systems do well by default because they're tend to be sort of in systems built for the aggregate throughput rather than sort of getting one piece of data very, very quickly. That led us to build what today is Azure Premium Blob Storage. And we have some leading retailers that serve all of their product catalog images off of those systems. And that's sort of a very interesting thing. Or we have media customers who do 4K video editing, again, off Azure Premium Blob Storage.

 

Vamshi:

And these are use cases that these were very directly supplied by customers in these industries. Another area I'll touch on is high performance computing, right? That's probably the move of high performance computing into the Cloud across different industries as probably done a lot to drive the capabilities of our platform over the years. For example, when we started out, we didn't really think of core object performance, right? We always thought about scale-out as opposed to scale-up. And that's a great place where, if you look at oil and gas, autonomous driving, financial services, all of these industries had high performance computing needs that really required us to make some deep investments in scale of the object storage system.

 

David:

Those are great observations. And I really want to thank you for this first of two discussions on Azure Storage, Vamshi. There is going to be a lot more available in part two. The second part of this interview will be available shortly listeners. So watch your feed. And in our next episode, we're going to drill down even deeper on how industry demand is moving the state of art forward for data storage services. Vamshi, thank you so much for joining me here on the Azure For Industry podcast.

 

Vamshi:

Thank you, David. It was great to be here.

 

David:

Thank you for joining us for this episode of the Azure For Industry podcast. The show that explores how industry experts are transforming our world with Azure. For show topic recommendations, or other feedback, reach out to us at industrypodcastatmicrosoft.com.