Azure Stack HCI in 2024

Azure Stack HCI in 2024

What is Azure Stack HCI?

Seems a trite question that should have a straightforward answer, but the reality is that the answer would vary massively depending on when you asked the question.

For the sake of sanity I’m not going to delve too far back into the history of the Azure Stack HCI name though, I’m going to start with the second to latest iteration.

In October 2020, Microsoft launched Azure Stack HCI version 20H2. From the release notes at the time:

Azure Stack HCI is a hyperconverged infrastructure (HCI) cluster solution that hosts virtualized Windows and Linux workloads and their storage in a hybrid on-premises environment. Azure hybrid services enhance the cluster with capabilities such as cloud-based monitoring, Site Recovery, and VM backups. These services also provide a central view of all your Azure Stack HCI deployments in the Azure portal. You can manage the cluster with your existing tools including Windows Admin Center, System Center, and PowerShell.

This can be broadly represented with a diagram like the below, albeit without touching on the management experience with WAC, VMM, and Azure.

In this iteration of Azure Stack HCI, the solution that you purchased and deployed was an Operating System which connected to Azure to provide a hybrid cloud experience. It’s been a brilliant solution, and one which has delivered great outcomes to our partners and customers for the last 3+ years.

One of the core reasons for the creation of the Azure Stack HCI OS as its own specialized operating system was to be able to deliver a faster update cadence - in the old Windows Server world, major releases are every three years. 2016, 2019, 2022, 2025… With Azure Stack HCI OS, Microsoft were able to move to a more rapid cadence of 2+ updates per year, greater than 6x the release velocity.

In this guise, there were two paths that an OEM could take to deliver a qualified and supported solution to their customers - these two program categories were the Validated Node category and the Integrated System category.

The requirements an OEM had to adhere to in order to qualify and sell each of these were clearly defined, and then tabularized in the below table.

A Validated Node had a requirement to be validated once, and then have hardware support provided for five years.

An Integrated System had a requirement to be validated at the time of each major release, supported as a solution for 5 years, including the HCI software, required a level of deployment automation, support integration, and factory installation of the operating system. This is how Azure Stack HCI operated from 20H2 to 22H2.

After three years in the market, I think we’d all hope that a solution would evolve into its next genesis, and so in February 2024, Azure Stack HCI 23H2 was launched. Given that its naming was incremental from 22H2, you’d be forgiven for thinking that it was more of the same, but the reality is that what Azure Stack HCI now is fundamentally changed on that day.

The simplest way I can describe the evolution of Azure Stack HCI from 22H2 to 23H2, is that in 22H2 it was a Cloud-Connected Operating System, while in 23H2 it’s a Hybrid Cloud Solution, of which one constituent part is the operating system.

The below diagram is again not exhaustive, but provides a fair representation of what an Azure Stack HCI 23H2 deployment now looks like in comparison to the Azure Stack HCI of yester-year.

The first thing to note is that there are a bunch of new components deployed with every Azure Stack HCI 23H2 system. There’s a VM that’s created at deployment time called the Arc Resource Bridge (ARB). At its simplest, the ARB is a Linux VM which contains a whole bunch of Microsoft services which provide deep integration up-stack into Azure, which help manage the new Lifecycle Management Engine, and which provide the core management components for the on-premises Azure Kubernetes Service. The ARB is an appliance, you can’t login to it, it’s there to deliver a series of outcomes to better your Azure hybrid experience and capabilities.

The next item to note is the addition of a series of Azure Arc extensions with every deployment. In the screenshot below you’ll see that the Azure Edge Device Management, Telemetry and Diagnostics, and Remote Support extensions are installed. These Arc Extensions provide core functionality to the Azure Stack HCI solution, delivering lifecycle and supportability features.

You’ll notice in the screenshot above that there’s also an ‘Add Capabilities’ button. If I click on this, it’ll take me to a suite of add-on functionality that I can enable for my Azure Stack HCI solution. In the screenshot below, note that both Windows Admin Center and Disaster Recovery are listed as Extensions.

If I install the Windows Admin Center extension, then guess what happens? A new Arc Extension is deployed to my cluster, providing Windows Admin Center in Azure connectivity to my cluster.

This for me is everything that exemplifies and demonstrates the core directional evolution of Azure Stack HCI as a solution. If we revert back to Azure Stack HCI 20H2-22H2, its core purpose was to deliver operating system features at a faster cadence than had been possible previously with Windows Server.

Azure Stack HCI 23H2 takes this a leap step further - instead of focusing solely on Operating System updates, it’s concerned with how to deliver new Azure Stack HCI features and new hybrid capabilities at an even more rapid cadence. If you’re familiar with how Azure Arc extensions work, you’ll recognize that by using this mechanism to deliver Azure Stack HCI features, the ability to release, update, break/fix, and generally lifecycle these cloud capabilities is dramatically accelerated.

This then in turn leads naturally to two additional major changes to Azure Stack HCI as a solution - #1, the pace at which updates are released, and #2, how said updates are validated end to end.

Azure Stack HCI 23H2 follows a new release cadence based on the concept of Release Trains

A Release Train consists of is an Operating System Baseline, plus the next six months of its cumulative updates, plus Azure Stack HCI feature updates (via ARB and Arc), plus OEM updates via the Solution Builder Extension (SBE)

The below diagram from Microsoft Learn provides a representation of the Release Train lifecycle.

The first thing to note is that there will always be two release trains running and supported concurrently.

Let’s take the example of a fresh deployment which has the 2311.0 OS Baseline installed in November. In January, the 2311.2 update is released containing OS Cumulative Updates, plus Azure Stack HCI feature updates. Before I update my system, I want assurance that these updates have been tested and validated end to end - we’ll come back to that at the end of this blog. Assuming I’m comfortable with updating, I update to 2311.2, and all goes smoothly.

The next month, the 2311.3 update is released, and at the same time the 2402 OS Baseline is released. The immediate assumption might be that 2402 is newer, better, faster, stronger, but remember that the Azure Stack HCI features are decoupled from the OS updates. What this means in practice, is that whether I am running the 2311.3 update on the 2311 Release Train, or the 2402.0 update on the new 2402 Release Train, the Azure Stack HCI features I’m consuming are the same. So I update to 2311.3, and all works great.

From here, I have the option to update to the 2402 Release Train, or to remain on the 2311 Release Train for another two months. Why might I want to remain on the 2311 Release Train? Typically for stability reasons. In general we find that customers will prefer to wait for 1-2 CU cycles before jumping between major OS updates. The beauty of Azure Stack HCI in its new form, is that you don’t need to compromise on the Azure Stack HCI features you’re receiving by remaining on the prior Release Train. This is a tremendous benefit.

So I continue on through the 2311 Release Train until 2311.5 is released, at which point I decide to move Release Train. I update from 2311.4 to 2402.1, and then from there to 2402.2.

Through all of this I have had the choice as to when to move operating system release, and I haven’t had to compromise on my Azure Stack HCI feature experience. This is a complete shift in how Azure Stack HCI has operated up to this point, and it’s really worth understanding that running on either of the currently supported Release Trains is designed provide you a consistent Azure Stack HCI feature experience.

So then with all of this new functionality, all of these new added components, this new more rapid update cadence, all of these new release vehicles, how can you be confident that everything in this full new stack has been rigorously tested and qualified? This is where in the same timeframe as the launch of Azure Stack HCI 23H2, a new Azure Stack HCI solution category was announced and launched.

I would argue that the addition of the Premier Solution category isn’t an incremental or optional addition to the HCI Catalog, instead it represents a necessary leap forward in how we now approach Azure Stack HCI solution qualification and ongoing testing.

Returning to the comparison table between Validated and Integrated, and now adding in the Premier Solution category, there are two line items that I want to draw attention to in relation to the content of this blog specifically.

1 - Solution Testing Requirement - Continuously - Full Stack
2 - Full Stack Validation by Microsoft in their own labs

Each of these new requirements for the Premier Solution category is designed to be able to provide confidence and assurance around the test and qualification process of Azure Stack HCI updates that has not been possible in previous product categories. Having a more complex (and exciting!) solution that’s way more than just an operating system requires a whole new way of providing continuous validation and testing. To that end, Premier Solutions require deep engineering integration between OEM and Microsoft through a CI/CD testing pipeline, which provides that end to end assurance from disk to cloud and everything in between that when you click ‘Go’ on an update, you can have end to end confidence that you’ll have a good outcome.

While I work for Dell, this blog has been written from a more generic point of view covering the baseline requirements of Azure Stack HCI 23H2, and the bare minimum entry requirements to deliver a Premier Solution. Future blogs will delve deeper into additional value that Dell brings over and above these base requirements, but for the time being I thought it important to take a more generic approach, to show how Azure Stack HCI itself has evolved from Operating System to Solution, and how we as OEMs, Partners, and Customers all need to adapt our practices and assumptions around the solution to be able to provide the best outcomes on it.