Managing virtual desktop infrastructure in Azure can be challenging, particularly regarding monitoring and analyzing performance metrics. With so many dashboards and data sources to navigate, IT administrators might need help knowing where to start when accessing relevant performance metrics for Azure Virtual Desktop. ControlUp provides the necessary visibility with a simple, comprehensive IT management solution offering a single glass pane for monitoring and troubleshooting Azure Virtual Desktop environments.
It does this by aggregating data from a variety of sources, including the machines, from Azure, and from Azure Virtual Desktop to provide a holistic view of the environment. After collecting and analyzing data from endpoints, the network, applications, and infrastructure components, ControlUp leverages real-time metrics like CPU usage, memory consumption, disk utilization, and network latency to provide detailed insights into the performance and health of Azure Virtual Desktop environments.
Starting with our latest version of ControlUp Real-Time DX, we’ve integrated Azure Virtual Desktop service monitoring, expanded Azure cost metrics and added FSLogix monitoring to give you a complete picture of your Azure Virtual Desktops. Read on to learn all about the new Azure Virtual Desktop integration and what it can offer you in your environment.
Breaking down your Azure costs can be complicated. With over 100 Azure cost metrics tracked in our user interface, Solve, we’ve made it our goal to try and bring as much visibility and ease of digesting cost metrics into ControlUp as possible.
Higher-level costing metrics are summed in the subscription, and lower-level costing metrics for resource groups. Getting a clear picture of these costs’ relationship is trivial in ControlUp.
When focused on the subscription level (Figure 1), ControlUp shows the costing metrics for each subscription. This eases managing the costs of multiple subscriptions by giving you visibility to the costs for each in a classic spreadsheet-like way for easy comparisons and sorting. The summary widget in the middle gives you the total costs of all subscriptions.
Figure 1: Subscription level costing metrics.
With ControlUp Real-time DX 8.8, we’ve added more costing metrics at the resource group level (Figure 2). Now each resource group has the easily digestible and sortable “grid” view to understand what consuming costs are for your resource groups. This makes seeing unexpected costs trivial and gives you the information to drill down quickly.
Did you forget to disable boot diagnostics? AZ Storage Account Costs having a value will quickly bring to light which resource groups have machines enabled.
Did someone need to work on a machine that enabled Bastion but forgot to disable it afterward? AZ Bastion Cost metric will bring it to light quickly!
Figure 2: Resource Group level costing metrics.
The new displays added to Solve for the Azure Virtual Desktop integration makes it a snap to understand the relationships between your Azure Virtual Desktop resources. The top 3rd image in Figure 3 shows the topology bar, followed by the summary widget bar and the grid view of all the monitored resources.
Figure 3: Azure Virtual Desktop in Solve.
Azure Virtual Desktop has some logical constructs, shown in an easy-to-digest relationship layout provided by ControlUp Solve. Only those resources relevant to that area are presented when focusing on an area of interest. In Figure 4, I’ve selected a specific Azure Virtual Desktop Workspace to focus on (Workspace-EUS), and Solve shows only those resources related to that workspace. ControlUp’s ‘context and focus’ approach makes seeing only the relevant resources and drilling down into different areas straightforward.
From left to right we can see the following resources apply to the AVD Workspace I’m focusing on (Workspace-EUS):
Figure 4: Azure Virtual Desktop topology focused on Azure Virtual Desktop Workspace.
All resources from the real-time grid get some metrics aggregated and summarized in the widget bar, giving you a quick overview of the health of these monitored resources. At the bottom is the real-time grid displaying metrics for the area of focus you have selected.
When designing this display, it became apparent that understanding the resources related to the workspace was paramount. Something as simple as finding out how many users are connected to all the machines in a workspace is more complex when using the Azure Portal.
Figure 5: Summary and real-time grid view.
If you only used the Azure Portal to get a count of all user sessions in a Workspace, you will click about a dozen times or so, even more, if you have numerous host pools. To illustrate gathering this simple metric without ControlUp, in the Azure Portal, you’d have to navigate to the Azure Virtual Desktop display, select the “Workspaces” blade, select your workspace, select “Application Groups”, select your first application group, select the “Host pool” and then note the number of “Total sessions.” To go back, you must click the “X” twice to navigate to the Workspaces display. This is seven total clicks for one metric for one host pool within one Application group within one workspace. If you have multiple host pools, this will get exponentially larger.
ControlUp does this all for you automatically, giving you the tools to understand and keep a pulse on the performance of your AVD environment.
Figure 6: Pulling metrics from related and child resources.
Within a Workspace, you have Application Groups. These logical groupings can present desktops or applications to your users. ControlUp gathers this information and produces a report showing the status of your Application Groups. Is the application group publishing a desktop or an application? How many applications? Are there any MSIX AppAttach applications? All of these questions and more are answered with ease with ControlUp.
Figure 7: Azure Virtual Desktop Application Groups.
When working with customers, we found a limit within Azure Virtual Desktop that necessitated understanding. This limit is brought to the forefront with a new metric called “AVD % Application Group Service Limit,” which is explained upon hovering over the metric. (Figure 8).
Figure 8: ControlUp shows what percentage of published applications was consumed for the Application Group limit of 500 published apps.
People often create or make system changes, and nobody knows who and why, which can drive people batty! Luckily, this data is stored in Azure. ControlUp pulls this data and displays it in the Application Groups view to identify who created or modified it and when. With these two pieces of information, you can put together the puzzle of why something is working differently and troubleshoot accordingly.
Figure 9: Identifying who created or made changes to an application or desktop.
Are you in healthcare or another environment with published applications? Gaining insights into all the published applications in your environment in one view might seem like a dream, and I’m happy to report that ControlUp is here to deliver. If you have multiple applications published across multiple application groups assigned to various Host Pools, you can go mad trying to fish your target application out!
The ControlUp Remote Apps view displays all your published resources in the easily digestible, sortable, and filterable grid view. ControlUp shows whether applications are published command line applications or MSIX AppAttach applications and their properties.
Figure 10: Azure Virtual Desktop Remote Apps.
In the grid view (Figure 11), you can quickly and easily identify which Subscription, Host Pool, and Resource Group an application is a part of.
Figure 11: Azure Virtual Desktop Remote Apps in the grid view for quickly identifying where the application resides.
Host Pools are an integral part of the Azure Virtual Desktop architecture. They are a logical construct grouping virtual machines that reside in Azure so that you can manage them as a whole. Since the machines share similar properties, understanding their current state and the differences between Host Pools can help you know their performance or limitations.
Figure 12: Host Pools shown in focus.
The ControlUp grid view keeps a near real-time display of the properties of your host pools (Figures 13-15). We bring the Azure Virtual Desktop Compute Cost of the machines within a host pool front and center to provide a clear understanding of costs. The number of available machines, the number of user sessions, and the percentage of machines available to a host pool are all metrics tracked and displayed in a digestible manner.
Figure 13: Various properties within the different host pools.
Figure 14: More properties tracked within the host pools.
Figure 15:Even more properties tracked within the host pools.
Is the Host Pool Load Balancing algorithm set correctly? Is Start VM on Connect configured, and for which host pools? Out of the many machines in a host pool, how many machines have a user session on them?
All of these questions and more are easily answered with ControlUp Solve. With ControlUp automation, it’s simple to ensure consistency within your environment. Should all Azure region “eastus” host pools have Start VM on Connect enabled? ControlUp Automation can make this happen in a snap!
ControlUp has tracked Azure-specific metrics, including power state and others, for a while. Now we’ve included many Azure Virtual Desktop-specific metrics, making solving common Azure Virtual Desktop issues much faster and easier.
The summary widget bar (Figure 16) aggregates the resources within focus to give you a quick overview of important metrics. The grid view shows you the real-time performance of your Azure Virtual Desktop machines.
Figure 16: Machines view of Azure Virtual Desktop Machines.
With the ControlUp grid view, the ease of understanding if your environment is in-spec, out of spec, or what’s different between them becomes clear. Gathering this information with a Powershell script or via the Azure Portal could take days in a small or medium environment and even longer in a large enterprise environment. ControlUp can show Azure Virtual Desktop-specific metrics up-to-the-minute!
Within Figure 17, I can quickly identify issues within my environment. The Azure Virtual Desktop Agent version appears out of date for several machines. I can also see when the Azure Virtual Desktop Agent status was last updated and if the machine is in Drain mode or what Azure saw for its health checks on the machines.
Figure 17: View of more Azure Virtual Desktop specific metrics for your machines.
Drain mode is interesting because unused Azure Virtual Desktop machines cost you money. Imagine you want to do Windows Updates or upgrade an application on the machines. Your standard procedure is to enable drain mode, wait for the user sessions to go to 0, do your work, and disable drain mode.
Figure 18: Sorting by Azure Virtual Desktop Drain Mode.
I’ve worked in environments where this work is done across a couple hundred machines, and sometimes removing drain mode gets missed as the last step. Beyond the immediate visibility within the grid view to highlight which machines are misconfigured, you can enable ControlUp Automation to either execute the Drain Mode toggle at a specific time, perhaps at the end of your change window with a scheduled trigger or ControlUp Automation can execute after some time has elapsed after the property has changed.
For instance, to maximize availability, you can have a requirement that work requiring drain mode should be completed within a couple of hours. To ensure that availability will be there, you can create a Trigger within ControlUp that will execute a toggle to remove Drain Mode when it detects it’s been set to “Enabled,” but only after 3 hours (Figure 19).
Figure 19: Using ControlUp Automation to execute against properties within Azure Virtual Desktop is easy.
But what if you are troubleshooting a more complex problem that will take more than that time? ControlUp Automation’s filter editor allows you to add any inclusions or exclusions on whatever properties you want! You can add an Azure Tag to the specific machines you want to exclude to prevent the toggle. As shown in Figure 20, if you add the string “Test Machine” somewhere in an Azure Tag and then do the following in the filter editor:
Drain mode will not be toggled to disabled if it detects the “Test Machine” string within an Azure Tag.
Figure 20: View of the Filter Editor when customizing a trigger in ControlUp Automation.
Azure Virtual Desktop has terrific technology to connect your users to their resources with the fastest performance possible. Usually, when you connect to a virtual computer, your connection goes through a gateway that helps make sure everything is working correctly. But with RDP Shortpath, your connection can go directly to the virtual computer without going through the gateway. This can make your connection faster and more reliable.
However, your machines need to be configured to provide Shortpath. ControlUp tracks your machines’ configuration to report whether this configuration is working or disabled. This reporting saves you headaches if your users are frustrated about why things are not performing as they should by instantly giving you the information (Figure 21).
Figure 21: Screenshot of the grid view where you can identify whether you’re configured for Shortpath.
As Microsoft has come out with some killer features for Azure Virtual Desktop, we’ve made it a goal to try and get as many of these configurations visible to you as possible. The Azure Virtual Desktop Agent Last Update and Screen Capture Protection (Figure 22) is an example of us ensuring this information is brought to light. The Azure Virtual Desktop Agent, Last Update metric, provides information on whether Microsoft could update the Azure Virtual Desktop agent and, most importantly, when. Suppose you require Screen Capture Protection in your environment. In that case, ControlUp brings misconfigurations to light, or you can tie in ControlUp Automation to automatically enable it if it detects that it’s disabled.
Figure 22: Grid view displaying Azure Virtual Desktop Agent information and Screen Capture Protection.
One of the remarkable things about ControlUp Solve is its malleability. While looking at hundreds of metrics in a grid view is fantastic for comparing various resources, sometimes you want a complete summary of an individual machine, including all its properties.
By focusing on a specific Azure Virtual Desktop machine, ControlUp Solve will present you with a summary view of all related Azure Virtual Desktop properties. This is great if you want to get a more information-dense overview of the properties of a particular machine or to see what properties are available to add to the grid view.
Figure 23: Summary view of all related Azure Virtual Desktop properties.
Since Azure Virtual Desktop leverages FSLogix to provide an incredibly performant and flexible user profile solution, we wanted to offer a complete picture of how your Azure Virtual Desktop machines are performing and build an FSLogix integration into ControlUp. FSLogix pairs amazingly with Azure Virtual Desktop but is agnostic in that it works just as well with Citrix or VMware Horizon solutions. This, too, makes ControlUp an agnostic solution that will monitor FSLogix on any of the platforms mentioned. Read more about our FSLogix integration.
Figure 24: FSLogix integration displaying attached containers and configuration properties.
ControlUp’s session metrics are critical to understanding the experience your users are receiving, and our integration with Azure Virtual Desktop provides a potent tool for troubleshooting and understanding performance characteristics.
Figure 25: Azure Virtual Desktop Session Metrics.
Earlier, I touched upon RDP Shortpath, what it is at a machine level, and why it’s essential to try and ensure your users are configured and using it. Microsoft’s Remote Desktop client for Windows will summarize the performance of your connection with and without Shortpath.
To simplify this, one of the most significant advantages Shortpath provides is the UDP transport protocol. When Shortpath is enabled, UDP slices your latency in half. Although the screenshots (Figures 26 and 27) have similar latency, TCP requires acknowledgment of each packet sent. You must wait for packets to complete two trips—one to the destination and an acknowledgment back. Although various tricks can work around this limitation, it is inherent in TCP and adds latency for numerous transactions.
On the other hand, UDP sends its packets, and if the receiver doesn’t get them, they are lost. This lack of guaranteed delivery matters less when an acknowledgment is too late and new data is already being sent. This makes UDP a suitable protocol for real-time interactions and should be monitored to ensure your users get their best performance.
Figure 26: Remote Desktop view when not using Shortpath.
Figure 27: Remote Desktop view when using Shortpath.
ControlUp can report on the state of your users’ connections and whether they are using Shortpath (and UDP) or if they are not. We have developed two new metrics reporting this state (Figure 28). The AVD Session, Connection Transport column, reports whether the TCP or UDP protocol is being used, and the Azure Virtual Desktop Session Connection Type reports whether Shortpath is used for the user.
Suppose your users are complaining about poor performance. In that case, ControlUp can help by using these columns and identifying whether they are consuming Azure Virtual Desktop with maximal performance or if there is an opportunity to increase performance by ensuring they are using RDP Shortpath.
One caveat about RDP Shortpath is that it is not supported on the Microsoft Remote Desktop client for MacOS. As of May 2023, MacOS machines will always connect via TCP.
Figure 28: Grid view displaying Session Connection Type and Transport report on protocol, and indicates whether Shortpath is used.
Microsoft released an article to assist in troubleshooting issues using some built-in performance counters. At ControlUp, we immediately recognized the value of these counters and built them into our solution for customer consumption. With Azure Virtual Desktop, these have become even more critical.
Figure 29: Real-time performance session metrics
Average Encoding Time might be one of the most essential counters for understanding the effort your Azure Virtual Desktop machines are experiencing to ensure a smooth user experience. It tracks how long Remote Desktop Services took to capture and compress an individual frame. With this one counter, we can see if a machine can maintain a high level of experience for your users. This counter is reported in milliseconds (ms). If you have a target of 30 frames per second (fps) of experience for your users, this value needs to be at or less than 33ms. If it is higher than 33ms then the machine cannot capture and compress all the visual data in a time to guarantee a 30fps performance.
Tricks that have evolved to reduce CPU load on the servers and reduce bandwidth also come into play, giving visibility to data that might seem nonsensical. For example, you might have an Average Encoding Time much faster than required to deliver a 30fps average, but you still only see a Frames Per Second count of less than 30. This is because Microsoft (and other EUC vendors) realized that it only sometimes makes sense to have the server work as hard as possible, streaming data that stays the same or changes very little. The encoder should only work as hard as it needs to, giving more CPU resources to the applications to do whatever they need. If your users are looking at a static document, then the Frames Per Second rate will be reduced until it’s required in order to ramp up. This operates very similarly to the relatively new “variable refresh rate displays” that mobile phones use to reduce their processing and increase battery life.
If the display isn’t changing, send a frame repeatedly down the wire. However, by monitoring the Average Encoding Time and the Frames Per Second counter, you get the complete picture of whether your servers can keep up when demand is required (Figure 30).
Figure 30: Grid view displaying Average Encoding Time and Frames Per Second
ControlUp Solve also reports on three “Frames Skipped/Sec” metrics. Microsoft does a reasonably good job of attempting to figure out if the Frames Skipped is due to Insufficient client resources, the network, or on the Azure Virtual Desktop server. With these metrics, you can drill down your troubleshooting of performance issues even further!
Figure 31: Frames Skipped/Sec Metrics
ControlUp has never been a monitoring solution only. ControlUp provides actionable information and the ability to remediate or execute based on that information. Did a user report a stuck session and need it logged off? Solve provides a single pane of glass to search and take action.
Figure 32: Screenshot of Solve Actions available in ControlUp.
Did a user experience a longer-than-expected logon? ControlUp provides the ability to get further information on logon performance after performing a complete Logon Duration Analysis directly from Solve (Figure 33).
Figure 33: Screenshot of an Analyze Logon Duration Analysis performed in ControlUp Solve.
The transition to remote work was one of the most challenging things for IT teams to adjust to in the past few years. Home networks are built for sporadic traffic bursts, asymmetry, and varying degrees of latency and bandwidth. As the quality of home routers varies, like the wireless standards that seem to change every six months, troubleshooting connectivity issues at scale has proven to be beyond frustrating. To assist with troubleshooting these heterogeneous networks, ControlUp released RemoteDX technology, which provides rich information on the performance of a user’s home network.
For Windows clients, ControlUp offers a plugin for Remote Desktop that allows home network data to be brought into Solve. With our latest release of ControlUp we’ve added even more actionable information in RemoteDX, including that about the ISP, the client, and more.
Let me run through a scenario most of us have heard before. Your user complains, “My desktop is slow. My applications are slow. It’s frustrating to use our technology!” You find the user is working from home. What do you do? The IT team is often blind to the user’s home network. So guess and pray that troubleshooting becomes the norm.
What next? Send them a laptop. Did that fix it? No. Send a network cable and ask them to plugin directly to their router. Did that fix it? No. Ask them to upgrade their home router. Did that fix it? No. Ask them to upgrade their home internet. Did that fix it? No. You tell the user, “I don’t know, you just have to come into the office, I guess.” This is truly a frustrating exercise for all.
You then enter RemoteDX with its new expanded metrics included. Instead of guess and test troubleshooting, poor network performance is brought to light (Figure 34). Immediately we have some answers.
Figure 34: Screenshot of RemoteDX highlighting an ISP performance issue.
The first “hop” (192.168.1.1) is operating quite well, which tells us that the issue was never with the home router or network, and plugging in directly to the router or replacing the router or the laptop wouldn’t resolve the slowness. Therefore, the issue was with their Internet Service Provider (ISP). A route or other configuration issue on the ISP contributed to the delay. With this information, the user can call the ISP and report their performance issue, and IT can monitor the issue to ensure it is resolved and the user’s performance returns to normal.
In addition to the new ISP network hop, RemoteDX tracks the device’s CPU utilization. This helps us check off another possible reason the user may be experiencing network issues. If the network metrics look fine but are still reporting poor performance and the CPU utilization is high, then something might be causing contention on their device. This additional information can help you quickly get to the root cause of an issue and troubleshoot accordingly.
As a Canadian, there is an old Wayne Gretzky quote that I love. It goes, “I skate to where the puck is going, not to where it has been.”
As of the publication of this blog, Microsoft offers Azure Virtual Desktop on Azure Stack HCI in preview. This means you can use the Azure Virtual Desktop service on top of an Azure Stack HCI host on-premises. Since Windows Virtual Desktop was released, this has been a big deal to customers requesting it. Though it is in preview today, ControlUp will fully support Azure Virtual Desktop on Azure Stack HCI when it goes to General Availability (GA).
Figure 35: View of ControlUp fully supporting Azure Virtual Desktop on Azure Stack HCI (when GA).
Today, ControlUp supports Azure Stack HCI hosts the same as Hyper-V hosts. With the latest release of ControlUp, we also support Azure Virtual Desktop workloads on Azure Stack HCI. All of the screenshots and actions I wrote about in this article are from using Azure Virtual Desktop on Azure Stack HCI while being monitored and actioned against by ControlUp! Additionally, all the Azure Virtual Desktop metrics you saw were on workloads running on-premises with Azure Stack HCI hosts! Cool stuff, huh?
These are exciting times in the EUC space, and ControlUp will be with you every step of the way. If you’d like, download ControlUp and see firsthand how ControlUp can bring you the necessary visibility into your Azure Virtual Desktop environment and troubleshooting capabilities to close help desk tickets faster, reduce downtime, and improve your user experience.