I’m working with a customer who’s having an intermittent issue. The database that Citrix connects to occasionally becomes unavailable — why, no one knows. Since Citrix has a 90 second delay from the database outage to the local host cache (LHC) activating, this outage is really disruptive. If it happens during the morning logon rush or a shift change, you could be preventing hundreds or thousands of end users from working.
Citrix has asked for some CDF traces. However, this problem might only last for a few minutes and resolve itself by the time the ticket comes.
The ControlUp Automation Platform can assist with intermittent problems in two ways: augmented troubleshooting and automatic remediation. This post will cover augmented troubleshooting.
ControlUp can pick up on the events (like the one described above) as they are detected by the broker(s) and immediately start the information capture. Since this issue is intermittent, the capture will stop when the issue resolves itself.
Typically, the ask is to get multiple traces. This means letting the environment exist in the database down state until enough CDF trace files are captured for Citrix to analyze.
Even though Citrix has asked only for CDF traces, we can do better. I’ve configured four additional scripts to provide us even more context for what the issue might be.
The first script is a continuous ping test to validate connectivity.
The second script leverages the Citrix Test-BrokerDBConnection cmdlet.
The third script executes a packet capture of network traffic.
The fourth executes a CDF trace.
The first thing you need to do is set up our environment so you can execute CDF traces.
Download CDFControl and extract it somewhere in your system. For the purposes of this post, I’ve set it up in this path: C:SwinstCDFControl
Open CDFControl on your Citrix Desktop Delivery Controller (DDC). Under the “Trace Categories” menu, select “All Modules”.
Under “Tools” select “Create CTL File”.
Save the file in your CDFControl folder.
In my examples, I’ll be storing the CDFControl.exe here: “C:SwinstCDFControlCDFControl.exe”.
Copy the whole folder with CDFControl and your CTL file to all other Citrix Desktop Delivery Controllers.
You are now set up to gather additional information.
The first script will start a continuous ping against our database with timestamps.
The second script will use the Citrix cmdlet’s to execute a database health check.
The third script starts a network packet capture.
The fourth script will start CDF Trace.
Script 1 – Continuous Ping until the database responds
#requires -version 3 $ErrorActionPreference = 'Stop' $ProgressPreference = 'silentlyContinue' <# .SYNOPSIS Starts a continuous ping test against the Citrix database and logs the output. .DESCRIPTION Starts a continuous ping test against the Citrix database and logs the output if the Citrix Broker Service cannot contact the database. Another event-based task will be created and linked to event ID 1200, Citrix Broker Service, Info to stop the ping. .NOTES Write-Output commands are captured by ControlUp and put into the ControlUp Auditing events. So make them nice :) #> #region variables #Change these paths to reflect your environment $TraceOutput = "$($args[0])" #Database restored event log values [string]$sourceIdentifier = 'Citrix Broker Service' [string]$eventLogName = 'Application' [string]$eventProvider = 'Citrix Broker Service' [int]$eventId = '1200' [string]$eventName = 'EntryWritten' #endregion variables <# Trigger on an event log entry being created @guyrleech 2019 #> $eventLog = Get-EventLog -List | Where-Object { $_.Log -eq $eventLogName } try { $latestOutageEvent = (Get-EventLog -LogName Application -Source "Citrix Broker Service" -After (Get-Date -Format yyyy-MM-dd) | Where {$_.EventID -eq 1201})[0] } catch { Write-Error "Unable to find the Event Id 1201 from source Citrix Broker Service for today." } if( ! $eventLog ) { Throw "Unable to find event log `"$eventLogName`"" } $DBConnection = Get-ItemPropertyValue -Path HKLM:SOFTWARECitrixDesktopServerDataStoreConnectionsController -Name ConnectionString $server = (($DBConnection -split ";")[0] -split "=")[1] $1201TimeStamp = Get-Date $latestOutageEvent.TimeWritten -Format yyyy-MM-dd_HH-mm-ss Unregister-Event -SourceIdentifier $sourceIdentifier -Force -ErrorAction SilentlyContinue Register-ObjectEvent -InputObject $eventLog -EventName $eventName -SourceIdentifier $sourceIdentifier Write-Verbose "Going into main loop @ $(Get-Date -Format G)" try { While( $true ) { $pingobj = Test-NetConnection -ComputerName $server $outputObj = New-Object PSObject $outputObj | Add-Member -MemberType NoteProperty -Name "Time" -Value (Get-Date -Format yyyy-MM-dd_HH-mm-ss) $outputObj | Add-Member -MemberType NoteProperty -Name "Server" -Value ($server) foreach ($property in ($pingobj | Get-Member | where {$_.MemberType -eq "Property"})) { Switch ($property.Name) { "RemoteAddress" { $outputObj | Add-Member -MemberType NoteProperty -Name "$($property.Name)" -Value ($pingobj.$($property.Name).IPAddressToString) } "PingReplyDetails" { $outputObj | Add-Member -MemberType NoteProperty -Name "PingReplyDetails (RTT)" -Value ($pingobj.$($property.Name).RoundTripTime) } "InterfaceAlias" { $outputObj | Add-Member -MemberType NoteProperty -Name "$($property.Name)" -Value ($pingobj.$($property.Name)) } "SourceAddress" { $outputObj | Add-Member -MemberType NoteProperty -Name "$($property.Name)" -Value ($pingobj.$($property.Name).IpAddress) } "PingSucceeded" { $outputObj | Add-Member -MemberType NoteProperty -Name "$($property.Name)" -Value ($pingobj.$($property.Name)) } } } $outputObj | Export-Csv -Path "$traceOutput$($1201TimeStamp)_PingTest.txt" -NoClobber -Append -NoTypeInformation $eventRaised = Wait-Event -Timeout 2 #this allows the ping loop to continue if( $eventRaised -and $eventRaised.SourceIdentifier -eq $sourceIdentifier ) { $event = $eventRaised.SourceEventArgs.Entry if( $event ) { if( $event.InstanceId -eq $eventId -and $event.Source -eq $eventProvider ) { Write-Host "Monitored event raised at $($eventRaised.TimeGenerated)" Write-Host "Stopping CDF Trace." Start-Process -FilePath $CDFControlEXEPath -ArgumentList @("-stop","-noprompt") if ($data.IsCompleted -ne $true) { $pscmd.Stop() } if ($data2.IsCompleted -ne $true) { $pscmd2.Stop() } break } } } elseif( $eventRaised ) { Write-Verbose "Caught unexpected event $eventRaised" Start-Sleep -Seconds 1 } $eventRaised | Remove-Event } } catch { Throw $_ } finally { Write-Verbose "Unregistering event source identifier $sourceIdentifier" Unregister-Event -SourceIdentifier $sourceIdentifier -Force }
Script 2 – Test the Citrix Broker Database using the Citrix cmdlets
#requires -version 3 $ErrorActionPreference = 'Stop' <# .SYNOPSIS Tests the Citrix Broker Database. .DESCRIPTION Starts a test of the Broker Database Connection if the Citrix Broker Service cannot contact the database. #> #region variables #Change these paths to reflect your environment $TraceOutput = "$($args[0])" $eventLog = Get-EventLog -List | Where-Object { $_.Log -eq $eventLogName } try { $latestOutageEvent = (Get-EventLog -LogName Application -Source "Citrix Broker Service" -After (Get-Date -Format yyyy-MM-dd) | Where {$_.EventID -eq 1201})[0] } catch { Write-Error "Unable to find the Event Id 1201 from source Citrix Broker Service for today." } #Get the timestamp for this last event. $1201TimeStamp = Get-Date $latestOutageEvent.TimeWritten -Format yyyy-MM-dd_HH-mm-ss ipmo Citrix.Broker.Commands $DBConnection = Get-ItemPropertyValue -Path HKLM:SOFTWARECitrixDesktopServerDataStoreConnectionsController -Name ConnectionString $testDBConnection = Test-BrokerDBConnection -DBConnection $DBConnection Write-Output "Service Status`n--------------" | Out-file "$TraceOutput$($1201TimeStamp)_TestDBConnection.txt" $testDBConnection.ServiceStatus | Out-file "$TraceOutput$($1201TimeStamp)_TestDBConnection.txt" -Append $keys = $testDBConnection.ExtraInfo.keys foreach ($key in $keys) { Write-Output "`n" | Out-file "$TraceOutput$($1201TimeStamp)_TestDBConnection.txt" -Append $key | Out-File "$TraceOutput$($1201TimeStamp)_TestDBConnection.txt" -Append Write-Output "--------------" | Out-file "$TraceOutput$($1201TimeStamp)_TestDBConnection.txt" -Append $testDBConnection.ExtraInfo.$key | Out-File "$TraceOutput$($1201TimeStamp)_TestDBConnection.txt" -Append }
Script 3 – Capture network traffic
#requires -version 3 $ErrorActionPreference = 'Stop' $ProgressPreference = 'silentlyContinue' <# .SYNOPSIS Starts a network capture and stops when the database is restored. .DESCRIPTION Starts a network capture and stops when the Citrix Broker Service cannot contact the database. Another event-based task will be created and linked to event ID 1200, Citrix Broker Service, Info to stop the capture. .NOTES Write-Output commands are captured by ControlUp and put into the ControlUp Auditing events. So make them nice :) #> #region variables #Change these paths to reflect your environment $TraceOutput = "$($args[0])" #Database restored event log values [string]$sourceIdentifier = 'Citrix Broker Service' [string]$eventLogName = 'Application' [string]$eventProvider = 'Citrix Broker Service' [int]$eventId = '1200' [string]$eventName = 'EntryWritten' #endregion variables #region functions #ConvertTo-PCAP taken from the great Malcolm McCaffery. Code reused with his permissions. Please see his blog posting here for more details: https://chentiangemalc.wordpress.com/2018/10/08/convert-netsh-trace-etl-to-pcap-with-powershell/ function ConvertTo-PCAP { [CmdletBinding()] param( [Parameter(Position=0)] [ValidateScript({ if( -Not ($_ | Test-Path) ){ throw "File or folder $_ does not exist" } if($_.Extension -ne ".etl"){ throw "Source file must be .etl file" } return $true })] [System.IO.FileInfo]$Path, [Parameter(Position=1)] [ValidateScript({ if( -Not ($path.DirectoryName | Test-Path) ){ throw "File or folder does not exist" } if($_.Extension -ne ".pcap") { throw "Destination file must be .pcap file" } return $true })] [System.IO.FileInfo]$Destination, [Parameter(Position=2)] [Uint32]$MaxPacketSizeBytes = 65536) $csharp_code = @' using System; using System.Collections.Generic; using System.Diagnostics.Eventing.Reader; using System.IO; using System.Linq; using System.Text; using System.Threading.Tasks; namespace chentiangemalc { public static class NetworkRoutines { public static long ConvertEtlToPcap(string source, string destination, UInt32 maxPacketSize) { int result = 0; using (BinaryWriter writer = new BinaryWriter(File.Open(destination, FileMode.Create))) { UInt32 magic_number = 0xa1b2c3d4; UInt16 version_major = 2; UInt16 version_minor = 4; Int32 thiszone = 0; UInt32 sigfigs = 0; UInt32 snaplen = maxPacketSize; UInt32 network = 1; // LINKTYPE_ETHERNET writer.Write(magic_number); writer.Write(version_major); writer.Write(version_minor); writer.Write(thiszone); writer.Write(sigfigs); writer.Write(snaplen); writer.Write(network); long c = 0; long t = 0; using (var reader = new EventLogReader(source, PathType.FilePath)) { EventRecord record; while ((record = reader.ReadEvent()) != null) { // Console.WriteLine(String.Format("record: {0}",record.ToXml())); c++; t++; if (c == 10000) { Console.WriteLine(String.Format("Processed {0} events with {1} packets processed",t,result)); c = 0; } using (record) { if (record.ProviderName == "Microsoft-Windows-NDIS-PacketCapture") { if (record.Properties.Count() >= 4) { if (record.Properties[2].Value.GetType() == typeof(UInt32)) { //Console.WriteLine(String.Format("record.properties: {0}",record.Properties.Count())); result++; DateTime timeCreated = (DateTime)record.TimeCreated; //Console.WriteLine(String.Format("record.TimeCreated: {0}",record.TimeCreated)); UInt32 ts_sec = (UInt32)((timeCreated.Subtract(new DateTime(1970, 1, 1))).TotalSeconds); //Console.WriteLine(String.Format("record.ts_sec: {0}",ts_sec)); UInt32 ts_usec = (UInt32)(((timeCreated.Subtract(new DateTime(1970, 1, 1))).TotalMilliseconds) - ((UInt32)((timeCreated.Subtract(new DateTime(1970, 1, 1))).TotalSeconds * 1000))) * 1000; //Console.WriteLine(String.Format("record.ts_usec: {0}",ts_usec)); //Console.WriteLine(String.Format("record.incl_len: {0}",record.Properties[2].Value.GetType())); UInt32 incl_len = (UInt32)record.Properties[2].Value; //Console.WriteLine(String.Format("record.incl_len: {0}",record.Properties[2].Value)); if (incl_len > maxPacketSize) { Console.WriteLine(String.Format("Packet size of {0} exceeded max packet size {1}, packet ignored",incl_len,maxPacketSize)); } UInt32 orig_len = incl_len; writer.Write(ts_sec); writer.Write(ts_usec); writer.Write(incl_len); writer.Write(orig_len); writer.Write((byte[])record.Properties[3].Value); } } } } } } } return result; } } } '@ Add-Type -Type $csharp_code $result = [chentiangemalc.NetworkRoutines]::ConvertEtlToPcap($Path.FullName,$Destination.FullName,$MaxPacketSizeBytes) Write-Host "$result packets converted." } #endregion functions <# Trigger on an event log entry being created @guyrleech 2019 #> $eventLog = Get-EventLog -List | Where-Object { $_.Log -eq $eventLogName } try { $latestOutageEvent = (Get-EventLog -LogName Application -Source "Citrix Broker Service" -After (Get-Date -Format yyyy-MM-dd) | Where {$_.EventID -eq 1201})[0] } catch { Write-Error "Unable to find the Event Id 1201 from source Citrix Broker Service for today." } if( ! $eventLog ) { Throw "Unable to find event log `"$eventLogName`"" } $1201TimeStamp = Get-Date $latestOutageEvent.TimeWritten -Format yyyy-MM-dd_HH-mm-ss Unregister-Event -SourceIdentifier $sourceIdentifier -Force -ErrorAction SilentlyContinue Register-ObjectEvent -InputObject $eventLog -EventName $eventName -SourceIdentifier $sourceIdentifier Write-Verbose "Going into main loop @ $(Get-Date -Format G)" if ((Get-NetEventSession).SessionStatus -eq "Running") { Write-Error "Exisiting trace is underway. Aborting." Exit } if ((Get-NetEventSession).SessionStatus -eq "NotRunning") { Get-NetEventSession | Remove-NetEventSession } try { New-NetEventSession -Name "Session1" -LocalFilePath "$traceOutput$($1201TimeStamp)_capturetraffic.etl" -ErrorAction Stop | Out-Null Add-NetEventPacketCaptureProvider -SessionName "Session1" -ErrorAction Stop | Out-Null Start-NetEventSession -Name "Session1" -ErrorAction Stop | Out-Null } catch { Write-Error "Starting capture failed with : $($error[0])" } try { While( $true ) { $eventRaised = Wait-Event -Timeout 2 #this allows the ping loop to continue if( $eventRaised -and $eventRaised.SourceIdentifier -eq $sourceIdentifier ) { $event = $eventRaised.SourceEventArgs.Entry if( $event ) { if( $event.InstanceId -eq $eventId -and $event.Source -eq $eventProvider ) { Write-Host "Monitored event raised at $($eventRaised.TimeGenerated)" Write-Host "Stopping Packet Capture." Stop-NetEventSession -Name "Session1" Remove-NetEventSession -Name "Session1" #convert to cap file for wireshark ConvertTo-PCAP -Path "$traceOutput$($1201TimeStamp)_capturetraffic.etl" -Destination "$TraceOutput$($1201TimeStamp).pcap" break } } } elseif( $eventRaised ) { Write-Verbose "Caught unexpected event $eventRaised" Start-Sleep -Seconds 1 } $eventRaised | Remove-Event } } catch { Throw $_ } finally { Write-Verbose "Unregistering event source identifier $sourceIdentifier" Unregister-Event -SourceIdentifier $sourceIdentifier -Force }
Script 4 – Execute a CDF Trace during the outage
#requires -version 3 $ErrorActionPreference = 'Stop' <# .SYNOPSIS Starts CDF Trace if an existing trace is not running. Targeted specifically for database outages noticed by the Citrix Broker Service. .DESCRIPTION Starts CDF Trace if the Citrix Broker Service cannot contact the database. Another event-based task will be created and linked to event ID 1200, Citrix Broker Service, Info to stop the CDF trace. #> #region variables #Change these paths to reflect your environment $CDFControlEXEPath = "$($args[0])" $CTLFile = "$($args[1])" $TraceOutput = "$($args[2])" #Database restored event log values [string]$sourceIdentifier = 'Citrix Broker Service' [string]$eventLogName = 'Application' [string]$eventProvider = 'Citrix Broker Service' [int]$eventId = '1200' [string]$eventName = 'EntryWritten' #endregion variables try { #see if an existing CDF trace is running if ((logman -ets | Select-String -SimpleMatch "CDFCONTROL_TRACE_SESSION" -Quiet) -ne $true) { Write-Error "No CDF Trace Session found." -ErrorAction Stop } else { Write-Host "An existing CDF Trace is already running." } } catch { Write-Host "Starting a CDF Trace." Start-Process -FilePath $CDFControlEXEPath -ArgumentList @("-start","-guids","$CTLFile","-path","$traceOutput") <# Trigger on an event log entry being created @guyrleech 2019 #> $eventLog = Get-EventLog -List | Where-Object { $_.Log -eq $eventLogName } if( ! $eventLog ) { Throw "Unable to find event log `"$eventLogName`"" } Unregister-Event -SourceIdentifier $sourceIdentifier -Force -ErrorAction SilentlyContinue Register-ObjectEvent -InputObject $eventLog -EventName $eventName -SourceIdentifier $sourceIdentifier Write-Verbose "Going into main loop @ $(Get-Date -Format G)" try { While( $true ) { $eventRaised = Wait-Event if( $eventRaised -and $eventRaised.SourceIdentifier -eq $sourceIdentifier ) { $event = $eventRaised.SourceEventArgs.Entry if( $event ) { if( $event.InstanceId -eq $eventId -and $event.Source -eq $eventProvider ) { Write-Host "Monitored event raised at $($eventRaised.TimeGenerated)" Write-Host "Stopping CDF Trace." Start-Process -FilePath $CDFControlEXEPath -ArgumentList @("-stop","-noprompt") break } } } elseif( $eventRaised ) { Write-Verbose "Caught unexpected event $eventRaised" Start-Sleep -Seconds 1 } $eventRaised | Remove-Event } } catch { Throw $_ } finally { Write-Verbose "Unregistering event source identifier $sourceIdentifier" Unregister-Event -SourceIdentifier $sourceIdentifier -Force } }
Now you need to configure the trigger. For Citrix database outages you can key in on a specific Citrix event. In this case, that’s Event ID 1201 from the Citrix Broker Service:
You then can configure an Event-based trigger with the following properties:
Now, you attach multiple script actions to the trigger.
By default, ControlUp will execute automation sequentially, with each script waiting until the other has completed before starting the next. For this scenario you need to set up the automation for simultaneous script execution. Simply add a registry value to the ControlUp Monitor:
Registry Key: HKEY_USERSS-1-5-20SoftwareSmart-XControlUpMonitorSvc
Name: MaxAutoExecJobsTarget
Type: REG_DWORD
Value: (in decimal) number of allowed simultaneous scripts
Let’s see it in action:
In my next blog I’ll cover how to use ControlUp Automation to automatically remediate the problem.