Tap into a wealth of knowledge from your peers, industry experts, and the ControlUp team, deepening your understanding of ControlUp products and solutions.
Tap into a wealth of knowledge from your peers, industry experts, and the ControlUp team, deepening your understanding of ControlUp products and solutions.
The challenges of troubleshooting intermittent problems
I’m working with a customer who’s having an intermittent issue. The database that Citrix connects to occasionally becomes unavailable — why, no one knows. Since Citrix has a 90 second delay from the database outage to the local host cache (LHC) activating,this outage is really disruptive. If it happens during the morning logon rush or a shift change, you could be preventing hundreds or thousands of end users from working.Citrix has asked for some CDF traces. However, this problem might only last for a few minutes and resolve itself by the time the ticket comes. The ControlUp Automation Platform can assist with intermittent problems in two ways: augmented troubleshooting and automatic remediation. This post will cover augmented troubleshooting.
Augmented Troubleshooting
ControlUp can pick up on the events (like the one described above) as they are detected by the broker(s) and immediately start the information capture. Since this issue is intermittent, the capture will stop when the issue resolves itself.Typically, the ask is to get multiple traces. This means letting the environment exist in the database down state until enough CDF trace files are captured for Citrix to analyze. Even though Citrix has asked only for CDF traces, we can do better. I’ve configured four additional scripts to provide us even more context for what the issue might be. The first script is a continuous ping test to validate connectivity.
The second script leverages the Citrix Test-BrokerDBConnection cmdlet.
The third script executes a packet capture of network traffic.
The fourth executes a CDF trace.
CDF Trace Setup
The first thing you need to do is set up our environment so you can execute CDF traces.Download CDFControl and extract it somewhere in your system. For the purposes of this post, I’ve set it up in this path: C:SwinstCDFControlOpen CDFControl on your Citrix Desktop Delivery Controller (DDC). Under the “Trace Categories” menu, select “All Modules”.Under “Tools” select “Create CTL File”.Save the file in your CDFControl folder.In my examples, I’ll be storing the CDFControl.exe here: “C:SwinstCDFControlCDFControl.exe”.Copy the whole folder with CDFControl and your CTL file to all other Citrix Desktop Delivery Controllers.You are now set up to gather additional information.
Script Actions
The first script will start a continuous ping against our database with timestamps.
The second script will use the Citrix cmdlet’s to execute a database health check.
The third script starts a network packet capture.
The fourth script will start CDF Trace.Script 1 – Continuous Ping until the database responds
#requires -version 3
$ErrorActionPreference = 'Stop'
$ProgressPreference = 'silentlyContinue'
<#
.SYNOPSIS
Starts a continuous ping test against the Citrix database and logs the output.
.DESCRIPTION
Starts a continuous ping test against the Citrix database and logs the output if the Citrix Broker Service cannot contact the database. Another event-based task will be created and linked to event
ID 1200, Citrix Broker Service, Info to stop the ping.
.NOTES
Write-Output commands are captured by ControlUp and put into the ControlUp Auditing events. So make them nice :)
#>
#region variables
#Change these paths to reflect your environment
$TraceOutput = "$($args[0])"
#Database restored event log values
[string]$sourceIdentifier = 'Citrix Broker Service'
[string]$eventLogName = 'Application'
[string]$eventProvider = 'Citrix Broker Service'
[int]$eventId = '1200'
[string]$eventName = 'EntryWritten'
#endregion variables
<#
Trigger on an event log entry being created
@guyrleech 2019
#>
$eventLog = Get-EventLog -List | Where-Object { $_.Log -eq $eventLogName }
try {
$latestOutageEvent = (Get-EventLog -LogName Application -Source "Citrix Broker Service" -After (Get-Date -Format yyyy-MM-dd) | Where {$_.EventID -eq 1201})[0]
} catch {
Write-Error "Unable to find the Event Id 1201 from source Citrix Broker Service for today."
}
if( ! $eventLog )
{
Throw "Unable to find event log `"$eventLogName`""
}
$DBConnection = Get-ItemPropertyValue -Path HKLM:SOFTWARECitrixDesktopServerDataStoreConnectionsController -Name ConnectionString
$server = (($DBConnection -split ";")[0] -split "=")[1]
$1201TimeStamp = Get-Date $latestOutageEvent.TimeWritten -Format yyyy-MM-dd_HH-mm-ss
Unregister-Event -SourceIdentifier $sourceIdentifier -Force -ErrorAction SilentlyContinue
Register-ObjectEvent -InputObject $eventLog -EventName $eventName -SourceIdentifier $sourceIdentifier
Write-Verbose "Going into main loop @ $(Get-Date -Format G)"
try
{
While( $true )
{
$pingobj = Test-NetConnection -ComputerName $server
$outputObj = New-Object PSObject
$outputObj | Add-Member -MemberType NoteProperty -Name "Time" -Value (Get-Date -Format yyyy-MM-dd_HH-mm-ss)
$outputObj | Add-Member -MemberType NoteProperty -Name "Server" -Value ($server)
foreach ($property in ($pingobj | Get-Member | where {$_.MemberType -eq "Property"})) {
Switch ($property.Name) {
"RemoteAddress" { $outputObj | Add-Member -MemberType NoteProperty -Name "$($property.Name)" -Value ($pingobj.$($property.Name).IPAddressToString) }
"PingReplyDetails" { $outputObj | Add-Member -MemberType NoteProperty -Name "PingReplyDetails (RTT)" -Value ($pingobj.$($property.Name).RoundTripTime) }
"InterfaceAlias" { $outputObj | Add-Member -MemberType NoteProperty -Name "$($property.Name)" -Value ($pingobj.$($property.Name)) }
"SourceAddress" { $outputObj | Add-Member -MemberType NoteProperty -Name "$($property.Name)" -Value ($pingobj.$($property.Name).IpAddress) }
"PingSucceeded" { $outputObj | Add-Member -MemberType NoteProperty -Name "$($property.Name)" -Value ($pingobj.$($property.Name)) }
}
}
$outputObj | Export-Csv -Path "$traceOutput$($1201TimeStamp)_PingTest.txt" -NoClobber -Append -NoTypeInformation
$eventRaised = Wait-Event -Timeout 2 #this allows the ping loop to continue
if( $eventRaised -and $eventRaised.SourceIdentifier -eq $sourceIdentifier )
{
$event = $eventRaised.SourceEventArgs.Entry
if( $event )
{
if( $event.InstanceId -eq $eventId -and $event.Source -eq $eventProvider )
{
Write-Host "Monitored event raised at $($eventRaised.TimeGenerated)"
Write-Host "Stopping CDF Trace."
Start-Process -FilePath $CDFControlEXEPath -ArgumentList @("-stop","-noprompt")
if ($data.IsCompleted -ne $true) { $pscmd.Stop() }
if ($data2.IsCompleted -ne $true) { $pscmd2.Stop() }
break
}
}
}
elseif( $eventRaised )
{
Write-Verbose "Caught unexpected event $eventRaised"
Start-Sleep -Seconds 1
}
$eventRaised | Remove-Event
}
}
catch
{
Throw $_
}
finally
{
Write-Verbose "Unregistering event source identifier $sourceIdentifier"
Unregister-Event -SourceIdentifier $sourceIdentifier -Force
}
#requires -version 3
$ErrorActionPreference = 'Stop'
<#
.SYNOPSIS
Starts CDF Trace if an existing trace is not running. Targeted specifically for database outages noticed by the Citrix Broker Service.
.DESCRIPTION
Starts CDF Trace if the Citrix Broker Service cannot contact the database. Another event-based task will be created and linked to event
ID 1200, Citrix Broker Service, Info to stop the CDF trace.
#>
#region variables
#Change these paths to reflect your environment
$CDFControlEXEPath = "$($args[0])"
$CTLFile = "$($args[1])"
$TraceOutput = "$($args[2])"
#Database restored event log values
[string]$sourceIdentifier = 'Citrix Broker Service'
[string]$eventLogName = 'Application'
[string]$eventProvider = 'Citrix Broker Service'
[int]$eventId = '1200'
[string]$eventName = 'EntryWritten'
#endregion variables
try {
#see if an existing CDF trace is running
if ((logman -ets | Select-String -SimpleMatch "CDFCONTROL_TRACE_SESSION" -Quiet) -ne $true) {
Write-Error "No CDF Trace Session found." -ErrorAction Stop
} else {
Write-Host "An existing CDF Trace is already running."
}
} catch {
Write-Host "Starting a CDF Trace."
Start-Process -FilePath $CDFControlEXEPath -ArgumentList @("-start","-guids","$CTLFile","-path","$traceOutput")
<#
Trigger on an event log entry being created
@guyrleech 2019
#>
$eventLog = Get-EventLog -List | Where-Object { $_.Log -eq $eventLogName }
if( ! $eventLog )
{
Throw "Unable to find event log `"$eventLogName`""
}
Unregister-Event -SourceIdentifier $sourceIdentifier -Force -ErrorAction SilentlyContinue
Register-ObjectEvent -InputObject $eventLog -EventName $eventName -SourceIdentifier $sourceIdentifier
Write-Verbose "Going into main loop @ $(Get-Date -Format G)"
try
{
While( $true )
{
$eventRaised = Wait-Event
if( $eventRaised -and $eventRaised.SourceIdentifier -eq $sourceIdentifier )
{
$event = $eventRaised.SourceEventArgs.Entry
if( $event )
{
if( $event.InstanceId -eq $eventId -and $event.Source -eq $eventProvider )
{
Write-Host "Monitored event raised at $($eventRaised.TimeGenerated)"
Write-Host "Stopping CDF Trace."
Start-Process -FilePath $CDFControlEXEPath -ArgumentList @("-stop","-noprompt")
break
}
}
}
elseif( $eventRaised )
{
Write-Verbose "Caught unexpected event $eventRaised"
Start-Sleep -Seconds 1
}
$eventRaised | Remove-Event
}
}
catch
{
Throw $_
}
finally
{
Write-Verbose "Unregistering event source identifier $sourceIdentifier"
Unregister-Event -SourceIdentifier $sourceIdentifier -Force
}
}
Configuring the Trigger for Automation
Now you need to configure the trigger. For Citrix database outages you can key in on a specific Citrix event. In this case, that’s Event ID 1201 from the Citrix Broker Service:You then can configure an Event-based trigger with the following properties:Now, you attach multiple script actions to the trigger. By default, ControlUp will execute automation sequentially, with each script waiting until the other has completed before starting the next. For this scenario you need to set up the automation for simultaneous script execution. Simply add a registry value to the ControlUp Monitor:Registry Key: HKEY_USERSS-1-5-20SoftwareSmart-XControlUpMonitorSvcName: MaxAutoExecJobsTargetType: REG_DWORDValue: (in decimal) number of allowed simultaneous scriptsLet’s see it in action:In my next blog I’ll cover how to use ControlUp Automation to automatically remediate the problem.
Trentent Tye, a Tech Person of Interest, is based out of Canada and its many, many feet of snow. FUN FACT: Trentent came to ControlUp because, as a former customer, the product impacted his life in so many positive ways—from reducing stress, time to remediation, increased job satisfaction, and more—he had to be our evangelist. Now an integral part of ControlUp’s Product Marketing Team, he educates our customers, pours his heart and soul into the product, and generally makes ControlUp a better place. Trentent recently moved to be closer to family. He does not recommend moving during a pandemic.