Scheduled Troubleshooting Design

Introduction

Scheduled Troubleshooting with Discovery Admin enables the automated troubleshooting of Logs generated by ServiceNow Discovery on a recurring basis.

A good Scheduled Troubleshooting design ensures that we are comprehensively troubleshooting the Logs generated by ServiceNow Discovery Schedules, consistently and reliably.

This article walks through what is needed for a successful Scheduled Troubleshooting Design.

WHY (is Scheduled Troubleshooting Important)

Designing (and configuring) Scheduled Troubleshooting is the first step towards successfully operationalizing the maintenance of ServiceNow Discovery.

Scheduled Troubleshooting with Discovery Admin automatically gives us actionable Incident Error Codes (with corresponding Root Cause and Remediations) aligned with each iterative run of ServiceNow Discovery.

These recurring data points set the foundation for Visualization (Dashboards and Reports) and Incident Generation, changing how we operationalize the maintenance of ServiceNow Discovery.

WHO (should be involved with the Scheduled Troubleshooting Design)

The design for Scheduled Troubleshooting with Discovery Admin requires a similar mindset to designing ServiceNow Discovery Schedules.

The best inputs to the Design process are from the Team who designed the ServiceNow Discovery Schedules and understand why they were configured in a particular way.

It is possible that the original Team may not be available to provide the necessary inputs.

In this case, we should analyze how the Discovery Schedules are currently running in Production and use the results of the analysis as inputs for the Design of the Scheduled Troubleshooting.

WHAT (should be scoped for Scheduled Troubleshooting)

To determine WHAT to scope for Scheduled Troubleshooting, we need to determine the following:

Which Discovery Schedules are in scope:
- These are typically recurring (non-ad-hoc Schedules) that scan Configuration Items.
How many Daisy Chains are configured corresponding to the scoped Discovery Schedules above:
- While many Customers have a single Daisy Chain that scans the complete network periodically, it is common to have multiple Daisy Chains that scan different parts of the Network based on geography, business unit, network segmentation, etc.
How long does each Daisy Chain take to complete:
- While many Customers have Daisy Chains that complete in a few hours, it is common to have scenarios where Daisy Chains may take more than a day to complete.
- However, for effective troubleshooting, it is important to ensure that a single Daisy Chain does not take more than 3 days to complete. This is to allow enough time for Discovery Admin to complete the troubleshooting before ServiceNow auto-purges the ECC Queue.
- If the ECC Queue gets auto-purged BEFORE Discovery Admin can get to the corresponding ServiceNow Discovery Logs, Discovery Admin assigns the IEC: P0.NoECC.00

In summary, use the following Design guidelines:

One Scheduled Troubleshooting per ServiceNow Discovery Schedule Daisy Chain
Grouping of these ServiceNow Discovery Schedules can be done leveraging the naming convention of the Discovery Schedules
If the ServiceNow Discovery naming conventions do not lend to appropriate grouping, consider leveraging any unused attribute on the Discovery Schedule Table to help group common ServiceNow Discovery Schedules
Dot-walking to attributes on the ServiceNow Discovery Schedule (via the Discovery Status) lends to a more consistent design as it eliminates the variability of the duration of the Discovery Status record

Example 1:

A Customer has one Daisy Chain that scans the whole network in two days.
This Customer would need one Scheduled Troubleshooting which runs at 2:00 am the day after the Daisy Chain is complete and looks at all the corresponding ServiceNow Discovery Logs (generated by the Daisy Chain) created on the past two days.

Example 2:

A Customer has two Daisy Chains (A and B) that scan the whole network within one day each.
This Customer would need two Scheduled Troubleshooting Records which run at 2:00 am the day after each of the Daisy Chains (A and B) complete and look at all the corresponding ServiceNow Discovery Logs (generated by the respective Daisy Chains) created on the previous day.

Other Considerations:

Prior to finalizing the Scheduled Troubleshooting Design, execute the Scheduled Troubleshooting to get insights into approximately how long the corresponding Troubleshooting takes.
Make sure the Troubleshooting triggered by Scheduled Troubleshooting does not run for more than 16 hours. If it does, reduce the Discovery Status Records selected by the Scheduled Troubleshooting and spread the Discovery Status Records over multiple Scheduled Troubleshooting Records.

WHEN (should the Scheduled Troubleshooting be run)

The WHEN is configured on the Scheduled Troubleshooting Form after the Scheduled Troubleshooting is created.

However, this may require us to revisit the WHAT and thus is an important consideration during the design process.

In short, the WHAT and WHEN work in conjunction with each other as inputs for the Scheduled Troubleshooting Design.

The Scheduled Troubleshooting should be configured to run Weekly (instead of Daily, Monthly or Periodically). The following two questions should be addressed for the Weekly Scheduled Troubleshooting Design:

Which Day of the Week should the Scheduled Troubleshooting run
What time of the Day should the Scheduled Troubleshooting run

This decision needs to be made for every ServiceNow Discovery Schedule Daisy Chain taking into account the following:

How often does the ServiceNow Discovery Schedule Daisy Chain Run
- Daisy Chains can run Daily, Weekly or Periodically
How long does the Scheduled Troubleshooting take to complete
- Discovery Admin can scan about 1 Million Logs a Day
- Leveraging the 'Priority Flag' Discovery Admin can scan an additional 1 Million Logs a Day in parallel

Example 1:

A Customer has one Daisy Chain that runs on Saturday Morning and scans the whole network in two days (i.e. completes the ServiceNow Discovery Scan on Sunday evening)
This Customer would need a corresponding Scheduled Troubleshooting which runs at 2:00 am every Monday (Weekly) and looks at all the corresponding ServiceNow Discovery Logs (generated by the Daisy Chain) created on the past two days.

Example 2:

A Customer has two Daisy Chains (A and B) that scan the whole network within one day each.
This Customer would need two Scheduled Troubleshooting Records which run at 2:00 am the day after each of the Daisy Chains (A and B) complete and look at all the corresponding ServiceNow Discovery Logs (generated by the respective Daisy Chains) created on the previous day.
Since both the Daisy Chains run within a day, we can still run the corresponding Scheduled Troubleshooting with a Weekly
- Explore the Priority Flag in the Additional Considerations Section
- Explore the Run Next Attribute in the Additional Considerations Section

Other Considerations:

'Max Run Time' on 'ServiceNow Discovery Schedules' should be configured, to ensure they are completed (or canceled) before the Scheduled Troubleshooting is scheduled to start.
No two Scheduled Troubleshooting Records shoud be scheduled to start at the same time, unless the Priority flag on the Scheduled Troubleshooting Form is active on ONE of them.
The attribute 'Progress (Analysis)' on the Troubleshooting Form provides the speed of the Analysis in Logs / Minute. If this value is less than 500 Logs / Minute, revisit WHEN the Scheduled Troubleshooting Records are configured to run, to ensure the that Troubleshooting Records are not being queued.

WHERE (do we go to configure a new Scheduled Troubleshooting)

The WHAT inputs from the use cases above are configured on the Discovery Status List View Filter followed by the WHEN inputs from the use cases above, which are configured on the Scheduled Troubleshooting Form.

The following attributes should be in EVERY Discovery Status Query Filter on the Scheduled Troubleshooting Form:

Created = [Derived from the Inputs Above] (Time-box the Discovery Status Records)
- Use the 'ON' option for selecting filters like 'Yesterday' or 'Last week' or 'Last 7 days'
- Use the 'RELATIVE' option for selecting filters like 'After' N 'Hours ago' or 'After' N 'Days ago'
Description = Scheduled (filter on Scheduled Discovery Status Records vs Ad-hoc Discovery Status Records)
Discover = Configuration Items (Horizontal IP-Based Discovery)
Schedule CONDITION [Derived from the Inputs Above] (Group Discovery Status Records)
Schedule DOT WALK [Derived from the Inputs Above] (Advanced Grouping of Discovery Status Records)
- Schedule.Active = true
- Schedule.Discovery Run Type = Weekly

NOTE: Discovery = Cloud Resources is NOT Supported by Discovery Admin

HOW (is a new Scheduled Troubleshooting created)

Details for how to Create and Manage Scheduled Troubleshooting Records is explained here with a Demo walking through all the steps to configure a Scheduled Troubleshooting.

This page also contains additional tips and pointers to help address common questions when configuring the Scheduled Troubleshooting.

However, review this complete article and finalize the design, before configuring the Scheduled Troubleshooting.

Naming

As a part of creating a new Scheduled Troubleshooting Record via the Discovery Status List View, we need to provide a unique name for the Scheduled Troubleshooting Record.

The naming conventions should align with the Daisy Chain and the combination of parameters derived from the WHAT and the WHEN section.

Since the name of the Scheduled Troubleshooting is used as a filter in Reports and Dashboards, keep the name(s) of the Scheduled Troubleshooting short, intuitive, and unique.

It is recommended to lock in the naming convention of all the Scheduled Troubleshooting Record(s) BEFORE creating the first Scheduled Troubleshooting Record.

Include the one or more of the following for the Scheduled Troubleshooting Naming Convention:

Start the Scheduled Troubleshooting Name with a Number, so it is easier to sort on the Scheduled Troubleshooting List View, especially when Scheduled Troubleshootings are Daisy Chained.
Consider including the following in the naming Convention
- Location
- Environment
- Business Unit
- Day of the Week
- Discovery Schedule(s)
For Example
- 01-US-PROD-MONDAY
- 03-STL-DMZ-WEDNESDAY
- 07-DC1-GLOBAL_IT-FRIDAY

Run Next (Reference Attribute)

Just like we can Daisy Chain ServiceNow Discovery Schedules using the 'Run After' Attribute, we can also the Daisy Chain Discovery Admin Scheduled Troubleshooting using the 'Run Next' Attribute via the Scheduled Troubleshooting List View or Form.

This automatically triggers the next Scheduled Troubleshooting when the current one finishes execution.

Daisy Chaining of Scheduled Troubleshooting eliminates the need to keep a time buffer between two Scheduled Troubleshooting Records thereby increasing the throughput of Discovery Admin.

Note:

Canceling the current Troubleshooting will NOT trigger the Scheduled Troubleshooting in the 'Run Next' attribute, effectively canceling the Daisy Chain.
The 'Run Next' attribute on Scheduled Troubleshooting works differently than the 'Run After' attribute on Discovery Schedule.
'Run After' on the Discovery Schedule Form references the previous Discovery Schedule which runs before the current one.
'Run Next' on the Scheduled Troubleshooting Form references the next Scheduled Troubleshooting which will run after the current one.
This difference in implementation is attributed to the fact that empirically, we want to know what is running next on the Troubleshooting Record instead of what ran before.

Considerations:

Scheduled Troubleshootings having a shorter completion time should be configured earlier in the Daisy Chain as compared to longer running Scheduled Troubleshootings.
To prevent accidental runs, configure the Run attribute as 'Run = Once' for all the Daisy Chained Scheduled Troubleshooting Records, except the first Scheduled Troubleshooting Record in the Daisy Chain.

Priority (Boolean Attribute)

Scheduled Troubleshooting can be configured to use a separate prioritized background queue using the 'Priority' flag.

As a result, we can run up to two Scheduled Troubleshooting at the same time (one with the Priority Flag selected and the other with the Priority Flag unselected).

This can be used to scan more logs or kept as a buffer for doing any additional analysis, without disturbing the ongoing cadence.

This is particularly useful when two long running Scheduled Troubleshooting Records need to be run during the same time window.

These attributes can be configured on Form or List View.

If automated Incident Generation is enabled (see the section below), the Priority Flag allows for parallel Incident Generation as well.

Generate Incidents (Boolean Attribute)

We can control the Incident Generation at the granularity of the Scheduled Troubleshooting. This allows us to scan and report on the results on a more regular basis and have Incidents generated less frequently.

Execute Now (UI Action)

If there is a need to analyze a predetermined set of Discovery Status Records (corresponding to Discovery Schedules), a Scheduled Troubleshooting can be configured to be On-Demand and we can leverage the { Execute Now } UI Action to run the analysis.

This is a great way to run an ad-hoc analysis for a set of Discovery Schedules, which would be otherwise cumbersome to filter and select via the Discovery Status List View every time they need to be analyzed.

Any applicable filter can be saved via the Scheduled Troubleshooting Record.

This is also helpful when ad-hoc Troubleshooting is needed for a predetermined set of Logs, in non-Prod.

Note:

{ Execute Now } does not take into consideration fields like: Run Next, Priority, and Generate Incidents as this is considered ad-hoc Troubleshooting vs Scheduled Troubleshooting as it is manually initiated.

Run

This is a ServiceNow out-of-the-box feature.

After creating the Scheduled Troubleshooting Record, you can configure one of the following options on the Form:

Run = On Demand

This is the default value populated when a new Scheduled Troubleshooting is created.

Run = Once

We also can run the Scheduled Troubleshooting once, by specifying the time when the analysis should run.

HINT: Mouse over the 'Starting' displays the exact time of execution and is a good way to validate the execution time if the system timezone differs from the timezone configured on your user profile.

Run = Business Calendar

Provides the ability to select a pre-configured Business Calendar to allow complete flexibility on when a Scheduled Troubleshooting should be executed.

Conditional (Boolean Attribute)

This is a ServiceNow out-of-the-box advanced feature allowing for a script to provide additional conditions.

The Scheduled Troubleshooting should NOT be configured to start after the successful completion of the Discovery Schedule(s) you are planning to Troubleshoot. Additionally, the cadence of the Scheduled Troubleshooting must match that of the corresponding Discovery Schedule(s).

Unique requirements like triggering a Scheduled Troubleshooting right after a Discovery Status finishes, can be accommodated by leveraging the out-of-the-box Conditional attribute on the Scheduled Troubleshooting Form. However, this approach isn't recommended because the troubleshooting is now dependent on the variability of Discovery Schedules (and DA can detect bad schedules)

However, the OOB Conditional field is available for scripting any advanced conditions which are not able to be configured via the out-of-the-box Scheduling UI.

API (Advanced Scheduling)

NOTE: Make sure the call is made by code in the Discovery Admin Application Scope.

Scheduled Troubleshooting can be triggered via any custom code in ServiceNow by using the following:

utilQuickNexusDiscovery().insertTroubleshootingFromSchedule(strQueryString, strScheduledJob, strTriggerType, strRunNext)
- strQueryString = Discovery Status Query Filter [Required]
- strScheduledJob = Unique Name [Required] strTriggerType = ea_st OR ea_st_priority [Required]
- strRunNext = Daisy Chain Scheduled Troubleshooting [Optional: pass an empty string as a parameter]