In this section:

VM usage is monitored for all VM on the DSC - SP2000 Platform. These systems support three severity levels for the CPU usage alarms: Minor, Major, and Critical; one onset and one abatement threshold values are associated with each alarm severity. The thresholds can be set from 1 to 100%. The default value setting is 0 (zero). One or more of these severity based alarms can be disabled by setting both the corresponding onset and abatement thresholds for that level to a value of zero.

The term onset refers to the rising or falling VM usage value that, when reached, generates an alarm. The term abate refers to the rising or falling VM usage value that, when reached, clears the alarm.

VM usage alarms are raised on a per slot basis, not a per core basis; each VM usage severity alarm is raised on the first instance of a VM core usage crossing the onset threshold.

Subsequent VM cores whose usage crosses the onset threshold do not raise an alarm. VM  usage alarms are then cleared when the VM usage recovers under the abatement threshold  for that severity on all VM cores.

 

The VM usage alarm control process ensures the following:

  • previous alarms are cleared at system startup

  • alarms are not created during the VM startup

  • alarms are raised as required in high CPU usage situations

  • alarms are cleared when the CPU usage is below the set threshold

  • alarms are logged in sysmon and slotmon

A sysmon trap and syslog is generated EVERY time a CPU core crosses an alarm threshold level. The trap and log contain the Core Number, Percent Usage, Slot Number, and the severity of the alarm level that was crossed.

The enhanced sysmon functions are as follows:

  • Introduced abatement and onset thresholds to minimize the number of CPU usage generated alarms

  • Reset alarm levels to zero on start-up to clear any previously raised CPU usage alarms

  • Changed the sleep time in cpu_usage_checker.sh to 10 second from 1 second to match the CPU usage numbers shown using: top -d 10

The enhanced slotmon functions are as follows:

  • Increased initial timer to (5 sec) before a check for CPU usage alarms takes place

  • Slotmon proper handling of internal state on card extraction/loss of connection, clearing of related/existing dashboard

  • CPU usage alarms, internal state reset

The following is recommended to configure the CPU usage threshold:

  • CPU usage alarm thresholds need to be engineered to match the expected/normal levels that the system is likely to encounter. As with all multi-threaded computers, this depends on many factors, including traffic levels, background activities, and so on. It is recommended that the typical CPU usage levels experienced during high and low messaging traffic periods are analyzed in order to determine appropriate CPU usage threshold value.

  • The threshold levels should be set to values which produce alarms when CPU usage reaches levels not considered normal (for example, beyond what is seen during peak traffic periods). If a certain threshold setting frequently gets triggered and investigation indicates that the system is operating normally and that the usage spike is due to normal system operations combined with traffic levels, then the threshold level should probably be raised.

  • If the system is behaving normally, but its CPU usage levels are fluctuating between high and low values frequently, then having the abatement and onset threshold levels set farther apart may be warranted to avoid a flood of alarm raises and clears from masking out real system congestion states.

The recommended settings for the onset and abatement values are as follows:

  • CPU Usage Minor Alarm Abatement 40%

  • CPU Usage Minor Alarm Onset 50%

  • CPU Usage Major Alarm Abatement 60%

  • CPU Usage Major Alarm Onset 70%

  • CPU Usage Critical Alarm Abatement 80%

  • CPU Usage Critical Alarm Onset 90%

Minor, Major, Critical alarm threshold levels can be set in Web based User Interface (Web UI).

To view the CPU Usage Threshold

  1. From the Main Menu, click Processes.

  2. Using the Process Details on drop-down list, select the CPU for which you want to see the CPU usage threshold.

  3. Click Update.

  4. Click CPU Usage Threshold.
  5. Configure the CPU usage alarm as required.

  6. Click Update.

  • No labels