Introduction

Resource and statistics monitoring is an important function for managing network resources on the SBC Core. The SBC exposes data about its internal operations through a number of data points related to resource utilization on the device. The SBC collects and retains data through a series of interval statistics tables that are primarily accessible through the user interface, EMA and CLI, and Insight EMS (if applicable).

The interval time and number of intervals retained are configurable, but defaults are 15-minute collection intervals and a retention policy of 4 intervals. This gives the operator access to one hour’s worth of statistical data directly on the device.

The SBC collects interval statistics data on CPU utilization, memory utilization, DSP utilization, system policers, Call Admission controls and trunk group (call) statistics.

Caution

Ribbon strongly cautions against using the data from the current interval in any performance monitoring application. The current interval is accumulating data and the value of that data depends on where along the interval timeline the data is sampled. For example, the average CPU utilization sampled in the first few seconds of an interval may vary significantly from the average CPU utilization across the entire interval.

Instead of using current interval, Ribbon recommends sampling data from the most recently completed interval and trending the statistics across multiple intervals to form a profile about utilization trends.

The SBC allows the operator to access data from the current interval and from the interval statistics table related to each object being monitored. The current interval consists of a statistics table entry that is accumulating data for the next interval report. The interval statistics table is a series of statistics table entries containing data from the most recently completed n intervals (where n is the number of intervals in the retention policy configuration).

Note
Refer to Links to Statistics Pages to navigate to a particular EMA or CLI statistics page.

Status vs. Statistics

Certain objects can be monitored based on the status of that object. An Ethernet port, for example, is either up or down. An Ethernet port with a status of down is a problem worth investigating. It does not really matter if the port is bouncing up and down over several status samples or simply remains down.

For other objects whose status may vary from moment to moment, it is more informative to collect statistics about that object. CPU utilization, for example, may reveal 100% in any given instant; however, it is only significant if utilization remains at 100% over several samples.

Note
Because monitoring instantaneous utilization can be misleading, it is better to rely on statistics which take into account both average and peak utilization over a predefined period.

Interval Settings

The SBC supports configuring both Utilization Monitor Statistics Interval and the more general Interval Statistics. The Utilization Monitor settings control collection of CPU, memory and other utilization statistics.  Interval Statistics settings control collections of trunk group level statistics.

The SBC allows the operator to configure interval settings separately for system utilization statistics (CPU, memory and DSP resource utilization) and trunk group related call statistics.

 

Note

To configure Utilization Monitor settings, see:

To configure general interval statistics, see:

CPU Utilization Statistics

Unlike older systems that report a single CPU utilization value, the SBC uses a multi-core processor and records CPU utilization for each CPU core separately. The SBC accumulates statistics about CPU utilization including the high, average and low utilization statistics for each CPU core for each interval period.

The SBC itself monitors CPU utilization as part of congestion management on the device. The software samples CPU utilization every second and uses a 3-sample average of the highest running core as the value of the current CPU utilization for the purposes of determining congestion. It is important to recognize that the contribution to high CPU utilization for determining congestion could come from a single CPU core or from multiple different CPU cores in a series of samples.

If the operator is looking for a single CPU trend indicator in the interval statistics, Ribbon recommends using the Maximum of the average CPU utilizations across all 16 cores from a given collection interval. The core selected may be different for each interval, but this trend will be closer to how the system uses CPU utilization for congestion reporting.

We caution that the system may exhibit congestion responses without seeing the congestion threshold crossed in the interval statistics. This is due to the differences in the interval collection and reporting methods from the CPU sampling algorithm used by the congestion management system.

Ribbon recommends watching for the Maximum of the average CPU utilizations to rise above 80% utilization for two or more consecutive intervals as the threshold for an alert or warning. Although the SBC can certainly sustain this level of utilization, continuous operation above 80% utilization may indicate a need for additional resources or additional action to move traffic off the device.

Continuous operation at 95% or higher  is cause for concern. In addition to the average CPU utilizations reported in the interval statistics, the operator should also look to the congestion statistics to determine if the device is operating in overload and take action to reduce or limit traffic.

Monitoring of CPU utilization trends will benefit from shorter intervals than the 15-minute default interval duration.

Note

Ribbon recommends setting the system utilization interval to 5 minutes with a retention policy of 12 periods. This improves the granularity of the statistics samples, allowing the operator to more readily observe spikes or dips in average utilization in each subsequent period.

The Number field in the left-most column represents the Interval Number. This is a 32-bit, incrementing sequence number that identifies a particular interval. The lower the sequence number, the earlier the interval.  The most recent interval is the interval with the highest sequence number.

Refer to Request System - CLI to display CPU Usage history via CLI.

Refer to Congestion Control for additional congestion control details, such as signaling and media overload protection, adaptive overload control, monitoring congestion controls and IP Peer overload traffic throttling.

Memory Utilization Statistics

As with CPU Utilization, Memory Utilization is tracked through Memory Utilization Interval Statistics. As there is a single, common memory space for all processes on the SBC, memory utilization is a bit simpler to understand and interpret than CPU utilization.

Memory utilization will expand as more traffic is carried on the device. Some memory objects are created in pools where the objects are assigned to a “free pool” for use by the software and returned to the pool when not in use.  This memory is allocated, but never released to the operating system.  Other objects are created on demand and released back to the operating system when no longer needed.

In general, memory utilization will tend to grow and then stabilize on the SBC. Memory usage in excess of 80% is cause for an alert or warning. Memory usage in excess of 90% is cause for concern and may require re-starting the device to clear memory usage in the system.

Ribbon recommends watching the average memory utilization for the interval periods. Memory utilization is relatively stable when compared to CPU utilization and does not really benefit from shorter sampling intervals. Memory utilization and CPU utilization, however, are controlled by the same timer settings and we recommend the 5-minute interval for both CPU and memory.

Measure Memory Usage of Each SBC Process

The SBC Core uses the OAM Event Log memusage command to log the memory usage of each process over a configurable interval. The SBC generates a memory log which is uses to capture and log process heap memory usage over time.

The following limitations apply in this release: 

  • Memory consumption through interval statistics are not reported.
  • Memory usage is reported at the process level, not for individual threads/tasks.

The number of bytes used by an active process are captured in the memory usage log file:

Processes are identified by the log entries encoded by the system. For example, the format of a log entry:
113 03282017 073341.007995:1.01.00.00006.MAJOR .PRS: memusage: 1516445696

The memory usage details are logged to the hard drive in the directory: /var/log/sonus/sbx/evlog 

Note

Use the log number to locate the correct log file. For example:

/var/log/sonus/sbx/evlog/<log number>.mem

where the <log number>.mem is the memory usage log file.

For configuration details, refer to Event Log - CLI.

Trunk Group Statistics

Decreasing the interval frequency for trunk group-related call statistics could have a negative impact on system performance for configurations with large numbers of trunks. The SBC collects separate statistics from each of the trunk groups configured on the device. Ribbon recommends leaving the interval settings at 15 minutes with a retention policy of 4 periods (the default values) for trunk group and call statistics.

Traffic Management

Data collected for IP trunk groups are available to network management systems. The Global-Objects MIB files are used to report IP trunk group traffic data (call performance statistics). For IP trunk groups, the MIB also includes bandwidth and media performance statistics as defined below.

Call performance statistics for IP trunk groups is measured using a logical call leg resource. Statistics are reported for every trunk group, for the current interval and also for a number of past intervals. The number of intervals and the time period of each interval are configurable.

This data, along with other data reported by SBC and other systems in the network, is analyzed by traffic managers to identify trouble spots or congestion issues in the network to determine facility needs. Traffic managers can then react to this information and implement appropriate traffic controls.

  • No labels