Ribbon CNF Observability

In this section:

Overview

All CNF applications produce logs and metrics that are essential to understanding their state and health. Because pods are typically ephemeral, it is essential to get this data to a centralized observability backend. The Ribbon CNFs support EFK and Kafka as observability backends for centralized logging, and Prometheus as the metrics logging backend. The interaction with the backends is typically through an integrated telemetry agent.

Logging

The CNF framework logs of all the microservices/containers are directly streamed to an Elastic backend when Elastic is configured and streamed using RAMP when KAFKA is configured. When a pod restarts, crashes, or is evicted, any logs that are locally stored in an ephemeral file system are lost. The Observability framework streams the logs to the backend to ensure they remain available and are easily searchable.

The new CNF logging format adheres to the Elastic Common Schema that allows you to interoperate with storage backends that handle structured data. Below is an example of the new format. Additional details are provided upon request.

Supported logging mechanisms include:

Streaming the logs to stdout using the default Kubernetes logging framework to collect the logs.
Streaming the logs to the back end using Elastic search

Old Format

Format: Size, filter flag, month, day, year, hour, minute, second, tenths of seconds, shelf, slot, instance, sequence number, level, subsystem, trace type, trace name, event text.

Eg: "206 05022023 131215.414913:1.01.00.00004.MAJOR   .SM: *HwModuleServer::procNodeCeProductCapability: serverName: vsbc1, capName: SPS100, capValue: not present, checkType: Minimum Required ActualCeName vsbc1"

New Format

Format: year, month, day, hour, minute, second, tenths of seconds, time zone, level, Size, shelf, slot, instance, sequence number, trace type, subsystem, trace name, event text.

Eg:"2023-05-02 11:52:12,367076 UTC MAJOR    132 1.01.00.00001 .SM: *ConfigManager::configUpdater: ALARM CLEARED - Able to process config updates"

Sample Application Logs with Metadata details added by Observability Agent

worker-7.rco-ocp1.lab.rbbn.com.manjumain-sc-67c989b4c7-wb77s.sbc-parser.SCDebug.log: [1686734744.373162000, {"time":"2023-06-14 09:25:44,373162 UTC","container":{"runtime":"cri","labels":{"element":"sc","group":"manjumain","na        me":"manjumain-sc","ns_uuid":"nsbnknscpscpk101","app":"manjumain-sc","app_kubernetes_io/instance":"manjumain","appVersion":"12.0.0-R000","tenant":"sbc","cnf_uuid":"bnknscpscpk101","pod-template-hash":"67c989b4c7","cnfc_uuid":"        bnknscpscpk101-sc"}},"topic":"sc-pod-logs","host.geo.timezone":"Etc/UTC","orchestrator.resource.name":"manjumain-sc-67c989b4c7-wb77s","orchestrator.resource.id":"20dbb57f-a0d7-4bc6-ab11-27daadd53d30","log.file.path":"/var/log/        SCDebug.log","source.bytes":291,"event.original":"2023-06-14 09:25:44,373162 UTC Info     131 1.01.00.00002 .CPX: *CpxIcmsProcRegSlvCeNumMsg: 0/47 /oam/accounting/radius/radiusServerStatus","orchestrator.namespace":"sbc-svt","        message":"131 1.01.00.00002 .CPX: *CpxIcmsProcRegSlvCeNumMsg: 0/47 /oam/accounting/radius/radiusServerStatus","log.level":"Info","ecs.version":"1.6.0"}]

worker-7.rco-ocp1.lab.rbbn.com.manjumain-sc-67c989b4c7-wb77s.sbc-parser.SCDebug.log: [1686734744.373352000, {"time":"2023-06-14 09:25:44,373352 UTC","container":{"runtime":"cri","labels":{"element":"sc","group":"manjumain","na        me":"manjumain-sc","ns_uuid":"nsbnknscpscpk101","app":"manjumain-sc","app_kubernetes_io/instance":"manjumain","appVersion":"12.0.0-R000","tenant":"sbc","cnf_uuid":"bnknscpscpk101","pod-template-hash":"67c989b4c7","cnfc_uuid":"        bnknscpscpk101-sc"}},"topic":"sc-pod-logs","host.geo.timezone":"Etc/UTC","orchestrator.resource.name":"manjumain-sc-67c989b4c7-wb77s","orchestrator.resource.id":"20dbb57f-a0d7-4bc6-ab11-27daadd53d30","log.file.path":"/var/log/        SCDebug.log","source.bytes":431,"event.original":"2023-06-14 09:25:44,373352 UTC Info     121 1.01.00.00003 .CPX: *CpxIcmsProcRegSlvCeNumMsg 13 47 /oam/accounting/cdrServer/status","orchestrator.namespace":"sbc-svt","message":        "121 1.01.00.00003 .CPX: *CpxIcmsProcRegSlvCeNumMsg 13 47 /oam/accounting/cdrServer/status","log.level":"Info","ecs.version":"1.6.0"}]

worker-7.rco-ocp1.lab.rbbn.com.manjumain-sc-67c989b4c7-wb77s.sbc-parser.SCDebug.log: [1686734744.373446000, {"time":"2023-06-14 09:25:44,373446 UTC","container":{"runtime":"cri","labels":{"element":"sc","group":"manjumain","na        me":"manjumain-sc","ns_uuid":"nsbnknscpscpk101","app":"manjumain-sc","app_kubernetes_io/instance":"manjumain","appVersion":"12.0.0-R000","tenant":"sbc","cnf_uuid":"bnknscpscpk101","pod-template-hash":"67c989b4c7","cnfc_uuid":"        bnknscpscpk101-sc"}},"topic":"sc-pod-logs","host.geo.timezone":"Etc/UTC","orchestrator.resource.name":"manjumain-sc-67c989b4c7-wb77s","orchestrator.resource.id":"20dbb57f-a0d7-4bc6-ab11-27daadd53d30","log.file.path":"/var/log/        SCDebug.log","source.bytes":561,"event.original":"2023-06-14 09:25:44,373446 UTC Info     122 1.01.00.00004 .CPX: *CpxIcmsProcRegSlvCeNumMsg: 0/47 /oam/accounting/cdrServer/status","orchestrator.namespace":"sbc-svt","message":        "122 1.01.00.00004 .CPX: *CpxIcmsProcRegSlvCeNumMsg: 0/47 /oam/accounting/cdrServer/status","log.level":"Info","ecs.version":"1.6.0"}]

Configuring Logs in Helm

The following configuration example in the Helm configures the Observability backend server for logging.

      logs:
        enable: true
        disablePvcLogs: False # Disables PVC logging
        defaultTopic: default # log messages to this Kafka topic by default when a topic for the container parsers or system is not specified
        system: true
        systemTopic: system # system logs will be sent to this Kafka topic, defaultTopic is used if this property is not defined
        telemetry:
          enable: True # Process the otel agent logs
          topic: rbbn-otel-agent # Open telemetry logs will be sent to this Kafka topic
        podlogs: false
        logPrefix:
          nodeId: true
          containerId: false
        # Based on the observability backend server used for logging, uncomment the elasticserach or kafka
        backendServers: # Example of backend servers. To disable a log backend, provide an empty array of endpoints, for example: kafka: []
          stdout:
            enable: False # When true, all OBS and app logs are send through STDOUT
          elasticsearch: 
            # pick the endpoint based on the location from where you are deploying your CNF
              - endpoint: elasticsearch.apps.blr-ocp1.lab.rbbn.com:443
                auth: True
                username: ELASTICSEARCH_USER
                password: ELASTICSEARCH_PASSWD
                index: platform-filebeat
          kafka: []
          #  - endpoint: ip:port
          #    sslCert: KAFKA_CERT
          #    sslKey: KAFKA_KEY
          #    sslCa: KAFKA_CA

    # Observability Backend - Elasticsearch Credentials
    elasticsearchCreds:
      create: True
      user:
        key: ELASTICSEARCH_USER
        value:
      password:
        key: ELASTICSEARCH_PASSWD
        value:

    # Observability Backend - KAFKA Credentials
    # all values should be base64 encoded.
    kafkaCreds:
      create: False
      #path: /etc/kafka/ssl
      #sslCert:
        #key: KAFKA_CERT
        #value:
      #sslKey:
        #key: KAFKA_KEY
        #value:
      #sslCa:
        #key: KAFKA_CA
        #value:

SBC CNe Logging Backwards Compatibility

For SBC CNe debug, system and security logging format backward compatibility, use the following Event Log CLI commands. You may use these commands to switch the files of these log formats at runtime.

set oam eventLog typeAdmin debug cnfLogFormat <disable | enable>
set oam eventLog typeAdmin system cnfLogFormat <disable | enable>
set oam eventLog typeAdmin security cnfLogFormat <disable | enable>

CNF Log Format Considerations

By default, the cnfLogFormat flag is enabled in the CNF environment and disabled in non-CNF environments.
You can configure the debug, system and security formats to have new CNF log format based on this flag.
The trace and pkt files always follow the old format.
The audit and mem files always follow the new format.

Ribbon CNF Metrics

As part of the Ribbon CNF solution, the system metrics (e.g., CPU, Memory, Disk read/write, etc.) and application performance metrics (e.g., number of calls per trunk group, active calls, attempted calls, etc.) are collected and monitored using Prometheus The metrics are sent in the Prometheus format. In the Observability backend (e.g., Grafana dashboard), multiple queries are available to display these metrics in a graphical format.

Once the metrics are stored in a time-series database such as Prometheus, you can display these metrics. Ribbon provides the Grafana dashboard templates with the solution to display compelling metrics.

Metrics-based Alerts

You may optionally install the provided metrics-based alerting rules using the Prometheus query language (promql) directly through the Helm chart if using the prometheus-operator.

SBC CNe System and Application Performance Metrics

Interval Statistics

The interval statistics files are generated by the isbc/slb/cs pods at the configured time interval. The files are shared with the OAM Pod via PVC.

The OAM Pod aggregates the statistics files to provide a consolidated statistics pm files to the RAMP
The OAM Pod populates the DB to get the performance statistics per pod and aggregated statistics. With this, we can see the pod wise interval statistics and aggregated statistics via CLI.
The individual pod level interval statistics are also streamed to observability backend.

In a VNF environment, managed pods used to connect to an EMS and stream intervals statistics individually to the EMS, whereas in a CNF environment, only OAM is connected to the RAMP and OAM streams aggregated/consolidated interval statistics from all pods to the RAMP. All existing performance statistics are supported with some of the statistics being replaced with the new statistics.

New CNF-equivalent commands are introduced if the SBC CNe cannot aggregate statistics data from multiple pods for the following reasons. A new key cnfPodName has been added in the CNF-equivalent commands.

If one or more non integer type fields like Ip Address, Time, Average data are present)
If aggregation is possible but aggregated data will not add value as the statistics command is meant to reflect pod specific data (like memory/CPU utility information).
For above cases mentioned,

Existing versus New CNF Stats:

Existing StatsName	CNF StatsName
IpAclOverallStats	CnfIpAclOverallStats
IpAclRuleStats	CnfIpAclRuleStats
IpGeneralGroupStats	CnfIpGeneralGroupStats
IpPolicingAclOffendersListIntStats	CnfIpPolicingAclOffendersListIntStats
IpPolicingAggregateOffendersIntStats	CnfIpPolicingAggregateOffendersIntStats
IpPolicingArpOffendersListIntStats	CnfIpPolicingArpOffendersListIntStats
IpPolicingBadEtherIpHdrOffendersIntStats	CnfIpPolicingBadEtherIpHdrOffendersIntStats
IpPolicingDiscardRuleOffendersIntStats	CnfIpPolicingDiscardRuleOffendersIntStats
IpPolicingIpSecDecryptOffendersIntStats	CnfIpPolicingIpSecDecryptOffendersIntStats
IpPolicingMediaOffendersIntStats	CnfIpPolicingMediaOffendersIntStats
IpPolicingRogueMediaIntStats	CnfIpPolicingRogueMediaIntStats
IpPolicingSrtpDecryptOffendersIntStats	CnfIpPolicingSrtpDecryptOffendersIntStats
IpPolicingSystemIntStats	CnfIpPolicingSystemIntStats
IpPolicinguFlowOffendersListIntStats	CnfIpPolicinguFlowOffendersListIntStats
LinkDetectionGroupStats	CnfLinkDetectionGroupStats
SysCpuUtilIntStatsSts	CnfSysCpuUtilIntStatsSts
SysMemoryUtilIntStatsSts	CnfSysMemoryUtilIntStatsSts
SystemCongestionIntervalStats	CnfSystemCongestionIntervalStats
TcpGeneralGroupStats	CnfTcpGeneralGroupStats
DiamNodeRfIntervalStatistics	CnfDiamNodeRfIntervalStatistics
EthernetPortMgmtStats	CnfEthernetPortMgmtStats
EthernetPortPacketStats	CnfEthernetPortPacketStats
IcmpGeneralGroupStats	CnfIcmpGeneralGroupStats
sipOcsCallIntervalStatistics	cnfSipOcsCallIntervalStatistics

Example CNF-equivalent CLI commands to display pod-level data using , cnfSipOcsCallIntervalStatistics:

admin@vsbc1> show status service SC podName ALL addressContext default zone PR_ZONE_INGRESS cnfSipOcsCallIntervalStatistics 24

Possible completions:
displaylevel - Depth to show
prupgrade-sc-8695fcdc64-jd8xp - This object indicates the PodName.
prupgrade-sc-8695fcdc64-mh42j - This object indicates the PodName.
prupgrade-sc-8695fcdc64-mshm7 - This object indicates the PodName.
Possible match completions:
attemptedCalls - Current Attempted ocs Call statistics.
establishedCalls - Current Established ocs Call statistics.
failedCalls - Current Failed ocs Call statistics.
intervalValid - The member indicating the validity of the interval.
pendingCalls - Current Pending ocs Call statistics.
rejectedCalls - Current SBX Rejected ocs Call statistics.
relayedCalls - Current Realyed ocs Invite to Engress side statistics.
successfulCalls - Current Successful ocs Call statistics.
time - The system up time when the interval statisitic is collected.

admin@vsbc1> show status service SC podName ALL addressContext default zone PR_ZONE_INGRESS cnfSipOcsCallIntervalStatistics 24 prupgrade-sc-8695fcdc64-jd8xp
cnfSipOcsCallIntervalStatistics 24 prupgrade-sc-8695fcdc64-jd8xp PR_INGRESS_TG {
intervalValid true;
time 1683091200;
attemptedCalls 0;
relayedCalls 0;
establishedCalls 0;
successfulCalls 0;
failedCalls 0;
pendingCalls 0;
rejectedCalls 0;
}
cnfSipOcsCallIntervalStatistics 24 prupgrade-sc-8695fcdc64-jd8xp PR_INGRESS_TG1 {
intervalValid true;
time 1683091200;
attemptedCalls 0;
relayedCalls 0;
establishedCalls 0;
successfulCalls 0;
failedCalls 0;
pendingCalls 0;
rejectedCalls 0;
}

CLI example of callCountIntervalStatistics aggregated info:

admin@vsbc1> show table service SC podName ALL global callCountIntervalStatistics

                                                            ENHANCED  AMRNB  AMRWB  EVRC   NICE   MRF       SIP                      EV
               INTERVAL              CALL   ENCRYPT  SRTP   VIDEO     LEG    LEG    LEG    REC    SESSIONS  REC    TRANSCODE  PDCS   LE
NUMBER  NAME   VALID     TIME        COUNT  COUNT    COUNT  COUNT     COUNT  COUNT  COUNT  COUNT  COUNT     COUNT  COUNT      COUNT  CO
---------------------------------------------------------------------------------------------------------------------------------------
55      entry  true      1683100500  0      0        0      0         0      0      0      0      0         0      0          0      0
56      entry  true      1683100800  0      0        0      0         0      0      0      0      0         0      0          0      0
57      entry  true      1683101100  0      0        0      0         0      0      0      0      0         0      0          0      0
58      entry  true      1683101400  0      0        0      0         0      0      0      0      0         0      0          0      0

Current Statistics

In a CNF environment, the service level statistics from all SC Pods are aggregated at the OAM and presented to the CLI.

A service-level, cluster-wide Current Statistics example is shown below :

admin@vsbc1> show table service SC podName ALL global callCountCurrentStatistics

                              ENHANCED  AMRNB  AMRWB  EVRC   NICE   MRF       SIP                      EVS    SILK            SLB
       CALL   ENCRYPT  SRTP   VIDEO     LEG    LEG    LEG    REC    SESSIONS  REC    TRANSCODE  PDCS   LEG    LEG    LICENSE  SESSIONS
NAME   COUNT  COUNT    COUNT  COUNT     COUNT  COUNT  COUNT  COUNT  COUNT     COUNT  COUNT      COUNT  COUNT  COUNT  MODE     COUNT
---------------------------------------------------------------------------------------------------------------------------------------
entry  36129  0        0      0         0      0      0      0      0         0      0          0      0      0      domain   36129
[ok][2023-05-03 16:35:07]
admin@vsbc1>

Pod-level view of current statistics:

admin@vsbc1> show table service SC podName npbasedtones-sc-86647c7d94-djg2s global callCountCurrentStatistics

                              ENHANCED  AMRNB  AMRWB  EVRC   NICE   MRF       SIP                      EVS    SILK            SLB
       CALL   ENCRYPT  SRTP   VIDEO     LEG    LEG    LEG    REC    SESSIONS  REC    TRANSCODE  PDCS   LEG    LEG    LICENSE  SESSIONS
NAME   COUNT  COUNT    COUNT  COUNT     COUNT  COUNT  COUNT  COUNT  COUNT     COUNT  COUNT      COUNT  COUNT  COUNT  MODE     COUNT
---------------------------------------------------------------------------------------------------------------------------------------
entry  12173  0        0      0         0      0      0      0      0         0      0          0      0      0      domain   12173
[ok][2023-05-03 16:36:18]
admin@vsbc1>

Status Commands

Under service level, a key is added to all status commands to prevent data aggregation since CNF status details are meant to reflect pod-specific data.

admin@vsbc1> show table service SC podName ALL addressContext default zone MR_ZONE_INGRESS trunkGroupQoeStatus
                                                                   INBOUND                                   OUTBOUND
                                                                   RFACTOR    INBOUND                        RFACTOR    OUTBOUND
                                                          INBOUND  NUM        RFACTOR              OUTBOUND  NUM        RFACTOR
                                                          RFACTOR  CRITICAL   NUM MAJOR            RFACTOR   CRITICAL   NUM MAJOR
                                                 INBOUND  FROM     THRESHOLD  THRESHOLD  OUTBOUND  FROM      THRESHOLD  THRESHOLD  CURR
POD NAME                          NAME           RFACTOR  SBXBOOT  BREACHED   BREACHED   RFACTOR   SBXBOOT   BREACHED   BREACHED   ASR
---------------------------------------------------------------------------------------------------------------------------------------
npbasedtones-sc-86647c7d94-djg2s  MR_TG_INGRESS  94       94       0          0          94        94        0          0          90
npbasedtones-sc-86647c7d94-plxp6  MR_TG_INGRESS  94       94       0          0          94        94        0          0          90
npbasedtones-sc-86647c7d94-xf625  MR_TG_INGRESS  94       94       0          0          94        94        0          0          90
[ok][2023-05-03 17:02:06]
admin@vsbc1>

Action commands

Under service level, the podName object is added to the action command response to differentiate responses from multiple pods.

Action commands at the local level (For example, oam, system, addressContext, etc.) are unchanged.

admin@vsbc1> request service SC podName ALL oam eventLog typeAdmin debug rolloverLogNow
response {
    podname npbasedtones-sc-86647c7d94-plxp6
    result success
    reason
}
response {
    podname npbasedtones-sc-86647c7d94-xf625
    result success
    reason
}
response {
    podname npbasedtones-sc-86647c7d94-djg2s
    result success
    reason
}
[ok][2023-05-03 17:37:52]

All CLI commands for unsupported features are hidden in a CNF deployment.

CNF Traps/Alarms

The CNF traps from all pods are consolidated on the OAM Pod and streamed to RAMP's Observability backend. The traps include Kubernetes-specific details with the addition of the varbinds nodeName, podName and containerName to identify the originator of the trap from the CLI/RAMP. Also, the keyword 'Cnf' is appended to the CNe trap names. A new set of MIBs are introduced for CNe traps.

Use the following CLI command to view the alarms in the OAM Pod. In this example, the pod name (lplcnf1344-sc-696bb7958f-gflgq) is for the key. Thus, only the alarms raised from this pod only are shown.

Example

admin@vsbc1> show status AlarmsCnf currentStatus lplcnf1344-sc-696bb7958f-gflgq
 
currentStatus lplcnf1344-sc-696bb7958f-gflgq 4379 {
clearType AUTOMATIC;
timestamp 2022-01-31T10:41:01-00:00;
initialTimestamp 2022-01-31T10:01:01-00:00;
localTimestamp 2022-01-31T05:41:01;
localInitialTimestamp 2022-01-31T05:01:01;
count 9;
desc "Debug Event Log filter level is set to INFO. Set to MAJOR if finished troubleshooting";
reporter EVLOG;
severity Major;
acknowledgeState unAcknowledge;
comment "";
}

Space shortcuts

Page tree