Ribbon CNF Onboarding Prerequisites

This document captures the prerequisites for the Ribbon CNF Products (SBC, PSX, RAMP), including hardware, software, and related add-ons.

Software Requirements

Item	Version/Type	Additional Information
Argo Rollout	1.1 or above	Only if Canary upgrade is performed using Argo Rollout
FluxCD	v2 release v0.31.4	Only if GitOps-based installation is performed.
Helm	3 or above
Kubernetes (K8S)	1.23 or above
Linux Kernel	4.18 or 5.14	The Linux kernel installed on the worker nodes configured as part of the Kubernetes Cluster
Openshift Container Platform	4.8 or above

Note

The explicit versions captured here are the ones that are qualified as part of our CNF solution testing. Generally, the Ribbon CNF solution should also work with the later releases.

Hardware Requirements

Item

Details

NIC

Intel® Ethernet Network Adapter X710
OR
Intel® Ethernet Network Adapter X550
OR
Intel® Ethernet Network Adapter E810
OR
Mellanox Connectx5 Or Connectx-6

Processor

If an AMD processor is used, AMD processor specific tuning must be done. The parameters and recommended settings are as follows:

AMD Core Performance Boost: Enable
Enables the processor to transition to a higher frequency than its rated speed if it has available power and is within its temperature specifications.
AMD Fmax Boost Limit Control: Auto
Sets the maximum processor boost frequency. Auto will allow the processor to run at the highest possible boost frequencies.
AMD I/O Virtualization Technology: Enable
Enables capabilities provided by AMD I/O Virtualization (IOMMU) functionality.
AMD SMT Option: Enable
Enables Multi-Threading. When enabled, each physical processor core operates as two logical processor cores.
Page Table Entry Speculative Lock Scheduling: Enable
Disabling this feature impacts performance.
Processor x2APIC Support: Auto
This parameter enables operating systems to run more efficiently on high core count configurations. It also optimizes interrupt distribution in virtualized environments. Setting this option to Auto configures the OS to enable this feature when the logical core count is equal to or greater than 255 and disables it if it is less than 255.
SR-IOV: Enable
Power Regulator: Static High-Performance Mode
HW Prefetcher: Disable
Minimum Processor Idle Power Core C-State: No C-States
Data Fabric C-State Enable: Disable
NUMA Memory Domains per Socket: One memory domain per socket.
This is the same as the disabled Sub-NUMA clustering.

For Intel Processors, BIOS Settings are as follows:

CPU Power Management Power Regulator: Maximum Performance or Static High Performance
Intel Hyper-Threading: Enabled
Intel Turbo Boost: Enabled
Intel VT-x (Virtualization Technology): Enabled
Thermal Configuration: Optimal Cooling or Maximum Cooling
Minimum Processor Idle Power Core C-State: No C-states
Minimum Processor Idle Power Package C-State: No C-states
Energy Performance BIAS: Max Performance
Sub-NUMA Clustering: Disabled
HW Prefetcher: Disabled
SRIOV: Enabled
Intel® VT-d: Enabled

Storage

Storage Classes:

block
file

Network Interface Requirements

SBC

Interface	Network Types	Minimum Bandwidth	Additional Information
mgt0	macvlan, ovs, sriov	1 Gbps	Management communication
ha0	macvlan,ovs with whereabouts	10 Gbps	Inter Pod communication
pkt0 & pkt1	sriov	10 Gbps	Signaling and Media packets

PSX

Interface	Network Types	Minimum Bandwidth	Additional Information
mgt0	macvlan with whereabouts	1Gbps	management communication with RAMP
eth1	macvlan with whereabouts	10Gbps	D+ traffic, dbaas communication

RAMP

Interface	Network Types	Minimum Bandwidth	Additional Information
eth1	macvlan	1 Gbps	northbound and southbound interface

SBC Specific Requirements

Cluster and Node Level Settings

Configuration Item	Requirement/Usecase	How to check if configured/set/enabled
Hugepages	Realtime processing of media(RTP) packets requires faster memory access. A huge page size of 1Gi is required.	`$kubectl describe node <node name>` `$cat /proc/cmdline` `default_hugepagesz=1G hugepagesz=1G hugepages=128 +` `#hugeadm --pool-list` `$cat /sys/fs/cgroup/hugetlb/kubepods.slice/hugetlb.1GB.limit_in_bytes`
CPU Manager	Realtime processing of signaling and media(RTP) packets with low latency and better performance require dedicated, isolated CPU cores with fewer context switches. CPU Manager Policy should be set to 'static' cpuManagerPolicy: static	`$cat /var/lib/kubelet/cpu_manager_state` `{"policyName":"static","defaultCpuSet":"0-3","checksum":611748604}`
SR-IOV	For high throughput of signaling and media packets which require dedicated bandwidth for media packets. Driver Requirements: vfio_pci driver: Intel® Ethernet Network Adapter X710 Intel® Ethernet Network Adapter X550 Intel® Ethernet Network Adapter E810 mlx5_core driver (default driver) Note: No extra drivers are required for the following: Mellanox Connectx5 Connectx-6.	Verify whether the Linux kernel running on the Nodes supports SRIOV. `# grep -i sriov /boot/config-$(uname -r)` Check the interfaces for SRIOV support. `$lspci \| grep Eth` `$lspci -s <specific ethernet controller> -vnnn` Check whether SRIOV /cni/operator/plugins are installed and running in the cluster. $kubectl get pods -A \| grep sriov Check at the node level support. $kubectl describe node <node name> Check at the cluster level (for all nodes). $kubectl get nodes -o json \| jq '.items[].status.allocatable'
Multius CNI	In Kubernetes, each pod only has one Network Interface (apart from a loopback). Voice traffic handling requires dedicated network interfaces to handle signaling and media traffic. Multus CNI plugin enables attaching multiple network interfaces to Pods. Multus acts as a meta-plugin (a CNI plugin that can call multiple other CNI plugins).	`$kubectl get pods --all-namespaces \| grep -i multus` Check whether the Multus conf file exists: `/etc/cni/net.d/00-multus.conf` Check the Multus binary present under: `/opt/cni/bin`
NUMA	If there are multiple NUMA nodes, all resources for a given pod should come from the same NUMA node -- CPU, Memory and NICs. In Performance profile, Numa: Topology Policy: single-numa-node	`#oc get performanceprofiles.performance.openshift.io` `#oc describe performanceprofiles.performance.openshift.io <profile name>`
Real-Time Scheduling	In order to give the capability to assign Real-time scheduling (SCHED_FIFO) to SWe_NP threads inside the pod, the following Host setting is required. `echo -1 > /proc/sys/kernel/sched_rt_runtime_us` This setting removes the limits on the CPU bandwidth available to the Real-Time threads. The default value is “/proc/sys/kernel/sched_rt_runtime_us” “950000”, Note This setting is not persistent across the reboots. The above bash command must be included in one of the Host initialization scripts.	`cat /proc/sys/kernel/sched_rt_runtime_us`
Kernel Same-Page Merging	Kernel Same-page Merging (KSM) is a technology which finds common memory pages inside a Linux system and merges the pages to save memory in the event of one of the copies being updated, a new copy is created so the function is transparent to the processes on the system. For hypervisors, KSM is highly beneficial when multiple guests are running with the same level of the operating system. However, there is overhead due to the scanning process which may cause the application to run slower, which is not desirable. To turn off KSM: `#systemctl disable ksm` `#systemctl disable ksmtuned`
Role and Role Binding	Privileges are required to edit/patch/view the resources like endpoint/service (epu for updating the ha0 IP), deployment (hpa, to scale the deployment count), pod (to fetch ha0 IP from annotations), and so forth. Example Expand source apiVersion: rbac.authorization.k8s.io/v1 kind: Role metadata: namespace: {{ .Values.global.namespace }} name: {{ .Release.Name }}-calculator-role rules: - apiGroups: [""] resources: ["pods"] verbs: ["get", "list", "patch", "delete"] - apiGroups: [""] resources: ["pods/log"] verbs: ["get"] - apiGroups: ["rbac.authorization.k8s.io"] resources: ["roles"] verbs: ["get", "list"] - apiGroups: [""] resources: ["endpoints"] verbs: ["get", "list", "patch"] - apiGroups: [""] resources: ["services"] verbs: ["get", "list"] - apiGroups: ["apps"] resources: ["statefulsets"] verbs: ["get", "list"] - apiGroups: ["apps"] resources: ["deployments/scale", "statefulsets/scale"] verbs: ["get", "patch", "update"] - apiGroups: ["apps"] resources: ["deployments"] verbs: ["get", "list", "patch"] - apiGroups: ["metrics.k8s.io"] resources: ["pods"] verbs: ["get", "list"] --- apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: name: {{ .Release.Name }}-calculator-role-binding namespace: {{ .Values.global.namespace }} subjects: - kind: ServiceAccount name: {{ .Values.global.serviceAccount.name }} namespace: {{ .Values.global.namespace }} roleRef: kind: Role name: {{ .Release.Name }}-calculator-role apiGroup: rbac.authorization.k8s.io
PVC	The SBC CNF requires creating PVCs in both RWX (ReadWriteMany) and RWO (ReadWriteOnce) modes. A minimum of 15 PVCs can be created. (Storage size depends on the type, between 100 MB and 20 GB.)
CPU Frequency	The CPU Frequency setting determines the operating clock speed of the processor and, in turn, the system performance. Redhat offers a set of built-in tuning profiles and a tool called tuned-adm that helps configure the required tuning profile. Applying the "throughput-performance" profile is recommended, allowing the processor to operate at maximum frequency. Apply the 'throughput-performance" tuning profile. `#tuned-adm profile throughput-performance` This configuration is persistent across reboots and takes effect immediately. There is no need to reboot the host after configuring the profile.	Determine the Active tuning profile: `#tuned-adm active` Current Active profile: throughput-performance
Container Privileges	Some SBC CNF Pods/Containers need root privileges (e.g., SC container). All of the containers run in privileged mode.
autoMountServiceAccountToken	Ribbon SBC CNF containers must use Kubernetes API resources from the container application to support Horizontal Pod auto-scaling & Inter-Pod communication using the eth1 interface. Needed for most of the PODs. This requires the "autoMountServiceAccountToken" to be enabled.
Argo Operator	If progressive update is required using argo, then argo Operator should be instantiated.
Coredump Handler Requirements	This set of requirements is not mandatory. It is needed if the Customer deploys the Ribbon Coredump handler Tool, which collects core dump files from crashed containers. Read/Write "Bind Mount" permissions are required for the host path where core dumps are stored. "SysCtl" is required to set the core pattern on the host.
Shielding CPU Cores from the Interrupts	Isolating interrupts (IRQs) from real time workloads like SC and SLB on different dedicated CPUs (Host reserved cores) can minimize or eliminate latency in real-time environments. Approach 1 On Openshift(OCP) K8 platform: Set "`globallyDisableIrqLoadBalancing``"` in the performance profile to "`true`" to shield the isolated cores from IRQs. apiVersion: performance.openshift.io/v2 kind: PerformanceProfile metadata: name: manual spec: globallyDisableIrqLoadBalancing: true On non-OCP K8s Platforms: On the worker nodes `/etc/sysconfig/irqbalance` file needs to be updated to have either the `IRQBALANCE_BANNED_CPULIST` or `IRQBALANCE_BANNED_CPUS` parameter with values of CPUs(based on the version of irqbalance service) that need to be banned from IRQ. All the CPUs part of the isolcpus configuration in the grub-line has to be isolated from the IRQ servicing from the above configuration, as we don't know which set of CPUs the container workload would use in advance. After updating the above configuration, the irqbalance service has to be restarted using systemctl restart irqbalance.service Extra care is advised while allocating host reserved cores; a sufficient amount of CPUs should be allocated for the host processes(i.e., host reserved cores) in this scenario, as the IRQ will land only on the host reserved cores, leading to an increase in CPU utilization. Approach 2 This approach applies to the OCP K8S environment. For certain workloads, the host reserved CPUs are not always sufficient for dealing with device interrupts, and for this reason, device interrupts are not globally disabled on the isolated CPUs. Device interrupts are load-balanced between all isolated and reserved CPUs to avoid overloading CPUs, except for CPUs with a guaranteed pod running. Guaranteed pod CPUs are prevented from processing device interrupts when the pod annotation, irq-load-balancing.crio.io, is defined with the value as disabled. When configured, CRI-O disables device interrupts only when the pod is running. The corresponding update will be visible (after the latency-sensitive workload pod has been scheduled) in the `/etc/sysconfig/irqbalance file` - which would contain container CPUs in the IRQ banned list. From the SBC CNF helm chart perspective, the following setting needs to be configured: disableIrqBalance: a boolean value. To disable interrupt request processing on vCPUs allocated to the pod for enhanced performance, set this value to true. true: Results in enhanced performance by banning the IRQ landing on the cores of lantency sensitive workload. false: default - Allows interrupt request handling on vCPUs allocated to latency sensitive workload. performanceProfileName - a string value. This parameter must be provided with the name of the OCP performance profile configured on worker nodes hosting latency sensitive pods when IRQ handling needs to be disabled. When the first approach is used, the above two parameters should be left with the default value (i.e., false and "") Additional worker node configuration for Approach 2 The irqbalance service gets restarted every time the `/etc/sysconfig/irqbalance` file has to be updated with the container's CPU details as part of container scheduling. Since the SC pods are dynamically scale-able entities based on traffic subjected to the SBC CNe cluster, the SC pods would frequently be created and destroyed - resulting in frequent restarts of irqbalance service on the worker node. By default, the system allows five restarts (StartLimitBurst) in 10 seconds (StartLimitIntervalSec), which is not sufficient in certain scaling occasions, especially during the initial scale-out of SC deployment immediately after helm installation to minimum active SC pods. Therefore, the irqbalance service configuration file `/usr/lib/systemd/system/irqbalance.service` should be updated to have StartLimitBurst set to 60 to account for the maximum number of irqbalance service restarts upon the SC pod instantiation. A sample configuration would be as follows: [Unit]Description=irqbalance daemon ConditionVirtualization=!container [Service]EnvironmentFile=/etc/sysconfig/irqbalance ExecStart=/usr/sbin/irqbalance --foreground $IRQBALANCE_ARGS StartLimitBurst=60 <---------------------------------------------------- New parameter. [Install] WantedBy=multi-user.target After modifying the irqbalance.service unit file, you need to reload systemd and then restart the service for the changes to take effect: Reload systemd to pick up the changes to the unit files: systemctl daemon-reload Restart or reload the service: systemctl restart irqbalance.service or systemctl reload irqbalance.service For more information, refer to: https://docs.openshift.com/container-platform/4.15/scalability_and_performance/cnf-low-latency-tuning.html

Linux Capabilities

Some SBC CNF Pods require Linux capabilities for specific functions.

Capability/securityContext	Additional Information
NET_ADMIN SYS_RAWIO SYS_RESOURCE FOWNER IPC_LOCK IPC_OWNER KILL LEASE MKNOD NET_BIND_SERVICE NET_RAW SYS_BOOT SYS_MODULE DAC_OVERRIDE DAC_READ_SEARCH SYS_RESOURCE SETFCAP SETPCAP SETUID

Kernel Parameters

See SBC Kernel Parameters for the complete list of SBC Kernel Parameters.

Centralized Policy Server (PSX) Specific Requirements

Cluster and Node Level Settings

Configuration Item	Requirment/Usecase	How to check if configured/set/enabled
Hugepages	A huge page size of 1Gi is required for faster memory access.	`$kubectl describe node <node name>` `$cat /proc/cmdline` `default_hugepagesz=1G hugepagesz=1G hugepages=128 +` `#hugeadm --pool-list` `$cat /sys/fs/cgroup/hugetlb/kubepods.slice/hugetlb.1GB.limit_in_bytes`
Multus CNI	PSXs use non-eth0 interfaces to communicate within themselves for db sync/replication and with RAMP for registration. Multus enables this.	`$kubectl get pods --all-namespaces \| grep -i multus` Check whether the Multus conf file exists: `/etc/cni/net.d/00-multus.conf` Check the Multus binary present under: `/opt/cni/bin`
Role and Role Binding	Privileges are required to edit/patch/view the resources like endpoint/service (epu for updating the ha0 IP), deployment (hpa, to scale the deployment count), pod (to fetch ha0 IP from annotations), and so forth. How Create Role/Role Bindings Expand source apiVersion: rbac.authorization.k8s.io/v1 kind: Role metadata: namespace: {{ .Values.global.namespace }} name: {{ .Release.Name }}-calculator-role rules: - apiGroups: [""] resources: ["pods"] verbs: ["get", "list", "patch", "delete"] - apiGroups: [""] resources: ["pods/log"] verbs: ["get"] - apiGroups: ["rbac.authorization.k8s.io"] resources: ["roles"] verbs: ["get", "list"] - apiGroups: [""] resources: ["endpoints"] verbs: ["get", "list", "patch"] - apiGroups: [""] resources: ["services"] verbs: ["get", "list"] - apiGroups: ["apps"] resources: ["statefulsets"] verbs: ["get", "list"] - apiGroups: ["apps"] resources: ["deployments/scale", "statefulsets/scale"] verbs: ["get", "patch", "update"] - apiGroups: ["apps"] resources: ["deployments"] verbs: ["get", "list", "patch"] - apiGroups: ["metrics.k8s.io"] resources: ["pods"] verbs: ["get", "list"] --- apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: name: {{ .Release.Name }}-calculator-role-binding namespace: {{ .Values.global.namespace }} subjects: - kind: ServiceAccount name: {{ .Values.global.serviceAccount.name }} namespace: {{ .Values.global.namespace }} roleRef: kind: Role name: {{ .Release.Name }}-calculator-role apiGroup: rbac.authorization.k8s.io
PVC	The PSX CNF requires the ability to create PVCs in both RWX (ReadWriteMany) and RWO (ReadWriteOnce) modes.
Container Privileges	Some PSX CNF Pods/Containers need root privileges (Primary and replica Pods).

Linux Capabilities

Some of the PSX Pods require Linux capabilities for specific functions. The following is the complete list of Capabilities required:

Capability/securityContext

Additional Information

NET_ADMIN
SYS_RAWIO
SYS_RESOURCE
FOWNER
IPC_LOCK
IPC_OWNER
KILL
LEASE
MKNOD
NET_BIND_SERVICE
NET_RAW
SYS_BOOT
SYS_MODULE
DAC_OVERRIDE
DAC_READ_SEARCH
SYS_RESOURCE
SETFCAP
SETPCAP
SETUID

allowPrivilegeEscalation: true
privileged: true

Kernel Parameters

There are no specific kernel parameters that need to be tuned for PSX.

Ribbon Application Management Platform (RAMP) Specific Requirements

Cluster and Node Level Settings

Configuration Item Requirement/Usecase How to check if configured/set/enabled

Multus CNI

RAMP uses non-eth0 interfaces to communicate within northbound and southbound. Multus enables this.

Multus CNI plugin enables attaching multiple network interfaces to Pods. Multus acts as a meta-plugin (a CNI plugin that can call multiple other CNI plugins).

$kubectl get pods --all-namespaces | grep -i multus

Check whether the multus conf file exists
/etc/cni/net.d/00-multus.conf

check multus binary present under
/opt/cni/bin

Role and Role Binding

Privileges are required to edit/patch/view the resources like endpoint/service (epu for updating the ha0 IP), deployment (hpa, to scale the deployment count), pod (to fetch ha0 IP from annotations), and so forth.

Example

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: {{ .Values.global.namespace }}
  name: {{ .Release.Name }}-calculator-role

rules:
- apiGroups: [""]
  resources: ["pods"]
  verbs: ["get", "list", "patch", "delete"]
- apiGroups: [""]
  resources: ["pods/log"]
  verbs: ["get"]
- apiGroups: ["rbac.authorization.k8s.io"]
  resources: ["roles"]
  verbs: ["get", "list"]
- apiGroups: [""]
  resources: ["endpoints"]
  verbs: ["get", "list", "patch"]
- apiGroups: [""]
  resources: ["services"]
  verbs: ["get", "list"]
- apiGroups: ["apps"]
  resources: ["statefulsets"]
  verbs: ["get", "list"]
- apiGroups: ["apps"]
  resources: ["deployments/scale", "statefulsets/scale"]
  verbs: ["get", "patch", "update"]
- apiGroups: ["apps"]
  resources: ["deployments"]
  verbs: ["get", "list", "patch"]
- apiGroups: ["metrics.k8s.io"]
  resources: ["pods"]
  verbs: ["get", "list"]

---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: {{ .Release.Name }}-calculator-role-binding
  namespace: {{ .Values.global.namespace }}
subjects:
- kind: ServiceAccount
  name: {{ .Values.global.serviceAccount.name }}
  namespace: {{ .Values.global.namespace }}
roleRef:
  kind: Role
  name: {{ .Release.Name }}-calculator-role
  apiGroup: rbac.authorization.k8s.io

PVC

The RAMP CNF requires the ability to create PVCs in both RWX (ReadWriteMany) and RWO (ReadWriteOnce) modes.

Linux Capabilities

Some of the RAMP Pods require Linux capabilities for specific functions. Following is the complete list of Capabilities required.

Capability/securityContext:

NET_ADMIN
SYS_RAWIO
SYS_RESOURCE
FOWNER
IPC_LOCK
KILL
NET_RAW
AUDIT_WRITE
SYS_CHROOT

Kernel Parameters

Parameter	Values
sysctl kernel.pid_max	>= 4096
fs.inotify.max_user_instances	8192

Space shortcuts

Page tree

Software Requirements

Hardware Requirements

Network Interface Requirements

SBC

PSX

RAMP

SBC Specific Requirements

Cluster and Node Level Settings

Linux Capabilities

Kernel Parameters

Centralized Policy Server (PSX) Specific Requirements

Cluster and Node Level Settings

Linux Capabilities

Kernel Parameters

Ribbon Application Management Platform (RAMP) Specific Requirements

Cluster and Node Level Settings

Linux Capabilities

Kernel Parameters