In this section:
The following sections contain VM performance tuning recommendations to improve system performance. These performance recommendations are general guidelines and are not exhaustive. Refer to the documentation provided by your Linux OS and KVM host vendors. For example, Redhat provides extensive documentation on using virt-manager and optimizing VM performance. Refer to the Redhat Virtualization Tuning and Optimization Guide for details.
For performance tuning procedures on a VM instance log on to the host system as the root
user.
General Recommendations
Recommended BIOS Settings
For GPU transcoding, ensure all power supplies are plugged into the server.
CPU Frequency Setting on the Host
The cpu frequency setting determines the operating clock speed of the processor and in turn the system performance. Red Hat offers a set of in-built tuning profiles and a tool called tuned-adm that helps in configuring the required tuning profile.
Ribbon recommends to apply throughput-performance tuning profile, which makes the processor to operate at maximum frequency.
- Find out the active tuning profile
# tuned-adm active
Current active profile: powersave
- Apply throughput-performance tuning profile
# tuned-adm profile throughput-performance
This configuration is persistent across reboots and takes effect immediately. There is no need to reboot the host after configuring this tuning profile.
Processor and CPU Details
To determine the host system's processor and CPU details, perform the following steps:
Execute the following command to determine how many vCPUs are assigned to host CPUs:
lscpu -p
The command provides the following output:
The first column lists the logical CPU number of a CPU as used by the Linux kernel. The second column lists the logical core number - use this information for vCPU pinning.
Persistent CPU Pinning
CPU pinning ensures that a VM only gets CPU time from a specific CPU or set of CPUs. Pinning is performed on each logical CPU of the guest VM against each core ID in the host system. The CPU pinning information is lost every time the VM instance is shutdown or restarted. To avoid entering the pinning information again, update the KVM configuration XML file on the host system.
- Ensure that no two VM instances are allocated the same physical cores on the host system.
- Ensure that all the VMs hosted on the physical server are pinned.
- To create vCPU to hyper-thread pinning, pin consecutive vCPUs to sibling threads (logical cores) of the same physical core. Identify the logical core/sibling threads from the output returned by the command
lscpu
on the host. - Do not include the 0th physical core of the host in pinning. This is recommended because most host management/kernel threads are spawned on the 0th core by default.
Use the following steps to update the pinning information in the KVM configuration XML file:
- Shutdown the VM instance.
Enter the following command.
virsh
The command provides the following output:
Enter the following command to edit the VM instance:
virsh # edit <KVM_instance_name>
Search for the
vcpu placement
attribute.Enter CPU pinning information as shown below:
TipEnsure that no two VM instances have the same physical core affinity. For example, if VM1 has affinity of 0,1,2,3 assigned, then ensure no VM is pinned to 0,1,2,3,8,9,10 or 11 as these CPUs belong to the physical core assigned to VM1. Also, assign all other VM instances running on the same host with affinity; otherwise the VMs without affinity may impact the performance of VMs that have affinity.
Enter the following command to save and exit the XML file.
:wq
CPU Mode Configuration
host-model
using a virsh
command in the host system.Use the following steps to edit the VM CPU mode:
- Shutdown the VM instance.
Enter the following command.
virsh
The following output displays:
Enter the following command to edit the VM instance:
edit <KVM_instance_name>
Search for the
cpu mode
attribute.Edit the
cpu mode
attribute with the following:TipEnsure the topology details entered are identical to the topology details set while creating the VM instance. For example, if the topology was set to 1 socket, 2 cores and 2 threads, enter the same details in this XML file.
Enter the following command to save and exit the XML file.
:wq
Enter the following command to start the VM instance.
start <KVM_instance_name>
Increasing the Transmit Queue Length for virt-io Interfaces
This section is applicable only for virt-io based interfaces.
By default, the transmit queue length is set to 500. To increase the transmit queue length to 4096, use the following procedure:
Execute the following command to identify the available interfaces:
virsh
The
virsh
prompt displays.Execute the following command.
domiflist <VM_instance_name>
The list of active interfaces displays.
Execute the following command to increase the transmit queue lengths for the tap interfaces.
ifconfig <interface_name> txqueuelen <length>
The
interface_name
is the name of the interface you want to change, andlength
is the new queue length. For example,ifconfig macvtap4 txqueuelen 4096
.Execute the following command to verify the value of the interface length.
ifconfig <interface_name>
The command provides the following output.
Kernel Same-page Metering (KSM) Settings
Kernel same-page metering (KSM) is a technology which finds common memory pages inside a Linux system and merges the pages to save memory resources. In the event of one of the copies being updated, a new copy is created so the function is transparent to the processes on the system. For hypervisors, KSM is highly beneficial when multiple guests are running with the same level of the operating system. However, there is overhead due to the scanning process which may cause the applications to run slower, which is not desirable.
Turn off KSM in the host.
Deactivate KSM by stopping the ksmtuned
and the ksm
services as shown below. This does not persist across reboots.
# systemctl stop ksm # systemctl stopksmtuned
Disable KSM persistently as shown below:
# systemctl disable ksm # systemctl disable ksmtuned
Host Pinning
To avoid performance impact on VMs due to host-level Linux services, host pinning isolates physical cores where a guest VM is hosted from physical cores where the Linux host processes/services run.
In this example, the core 0 (Core 0 and core 36 are logical cores) and core 1 (Core 1 and core 37 are logical cores) are reserved for Linux host processes.
The CPUAffinity
option in /etc/systemd/system.conf
sets affinity to systemd
by default, as well as for everything it launches, unless their .service
file overrides the CPUAffinity
setting with its own value. Configure the CPUAffinity
option in /etc/systemd/system.conf
.
Execute the following command:
lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 72 On-line CPU(s) list: 0-71 Thread(s) per core: 2 Core(s) per socket: 18 Socket(s): 2 NUMA node(s): 2 Vendor ID: GenuineIntel CPU family: 6 Model: 79 Model name: Intel(R) Xeon(R) CPU E5-2697 v4 @ 2.30GHz Stepping: 1 CPU MHz: 2699.984 BogoMIPS: 4604.99 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 46080K NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58,60,62,64,66,68,70 NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47,49,51,53,55,57,59,61,63,65,67,69,71
To dedicate the physical CPUs 0 and 1 for host processing, in the file /etc/systemd/system.conf
, specify CPUAffinity
as 0 1 36 37, as shown below. Restart the system.
CPUAffinity=0 1 36 37
Back Up VMs with 1G hugepages
The number of hugepages is decided based on the total memory available on the host.
Configure the huge page size as 1G and number of huge pages by appending the following line to the kernel command line options in /etc/default/grub. In the example below, the host has a total of 256G memory, out of which 200G is configured as hugepages.
GRUB_TIMEOUT=5 GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)" GRUB_DEFAULT=saved GRUB_DISABLE_SUBMENU=true GRUB_TERMINAL_OUTPUT="console" GRUB_CMDLINE_LINUX="console=tty0 console=ttyS0,115200n8 crashkernel=auto intel_iommu=on iommu=pt default_hugepagesz=1GB hugepagesz=1G hugepages=200 rhgb quiet" GRUB_DISABLE_RECOVERY="true"
Regenerate the GRUB2 configuration as shown below:
If your system uses BIOS firmware, execute the following command:
# grub2-mkconfig -o /boot/grub2/grub.cfg
On a system with UEFI firmware, execute the following command:
# grub2-mkconfig -o /boot/efi/EFI/redhat/grub.cfg
Add lines in your instance XML file using
virsh
edit
<instanceName
>. The example below is for a 32G instance:<memory unit='KiB'>33554432</memory> <currentMemory unit='KiB'>33554432</currentMemory> <memoryBacking> <hugepages> <page size='1048576' unit='KiB' nodeset='0'/> </hugepages> </memoryBacking>
The previous example pins the VM on NUMA node 0. For hosting a second VM on NUMA node 1, use nodeset = ‘1’.
Restart the host.
Obtain the PID of the VM from the following command:
ps -eaf | grep qemu | grep -i <vm_name>
Execute the following command to verify VM memory is received from a single NUMA node:
numastat -p <vmpid>
Disable Flow Control
Use the following steps to disable flow control:
This setting is optional and depends on NIC capability. Not all NICs allow you to modify the flow control parameters. If it is supported by NICs,
- Log in to the system as the
root
user. Execute the following command to disable flow control for interfaces attached to the SWe VM.
ethtool -A <interface name> rx off tx off autoneg off
TipUse the
<interface name>
from the actual configuration.Example:
ethtool -A p4p3 rx off tx off autoneg off
ethtool -A p4p4 rx off tx off autoneg off
ethtool -A em3 rx off tx off autoneg off
ethtool -A em4 rx off tx off autoneg off
Tuning Interrupt Requests (IRQs)
This section applies only to virt-io-based packet interfaces. Virt-IO networking works by sending interrupts on the host core. SBC VM performance can be impacted if frequent interrupt processing occurs on any core of the VM. To avoid this, the affinity of the IRQs for a virtio-based packet interface should be different from the cores assigned to the SBC VM.
The /proc/interrupts
file lists the number of interrupts per CPU, per I/O device. IRQs have an associated "affinity" property, "smp_affin
ity," that defines which CPU cores are allowed to execute the interrupt service routine (ISR) for that IRQ. Refer to the distribution guidelines of the host OS for the exact steps to locate and specify the IRQ affinity settings for a device.
External Reference: https://access.redhat.com/solutions/2144921
OVS-DPDK Virtio Interfaces - Performance Tuning Recommendations
Follow the open stack recommended performance settings for host and guest: Refer to VNF Performance Tuning for details.
Make sure that physical network adapters, Poll Mode Driver (PMD) threads, and pinned CPUs for the instance are all on the same NUMA node.This is a mandate for optimal performance.
PMD threads are the threads that do the heavy lifting for userspace switching. They perform tasks such as continuous polling of input ports for packets, classifying packets once received, and executing actions on the packets once they are classified.
- Set the queue size for virtio interfaces to 1024 by updating the Director template.
NovaComputeExtraConfig: - nova::compute::libvirt::tx_queue_size: '"1024"'
NovaComputeExtraConfig: - nova::compute::libvirt::rx_queue_size: '"1024"'
- Configure the following dpdk parameters in host ovs-dpdk:
- Make sure two pair of Rx/Tx queues are configured for host dpdk interfaces
To validate, issue the following command duringovs-dpdk
bring-up:ovs-vsctl get Interface dpdk0 options
For background details, see http://docs.openvswitch.org/en/latest/howto/dpdk/ - Enable per-port memory, which means each port will use separate mem-pool for receiving packets, instead of using a default shared mem-pool:
ovs-vsctl set Open_vSwitch . other_config:per-port-memory=true
- configure 4096 MB huge page memory on each socket:
ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-socket-mem=4096,4096
- Make sure to spawn the appropriate number of PMD threads so that each port/queue can be serviced by a particular PMD thread. The PMD threads must be pinned to dedicated cores/hyper-threads, which must be in the same NUMA as network adapter and guest, which must be isolated from kernel, and must not be used by guest for any other purpose. The pmd-cpu-mask needs to be set accordingly.
ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x40001004000100
The example above sets PMD threads to run on two physical cores:8,26,36,54. (cores:8-36 and 26-54 are sibling hyper-threads). - Restart ovs-vswitchd after the changes:
systemctl status ovs-vswitchd
systemctl restart ovs-vswitchd
- Make sure two pair of Rx/Tx queues are configured for host dpdk interfaces
- The port and Rx queue assignment to PMD threads is crucial for optimal performance. Follow http://docs.openvswitch.org/en/latest/topics/dpdk/pmd/ for more details. The affinity is a csv list of <queue_id>:<core_id> which needs to be set for each ports.
ovs-vsctl set interface dpdk0 other_config:pmd-rxq-affinity="0:8,1:26"
ovs-vsctl set interface vhub89b3d58-4f other_config:pmd-rxq-affinity="0:36"
ovs-vsctl set interface vhu6d3f050e-de other_config:pmd-rxq-affinity="1:54"
In the example above, the PMD thread on core 8 will read queue 0 and PMD thread on core 26 will read queue 1 of dpdk0 interface.
Alternatively, you can use the default assignment of port/Rx queues to PMD threads and enable auto-load-balance option so that ovs will put the threads on cores based on load.
ovs-vsctl set open_vswitch . other_config:pmd-auto-lb="true"
ovs-appctl dpif-netdev/pmd-rxq-rebalance
Troubleshooting
- To check the port/Rx queue distribution among PMD threads, enter the command:
ovs-appctl dpif-netdev/pmd-rxq-show
- To check the PMD thread stats ( actual cpu usage), use below command and check for "processing cycles" and "idle cycles":
ovs-appctl dpif-netdev/pmd-stats-clear && sleep 10 && ovs-appctl dpif-netdev/pmd-stats-show
To check packet drops on host dpdk interfaces, use the below command and check for rx_dropped/tx_dropped counters:
watch -n 1 'ovs-vsctl get interface dpdk0 statistics|sed -e "s/,/\n/g" -e "s/[\",\{,\}, ]//g" -e "s/=/ =\u21d2 /g"'
For additional details, refer to the following page for troubleshooting performance issues/packet drops in ovs-dpdk environment:
https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/10/html/ovs-dpdk_end_to_end_troubleshooting_guide/validating_an_ovs_dpdk_deployment#find_the_ovs_dpdk_port_physical_nic_mapping_configured_by_os_net_config
Benchmarking
Setup details:
- Platform: RHOSP13
- Host OS: RHEL7.5
- Processor: Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz
- 1 Provider Network configured for Management Interface
- 1 Provider Network configured for HA Interface
- OVS+DPDK enabled for packet interfaces (pkt0 and pkt1)
- 2 pair of Rx/Tx queues in host dpdk interfaces
- 1 Rx/Tx queue in guest virtio interface
- 4 PMD threads pinned to 4 hyper threads (i.e. using up 2 physical cores)
Guest Details:
- SSBC - 8vcpu/18GB RAM/100GB HDD
- MSBC - 10vcpu/20GB RAM/100 GB HDD
Benchmarking has been tested in a D-SBC setup with up to 30k pass-through sessions using the recommendations described in this document.
You may require additional cores for PMD threads for higher numbers.
External References
https://docs.openvswitch.org/en/latest/howto/dpdk/
https://docs.openvswitch.org/en/latest/topics/dpdk/pmd/