Debug SBC SWe in AWS

Topics:

This section covers some common issues found in the SBC SWe in AWS setup, and the actions necessary for verification and troubleshooting.

Multiple HFE setups are reusing pkt0 subnet

Important

Please make note of this commonly-experienced issue.

Each HFE+SBC triplet must use a different pkt0 subnet. If you used the same subnet for multiple HFE+SBC triplets, then all SBCs will send packets to the HFE that was created last. This limitation is due to AWS networking architecture.

Action:

There is no recovery path from here!
1. Delete all HFE+ SBC triplets except the one created last.
You can redirect all non-AWS (Internet-bound) traffic to some instance/port by modifying AWS subnet routing table. This routing table modification effects entire subnet(not port).

Warning

Do not re-use the pkt0 subnet to create multiple SBC and HFE instances, create /3, or similar, small subnets for each SBC HFE triplet you need.

Error message in AWSSwitcover.log

Common problems are either the SecurityGroups config. doesn't allow outgoing traffic to the AWS API GW, or the EIP is not attached in eth0(mgt) on the SBC and HFE.

Action:

Make sure that AWS API GW is accessible from default routes.
Occassionally, the SBC fails to join an HA pair because it populates incorrect UserData metaVariabel table in confd.
1. In order to create an HA pair, ensure the secondary IPs are assigned on pkt0 and pkt1 ports of either the assigned active or standby.
2. If an instance fails to see the IPs attached on either of these two instances, the SBC fails to join cluster.
  1. Check the LCA and cloud-init logs:
    /var/log/cloud-init.log /var/log/sonus/lca/lca.log
If SIPs on pkt0 and pkt1 are assigned, or the port assignment occurs after the instance is created, LCA may show SIPs logs missing on both the active and standby.
1. Make sure all of the required resources are created by the script/tools provided by Ribbon.
  (Any local modification may result into a race condition which prevents the SBC from booting up)

SBC/HFE doesn't work in ABC region

Most of the problems reported are due to incorrectly configuring the IAM policy/role, or assiging incorrect name/permission of the S3 bucket for storing the HFE.sh script.

Action:

Check the IAM role and policy are correct for your release.
Make sure that the S3 bucket name given in HFE user data is correct, and that the IAM role attached with an HFE instance can download a script from the S3 bucket.

Note

HFE and SBC IAM roles/permissions are not identical.

SBC/HFE doesn't come up when I use templates that worked in a previous release (AMI)

Old HFE.sh scripts and templates may not work as expected with a new build. You must treat AMI, templates and HFE.sh scripts as one logical release (build), and use them together.

Action:

Make sure you use the correct templates and HFE.sh script release versions with each new release (these templates and scripts are bundled in a build's orca/rel directory).

Public IP (EIP) on UAC and/or UAS making calls to public port of SBC

Action:

Make sure that the signaling and all media ports that are negotiated are public IPs.
If any private IPs exist in the signaling configuration, you must correct this because it will not work with HFE.

Some packets are not going through HFE, and the rest of the packets follow HFE for public calls

This is due to an incorrect configuration on the SBC. Outgoing packets are either using private IPs, or are not sent out via pkt0.

The HFE can drop packets only if

a user restarts networking on it manually, or changes the default route on it, or if
the HFE routes are modified.

Action:

Verify the SBC configuration is correct.
Do not debug the HFE node that is dropping selective packets as the first debug step. The HFE does not process the information - whatever it receives from the SBC (on eth2) goes out via eth0.
Don't modify the default route. The default route serves all peers without copying peer configuration from the SBC to the HFE.

An HFE networking service restart will cause a change in the default route on the HFE. Thus, the HFE will stop forwarding packets to public UAC/end-points. DO NOT restart the networking service on the HFE. If you want to do a network restart, re-run the HFE.sh script with a "setup" command line argument.

The HFE fails to route traffic to the SBC

Possible causes:

Incorrect HFE.sh script
Incorrect version used
New AWS Linux shows 'ip route' in a different format
- Check the output of the commands run by HFE.sh on HFE instance

Ribbon doesn't maintain the HFE AMI (only the configuration script for HFE). The latest AWS Linux AMI ID is selected while launching HFE from the AWS template. Any change in 'ip addr' or 'ip route' commands may require changes in the HFE.sh script to configure HFE properly.

The instance fails to come up with no SSH access

Action:

Try to ssh on the HA port. If this also fails, continue to the next step.
Detach the root disk of the instance and attach it to any other Linux instance (which has lvm2 tools installed) in AWS EC2.
NOTE: The following commands assume that the disk is attached as xvdf in Linux machine. Disk can be detached/attached only when an instance is in stopped state.
1. Attach the volume (disk).
1. Once the disk is attached, mount the disk using the commands:
```
vgchange –ay
mount /dev/debian/root /mnt
# The root of SWe instance is available at /mnt
```
  The following SBC SWe logs are available:
  - OpenClovis logs: /mnt/var/log/sonus/sbx/openclovis
  - Tailf logs: /mnt/var/log/sonus/sbx/tailf
  - SYS/DBG logs: /mnt/var/log/sonus/evlog
  - Core dump: /mnt/var/log/sonus/sbx/coredump
  - Utility scripts: /mnt/opt/sonus/sbx/scripts
2. Unmount the disk from the GUI using the following commands:
```
umount /mnt/var/log/sonus/
umount /mnt
vgchange –an
```
3. Detach the volume (disk).

I have AWS infrastructure questions

Question	Answer
Can we host EIP on any of AWS service instead of HFE? How about AWS NAT service?	No, there is no such AWS service as of now. No, it is designed to give internet access to private instances, we can't initiate calls from public UAC using this service
A competitor showed us the use of sub-second switchover. Is this possible?	Most probably it is service continuity (using DNS), and all new requests are going to some other SBC. Check what happened to the existing calls during their fail-over demo.
Can we have HA across multiple AZs?	No. For call continuity we rely on IP address movement, AWS doesn't allow us to move IPs across AZs. This is infrastructure limitation.
Can we use AWS MAC spoofing?	No. AWS drops all such packets silently.
Can we use GARP ?	No use of sending GARP, AWS doesn't give access to virtual L2 switch. All the provided networking is L3
What do I need to know about Security Groups?	Do not block outgoing traffic because AWS API communication is required by the SBC HA and HFE. Caution Do not open your Security Groups too wide to allow all traffic from Internet. Always use PEM-based authentication (ssh/login).
Can I change MetaData ?	We can't change MetaData, only UserData can be modified

What is the purpose of the HFE IPs

The HFE Primary IP handles primary IP traffic, and is also used to:

upgrade and configure the SBC, and
download the HFE.sh script

The HFE Secondary IP handles all SIP EIPs to support call flows.

Note

All secondary IP traffic is directed to the SBC.

What do I do if I experience issues related to downloading and executing the HFE.sh

Action:

First, enter the below command in HFE to check if the HFE.sh script is downloaded from S3.
```
aws s3 cp s3://aws-quickstart/quickstart-ribbon-sbc/scripts/HFE.sh ${HFE_FILE}
```
If the script is downloaded, but HFE.sh is failing to run, go to the next step.

Verify the path is correct in HFE logs.

[root@ip-172-31-10-30 log]# bash /var/lib/cloud/instances/i-0d4ab8238b5935f85/user-data.txt
download: s3://aws-quickstart/quickstart-ribbon-sbc/scripts/HFE.sh to ../../home/ec2-user/HFE/HFE.sh
2019-08-29 09:32:11 Copied HFE script from S3 bucket.
REMOTE_SSH_MACHINE_IP=
/var/lib/cloud/instances/i-0d4ab8238b5935f85/user-data.txt: line 18: !timestamp: command not found

[root@ip-172-31-10-30 log]# bash /var/lib/cloud/instances/i-0d4ab8238b5935f85/user-data.txt
download: s3://aws-quickstart/quickstart-ribbon-sbc/scripts/HFE.sh to ../../home/ec2-user/HFE/HFE.sh
2019-08-29 09:33:20 Copied HFE script from S3 bucket.
REMOTE_SSH_MACHINE_IP=
[root@ip-172-31-10-30 log]# ls /var/lib/cloud/instances/
i-0d4ab8238b5935f85
[root@ip-172-31-10-30 log]#

The SBC is not able to contact AWS service(s)

The SBC contacts AWS service(s) in the following cases:

Get peer's data – This is done for cloud-init and LCA
Switchover in HA setup
Getting HFE's data in HFE-based HA setup
- The HFE collects SBC's data
SBC services use AWS API(s) to send data to cloud-watch, e.g. interval-stats, trc-anonymization, cloud-watch-service, SNMP logs, act-anonymization etc.
Metering API service used by post-paid metering SBC product in order to bill customers for actual usage of calls

Action:

If the SBC fails to contact AWS API(s), check the following:

Check DNS resolution – You must have AWS provide a DNS configured in VPC, and this must be the first one configured in the SBC.

If you have a custom DNS configured in the SBC, you must ensure your DNS has the latest IPs of all AWS services.

If your DNS returns the wrong IP, all API communications will fail. Check tshark output to debug it further.
Example:

tshark -i mgt0 | grep -i dns

Capturing on 'mgt0'

129 98 6.584463 172.31.10.137 -> 172.31.10.54 DNS 87 Standard query 0xe264 A ec2.us-east-1.amazonaws.com

99 6.584471 172.31.10.137 -> 172.31.10.54 DNS 87 Standard query 0x829b AAAA ec2.us-east-1.amazonaws.com

100 6.588009 172.31.10.54 -> 172.31.10.137 DNS 165 Standard query response 0xe264 A 13.249.37.5 A 99.86.226.181

101 6.588020 172.31.10.54 -> 172.31.10.137 DNS 144 Standard query response 0x829b

In this example, the SBC is not using the AWS-provided DNS. Someone configured the SBC to use the DNS on 172.31.10.54, and this DNS returns the wrong IP for AWS services (A 13.249.37.5 A 99.86.226.181)

The AWS-provided DNS is x.x.x.2, e.g. 172.31.0.2.

Solution: Either remove the DNS configured on the SBC, or obtain the correct set of IPs configured on your DNS - 172.31.10.54.

IAM role – Make sure IAM role attached to the SBC has the correct set of permissions for the services you are debugging.
Security Group Rules – Make sure your SG rules don't block any outgoing traffic. It should be 0.0.0.0/0 to allow all traffic.
ACLs in AWS – Make sure AWS ACLs don't block any traffic.
ACLs in SBC – Check if ACLs on the SBC are blocking traffic going towards the IP returned by the DNS. If there is no DNS reply, check that ACLs are opened for port 53 (UDP and TCP).
AWS subnet routing table
1. Make sure the interface which is picked up by the SBC to send API traffic is in a subnet which can send traffic to the AWS API server (check the SBC routes for the IP returned by the DNS)
2. Check AWS subnet and routing table entries.
EIP on SBC
1. Check which interface is used for sending out traffic to AWS API server (check DNS IP and the SBC route table). This interface should use EIP or NATGW.
2. Altenatively, configure a VPC-end-point in your setup.
  NOTE: A VPC-end-point doesn't provide all services – if you require a metering service, you must configure either NATGW or EIP.

If you are using a VPC-end-point, make sure its IP is routed via mgt0 in the SBC.

VPC-end-points will work only if its IP is in mgt0 subnet or any subnet other than HA0, pkt0 and pkt1 (e.g. 10.54.10.x, 10.54.80.x).

The following configuration will not work:

SBC:

mgt0 - 10.54.10.x
HA0 - 10.54.20.x
pkt0 - 10.54.30.x
pkt1 - 10.54.40.x

VPC-end-point IP - 10.54.30.10

The DNS will return IP 10.54.30.10 for AWS services, and the SBC's attempt to send out traffic via pkt0 will fail.

The IP addresses assigned to the interface names are incorrect on the HFE and calls are not working

Perform the following procedure if the calls are not forwarded to the SBC through the HFE, and running the ip addr command indicates that the IP addresses are assigned to an interface name.

Run the ip addr command to identify the interface names that are swapped.

Note the link/Ether address associated with the IP address. The following example represents Eth1 and Eth2 are swapped:

[ec2-user@ip-10-1-5-214 ~]$ ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc mq state UP group default qlen 1000
    link/ether 02:a4:b2:be:17:22 brd ff:ff:ff:ff:ff:ff
    inet 10.1.5.214/24 brd 10.1.5.255 scope global dynamic eth0
       valid_lft 3540sec preferred_lft 3540sec
    inet 10.1.5.58/24 brd 10.1.5.255 scope global secondary eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::a4:b2ff:febe:1722/64 scope link
       valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc mq state UP group default qlen 1000
    link/ether 02:e2:8b:5e:96:1c brd ff:ff:ff:ff:ff:ff
    inet 10.1.3.97/24 brd 10.1.3.255 scope global dynamic eth1
       valid_lft 3550sec preferred_lft 3550sec
    inet6 fe80::e2:8bff:fe5e:961c/64 scope link
       valid_lft forever preferred_lft forever
4: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc mq state UP group default qlen 1000
    link/ether 02:31:53:a2:88:d6 brd ff:ff:ff:ff:ff:ff
    inet 10.1.1.62/24 brd 10.1.1.255 scope global dynamic eth2
       valid_lft 3550sec preferred_lft 3550sec
    inet6 fe80::31:53ff:fea2:88d6/64 scope link
       valid_lft forever preferred_lft forever

Run the following command to update the associated ifcfg file with the correct Ether address for each incorrect interface:

sudo sed -i 's/^HWADDR=.*/HWADDR=<ether address>/' /etc/sysconfig/network-scripts/ifcfg-eth<interface number>

Example:

sudo sed -i 's/^HWADDR=.*/HWADDR=02:31:53:a2:88:d6/' /etc/sysconfig/network-scripts/ifcfg-eth1
sudo sed -i 's/^HWADDR=.*/HWADDR=02:e2:8b:5e:96:1c/' /etc/sysconfig/network-scripts/ifcfg-eth2

Remove the persistent udev rules: sudo rm -f /etc/udev/rules.d/70-persistent-net.rules
Reboot the instance: sudo reboot

Run the ip addr command to verify the interfaces are correct.

[ec2-user@ip-10-1-5-214 ~]$ ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc mq state UP group default qlen 1000
    link/ether 02:a4:b2:be:17:22 brd ff:ff:ff:ff:ff:ff
    inet 10.1.5.214/24 brd 10.1.5.255 scope global dynamic eth0
       valid_lft 3495sec preferred_lft 3495sec
    inet 10.1.5.58/24 brd 10.1.5.255 scope global secondary eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::a4:b2ff:febe:1722/64 scope link
       valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc mq state UP group default qlen 1000
    link/ether 02:31:53:a2:88:d6 brd ff:ff:ff:ff:ff:ff
    inet 10.1.1.62/24 brd 10.1.1.255 scope global dynamic eth1
       valid_lft 3495sec preferred_lft 3495sec
    inet6 fe80::31:53ff:fea2:88d6/64 scope link
       valid_lft forever preferred_lft forever
4: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc mq state UP group default qlen 1000
    link/ether 02:e2:8b:5e:96:1c brd ff:ff:ff:ff:ff:ff
    inet 10.1.3.97/24 brd 10.1.3.255 scope global dynamic eth2
       valid_lft 3496sec preferred_lft 3496sec
    inet6 fe80::e2:8bff:fe5e:961c/64 scope link
       valid_lft forever preferred_lft forever

Space shortcuts

Page tree

Multiple HFE setups are reusing pkt0 subnet

Error message in AWSSwitcover.log

SBC/HFE doesn't work in ABC region

SBC/HFE doesn't come up when I use templates that worked in a previous release (AMI)

Public IP (EIP) on UAC and/or UAS making calls to public port of SBC

Some packets are not going through HFE, and the rest of the packets follow HFE for public calls

The HFE fails to route traffic to the SBC

The instance fails to come up with no SSH access

I have AWS infrastructure questions

What do I do if I experience issues related to downloading and executing the HFE.sh

The SBC is not able to contact AWS service(s)

The IP addresses assigned to the interface names are incorrect on the HFE and calls are not working