In this section:
This section covers common issues found in the SBC SWe in Azure setup, and the actions necessary for verification and/or troubleshooting.
This results from submitting invalid user-data being. Submit only valid JSON to the SBC. For more information on valid JSON, refer to SBC Userdata topic.
Action Steps:
To verify whether the problem occurs due to invalid JSON, perform the following steps:
If the message "Connection error ongoing - No connection to SBC PKT ports from HFE" is continually written to HFE.log, it indicates that the HFE node cannot connect to the SBCs.
Action Steps:
Perform the following verification steps:
Using the CLI, verify the PKT0 and PKT1 is configured correctly. For more information on this process, refer to Configuring PKT Ports topic.
Verify the IPs listed in the HFE_conf.log
are the ones attached to the SBC:
Go to /opt/HFE/log/
.
Find the logs that specify the IPs for the SBC ; the logs are in the form:
<SBC instance name> - IP for <pkt0 | pkt1> is <IP>
Find the Alias IPs for the SBC:
Go to Virtual machines.
Click on the SBC instance.
Go to Settings > Networking.
Go to the PKT0 interface.
Click on the network interface.
Go to Settings > IP configurations.
Verify the secondary IP matches.
Repeat for the PKT1 interface.
Check the Security groups are correct:
Go to Network security groups.
Select the security group.
Go to Inbound security rules.
Verify the end point IPs are allowed.
Check the routes are correct:
Go to Route tables.
Select the route table.
Click on Routes and verify the routes point to the eth2 IP on the HFE node.
Click on Subnets and verify the the route table is associated with both subnets
This indicates that either there is a configuration issue, or the firewall rules are not been updated correctly.
Action Steps:
Verify that the IP you are trying to SSH from is present correctly in the HFE node user-data. Update the appropriate line containing "REMOTE_SSH_MACHINE_IP":
/bin/echo "REMOTE_SSH_MACHINE_IP=\"10.27.178.4\"" >> $NAT_VAR
For more information, refer to Custom Data Example topic.
/opt/HFE/log/
. For more information, refer to HFE Node Logging topic.Even without the accelerated NICs, sometimes the SWe instance starts but does not guarantee performance.
Actions Steps:
Execute the following command to confirm the availability of the Mellanox NICs.
> lspci | grep Mellanox
The sample output given below indicates presence of Mellanox NICs.
83df:00:02.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx Virtual Function] (rev 80) 9332:00:02.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx Virtual Function] (rev 80)
In such situations, de-allocate the instance and start again.
If commands such as sudo apt
update
causes an error such as "connection timed out
", you must add a manual route on the HFE node.
The following action steps will interrupt calls
Actions Steps:
/opt/HFE/log/HFE_conf.log
2020-11-02 11:45:47 ETH1_GW 10.2.0.1
Add the route to make traffic go through management interface:
sudo ip route add 0.0.0.0/0 via <Gateway IP> dev <management interface name> metric 10
ip route add 0.0.0.0/0 via 10.2.0.1 dev eth1 metric 10
Run the apt command:
sudo apt update
Remove the route you just added:
sudo ip route delete 0.0.0.0/0 via <Gateway IP> dev <management interface name> metric 10
ip route delete 0.0.0.0/0 via 10.2.0.1 dev eth1 metric 10
This issue is generally caused by Linux failing to configure the network interfaces before the HFE_AZ.sh script is run.
Action Steps:
Verify if any of the network interfaces (eth0, eth1, eth2) have a state of 'DOWN', using 'ip addr'.
An example of a good instance is:
rbbn@SBC-Terraform:~$ ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000 link/ether 00:0d:3a:11:e1:ac brd ff:ff:ff:ff:ff:ff inet 10.2.1.4/24 brd 10.2.1.255 scope global eth0 valid_lft forever preferred_lft forever inet6 fe80::20d:3aff:fe11:e1ac/64 scope link valid_lft forever preferred_lft forever 3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000 link/ether 00:0d:3a:11:eb:7c brd ff:ff:ff:ff:ff:ff inet 10.2.0.7/24 brd 10.2.0.255 scope global eth1 valid_lft forever preferred_lft forever inet6 fe80::20d:3aff:fe11:eb7c/64 scope link valid_lft forever preferred_lft forever 4: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000 link/ether 00:0d:3a:11:e2:5c brd ff:ff:ff:ff:ff:ff inet 10.2.3.4/24 brd 10.2.3.255 scope global eth2 valid_lft forever preferred_lft forever inet6 fe80::20d:3aff:fe11:e25c/64 scope link valid_lft forever preferred_lft forever 5: enP1s1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master eth0 state UP group default qlen 1000 link/ether 00:0d:3a:11:e1:ac brd ff:ff:ff:ff:ff:ff 6: enP3s3: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master eth2 state UP group default qlen 1000 link/ether 00:0d:3a:11:e2:5c brd ff:ff:ff:ff:ff:ff rbbn@SBC-Terraform:~$
This issue is caused by either the User Assigned Managed Identity does not have the correct roles, or by Linux not configuring the network interfaces properly on the HFE.
Go to Virtual machines.
Click on the HFE instance.
Go to Settings > Identity.
Get the principalId for the Identity:
az identity show --name <identityName> --resource-group <resourceGroupName> --subscription <subscriptionId>
Get the roleDefinitionName:
az role assignment list --assignee <principalId> --subscription <subscriptionId>
Get the role definition, and verify action contains the correct permissions (refer to Create Role topic):
az role definition list --name <roleDefinitionName> --subscription <subscriptionId>