In this section:
This section describes some common issues found in SBC GCP instances, and the action needed to verify or troubleshoot the issues.
When I try to SSH in to the SBC as linuxadmin, I receive "Permission denied (publickey)"
It implies that there are errors in the public key provided, or you are using a different public key that the one provided.
Action:
- Check that the supplied
linuxadmin
key is correct:- Go to Compute Engine > VM Instances.
- Click on the instance.
- Click on the SSH key.
- Verify the public key matches the result running
ssh-keygen -y -f <<key file>>.
- Verify the format is
ssh-rsa ... linuxadmin.
- Verify that 'Block project-wide SSH keys' is selected:
- Go to Compute Engine > VM Instances.
- Click on the instance.
- Check 'SSH keys' for the checkbox.
- Verify that there are no 'SSH Keys' in the global Metadata.
- Go to Compute Engine > Metadata.
- If there are any SSH keys, remove them.
- When you find the issue:
- If it is an error in the key supplied, update the key and reboot the instance.
- If it is an error with global Metadata keys, all SBC instances recreate all SBC instances and restart all the HFE node to fetch the latest SBC information.
The HFE.log is continually getting the error message "Connection error ongoing - No connection to SBC PKT ports from HFE"
If this log is continually written to the HFE.log
, it implies that the HFE node cannot connect to the SBCs.
Action:
Ensure that PKT0 and PKT1 are configured correctly through the CLI. Refer to the section "CLI Configuration for Configuring PKT Ports" of the page Instantiating SBC SWe in GCP.
- Ensure that the IPs listed in the
HFE_conf.log
are the ones attached to the SBC:- Go to
/opt/HFE/log/
. - Find the logs which specify the IPs for the SBC; they are of the form:
<<SBC instance name>> - IP for <<pkt0/pkt1>> is <<IP>>.
- Find the Alias IPs for the SBC:
- Go to Compute Engine > VM Instances.
- Click on the instance.
In the Network interfaces table, observe the nic2 and nic3 and ensure that the IPs in the Alias IP ranges match.
- Go to
- Check the VPC routes and firewalls are correct:
- Go to VPC network > VPC networks.
- Click on the VPC for PKT0.
- Click on Firewall rules and verify firewall rules exist that, as described in the section "Google Firewall Rules" of the page Configure HFE Nodes in GCP.
- Click on Routes and verify the routes exist, as described in the section "Google Network Routes" of the page Configure HFE Nodes in GCP.
- Repeat for PKT1.
Every time I start my SBC instance, the instance stops itself after a few minutes.
This is the result of the invalid user-data. Enter only valid json. Refer to the section "User Data Format" of the page Instantiating SBC SWe in GCP.
Action:
Go to Compute > VM instances.
- Click on the instance.
- Go to Custom metadata.
- Click on user-data.
- Copy the user-data into a file a verify that it is valid JSON. For example:
- Linux utility jq:
jq . user-data.txt
- Python:
python -m json.tool user-data.txt
- If valid, the user data is printed out; else, an error is displayed.
- Linux utility jq:
Calls are failing to reach the SBC.
Action:
- Ensure that there are no error logs in the
HFE.log
. Refer to the section "HFE Node Logging" of the page Configure HFE Nodes in GCP. - Ensure that the end point of the traffic is allowed access through the VPC firewalls. Refer to the section "Google Firewall Rules" of the page Configure HFE Nodes in GCP.
One of my instances sent a broadcast message saying: "SplitBrain: Going for REBOOT to resolve id collision!!!"
This is the result of starting both the instances simultaneously, and trying to communicate with the same ID. This is an expected behavior. The system reboots and comes up as Standby.
I am unable to log in to my HFE via the mgmt interface
It implies that either there is a configuration issue, or the firewall rules are not updated correctly.
Action:
- Ensure the IP from which you send SSH requests is allowed through the VPC firewall. Refer to the section "Create Firewall Rules" of the page Configure VPC Networks.
- Ensure the IP from which you send SSH requests is in the HFE node user-data correctly. Refer to the section "User Data Example" of the page Configure HFE Nodes in GCP.
Ensure that the updated line is similar to the following:
/bin/echo "REMOTE_SSH_MACHINE_IP=\"10.27.178.4\"" >> $NAT_VAR
- The HFE script may fail before creating the routes. In that case:
- Attempt to SSH in to NIC0 on the HFE node.
- Check the logs for errors in the directory
/opt/HFE/log/
. Refer to the section "HFE Node Logging" of the page Configure HFE Nodes in GCP.
I have correctly configured the SBC PKT ports, but I am still getting "Connection error ongoing - No connection to SBC PKT ports from HFE in the HFE.log"
The possible reason is that the SBC PKT interface is unable to find the HFE interface.
Action:
- Login to the active SBC.
Run tshark on the port:
tshark -i pkt0/pkt1
Look for ARP error messages, such as:
0.999962 42:01:0a:00:41:e8 -> Broadcast ARP 42 Who has 10.0.65.231? Tell 10.0.65.232
Perform a switchover through the CLI:
request system admin <system_name> switchover
- Verify HFE connects to the new active SBC.
- If unresolved, repeat steps 4 and 5.
My private endpoint is in a different subnet to the HFE interface and I am not receiving traffic
Perform additional steps to allow traffic to reach the HFE, if it is a different subnet.
Action:
- Allow traffic between the subnets:
Peer the two VPCs together. Refer to https://cloud.google.com/vpc/docs/using-vpc-peering for details.
Ensure to peer both the VPCs.
- Add necessary Firewall rules to allow traffic between the peered subnet. Refer to https://cloud.google.com/vpc/docs/using-firewalls for details.
- If the HFE instance is 2.0, add the route to the subnet on the HFE instance:
- Get the gateway IP address of the subnet where the private endpoint is located:
- Go to VPC network.
- Select VPC networks.
- Select the HFE private subnet from the table. The Gateway IP address is located under Gateway heading.
- Update the HFE startup-script:
- Go to Compute Engine.
- Select the HFE instance.
- Select EDIT.
- Go to Custom metadata.
In the penultimate line of the startup script add the following command:
ip route add <endpoint CIDR> via <gateway ip> dev <ens5/eth1>
See the following startup script excerpt for example:
/bin/echo "Configured using HFE script - $HFE_FILE" >> $LOG_FILE /bin/echo $(timestamp) " ========================= Done ==========================================" >> $LOG_FILE ip route add 10.27.27.0/24 via 10.27.3.1 dev ens5 nohup $HFE_FILE setup > /dev/null 2>&1 &
- Reboot the HFE instance.
- Get the gateway IP address of the subnet where the private endpoint is located:
My new SBC HA with HFE setup is not accessible because the metaVariable table data is incomplete
When deploying an HA setup in a public cloud environment, each node must be able to query all other associated instances (peer SBC or HFE node) to obtain information about the other nodes. If there is a delay in creating any instance within the setup, the other nodes are unable to collect complete information and data is missing from the metaVariable table in the configuration database. The SBC application cannot start if cloud-init fails and the database is populated incorrectly.
To correct this issue, reboot both SBC instances from the console to ensure SSH works on the instances, and to allow the nodes to gather all of the required information.