In this section...

 

This topic covers common issues found in SBC GCP instances and the action steps needed to verify or fix.

 

When I try to SSH in to the SBC as linuxadmin, I receive Permission denied (publickey).

This error generally means there is some kind error in the public key that has been supplied or a different public key has been retrieved.

Action Steps:

  1. Check that the supplied linuxadmin key is correct:
    1. Go to Compute Engine > VM Instances.
    2. Click on the instance.
    3. Click on the SSH key.
    4. Verify the public key matches the result running ssh-keygen -y -f <<key file>>.
    5. Verify the format is ssh-rsa ... linuxadmin.
  2. Verify that 'Block project-wide SSH keys' is selected:
    1. Go to Compute Engine > VM Instances.
    2. Click on the instance.
    3. Check 'SSH keys' for the checkbox.
  3. Verify that there are no 'SSH Keys' in the global Metadata.
    1. Go to Compute Engine > Metadata.
    2. If there any key 'SSH key' keys remove them.
  4. Once the issue has been found:
    1. If it was an error in the key supplied then update the key and reboot the instance.
    2. If it is an error with global Metadata keys, all SBC instances will need to be completely recreated and the HFE node will need to be restarted to get the latest SBC information.

 

The HFE.log is continually getting the error message Connection error ongoing - No connection to SBC PKT ports from HFE

If this log continually being written to the HFE.log, then this means the HFE node cannot connect to the SBCs.

Action Steps:

  1. Verify PKT0 and PKT1 configured correctly through the CLI. See "CLI Configuration for Configuring PKT Ports" in Configuring SBCs in GCP.

  2. Verify the IPs listed in the HFE_conf.log are the ones attached to the SBC:
    1. Go to /home/ubuntu/HFE/log/.
    2. Find the logs which specify the IPs for the SBC - these are in the form: <<SBC instance name>> - IP for <<pkt0/pkt1>> is <<IP>>.
    3. Find the Alias IPs for the SBC:
      1. Go to Compute Engine > VM Instances.
      2. Click on the instance.
      3. In the Network interfaces table look at nic2 and nic3 and verify the IPs in the Alias IP ranges match.

        Network interfaces

       
  3. Check the VPC routes and firewalls are correct:
    1. Go to VPC network > VPC networks.
    2. Click on the VPC for PKT0.
    3. Click on Firewall rules and verify firewall rules exist that are outlined in Google Firewall Rules.
    4. Click on Routes and verify the routes exist outlined in Google Network Routes.
    5. Repeat for PKT1.

 

Every time I start my SBC instance, the instance stops itself after a few minutes.

This is the result of the invalid user-data being entered. Only valid json is allowed to be entered for the SBC. Refer to User Data Format.

Action Steps:

  1. Go to Compute > VM instances.

  2. Click on the instance.
  3. Go to Custom metadata.
  4. Click on user-data.
  5. Copy the user-data into a file a verify that it is valid JSON. For example:
    1. Linux utility jq : jq . user-data.txt.
    2. Pythonpython -m json.tool user-data.txt.
    3. If valid, the user data will be printed out, else an error will be displayed.

 

Calls are failing to reach the SBC.

Action Steps:

  1. Verify there are no error logs in the HFE.log. See HFE Node Logging.
  2. Verify the end point of the traffic is allowed access through the VPC firewalls. See Google Firewall Rules.

 

One of my instances sent a broadcast message saying: "SplitBrain: Going for REBOOT to resolve id collision!!!"

This is the result of both instances being started at the same time and trying to communicate with the same ID. This is expected behaviour. The system will reboot and should come up as Standby.

 

I am unable to log in to my HFE via the mgmt interface

This can mean there is a configuration issue or firewall rules have not been updated correctly.

Action Steps:

  1. Verify the IP you are trying to SSH from is allowed through the VPC firewall. See Creating Firewall Rules for more information.
  2. Verify that the IP you are trying to SSH from is in HFE node user-data correctly. See User Data Example.
    1. The lining needing to be updated should look like this:
      1. /bin/echo "REMOTE_SSH_MACHINE_IP=\"10.27.178.4\"" >> $NAT_VAR.
  3. The HFE script may have failed before creating the routes:
    1. Attempt to SSH in to NIC0 on the HFE node.
    2. Check the logs for errors in this directory /home/ubuntu/HFE/log/. See HFE Node Logging for more information.