In this section:
Route53 Setup
The Route53 test setup is depicted below:
The UAC and UAS can be present in same or different availability zones but within the same VPC. If the UAC and UAS are present in different availability zones than the SBC, add 'route' in both zones, i.e. SIP machine as well as SBC and corresponding security groups must be proper with corresponding entries.
How Route53 Works
As per the set up, two HFEs are created in two different availability zones. The UAC and UAS can be present in the same or different availability zones but within the same VPC. When calls are made from UAC to UAS using the DNS fqdn, the UAC queries the DNS to resolve it. Route 53 periodically sends requests to the endpoint for each health check. It does not perform the health check when it receives a DNS query. Based on the responses, Route 53 decides whether the endpoints are healthy and uses that information to determine how to respond to queries.
Consider that the health status for HFE-1 is healthy, and the health status for HFE-2 is unhealthy, and the DNS returns primary entry for the record. (Here HFE-1) Now, the calls go through the first set up: UAC > HFE-1 > SBC-1 > UAS. When the health of both HFEs is Healthy, if new calls are started with the same DNS fqdn, the DNS returns record for both HFEs. Now, the calls go through the second set up: UAC > HFE-2 > SBC-2 > UAS, or the first setup: UAC > HFE-1 >SBC-1 > UAS based on configured policies.
Health Check
A Route 53 health checks let you track the health status of your resources, such as web servers or mail servers, and take action when an outage occurs. A Route 53 health check can also monitor the EC2 instance for the reachable or health in multiple ways.
- Directly monitor the endpoint using TCP/HTTP
- Status of other health checks (calculated health check)
- State of CloudWatch alarm
In addition to those, the Route 53 health check also
- supports advanced checks using HTTPS and response analysis.
- has a default timeout of 30 seconds, but can be set to "Fast", which is 10 seconds. Fast mode is charged extra.
- probes a Public IP address only for mode "Endpoint".
- this is a problem for internal nodes
- health check probes are performed throughout the world and are customizable for selected regions.
- can monitor the health check of EC2 based on CloudWatch Alarms which will check available instance metrics.
Creating Health Check by CloudWatch Alarm
From 07.00.00S404 onward as a part of HFE auto-healing feature, the HFE triggers a custom CloudWatch alarm. You can create a health check based on the Alarm state.
To create a health check by CloundWatch Alarm:
Go to CloudWatch > Alarms and click Create Alarm.
On the Select Metric page, go to Custom Metrics and choose HFE.
Select InstanceID of HFE and click Next.
On the Define Alarm page:
Enter name and description of Alarm.
- Set HFEState to less than 20, and for 1 out of 1 datapoints.
- Set Period to 1 Minute.
- Set Statistics to Standard, and select Sample Count in the drop-down menu.
Set Treat Missing data as to bad (breaching threshold).
Set Send Notification to as required. If SNS in list is not present, SNS can be created (https://aws.amazon.com/sns/).
- Click Create Alarm.
- Repeat the same procedure to Create Alarm for for other HFE nodes.
- Create one health check for each CloudWatch Alarm. Go to Route53 > Health Checks > Create health check.
- Enter a name for the health check.
- Set what to monitor as State of CloudWatch alarm.
- Select the correct CloudWatch region and related CloudWatch Alarm from the drop down.
- Under Health Check Status, set When the alarm is in the INSUFFICIENT state as the status is unhealthy and uncheck Invert health check status.
- Click Next.
- Click Create alarm and Create health check. Repeat the same step for other CloudWatch Alarm.
- After a few minutes, the health check status shows as successful.
Hosted Zone Configurations
- Create a Hosted Zone.
- Navigate under Services > Networking & Content Delivery > Route53.
- Select Hosted zones from the left-side panel.
- A DNS name (for example, "awsribbon.com" should be given.
- For Public hosted zones, the DNS name should be registered. (can be done in Route-53 itself. Prices applicable). It can be any name if using for intra-VPC communication for testing.
- Create Record sets in the Hosted Zone.
- Select the Domain Name created and click Go to Record Sets. Ignore any NS and SOA records created by default if they are not used.
Create a new record set. Each Record set name can be set as "svt.awsribbon.com".
NoteFor Each HFE IP and Health check Associated we need to create different Record set with same Name and same routing policy. For DNS query with that name e.g. "svt.awsribbon.com" route53 will return all the healthy records in response. For more information, refer to https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/dns-failover-simple-configs.html.
- Select Type as A - IPv4 address. [As per example diagram, for primary we associate the HFE-1's eth0 IP and for secondary we associate HFE-2's eth0 IP].
- Configure TTL - keep it at minimum(3s).
- Add the HFE IP address that maps to this name. (Corresponding address is returned by the DNS when the record is used for generating DNS response).
- In the Routing Policy dropdown menu, select Multivalue Answer. Routing policy can be used where we can associate recordset with health check.
- Enter Set ID.
- Click the Yes radio button for Associate with Health Check.
- Enter the Name of the associated health check.
- Click Create. Repeat the same steps for the other HFE.
- A health check is required for all nodes to judge HFE health.
PING Test for DNS Query
Heath Status for both HFE Records is Healthy.
sanayak@wfats2:~$ ping svt.awsribbon.com PING svt.awsribbon.com (52.7.228.10) 56(84) bytes of data. sanayak@wfats2:~$ ping svt.awsribbon.com PING svt.awsribbon.com (18.235.153.112) 56(84) bytes of data. Tshark Section 39 2.685251895 10.6.40.242 → 10.128.32.49 DNS 77 Standard query 0xcd6c A svt.awsribbon.com 40 2.762479717 10.70.52.162 → 10.6.40.242 TCP 60 55629 → 22 [ACK] Seq=1 Ack=1225 Win=63408 Len=0 41 2.777289076 10.128.32.49 → 10.6.40.242 DNS 109 Standard query response 0xcd6c A svt.awsribbon.com A 52.7.228.10 A 18.235.153.112
Heath Status of one of the HFE Records is unHealthy.
sanayak@wfats2:~$ ping svt.awsribbon.com PING svt.awsribbon.com (18.235.153.112) 56(84) bytes of data. Tshark Section 70 1.846439883 10.6.40.242 → 10.128.32.49 DNS 77 Standard query 0x386d A svt.awsribbon.com 71 1.866576942 10.128.32.49 → 10.6.40.242 DNS 93 Standard query response 0x386d A svt.awsribbon.com A 18.235.153.112
Health Status of all HFE Records is unHealthy.
When all HFE statuses are unhealthy, Route53 is returning all the records.
[root@ip-172-31-10-120 ~]# ping svt.awsribbon.comPING svt.awsribbon.com (18.130.168.161) 56(84) bytes of data. Tshark Section 30 1.884934 172.31.10.120 -> 172.31.0.2 DNS 77 Standard query 0x305d A svt.awsribbon.com 31 1.885306 172.31.0.2 -> 172.31.10.120 DNS 125 Standard query response 0x305d A 18.130.168.161 A 34.232.25.65 A 35.170.132.125
- When all HFE statuses are unhealthy, Route53 is returning all the records.
- When HFE instance is going down, it will take 2-3 minutes to change the status of health check to unhealthy from healthy.
- When HFE instance is coming up, it will take 2-4 minutes to change the status of the health check to healthy from unhealthy (time includes Instance coming up, Alarm clearing time, and Health Check status change).
- In the CloudWatch Alarm, the Period of Alarm needs to be at least 1 minute as the Health Check will not support High-Resolution Alarms.