IP Adress Failover for highly available AWS Services

IP Adress Failover for highly available AWS Services

Abstract

This documents describes the implementation of high available failover services for applications, which rely on IP address based communication only.
It shows how to configure IP addresses in a Virtual Private Network (VPC), which will route network traffic to a node A or an alternate node B as needed. The document describes how to change the setup of a VPC in case of a failure of node A. The document shows step by step how to automatically assign a service IP address to a standby node B when needed.
The document outlines two different technologies to achieve the same purpose. This allows the implementer to pick the technology that is most suitable for a given infrastructure and the switch over requirements.

Introduction

High available failover architectures are based on a concept where consumers reach a service provider A through a network connection. The core idea is to reroute the consumer traffic to a standby service B when the initial service A fails to provide a given service.
Amazon Web Services provides many building blocks to achieve the purpose to failover network consumers to a new network service. The following solutions are commonly used, they are however not subject of this document:

  • AWS Elastic Load Balancers (ELB) in conjunction with AWS Autoscaling allows rerouting traffic to other service providers when needed. AWS Elastic Load Balancers support many other protocols beyond http and https. They may however not work with some proprietary and legacy protocols.
  • Domain Name Service (DNS) failover with AWS Route 53 : This approach allows redirecting network consumers which lookup services by name to get redirected to a different network service provider. This concept works well with software solutions, which use name resolution to reach their service provider. Some legacy applications rely however on a communication through IP addresses only.

This paper focuses on network consumers which need to reach a service through a given, fixed IP address.

The two solutions work for any protocol. Both solutions require that a given network consumer can reconnect to the same IP address if the original service hangs and times out.

Amazon Web Services (AWS) offers two solutions to failover IP addresses, which should be chosen, based on the network and high availability requirements.

The first solution is based on the fact that AWS manages IP addresses as separate build blocks with the name “Elastic Network Interface” (ENI). Such an ENI hosts an IP address and it can be attached and detached on the fly from an EC2 instance. This allows redirecting the traffic to such an IP address by detaching and re-attaching ENIs to EC2 instances. The limitation of this concept is that it is limited to a single availability zone (AZ).

An IP address has to be part of a subnet. And a subnet has to be assigned to a given AZ. Attaching the same ENI to instances in two different AZs isn’t possible since a given IP address can belong to one subnet only.

This limitation may not be important for some high availability solutions. There may be however the need to leverage the key features of availability zones by running failover instances in two different availability zones.

The second solution is overcoming this limitation with Overlay IP addresses. AWS allows creating routing tables in a VPC which route any traffic for an IP address to an instance no matter where it is in the VPC. These IP addresses are called Overlay IP addresses.

The Overlay IP address can route traffic to instances in different availability zones. This comes with the challenge that the Overlay IP address has to be an IP address that isn’t part of the VPC. The general routing rules wouldn’t work otherwise.

On premises network consumers like desktops who try to access such an IP address have to be routed to the AWS VPC knowing that the Overlay IP address is not part of the regular subnet of the VPC. This leads to the extra effort to have to route on premises consumers to the AWS VPC with an additional subnet which isn’t part of the VPC itself.

Stefan Schneider Wed, 02/17/2016 - 11:04

IP Failover through Reassigning Network Interfaces

IP Failover through Reassigning Network Interfaces

This concept is based on the fact that multiple EC2 instances are reachable through their standard network interface (eth0) and an additional IP address that is used to provide the high available service. The high available service IP address has to belong to a different subnet. This IP address gets modeled with an AWS Elastic Network Interface (ENI). The ENI gets detached from an EC2 instance when the instance fails, it gets then attached to the EC2 instance which is supposed to take a service over.

ENI address reassignment The Elastic Network Interface with the name eni-x is currently attached to instance i-a and it can be attached on demand to instance i-b. The highly available service is provided through subnet A. The two service providing instances i-a and i-b can be reached through subnet B with their standard network interfaces (eth0).

This architecture requires at least two subnets:

  • A service provider subnet (here A)
  • A subnet which allows access to the normal (eth0) interfaces of the two instances (here B)

It takes the following steps to make such a failover scenario work

  1. Create and configure an ENI
  2. Enable the Linux instances to accept traffic and to send traffic back through the dynamically attached ENI
  3. Create policies for the two instances to allow them pull over the dynamic IP address
  4. Execute the appropriate AWS commands to detach and reattach the ENI

Create and configure an Elastic Network Interface (ENI)

Prerequisites to create an Elastic Network Interface are to have the following information:

  • Name of the AWS subnet in which the IP address fits
  • IP addresses which matches the CIDR of the AWS subnet
  • A security group for the ENI, which allows the required service, protocols to pass. The allowed protocols to pass are typically a subset of the protocols one would use for the primary interface in a non-high-available configuration.
Create ENI Interface

The creation can be done manually using the AWS console. You have to choose the EC2 console. Select the entry “Network Interfaces” in the left column. Click on “Create Network Interface” and you’ll see a dialog that looks like the one to the left.

 

Enter the required information. The ENI will be created. It’ll have a unique AWS-identifier in the form eni-XYZ,

The alternative is to use the command line with the AWS-CLI tools . The equivalent AWS-CLI command is:

create-network-interface --subnet-id <value>[--description <value>] [--private-ip-address <value>]

Network Configuration for the Linux Instances

Linux instances have to learn that they have to return the network traffic through the new network interface (eth1) once it is attached. It takes a number of instance specific routing changes once the interface gets attached. It’s important to undo these routing changes after the secondary network interface (eth1) gets detached.

The scripts below work for SLES 11 SP3 instances other Linux distributions will need the routing entries to be performed in different network configuration scripts.

/etc/sysconfig/network/scripts/ifup.local.eth1

The script below needs to be adopted by replacing the two following variables, which are printed in bold letters

  • DEFAULT-SUBNET-CIDR: This would be 10.1.0.0/8 according to the network diagram above
  • DEFAULT-ROUTER: The default router for the default subnet. This would be 10.1.0.1 for the network diagram from above:
#!/bin/bash
if [ "$1" = 'eth1' ]
then
ip route flush table MYHA
ip rule add from DEFAULT-SUBNET-CIDR table MYHA priority 100
ip route add default via DEFAULT-ROUTER dev eth1 table MYHA
fi

This script needs to be executable. A root user will have to perform this command to achieve this:

chmod +x /etc/sysconfig/network/scripts/ifup.local.eth1

/etc/sysconfig/network/scripts/ifdown.local.eth1

The script below needs to be adopted a replacing the variable shown in bold letters

• DEFAULT-SUBNET-CIDR: This would be 10.1.0.0/8 according to the network diagram above

#!/bin/bash
if [ "$1" = 'eth1' ]
        then
        ip route flush table MYHA
        ip rule del from DEFAULT-SUBNET-CIDR table MYHA priority 100 fi

This script needs to be executable. A root user will have to perform this command to achieve this:

chmod +x /etc/sysconfig/network/scripts/ifdown.local.eth1

Linking the Scripts to the right Directories

The scripts above need to be found by the help of soft links which have to be created by a root user the following way:

cd /etc/sysconfig/network/if-down.d
ln –s ../scripts/ifdown.local.eth1 ifdown.local.eth1
cd /etc/sysconfig/network/if-up.d
ln –s ../scripts/ifup.local.eth1 ifup.local.eth1

Adding an additional Routing Table

The scripts from this section will need an additional routing table. This table can be declared with the following command getting executed by a root user:

echo "100 MYHA" >> /etc/iproute2/rt_tables

Policies needed to Detach and Attach ENIs to EC2 Instances

It’s common that two highly available nodes monitor the other one and take action when the monitored node fails. It takes the following policy to enable a node to perform the required AWS configuration change. Attach this policy to all nodes which are supposed to change the network configuration:

{ 
  "Statement": [
      {
       "Sid": "Stmt1346888659253", 
       "Action": [
           "ec2:AttachNetworkInterface",
           "ec2:DescribeInstances",
           "ec2:DescribeInstanceStatus",
           "ec2:DescribeNetworkInterfaces",
           "ec2:DetachNetworkInterface",
           "ec2:DescribeNetworkInterfaceAttribute",
         ], 
         "Effect": "Allow",
         "Resource": [ 
             "*" 
         ] 
      } 
    ] 
}

Scripts to detach and reattach an ENI

The script getInterface.sh below is an example of how the AWS-CLI can be used to dynamically attach an Elastic Network Interface. It requires the dynamic IP to be entered as first command line parameter. The second command line parameter is the primary IP address of the system, which will then host the dynamic address.

The script

  1. Identifies the name of the ENI by using it’s IP address
  2. it determines the EC2 instance from which the ENI needs to be detached
  3. it detaches the ENI
  4. it waits until the operation has completed
  5. it attaches the ENI to the second system once it’s available
#!/bin/bash
# This scripts detaches as secondary interface from an instance.
# It then attaches the interface to the instance where it has been executed
# 
# Command line parameter
#=======================
# First parameter: IP address which needs to be detached and moved to a
# different instance 
echo "Move IP adress: $1 to system with primary IP address $2" 
INTERFACE=`ec2-describe-network-interfaces | grep $1 | \
awk /NETWORKINTERFACE/'{print $2 }'` 
TONODE=`curl -silent http://169.254.169.254/latest/meta-data/instance-id` 
echo "move eni: $INTERFACE to instance id: $TONODE " 
DETACH=`aws ec2 describe-network-interfaces --network-interface-ids $INTERFACE | \
awk /ATTACHMENT/'{print $3 }'` INTERFACESTATUS=`aws ec2 describe-network-interfaces --network-interface-ids $INTERFACE | \
awk -F"\t" /NETWORKINTERFACE/'{print $10 }'`
echo "$DETACH to be detached. Current interface status is $INTERFACESTATUS"
aws ec2 detach-network-interface --attachment-id $DETACH --force
echo "Command to detach Interface $INTERFACE submitted"
while [ "$INTERFACESTATUS" = 'in-use' ]
do
 echo "Will sleep 1 second"
 sleep 1
 INTERFACESTATUS=`aws ec2 describe-network-interfaces --network-interface-ids $INTERFACE | \
 awk -F"\t" /NETWORKINTERFACE/'{print $10 }'`
Done
echo "Will attach interface $INTERFACE to $TONODE "
aws ec2 attach-network-interface --instance-id $TONODE --network-interface-id $INTERFACE

 More Resources

 

Stefan Schneider Wed, 02/17/2016 - 11:14

Anonymous (not verified)

Wed, 07/26/2017 - 15:55

Thanks for providing such a great resource of knowledge! I'd like to understand - what is the difference between IP Address failover by the method of attaching/re-attaching an ENI (eth1) versus following the private-IP-reassignment approach in the article here: AWS article: Leveraging Multiple IP Addresses for Virtual IP Address Fail-over in 6 Simple Steps

It seems that using private-IP-address-reassignment would not require changes to the Linux instance network config scripts and be a simpler solution? But are there advantages/limitations to consider between the 2 approaches where a=ENI re-assignment, and b=secondary IP address reassignment?

Many thanks for your input!

Keep in mind that the AWS networking infrastructure needs to know that it will send traffic to a given IP address. I personally prefer "ENI reattach" since the IP address is only once in the routing system. Updating secondary IP addresses may theoretically lead to duplicate IP addresses. I guess it won't matter...

IP Failover with Overlay IP Addresses

IP Failover with Overlay IP Addresses

 AWS networking allows creating routing entries for routing tables, which direct all traffic for an IP address to an EC2 instance. This concept allows directing the traffic to any instance in a Virtual Private Network (VPC) no matter which subnet it is in and no matter which availability zone (AZ) it is in. Changing this routing entry for the subnets in a given VPC allows redirecting traffic when needed. This concept is known as “IP Overlay” routing in AWS. It is normally being used in a static way for routers and Network Address Translation (NAT) instances. Overlay IP routing can however be used in a dynamic fashion.

diagrams with overlay IP address

The diagram in Figure X shows a network topology in which this concept can get used. Two instances named node1 (EC2 instance i-a) and node2 (EC2 instance i-b) are connected to two different subnets. The two subnets are assigned to the same VPC in two different Availability Zones (AZ). It is not mandatory that both nodes are located in different availalibility zones and subnets, it’s however desirable in many cases. Failover nodes in high availability architectures should be independent of common failure root causes.

Both nodes are part of the same Virtual Private Network (VPC). Both subnets share the same routing table named rtb_A.

The idea is to route traffic from on premises consumers or consumers from within the VPC to the IP address 10.2.0.1 in this case. It’s important that the IP address is outside of the Classless Inter-Domain Routing (CIDR) block of the VPC.

It takes 4 steps to route traffic through an Overlay IP address to EC2 node1 or node2

  1. Create a routing entry in the routing table which sends the traffic to the EC2 instance in question
  2.  Disable the source/destination check for the network traffic to the two instances in the EC2 network management. The AWS network doesn’t by default send network packets to instances which don’t match the normal routing entries
  3. Enable the operating system of the EC2 instances to accept these packets
  4. The two EC2 instances are likely to monitor each other. They are likely to initiate the routing change when needed. The EC2 instances require policies in the IAM roles which authorize them make these changes in the routing table

Creating and managing the routing Entries

The AWS command line interface (AWS-CLI) allows creating such a route with the command:

aws ec2 create-route --route-table-id ROUTE_TABLE --destination-cidr-block CIDR --instance-id INSTANCE

Where as ROUTE_TABLE is the identifier of the routing table which needs to me modified. CIDR is an IP address with the filter. INSTANCE is the node to which the traffic gets directed.

Once the route exists it can be changed whenever traffic is supposed to be routed to a different node with the command:

aws ec2 replace-route --route-table-id ROUTE_TABLE --destination-cidr-block CIDR --instance-id INSTANCE

There are chances if there is a need to delete such a route entry. This happens with the command:

aws ec2 delete-route --route-table-id ROUTE_TABLE --destination-cidr-block CIDR 

It may be as well important to check for the current status of the routing table. A routing table can be checked with this command:

aws ec2 describe-route-tables --route-table-ids ROUTE_TABLE

The output will list all routing entries. The user will have to filter out the line with the CIDR in question.

Disable the Source/Destination Check for the Failover Instances

AWS console, change source destination check

The source/destination check can be disabled through the EC2 console. It takes the execution of the following pull down menu in the console for both EC2 instances (see left).

The same operation can be performed through scripts using the AWS command line interface (AWS-CLI). The following command needs to be executed one time for both instances, which are supposed to receive traffic from the Overlay IP address:

 ec2-modify-instance-attribute EC2-INSTANCE --source-dest-check false

The system on which this command gets executed needs temporarily a role with the following policy:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "Stmt1424870324000",
      "Effect": "Allow",
      "Action": [ "ec2:ModifyInstanceAttribute"],
      "Resource": [
       "arn:aws:ec2:REGION:ACCOUNT-ID:instance/INSTANCE-A",
       "arn:aws:ec2:REGION:ACCOUNT-ID:instance/INSTANCE-B"
      ]
    }
  ]
}

Replace the individual parameters (bold letters) for the region, the account identifier and the two identifiers for the EC2 instances with the placeholders in bold letters.

Configure the Network Interfaces to receive the Network Traffic of the Overlay IP Address

Linux systems need the overlay IP addresses to be configured as secondary IP address on their standard interface eth0. This can be achieved by the command:

ip address add OVERLAY-IPD/CIDR dev eth0:1

The tools to make the secondary IP address permanent vary across the Linux distributions. Please use the individual documentation to lookup the commands.

Enable the Instances to change the Routes

Switching routes from node to node typically happens in failover cluster. Failover clusters with two nodes monitor each other and take action when the other node doesn’t seem to be alive anymore. The following policy has to be applied to the EC2 instances, which are supposed to monitor each other and be allowed to switch the route when needed:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "Stmt1424870324000",
            "Effect": "Allow",
            "Action": [
                "ec2:DescribeInstances",
                "ec2:DescribeInstanceAttribute",
                "ec2:DescribeTags"
             ],
            "Resource": "*"
        },
        {
            "Sid": "Stmt1424860166260",
            "Action": [
                "ec2:CreateRoute",
                "ec2:DeleteRoute",
                "ec2:DescribeRouteTables",
                "ec2:ReplaceRoute"
            ],
            "Effect": "Allow",
            "Resource": "arn:aws:ec2:region-name:account-id:route-table/rtb-XYZ"
       } 
     ]
}

Replace the following variables with the appropriate names:

  • region-name : the name of the AWS region
  • account-id : The name of the AWS account in which the policy is getting used
  • rtb-XYZ : The identifier of the routing table which needs to be updated
Stefan Schneider Wed, 02/17/2016 - 11:50

Anonymous (not verified)

Thu, 09/15/2016 - 03:32

Hi,

First I'd like to say, you blog is fantastic. We've implemented SAPHANASR configuration out lined, and the Overlay IP seems to work fine only when working within AWS. (subnet to subnet)

However, we're having issues directing traffic from our main site over a site to site VPN which leverages the AWS VPN Gateway. Was this part successfully implemented? Anything special had to be done on the main site other that implementing a route to the Overlay IP?

I'm asking to know if this is an issue on our side?

Thanks,

Philippe

Yes,
overlay IP addresses can be reached from inside the VPN only.
You will want to run your users like application servers inside the VPN.
You will need a proxy like an ELB/ALB or a SAP GUI router if you want to reach the Overlay IP address from outside of the VPC.

Hi Stefan,

Thanks for a wonderful explanation.However i am not still able to understand how ELN/ALB will help n forwarding the traffic to the Overlay IP address.Since ELB has no understanding of the OverlayIP address which is plumbed inside the instance and no where mapped in the (just in the route tables).
So how will ELB/ALB will forward traffic to the Overlay IP address.

Hi,

this page is based on my daily work.

ELBs are great to load balance http and https protocols.
ELBs are great for web facing applications.

I'm working with legacy ERP applications which use lasting TCP connections and my users want to use everything in a VPC and intranet setup.
My GUI users may be idle for more than an hour (see: http://docs.aws.amazon.com/elasticloadbalancing/latest/classic/config-i…).

Anonymous (not verified)

Mon, 07/29/2019 - 17:39

In reply to by Stefan Schneider

Hi Stefan

Was there any response for the question asked above?

I'm pasting the question again here -

Thanks for a wonderful explanation.However i am not still able to understand how ELN/ALB will help in forwarding the traffic to the Overlay IP address.Since ELB has no understanding of the OverlayIP address which is plumbed inside the instance and no where mapped in the (just in the route tables).
So how will ELB/ALB will forward traffic to the Overlay IP address.

An ELB and an Overlay IP address means applying two solutions for one problem. You can use an ELB exclusively. The ELB may end an IP connection after an hour.You will have to make sure that only one target is getting configured at a time. Please touch base with your SAP savy architect from AWS to discuss this problem. There's a lot of innovation at AWS.

Anonymous (not verified)

Fri, 05/25/2018 - 00:47

Hi,

Have a question, how a subnet CIRD could be 10.1.0.0/8 when VPC is 10.1.0.0/16 (lower number of IPs)

Stefan Schneider

Thu, 05/31/2018 - 20:07

In reply to by Anonymous (not verified)

Correct observation. This is a typo in the diagram. I'll fix it.

Stalin (not verified)

Sat, 11/07/2020 - 02:42

Great documentation!!

I just had one real quick question, will this Overlay IP get relased upon a reboot? Just in case we want to applying monthly paches on these cluster nodes. If yes, what is best way to manage it.

Best,
Stalin

jayanth (not verified)

Thu, 11/12/2020 - 18:14

How will i communicate overlay ip, if the overlay ip is from different VPC

Stefan Schneider

Thu, 11/26/2020 - 10:16

In reply to by jayanth (not verified)

Any IP address within the VPC will be routed to the EC2 instance behind. This happens because they (hopefully) use the same routing table. You may have to maintain multiple routing tables if needed. An overlay IP address can't be reached from outside the VPC. The routing won't work :-( .