Using all IPv6 addresses on an Elastic Network Interfaces in EC2 instances
ENIs come with a number of IPv4 and IPv6 interfaces (current numbers are here: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-eni.html#AvailableIpPerENI), but what happens if for some reason we need more than what a single ENI can support? The answer is to use multiple ENIs attached to the same instance. Whilst this works out-of-the-box for IPv4, IPv6 requires some further setup.
The problem
AWS networking is unlike any "regular" network infrastructure, for example in it's default scenario multiple ENIs on the same EC2 instance can be connected to the same subnet (for both IPv4 and IPv6). A sample instance might have the following configuration (when DHCPv6 and RA is enabled):
# ip -6 a sh dev ens5
2: ens5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc mq state UP group default qlen 1000
inet6 2406:da1c:150:2e01::a0/128 scope global dynamic noprefixroute
valid_lft 383sec preferred_lft 83sec
inet6 2406:da1c:150:2e01::a3/128 scope global dynamic noprefixroute
valid_lft 383sec preferred_lft 83sec
inet6 2406:da1c:150:2e01::a2/128 scope global dynamic noprefixroute
valid_lft 383sec preferred_lft 83sec
inet6 2406:da1c:150:2e01::a1/128 scope global dynamic noprefixroute
valid_lft 383sec preferred_lft 83sec
inet6 fe80::57:76ff:fec9:bd8c/64 scope link
valid_lft forever preferred_lft forever
# ip -6 a sh dev ens6
3: ens6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc mq state UP group default qlen 1000
inet6 2406:da1c:150:2e01::a7/128 scope global dynamic noprefixroute
valid_lft 392sec preferred_lft 92sec
inet6 2406:da1c:150:2e01::a6/128 scope global dynamic noprefixroute
valid_lft 392sec preferred_lft 92sec
inet6 2406:da1c:150:2e01::a5/128 scope global dynamic noprefixroute
valid_lft 392sec preferred_lft 92sec
inet6 2406:da1c:150:2e01::a4/128 scope global dynamic noprefixroute
valid_lft 392sec preferred_lft 92sec
inet6 fe80::8:b3ff:fefa:f10e/64 scope link
valid_lft forever preferred_lft forever
The first obvious thing to notice is that global
IPv6 addresses are all a /128
and only the link
ones are part of a network.
Let's have a look at routing:
# ip -6 r sh
::1 dev lo proto kernel metric 256 pref medium
2406:da1c:150:2e01::/64 dev ens5 proto ra metric 100 pref medium
2406:da1c:150:2e01::/64 dev ens6 proto ra metric 200 pref medium
blackhole fd00::7:7d80/122 dev lo proto bird metric 1024 pref medium
fe80::/64 dev ens6 proto kernel metric 256 pref medium
fe80::/64 dev ens5 proto kernel metric 256 pref medium
default via fe80::b:f3ff:fea0:fbe dev ens5 proto ra metric 100 expires 1799sec pref medium
default via fe80::b:f3ff:fea0:fbe dev ens6 proto ra metric 200 expires 1799sec pref medium
The odd thing here is that we have 2 default routes with a different metric. In practical terms that means that all traffic will be leaving via ens5
.
So let's debug some packets to see how they flow, first to on the ens5
interface (pinging from a host outside of AWS):
# tcpdump -ni ens5 icmp6
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on ens5, link-type EN10MB (Ethernet), capture size 262144 bytes
12:11:48.850639 IP6 2406:e001:3:270e::2 > 2406:da1c:150:2e01::a1: ICMP6, echo request, seq 0, length 16
12:11:48.850692 IP6 2406:da1c:150:2e01::a1 > 2406:e001:3:270e::2: ICMP6, echo reply, seq 0, length 16
Nothing unusual to see here - packets enter and leave as expected.
What about an IP assigned to the ens6
interface?
# tcpdump -ni ens6 icmp6
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on ens6, link-type EN10MB (Ethernet), capture size 262144 bytes
12:16:03.061318 IP6 2406:e001:3:270e::2 > 2406:da1c:150:2e01::a7: ICMP6, echo request, seq 11, length 16
We can only see the ICMPv6 request, but not the response. That's because the response is leaving via ens5
:
# tcpdump -ni ens5 icmp6
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on ens5, link-type EN10MB (Ethernet), capture size 262144 bytes
12:17:13.304446 IP6 2406:da1c:150:2e01::a7 > 2406:e001:3:270e::2: ICMP6, echo reply, seq 81, length 16
but the host we're pinging from is not getting the response:
ping6 2406:da1c:150:2e01::a7
PING6(56=40+8+8 bytes) 2406:e001:3:270e::2 --> 2406:da1c:150:2e01::a7
^C
--- 2406:da1c:150:2e01::a7 ping6 statistics ---
100 packets transmitted, 0 packets received, 100.0% packet loss
What about if we deleted the default route via ens5
?
ip -6 r d default via fe80::b:f3ff:fea0:fbe dev ens5
Now we can ping the second IP:
ping6 2406:da1c:150:2e01::a7
PING6(56=40+8+8 bytes) 2406:e001:3:270e::2 --> 2406:da1c:150:2e01::a7
16 bytes from 2406:da1c:150:2e01::a7, icmp_seq=0 hlim=47 time=33.753 ms
16 bytes from 2406:da1c:150:2e01::a7, icmp_seq=1 hlim=47 time=34.124 ms
^C
--- 2406:da1c:150:2e01::a7 ping6 statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max/std-dev = 33.753/33.939/34.124/0.185 ms
but not the first one. Also that default
route comes back quite quickly due to the RA
packet, which breaks the connectivity again. That happens even with source-destination-check
disabled on the ENI.
The solution
So the problem we have is that depending which IPv6 address response traffic originates from it's supposed to use a different egress interface. Luckily this can be easily solved using standard Linux iproute2
package.
Linux has a concept of multiple routing tables, but by default only a few of them are used. In Ubuntu they're stored in /etc/iproute2/rt_tables
, that file provides a mapping between a human-readable names and a 8-bit integers:
#
# reserved values
#
255 local
254 main
253 default
0 unspec
#
# local
#
#1 inr.ruhep
(the mapping doesn't have to be used, you can use just a number instead)
In order to determine which table should be used for each packet Linux kernel inspects the rules
. Again by default they're fairly simple:
# ip -6 rule sh
0: from all lookup local
32766: from all lookup main
32767: from all lookup default
Routes for each of those tables can be inspected:
# ip -6 r sh table local
local ::1 dev lo proto kernel metric 0 pref medium
local 2406:da1c:150:2e01::a0 dev ens5 proto kernel metric 0 pref medium
local 2406:da1c:150:2e01::a1 dev ens5 proto kernel metric 0 pref medium
local 2406:da1c:150:2e01::a2 dev ens5 proto kernel metric 0 pref medium
local 2406:da1c:150:2e01::a3 dev ens5 proto kernel metric 0 pref medium
local 2406:da1c:150:2e01::a4 dev ens6 proto kernel metric 0 pref medium
local 2406:da1c:150:2e01::a5 dev ens6 proto kernel metric 0 pref medium
local 2406:da1c:150:2e01::a6 dev ens6 proto kernel metric 0 pref medium
local 2406:da1c:150:2e01::a7 dev ens6 proto kernel metric 0 pref medium
anycast fe80:: dev ens5 proto kernel metric 0 pref medium
anycast fe80:: dev ens6 proto kernel metric 0 pref medium
local fe80::8:b3ff:fefa:f10e dev ens6 proto kernel metric 0 pref medium
local fe80::57:76ff:fec9:bd8c dev ens5 proto kernel metric 0 pref medium
multicast ff00::/8 dev ens6 proto kernel metric 256 pref medium
multicast ff00::/8 dev ens5 proto kernel metric 256 pref medium
So, to put it all together, we need:
- 2 new route tables (lets call them
ens5
andens6
) that have the default route pointing down their own interface - A number of rules to tell the kernel which packets should be subject to that special routing
Creating route tables
The next hop for IPv6 traffic always seems to be the same fe80::b:f3ff:fea0:fbe
address, regardless of the actual network.
So first let's create entries in /etc/iproute2/rt_tables
:
5 ens5
6 ens6
next - populate the route tables:
ip -6 route add default via fe80::b:f3ff:fea0:fbe dev ens6 table ens6
ip -6 route add default via fe80::b:f3ff:fea0:fbe dev ens5 table ens5
and finally add the source-based rules:
ip -6 rule add from 2406:da1c:150:2e01::a7/128 table ens6
ip -6 rule add from 2406:da1c:150:2e01::a6/128 table ens6
ip -6 rule add from 2406:da1c:150:2e01::a5/128 table ens6
ip -6 rule add from 2406:da1c:150:2e01::a4/128 table ens6
ip -6 rule add from 2406:da1c:150:2e01::a0/128 table ens5
ip -6 rule add from 2406:da1c:150:2e01::a3/128 table ens5
ip -6 rule add from 2406:da1c:150:2e01::a2/128 table ens5
ip -6 rule add from 2406:da1c:150:2e01::a1/128 table ens5
Now, we can reach IPs on both interfaces from outside:
# ping6 -c 3 2406:da1c:150:2e01::a7
PING6(56=40+8+8 bytes) 2406:e001:3:270e::2 --> 2406:da1c:150:2e01::a7
16 bytes from 2406:da1c:150:2e01::a7, icmp_seq=0 hlim=47 time=37.032 ms
16 bytes from 2406:da1c:150:2e01::a7, icmp_seq=1 hlim=47 time=34.345 ms
16 bytes from 2406:da1c:150:2e01::a7, icmp_seq=2 hlim=47 time=34.597 ms
--- 2406:da1c:150:2e01::a7 ping6 statistics ---
3 packets transmitted, 3 packets received, 0.0% packet loss
round-trip min/avg/max/std-dev = 34.345/35.325/37.032/1.212 ms
# ping6 -c 3 2406:da1c:150:2e01::a1
PING6(56=40+8+8 bytes) 2406:e001:3:270e::2 --> 2406:da1c:150:2e01::a1
16 bytes from 2406:da1c:150:2e01::a1, icmp_seq=0 hlim=47 time=33.583 ms
16 bytes from 2406:da1c:150:2e01::a1, icmp_seq=1 hlim=47 time=35.032 ms
16 bytes from 2406:da1c:150:2e01::a1, icmp_seq=2 hlim=47 time=35.264 ms
--- 2406:da1c:150:2e01::a1 ping6 statistics ---
3 packets transmitted, 3 packets received, 0.0% packet loss
round-trip min/avg/max/std-dev = 33.583/34.626/35.264/0.744 ms
This configuration can be added for example to rc.local
to make sure it's executed every time the instance starts.
Credits
Photo by Onur K on Unsplash