2016-10-14 20:59:53

by Mike Walker

[permalink] [raw]
Subject: Layer 2 over IPv6 GRE and path MTU discovery

When using a layer 2 GREv6 tunnel (ip6gretap), I am using a Linux
bridge to push Ethernet frames from an Ethernet port to the GREv6
device.

Here is an example of the topology:

PC -> eth0 -> grebridge -> gre6dev -> (internet) -> GRE endpoint -> Remote host

In this case, the PC connected to the Ethernet port is using IPv6 to
communicate with the remote host, so the source and destination IP of
the traffic being sent by the PC are both IPv6 addresses. So we have
an IPv6 header, Ethernet header, then GRE header once the
encapsulation is done.

Sometimes these packets are too large for the GRE tunnel's MTU. When
this happens, the router's kernel wants to send an ICMP "packet too
big" error message back to the PC.

However, the router has no routing information for the PC. The path
from the PC to the remote host is all supposed to be layer 2. The
router is not configured to route traffic to the PC or the remote
host, only to bridge the layer 2 frames.

What happens then is Linux tries to send an ICMP error, it can't find
the route, or else it sends it to its default route, none of which do
any good.

If the PC doesn't get this ICMP error, it will not know why the
packets were dropped, or it won't even know they were dropped. It's
an ICMP blackhole scenario right?

So, one solution I tried was hacking the kernel so that if it's trying
to send this ICMP "packet too big" error to a host, and we know it's a
layer 2 GRE tunnel, instead of the normal logic, force the ICMP error
message to be sent back out via the network interface the offending
packet was received on.

This mostly worked, the PC recieves the ICMP error and adjusts its
path MTU, so in the future it will know to fragment the packet if it's
too big.

Problem is, I don't know what source IP and mac address I should be
using when I send back this ICMP error to the PC. Normally this
network path doesn't have any layer 3 address, and even the mac
address normally is transparent / unknown to the PC. For my prototype
I simply set the source IP of the ICMP error to whatever was the
destination IP of the packet that was too big. I let the kernel use
the mac address of either the bridge or eth0.

I couldn't seem to find any RFC that says how this should be handled.
Any ideas?


2016-10-15 15:21:01

by Erik Auerswald

[permalink] [raw]
Subject: Re: Layer 2 over IPv6 GRE and path MTU discovery

Hi Mike,

On Fri, Oct 14, 2016 at 01:59:49PM -0700, Mike Walker wrote:
> When using a layer 2 GREv6 tunnel (ip6gretap), I am using a Linux
> bridge to push Ethernet frames from an Ethernet port to the GREv6
> device.
>
> Here is an example of the topology:
>
> PC -> eth0 -> grebridge -> gre6dev -> (internet) -> GRE endpoint -> Remote host
>
> In this case, the PC connected to the Ethernet port is using IPv6 to
> communicate with the remote host, so the source and destination IP of
> the traffic being sent by the PC are both IPv6 addresses. So we have
> an IPv6 header, Ethernet header, then GRE header once the
> encapsulation is done.
>
> Sometimes these packets are too large for the GRE tunnel's MTU. When
> this happens, the router's kernel wants to send an ICMP "packet too
> big" error message back to the PC.

The proper way to handle this is to adjust the MTU of both the "PC"
and the "Remote host" to reflect the properties of the GRE tunnel.

> However, the router has no routing information for the PC. The path
> from the PC to the remote host is all supposed to be layer 2. The
> router is not configured to route traffic to the PC or the remote
> host, only to bridge the layer 2 frames.

Therefore the end points of this virtual Ethernet link need to know the
MTU for this link.

Alternatively, you can try to fragment packets inside the tunnel to fake
the 1500B MTU commonly assumed. That is supported by commercial networking
gear, I have not looked for a possible GNU/Linux implementation, yet.

> What happens then is Linux tries to send an ICMP error, it can't find
> the route, or else it sends it to its default route, none of which do
> any good.
>
> If the PC doesn't get this ICMP error, it will not know why the
> packets were dropped, or it won't even know they were dropped. It's
> an ICMP blackhole scenario right?
>
> So, one solution I tried was hacking the kernel so that if it's trying
> to send this ICMP "packet too big" error to a host, and we know it's a
> layer 2 GRE tunnel, instead of the normal logic, force the ICMP error
> message to be sent back out via the network interface the offending
> packet was received on.
>
> This mostly worked, the PC recieves the ICMP error and adjusts its
> path MTU, so in the future it will know to fragment the packet if it's
> too big.
>
> Problem is, I don't know what source IP and mac address I should be
> using when I send back this ICMP error to the PC. Normally this
> network path doesn't have any layer 3 address, and even the mac
> address normally is transparent / unknown to the PC. For my prototype
> I simply set the source IP of the ICMP error to whatever was the
> destination IP of the packet that was too big. I let the kernel use
> the mac address of either the bridge or eth0.
>
> I couldn't seem to find any RFC that says how this should be handled.
> Any ideas?

If you do want to use this hack, I'd suggest to use some MAC and
IPv6 address owned by the tunnel endpoint. You could use a link local
address for this, just to ensure you do not create frames that clash
with legitimate IPv6 / MAC combinations.

Best regards,
Erik
--
A distributed system is one in which the failure of a computer you didn't
even know existed can render your own computer unusable.
-- Leslie Lamport