Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756634AbcJNU7x (ORCPT ); Fri, 14 Oct 2016 16:59:53 -0400 Received: from mail-qk0-f169.google.com ([209.85.220.169]:32992 "EHLO mail-qk0-f169.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753866AbcJNU7u (ORCPT ); Fri, 14 Oct 2016 16:59:50 -0400 MIME-Version: 1.0 From: Mike Walker Date: Fri, 14 Oct 2016 13:59:49 -0700 Message-ID: Subject: Layer 2 over IPv6 GRE and path MTU discovery To: lartc@vger.kernel.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2270 Lines: 51 When using a layer 2 GREv6 tunnel (ip6gretap), I am using a Linux bridge to push Ethernet frames from an Ethernet port to the GREv6 device. Here is an example of the topology: PC -> eth0 -> grebridge -> gre6dev -> (internet) -> GRE endpoint -> Remote host In this case, the PC connected to the Ethernet port is using IPv6 to communicate with the remote host, so the source and destination IP of the traffic being sent by the PC are both IPv6 addresses. So we have an IPv6 header, Ethernet header, then GRE header once the encapsulation is done. Sometimes these packets are too large for the GRE tunnel's MTU. When this happens, the router's kernel wants to send an ICMP "packet too big" error message back to the PC. However, the router has no routing information for the PC. The path from the PC to the remote host is all supposed to be layer 2. The router is not configured to route traffic to the PC or the remote host, only to bridge the layer 2 frames. What happens then is Linux tries to send an ICMP error, it can't find the route, or else it sends it to its default route, none of which do any good. If the PC doesn't get this ICMP error, it will not know why the packets were dropped, or it won't even know they were dropped. It's an ICMP blackhole scenario right? So, one solution I tried was hacking the kernel so that if it's trying to send this ICMP "packet too big" error to a host, and we know it's a layer 2 GRE tunnel, instead of the normal logic, force the ICMP error message to be sent back out via the network interface the offending packet was received on. This mostly worked, the PC recieves the ICMP error and adjusts its path MTU, so in the future it will know to fragment the packet if it's too big. Problem is, I don't know what source IP and mac address I should be using when I send back this ICMP error to the PC. Normally this network path doesn't have any layer 3 address, and even the mac address normally is transparent / unknown to the PC. For my prototype I simply set the source IP of the ICMP error to whatever was the destination IP of the packet that was too big. I let the kernel use the mac address of either the bridge or eth0. I couldn't seem to find any RFC that says how this should be handled. Any ideas?