Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965425AbcKKWVa (ORCPT ); Fri, 11 Nov 2016 17:21:30 -0500 Received: from mail-pg0-f53.google.com ([74.125.83.53]:33955 "EHLO mail-pg0-f53.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S965338AbcKKWV3 (ORCPT ); Fri, 11 Nov 2016 17:21:29 -0500 X-Greylist: delayed 395 seconds by postgrey-1.27 at vger.kernel.org; Fri, 11 Nov 2016 17:21:29 EST Subject: Re: Source address fib invalidation on IPv6 To: "Jason A. Donenfeld" , Netdev References: Cc: WireGuard mailing list , LKML From: David Ahern Message-ID: <31e050e2-0499-a77e-f698-86e58ad2fa6b@cumulusnetworks.com> Date: Fri, 11 Nov 2016 15:14:52 -0700 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:45.0) Gecko/20100101 Thunderbird/45.4.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3156 Lines: 88 On 11/11/16 12:29 PM, Jason A. Donenfeld wrote: > Hi folks, > > If I'm replying to a UDP packet, I generally want to use a source > address that's the same as the destination address of the packet to > which I'm replying. For example: > > Peer A sends packet: src = 10.0.0.1, dst = 10.0.0.3 > Peer B replies with: src = 10.0.0.3, dst = 10.0.0.1 > > But let's complicate things. Let's say Peer B has multiple IPs on an > interface: 10.0.0.2, 10.0.0.3. The default route uses 10.0.0.2. In > this case what do you think should happen? > > Case 1: > Peer A sends packet: src = 10.0.0.1, dst = 10.0.0.3 > Peer B replies with: src = 10.0.0.2, dst = 10.0.0.1 > > Case 2: > Peer A sends packet: src = 10.0.0.1, dst = 10.0.0.3 > Peer B replies with: src = 10.0.0.3, dst = 10.0.0.1 > > Intuition tells me the answer is "Case 2". If you agree, keep reading. > If you disagree, stop reading here, and instead correct my poor > intuition. > > So, assuming "Case 2", when Peer B receives the first packet, he notes > that packet's destination address, so that he can use it as a source > address next. When replying, Peer B sets the stored source address and > calls the routing function: > > struct flowi4 fl = { > .saddr = from_daddr_of_previous_packet, > .daddr = from_saddr_of_previous_packet, > }; > rt = ip_route_output_flow(sock_net(sock), &fl, sock); > > What if, however, by the time Peer B chooses to reply, his interface > no longer has that source address? No problem, because > ip_route_output_flow will return -EINVAL in that case. So, we can do > this: > > struct flowi4 fl = { > .saddr = from_daddr_of_previous_packet, > .daddr = from_saddr_of_previous_packet, > }; > rt = ip_route_output_flow(sock_net(sock), &fl, sock); > if (unlikely(IS_ERR(rt))) { > fl.saddr = 0; > rt = ip_route_output_flow(sock_net(sock), &fl, sock); > } > > And then all is good in the neighborhood. This solution works. Done. > > But what about IPv6? That's where we get into trouble: > > struct flowi6 fl = { > .saddr = from_daddr_of_previous_packet, > .daddr = from_saddr_of_previous_packet, > }; > ret = ipv6_stub->ipv6_dst_lookup(sock_net(sock), sock, &dst, &fl); > > In this case, IPv6 returns a valid dst, when no interface has the > source address anymore! So, there's no way to know whether or not the > source address for replying has gone stale. We don't have a means of > falling back to inaddr_any for the source address. What do you mean by 'valid dst'? ipv6 returns net->ipv6.ip6_null_entry on lookup failures so yes dst is non-NULL but that does not mean the lookup succeeded. For example take a look at ip6_dst_lookup_tail(): if (!*dst) *dst = ip6_route_output_flags(net, sk, fl6, flags); err = (*dst)->error; if (err) goto out_err_release; perhaps I should add dst->error to the fib tracepoints ... > > Primary question: is this behavior a bug? Or is this some consequence > of a fundamental IPv6 difference with v4? Or is something else > happening here? > > Thanks, > Jason >