Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1422814AbbENV5x (ORCPT ); Thu, 14 May 2015 17:57:53 -0400 Received: from lion.mk-sys.cz ([213.168.178.121]:35876 "EHLO lion.mk-sys.cz" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1161135AbbENV5u (ORCPT ); Thu, 14 May 2015 17:57:50 -0400 X-Greylist: delayed 520 seconds by postgrey-1.27 at vger.kernel.org; Thu, 14 May 2015 17:57:50 EDT Date: Thu, 14 May 2015 23:49:07 +0200 From: Michal Kubecek To: Nicolas Dichtel Cc: "David S. Miller" , netdev@vger.kernel.org, linux-kernel@vger.kernel.org, Alexey Kuznetsov , James Morris , Hideaki YOSHIFUJI , Patrick McHardy , roopa Subject: Re: [PATCH net v2 2/2] ipv6: fix ECMP route replacement Message-ID: <20150514214907.GA20301@lion> References: <5554F073.4080501@6wind.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-2 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <5554F073.4080501@6wind.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2888 Lines: 78 On Thu, May 14, 2015 at 08:58:59PM +0200, Nicolas Dichtel wrote: > Le 13/05/2015 21:59, Michal Kubecek a ?crit : > >When replacing an IPv6 multipath route with "ip route replace", i.e. > >NLM_F_CREATE | NLM_F_REPLACE, fib6_add_rt2node() replaces only first > >matching route without fixing its siblings, resulting in corrupted > >siblings linked list; removing one of the siblings can then end in an > >infinite loop. > > > >Replacing the whole set of nexthops does IMHO make more sense than > >replacing a random one. We also need to remove the NLM_F_REPLACE flag > >after replacing old nexthops by first new so that each subsequent > >nexthop does not replace previous one. > > > >Fixes: 51ebd3181572 ("ipv6: add support of equal cost multipath (ECMP)") > >Signed-off-by: Michal Kubecek > >--- > > net/ipv6/ip6_fib.c | 17 ++++++++++++++--- > > net/ipv6/route.c | 8 +++++--- > > 2 files changed, 19 insertions(+), 6 deletions(-) > > > >diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c > >index 96dbffff5a24..abf4e4e5bdab 100644 > >--- a/net/ipv6/ip6_fib.c > >+++ b/net/ipv6/ip6_fib.c > >@@ -815,6 +815,8 @@ add: > > } > > > > } else { > >+ struct rt6_info *next; > >+ > > if (!found) { > > if (add) > > goto add; > >@@ -828,15 +830,24 @@ add: > > > > *ins = rt; > > rt->rt6i_node = fn; > >- rt->dst.rt6_next = iter->dst.rt6_next; > >+ > >+ /* skip potential siblings */ > >+ next = iter->dst.rt6_next; > >+ while (next && next->rt6i_metric == rt->rt6i_metric) > >+ next = next->dst.rt6_next; > I wonder if we should not loop over the siblings list here > (rt->rt6i_siblings). Only routes that match 'rt6_qualify_for_ecmp()' > are siblings. Problem with looping over the siblings list is that then we would have to find each of them in the (unidirectional) list linked by dst.rt6_next to be able to delete them from this list. Do we at least know that all routes in this list with matching metric and rt6_qualify_for_ecmp() are siblings? If so, we could still do the cleanup on one pass over the dst.rt6_next list. Michal Kubecek > > >+ rt->dst.rt6_next = next; > >+ > > atomic_inc(&rt->rt6i_ref); > > inet6_rt_notify(RTM_NEWROUTE, rt, info); > > if (!(fn->fn_flags & RTN_RTINFO)) { > > info->nl_net->ipv6.rt6_stats->fib_route_nodes++; > > fn->fn_flags |= RTN_RTINFO; > > } > >- fib6_purge_rt(iter, fn, info->nl_net); > >- rt6_release(iter); > >+ while (iter != next) { > >+ fib6_purge_rt(iter, fn, info->nl_net); > >+ rt6_release(iter); > >+ iter = iter->dst.rt6_next; > >+ } > Same here. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/