2015-05-13 09:50:37

by Michal Kubecek

[permalink] [raw]
Subject: [PATCH net 0/2] IPv6 ECMP route add/replace fixes

(1) When adding a nexthop of a multipath route fails (e.g. because of a
conflict with an existing route), we are supposed to delete nexthops
already added. However, currently we try to also delete all nexthops we
haven't even tried to add yet so that a "ip route add" command can
actually remove pre-existing routes if it fails.

(2) Attempt to replace a multipath route results in a broken siblings
linked list. Following commands (like "ip route del") can then either
follow a link into freed memory or end in an infinite loop (if the slab
object has been reused).

Michal Kubecek (2):
ipv6: do not delete previously existing ECMP routes if add fails
ipv6: fix ECMP route replacement

net/ipv6/ip6_fib.c | 17 ++++++++++++++---
net/ipv6/route.c | 9 ++++++---
2 files changed, 20 insertions(+), 6 deletions(-)

--
2.3.7


2015-05-13 09:50:40

by Michal Kubecek

[permalink] [raw]
Subject: [PATCH net 1/2] ipv6: do not delete previously existing ECMP routes if add fails

If adding a nexthop of an IPv6 multipath route fails, comment in
ip6_route_multipath() says we are going to delete all nexthops already
added. However, current implementation deletes even the routes it
hasn't even tried to add yet. For example, running

ip route add 1234:5678::/64 \
nexthop via fe80::aa dev dummy1 \
nexthop via fe80::bb dev dummy1 \
nexthop via fe80::cc dev dummy1

twice results in removing all routes first command added.

Limit the second (delete) run to nexthops that succeeded in the first
(add) run.

Fixes: 51ebd3181572 ("ipv6: add support of equal cost multipath (ECMP)")
Signed-off-by: Michal Kubecek <[email protected]>
---
net/ipv6/route.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index d3588885f097..18b92c05b541 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -2536,6 +2536,7 @@ beginning:
* next hops that have been already added.
*/
add = 0;
+ remaining = cfg->fc_mp_len - remaining;
goto beginning;
}
}
--
2.3.7

2015-05-13 09:50:46

by Michal Kubecek

[permalink] [raw]
Subject: [PATCH net 2/2] ipv6: fix ECMP route replacement

When replacing an IPv6 multipath route with "ip route replace", i.e.
NLM_F_CREATE | NLM_F_REPLACE, fib6_add_rt2node() replaces only first
matching route without fixing its siblings, resulting in corrupted
siblings linked list; removing one of the siblings can then end in an
infinite loop.

Replacing the whole set of nexthops does IMHO make more sense than
replacing a random one. We also need to remove the NLM_F_REPLACE flag
after replacing old nexthops by first new so that each subsequent
nexthop does not replace previous one.

Fixes: 51ebd3181572 ("ipv6: add support of equal cost multipath (ECMP)")
Signed-off-by: Michal Kubecek <[email protected]>
---
net/ipv6/ip6_fib.c | 17 ++++++++++++++---
net/ipv6/route.c | 8 +++++---
2 files changed, 19 insertions(+), 6 deletions(-)

diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index 96dbffff5a24..abf4e4e5bdab 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -815,6 +815,8 @@ add:
}

} else {
+ struct rt6_info *next;
+
if (!found) {
if (add)
goto add;
@@ -828,15 +830,24 @@ add:

*ins = rt;
rt->rt6i_node = fn;
- rt->dst.rt6_next = iter->dst.rt6_next;
+
+ /* skip potential siblings */
+ next = iter->dst.rt6_next;
+ while (next && next->rt6i_metric == rt->rt6i_metric)
+ next = next->dst.rt6_next;
+ rt->dst.rt6_next = next;
+
atomic_inc(&rt->rt6i_ref);
inet6_rt_notify(RTM_NEWROUTE, rt, info);
if (!(fn->fn_flags & RTN_RTINFO)) {
info->nl_net->ipv6.rt6_stats->fib_route_nodes++;
fn->fn_flags |= RTN_RTINFO;
}
- fib6_purge_rt(iter, fn, info->nl_net);
- rt6_release(iter);
+ while (iter != next) {
+ fib6_purge_rt(iter, fn, info->nl_net);
+ rt6_release(iter);
+ iter = iter->dst.rt6_next;
+ }
}

return 0;
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 18b92c05b541..b9af963207fa 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -2541,11 +2541,13 @@ beginning:
}
}
/* Because each route is added like a single route we remove
- * this flag after the first nexthop (if there is a collision,
+ * these flags after the first nexthop: if there is a collision,
* we have already fail to add the first nexthop:
- * fib6_add_rt2node() has reject it).
+ * fib6_add_rt2node() has reject it; when replacing, old
+ * nexthops are removed by first new, the rest is added to it.
*/
- cfg->fc_nlinfo.nlh->nlmsg_flags &= ~NLM_F_EXCL;
+ cfg->fc_nlinfo.nlh->nlmsg_flags &= ~(NLM_F_EXCL |
+ NLM_F_REPLACE);
rtnh = rtnh_next(rtnh, &remaining);
}

--
2.3.7

2015-05-13 12:29:09

by Nicolas Dichtel

[permalink] [raw]
Subject: Re: [PATCH net 1/2] ipv6: do not delete previously existing ECMP routes if add fails

Le 13/05/2015 11:50, Michal Kubecek a écrit :
> If adding a nexthop of an IPv6 multipath route fails, comment in
> ip6_route_multipath() says we are going to delete all nexthops already
> added. However, current implementation deletes even the routes it
> hasn't even tried to add yet. For example, running
>
> ip route add 1234:5678::/64 \
> nexthop via fe80::aa dev dummy1 \
> nexthop via fe80::bb dev dummy1 \
> nexthop via fe80::cc dev dummy1
>
> twice results in removing all routes first command added.
>
> Limit the second (delete) run to nexthops that succeeded in the first
> (add) run.
>
> Fixes: 51ebd3181572 ("ipv6: add support of equal cost multipath (ECMP)")
> Signed-off-by: Michal Kubecek <[email protected]>
> ---
> net/ipv6/route.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
> index d3588885f097..18b92c05b541 100644
> --- a/net/ipv6/route.c
> +++ b/net/ipv6/route.c
> @@ -2536,6 +2536,7 @@ beginning:
> * next hops that have been already added.
> */
> add = 0;
> + remaining = cfg->fc_mp_len - remaining;
> goto beginning;
Not sure to understand your fix. At the label beginning, the code is:

beginning:
rtnh = (struct rtnexthop *)cfg->fc_mp;
remaining = cfg->fc_mp_len;

Hence, 'remaining' will be overridden. How does your patch work?

2015-05-13 12:49:46

by Michal Kubecek

[permalink] [raw]
Subject: Re: [PATCH net 1/2] ipv6: do not delete previously existing ECMP routes if add fails

On Wed, May 13, 2015 at 02:28:57PM +0200, Nicolas Dichtel wrote:
> Le 13/05/2015 11:50, Michal Kubecek a ?crit :
> >If adding a nexthop of an IPv6 multipath route fails, comment in
> >ip6_route_multipath() says we are going to delete all nexthops already
> >added. However, current implementation deletes even the routes it
> >hasn't even tried to add yet. For example, running
> >
> > ip route add 1234:5678::/64 \
> > nexthop via fe80::aa dev dummy1 \
> > nexthop via fe80::bb dev dummy1 \
> > nexthop via fe80::cc dev dummy1
> >
> >twice results in removing all routes first command added.
> >
> >Limit the second (delete) run to nexthops that succeeded in the first
> >(add) run.
> >
> >Fixes: 51ebd3181572 ("ipv6: add support of equal cost multipath (ECMP)")
> >Signed-off-by: Michal Kubecek <[email protected]>
> >---
> > net/ipv6/route.c | 1 +
> > 1 file changed, 1 insertion(+)
> >
> >diff --git a/net/ipv6/route.c b/net/ipv6/route.c
> >index d3588885f097..18b92c05b541 100644
> >--- a/net/ipv6/route.c
> >+++ b/net/ipv6/route.c
> >@@ -2536,6 +2536,7 @@ beginning:
> > * next hops that have been already added.
> > */
> > add = 0;
> >+ remaining = cfg->fc_mp_len - remaining;
> > goto beginning;
> Not sure to understand your fix. At the label beginning, the code is:
>
> beginning:
> rtnh = (struct rtnexthop *)cfg->fc_mp;
> remaining = cfg->fc_mp_len;
>
> Hence, 'remaining' will be overridden. How does your patch work?

It does not, sorry. I'm still trying to find out what did I wrong while
testing. I'll need to move the initialization of remaining above the
label.

Michal Kubecek

2015-05-13 13:30:27

by Roopa Prabhu

[permalink] [raw]
Subject: Re: [PATCH net 1/2] ipv6: do not delete previously existing ECMP routes if add fails

On 5/13/15, 5:49 AM, Michal Kubecek wrote:
> On Wed, May 13, 2015 at 02:28:57PM +0200, Nicolas Dichtel wrote:
>> Le 13/05/2015 11:50, Michal Kubecek a ?crit :
>>> If adding a nexthop of an IPv6 multipath route fails, comment in
>>> ip6_route_multipath() says we are going to delete all nexthops already
>>> added. However, current implementation deletes even the routes it
>>> hasn't even tried to add yet. For example, running
>>>
>>> ip route add 1234:5678::/64 \
>>> nexthop via fe80::aa dev dummy1 \
>>> nexthop via fe80::bb dev dummy1 \
>>> nexthop via fe80::cc dev dummy1
>>>
>>> twice results in removing all routes first command added.
>>>
>>> Limit the second (delete) run to nexthops that succeeded in the first
>>> (add) run.
>>>
>>> Fixes: 51ebd3181572 ("ipv6: add support of equal cost multipath (ECMP)")
>>> Signed-off-by: Michal Kubecek <[email protected]>
>>> ---
>>> net/ipv6/route.c | 1 +
>>> 1 file changed, 1 insertion(+)
>>>
>>> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
>>> index d3588885f097..18b92c05b541 100644
>>> --- a/net/ipv6/route.c
>>> +++ b/net/ipv6/route.c
>>> @@ -2536,6 +2536,7 @@ beginning:
>>> * next hops that have been already added.
>>> */
>>> add = 0;
>>> + remaining = cfg->fc_mp_len - remaining;
>>> goto beginning;
>> Not sure to understand your fix. At the label beginning, the code is:
>>
>> beginning:
>> rtnh = (struct rtnexthop *)cfg->fc_mp;
>> remaining = cfg->fc_mp_len;
>>
>> Hence, 'remaining' will be overridden. How does your patch work?
> It does not, sorry. I'm still trying to find out what did I wrong while
> testing. I'll need to move the initialization of remaining above the
> label.
>
This looks like a similar bug i was trying to fix some time back:
http://patchwork.ozlabs.org/patch/362296/

(I am not sure if my full patch is still valid. I was thinking of
re-spining it sometime soon. If you are interested in trying it out,
please do)

2015-05-13 19:59:47

by Michal Kubecek

[permalink] [raw]
Subject: [PATCH net v2 0/2] IPv6 ECMP route add/replace fixes

(1) When adding a nexthop of a multipath route fails (e.g. because of a
conflict with an existing route), we are supposed to delete nexthops
already added. However, currently we try to also delete all nexthops we
haven't even tried to add yet so that a "ip route add" command can
actually remove pre-existing routes if it fails.

(2) Attempt to replace a multipath route results in a broken siblings
linked list. Following commands (like "ip route del") can then either
follow a link into freed memory or end in an infinite loop (if the slab
object has been reused).

Michal Kubecek (2):
ipv6: do not delete previously existing ECMP routes if add fails
ipv6: fix ECMP route replacement

net/ipv6/ip6_fib.c | 17 ++++++++++++++---
net/ipv6/route.c | 11 +++++++----
2 files changed, 21 insertions(+), 7 deletions(-)

--
2.3.7

2015-05-13 19:59:50

by Michal Kubecek

[permalink] [raw]
Subject: [PATCH net v2 1/2] ipv6: do not delete previously existing ECMP routes if add fails

If adding a nexthop of an IPv6 multipath route fails, comment in
ip6_route_multipath() says we are going to delete all nexthops already
added. However, current implementation deletes even the routes it
hasn't even tried to add yet. For example, running

ip route add 1234:5678::/64 \
nexthop via fe80::aa dev dummy1 \
nexthop via fe80::bb dev dummy1 \
nexthop via fe80::cc dev dummy1

twice results in removing all routes first command added.

Limit the second (delete) run to nexthops that succeeded in the first
(add) run.

Fixes: 51ebd3181572 ("ipv6: add support of equal cost multipath (ECMP)")
Signed-off-by: Michal Kubecek <[email protected]>
---
net/ipv6/route.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index d3588885f097..3821a3517478 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -2504,9 +2504,9 @@ static int ip6_route_multipath(struct fib6_config *cfg, int add)
int attrlen;
int err = 0, last_err = 0;

+ remaining = cfg->fc_mp_len;
beginning:
rtnh = (struct rtnexthop *)cfg->fc_mp;
- remaining = cfg->fc_mp_len;

/* Parse a Multipath Entry */
while (rtnh_ok(rtnh, remaining)) {
@@ -2536,6 +2536,7 @@ beginning:
* next hops that have been already added.
*/
add = 0;
+ remaining = cfg->fc_mp_len - remaining;
goto beginning;
}
}
--
2.3.7

2015-05-13 19:59:58

by Michal Kubecek

[permalink] [raw]
Subject: [PATCH net v2 2/2] ipv6: fix ECMP route replacement

When replacing an IPv6 multipath route with "ip route replace", i.e.
NLM_F_CREATE | NLM_F_REPLACE, fib6_add_rt2node() replaces only first
matching route without fixing its siblings, resulting in corrupted
siblings linked list; removing one of the siblings can then end in an
infinite loop.

Replacing the whole set of nexthops does IMHO make more sense than
replacing a random one. We also need to remove the NLM_F_REPLACE flag
after replacing old nexthops by first new so that each subsequent
nexthop does not replace previous one.

Fixes: 51ebd3181572 ("ipv6: add support of equal cost multipath (ECMP)")
Signed-off-by: Michal Kubecek <[email protected]>
---
net/ipv6/ip6_fib.c | 17 ++++++++++++++---
net/ipv6/route.c | 8 +++++---
2 files changed, 19 insertions(+), 6 deletions(-)

diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index 96dbffff5a24..abf4e4e5bdab 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -815,6 +815,8 @@ add:
}

} else {
+ struct rt6_info *next;
+
if (!found) {
if (add)
goto add;
@@ -828,15 +830,24 @@ add:

*ins = rt;
rt->rt6i_node = fn;
- rt->dst.rt6_next = iter->dst.rt6_next;
+
+ /* skip potential siblings */
+ next = iter->dst.rt6_next;
+ while (next && next->rt6i_metric == rt->rt6i_metric)
+ next = next->dst.rt6_next;
+ rt->dst.rt6_next = next;
+
atomic_inc(&rt->rt6i_ref);
inet6_rt_notify(RTM_NEWROUTE, rt, info);
if (!(fn->fn_flags & RTN_RTINFO)) {
info->nl_net->ipv6.rt6_stats->fib_route_nodes++;
fn->fn_flags |= RTN_RTINFO;
}
- fib6_purge_rt(iter, fn, info->nl_net);
- rt6_release(iter);
+ while (iter != next) {
+ fib6_purge_rt(iter, fn, info->nl_net);
+ rt6_release(iter);
+ iter = iter->dst.rt6_next;
+ }
}

return 0;
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 3821a3517478..cf842cb92b2f 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -2541,11 +2541,13 @@ beginning:
}
}
/* Because each route is added like a single route we remove
- * this flag after the first nexthop (if there is a collision,
+ * these flags after the first nexthop: if there is a collision,
* we have already fail to add the first nexthop:
- * fib6_add_rt2node() has reject it).
+ * fib6_add_rt2node() has reject it; when replacing, old
+ * nexthops are removed by first new, the rest is added to it.
*/
- cfg->fc_nlinfo.nlh->nlmsg_flags &= ~NLM_F_EXCL;
+ cfg->fc_nlinfo.nlh->nlmsg_flags &= ~(NLM_F_EXCL |
+ NLM_F_REPLACE);
rtnh = rtnh_next(rtnh, &remaining);
}

--
2.3.7

2015-05-13 20:06:59

by Michal Kubecek

[permalink] [raw]
Subject: Re: [PATCH net 1/2] ipv6: do not delete previously existing ECMP routes if add fails

On Wed, May 13, 2015 at 06:30:20AM -0700, roopa wrote:
> >
> This looks like a similar bug i was trying to fix some time back:
> http://patchwork.ozlabs.org/patch/362296/
>
> (I am not sure if my full patch is still valid. I was thinking of
> re-spining it sometime soon. If you are interested in trying it out,
> please do)

It's essentially the same idea but I prefer to use variable remaining
which is already there and is needed anyway rather than introduce
three extra variables for the cleanup (which is quite similar to the
suggestion from Hannes' reply).

I've sent v2 few minutes ago.

Michal Kubecek

2015-05-14 18:54:11

by Nicolas Dichtel

[permalink] [raw]
Subject: Re: [PATCH net v2 1/2] ipv6: do not delete previously existing ECMP routes if add fails

Le 13/05/2015 21:59, Michal Kubecek a écrit :
> If adding a nexthop of an IPv6 multipath route fails, comment in
> ip6_route_multipath() says we are going to delete all nexthops already
> added. However, current implementation deletes even the routes it
> hasn't even tried to add yet. For example, running
>
> ip route add 1234:5678::/64 \
> nexthop via fe80::aa dev dummy1 \
> nexthop via fe80::bb dev dummy1 \
> nexthop via fe80::cc dev dummy1
>
> twice results in removing all routes first command added.
>
> Limit the second (delete) run to nexthops that succeeded in the first
> (add) run.
>
> Fixes: 51ebd3181572 ("ipv6: add support of equal cost multipath (ECMP)")
> Signed-off-by: Michal Kubecek <[email protected]>
Acked-by: Nicolas Dichtel <[email protected]>

2015-05-14 18:59:05

by Nicolas Dichtel

[permalink] [raw]
Subject: Re: [PATCH net v2 2/2] ipv6: fix ECMP route replacement

Le 13/05/2015 21:59, Michal Kubecek a écrit :
> When replacing an IPv6 multipath route with "ip route replace", i.e.
> NLM_F_CREATE | NLM_F_REPLACE, fib6_add_rt2node() replaces only first
> matching route without fixing its siblings, resulting in corrupted
> siblings linked list; removing one of the siblings can then end in an
> infinite loop.
>
> Replacing the whole set of nexthops does IMHO make more sense than
> replacing a random one. We also need to remove the NLM_F_REPLACE flag
> after replacing old nexthops by first new so that each subsequent
> nexthop does not replace previous one.
>
> Fixes: 51ebd3181572 ("ipv6: add support of equal cost multipath (ECMP)")
> Signed-off-by: Michal Kubecek <[email protected]>
> ---
> net/ipv6/ip6_fib.c | 17 ++++++++++++++---
> net/ipv6/route.c | 8 +++++---
> 2 files changed, 19 insertions(+), 6 deletions(-)
>
> diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
> index 96dbffff5a24..abf4e4e5bdab 100644
> --- a/net/ipv6/ip6_fib.c
> +++ b/net/ipv6/ip6_fib.c
> @@ -815,6 +815,8 @@ add:
> }
>
> } else {
> + struct rt6_info *next;
> +
> if (!found) {
> if (add)
> goto add;
> @@ -828,15 +830,24 @@ add:
>
> *ins = rt;
> rt->rt6i_node = fn;
> - rt->dst.rt6_next = iter->dst.rt6_next;
> +
> + /* skip potential siblings */
> + next = iter->dst.rt6_next;
> + while (next && next->rt6i_metric == rt->rt6i_metric)
> + next = next->dst.rt6_next;
I wonder if we should not loop over the siblings list here (rt->rt6i_siblings).
Only routes that match 'rt6_qualify_for_ecmp()' are siblings.

> + rt->dst.rt6_next = next;
> +
> atomic_inc(&rt->rt6i_ref);
> inet6_rt_notify(RTM_NEWROUTE, rt, info);
> if (!(fn->fn_flags & RTN_RTINFO)) {
> info->nl_net->ipv6.rt6_stats->fib_route_nodes++;
> fn->fn_flags |= RTN_RTINFO;
> }
> - fib6_purge_rt(iter, fn, info->nl_net);
> - rt6_release(iter);
> + while (iter != next) {
> + fib6_purge_rt(iter, fn, info->nl_net);
> + rt6_release(iter);
> + iter = iter->dst.rt6_next;
> + }
Same here.

2015-05-14 21:57:53

by Michal Kubecek

[permalink] [raw]
Subject: Re: [PATCH net v2 2/2] ipv6: fix ECMP route replacement

On Thu, May 14, 2015 at 08:58:59PM +0200, Nicolas Dichtel wrote:
> Le 13/05/2015 21:59, Michal Kubecek a ?crit :
> >When replacing an IPv6 multipath route with "ip route replace", i.e.
> >NLM_F_CREATE | NLM_F_REPLACE, fib6_add_rt2node() replaces only first
> >matching route without fixing its siblings, resulting in corrupted
> >siblings linked list; removing one of the siblings can then end in an
> >infinite loop.
> >
> >Replacing the whole set of nexthops does IMHO make more sense than
> >replacing a random one. We also need to remove the NLM_F_REPLACE flag
> >after replacing old nexthops by first new so that each subsequent
> >nexthop does not replace previous one.
> >
> >Fixes: 51ebd3181572 ("ipv6: add support of equal cost multipath (ECMP)")
> >Signed-off-by: Michal Kubecek <[email protected]>
> >---
> > net/ipv6/ip6_fib.c | 17 ++++++++++++++---
> > net/ipv6/route.c | 8 +++++---
> > 2 files changed, 19 insertions(+), 6 deletions(-)
> >
> >diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
> >index 96dbffff5a24..abf4e4e5bdab 100644
> >--- a/net/ipv6/ip6_fib.c
> >+++ b/net/ipv6/ip6_fib.c
> >@@ -815,6 +815,8 @@ add:
> > }
> >
> > } else {
> >+ struct rt6_info *next;
> >+
> > if (!found) {
> > if (add)
> > goto add;
> >@@ -828,15 +830,24 @@ add:
> >
> > *ins = rt;
> > rt->rt6i_node = fn;
> >- rt->dst.rt6_next = iter->dst.rt6_next;
> >+
> >+ /* skip potential siblings */
> >+ next = iter->dst.rt6_next;
> >+ while (next && next->rt6i_metric == rt->rt6i_metric)
> >+ next = next->dst.rt6_next;
> I wonder if we should not loop over the siblings list here
> (rt->rt6i_siblings). Only routes that match 'rt6_qualify_for_ecmp()'
> are siblings.

Problem with looping over the siblings list is that then we would have
to find each of them in the (unidirectional) list linked by dst.rt6_next
to be able to delete them from this list. Do we at least know that all
routes in this list with matching metric and rt6_qualify_for_ecmp() are
siblings? If so, we could still do the cleanup on one pass over the
dst.rt6_next list.

Michal Kubecek

>
> >+ rt->dst.rt6_next = next;
> >+
> > atomic_inc(&rt->rt6i_ref);
> > inet6_rt_notify(RTM_NEWROUTE, rt, info);
> > if (!(fn->fn_flags & RTN_RTINFO)) {
> > info->nl_net->ipv6.rt6_stats->fib_route_nodes++;
> > fn->fn_flags |= RTN_RTINFO;
> > }
> >- fib6_purge_rt(iter, fn, info->nl_net);
> >- rt6_release(iter);
> >+ while (iter != next) {
> >+ fib6_purge_rt(iter, fn, info->nl_net);
> >+ rt6_release(iter);
> >+ iter = iter->dst.rt6_next;
> >+ }
> Same here.

2015-05-15 08:52:00

by Michal Kubecek

[permalink] [raw]
Subject: Re: [PATCH net v2 2/2] ipv6: fix ECMP route replacement

On Thu, May 14, 2015 at 11:49:07PM +0200, Michal Kubecek wrote:
> On Thu, May 14, 2015 at 08:58:59PM +0200, Nicolas Dichtel wrote:
> > Le 13/05/2015 21:59, Michal Kubecek a ?crit :
> > >When replacing an IPv6 multipath route with "ip route replace", i.e.
> > >NLM_F_CREATE | NLM_F_REPLACE, fib6_add_rt2node() replaces only first
> > >matching route without fixing its siblings, resulting in corrupted
> > >siblings linked list; removing one of the siblings can then end in an
> > >infinite loop.
> > >
> > >Replacing the whole set of nexthops does IMHO make more sense than
> > >replacing a random one. We also need to remove the NLM_F_REPLACE flag
> > >after replacing old nexthops by first new so that each subsequent
> > >nexthop does not replace previous one.
> > >
> > >Fixes: 51ebd3181572 ("ipv6: add support of equal cost multipath (ECMP)")
> > >Signed-off-by: Michal Kubecek <[email protected]>
> > >---
> > > net/ipv6/ip6_fib.c | 17 ++++++++++++++---
> > > net/ipv6/route.c | 8 +++++---
> > > 2 files changed, 19 insertions(+), 6 deletions(-)
> > >
> > >diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
> > >index 96dbffff5a24..abf4e4e5bdab 100644
> > >--- a/net/ipv6/ip6_fib.c
> > >+++ b/net/ipv6/ip6_fib.c
> > >@@ -815,6 +815,8 @@ add:
> > > }
> > >
> > > } else {
> > >+ struct rt6_info *next;
> > >+
> > > if (!found) {
> > > if (add)
> > > goto add;
> > >@@ -828,15 +830,24 @@ add:
> > >
> > > *ins = rt;
> > > rt->rt6i_node = fn;
> > >- rt->dst.rt6_next = iter->dst.rt6_next;
> > >+
> > >+ /* skip potential siblings */
> > >+ next = iter->dst.rt6_next;
> > >+ while (next && next->rt6i_metric == rt->rt6i_metric)
> > >+ next = next->dst.rt6_next;
> > I wonder if we should not loop over the siblings list here
> > (rt->rt6i_siblings). Only routes that match 'rt6_qualify_for_ecmp()'
> > are siblings.
>
> Problem with looping over the siblings list is that then we would have
> to find each of them in the (unidirectional) list linked by dst.rt6_next
> to be able to delete them from this list. Do we at least know that all
> routes in this list with matching metric and rt6_qualify_for_ecmp() are
> siblings? If so, we could still do the cleanup on one pass over the
> dst.rt6_next list.

Hm... it's still a bit more complicated. In the "replace" case, we break
the loop once we find any route with matching metric, i.e. we can find a
non-ECMP one even if there are some ECMP siblings farther in the chain.
As far as I can see, replacing such route would cause an inconsistency
as nsiblings would no longer match the number of ECMP-able routes in the
chain.

IMHO it's not completely clear what the "replace" semantics should be
for multiple routes (and, worse, for a mix of non-ECMP and ECMP ones).
One possible approach would be

- when new route is ECMP-able, try to find an ECMP-able route and
replace it with all its siblings
- if there is none, fall back to replacing first matching non-ECMP
route (or just add if creating is allowed)
- when new route is not ECMP-able (can this really happen with
NLM_F_REPLACE?), replace first matching non-ECMP or insert new one

But I still rather feel like replacing all existing matching routes
would better reflect what I expect "replace" to do.

Michal Kubecek

> > >+ rt->dst.rt6_next = next;
> > >+
> > > atomic_inc(&rt->rt6i_ref);
> > > inet6_rt_notify(RTM_NEWROUTE, rt, info);
> > > if (!(fn->fn_flags & RTN_RTINFO)) {
> > > info->nl_net->ipv6.rt6_stats->fib_route_nodes++;
> > > fn->fn_flags |= RTN_RTINFO;
> > > }
> > >- fib6_purge_rt(iter, fn, info->nl_net);
> > >- rt6_release(iter);
> > >+ while (iter != next) {
> > >+ fib6_purge_rt(iter, fn, info->nl_net);
> > >+ rt6_release(iter);
> > >+ iter = iter->dst.rt6_next;
> > >+ }
> > Same here.

2015-05-15 16:12:20

by David Miller

[permalink] [raw]
Subject: Re: [PATCH net v2 2/2] ipv6: fix ECMP route replacement

From: Michal Kubecek <[email protected]>
Date: Fri, 15 May 2015 10:51:52 +0200

> But I still rather feel like replacing all existing matching routes
> would better reflect what I expect "replace" to do.

What does IPV4 do?

2015-05-15 17:42:03

by Michal Kubecek

[permalink] [raw]
Subject: Re: [PATCH net v2 2/2] ipv6: fix ECMP route replacement

On Fri, May 15, 2015 at 12:12:12PM -0400, David Miller wrote:
> From: Michal Kubecek <[email protected]>
> Date: Fri, 15 May 2015 10:51:52 +0200
>
> > But I still rather feel like replacing all existing matching routes
> > would better reflect what I expect "replace" to do.
>
> What does IPV4 do?

Apparently there is some problem too as on 4.1-rc3, "ip route replace"
seems to always remove first matching IPv4 route (multipath or not) but
doesn't add the new one unless there was no matching route before
calling the command (then it behaves like "add").

I also tried 3.0 and it replaces first matching route, whether it was
multipath or not and whether the new one is multipath or not. This
behaviour cannot be directly transfered to IPv6 as there are no real
IPv6 multipath routes; instead, the set of all routes under a node with
the same metric satisfying rt6_qualify_for_ecmp() is handled as a
multipath route (so that there can be only one). The semantics I
suggested in my previous mail might be a reasonable approximation.

Michal Kubecek

2015-05-16 21:18:28

by David Miller

[permalink] [raw]
Subject: Re: [PATCH net v2 2/2] ipv6: fix ECMP route replacement

From: Michal Kubecek <[email protected]>
Date: Fri, 15 May 2015 19:41:56 +0200

> The semantics I suggested in my previous mail might be a reasonable
> approximation.

Ok, then let's try for that.

2015-05-18 18:53:57

by Michal Kubecek

[permalink] [raw]
Subject: [PATCH net v3 0/2] IPv6 ECMP route add/replace fixes


(1) When adding a nexthop of a multipath route fails (e.g. because of a
conflict with an existing route), we are supposed to delete nexthops
already added. However, currently we try to also delete all nexthops we
haven't even tried to add yet so that a "ip route add" command can
actually remove pre-existing routes if it fails.

(2) Attempt to replace a multipath route results in a broken siblings
linked list. Following commands (like "ip route del") can then either
follow a link into freed memory or end in an infinite loop (if the slab
object has been reused).

v2: fix an omission in first patch

v3: change the semantics of replace operation to better match IPv4

Michal Kubecek (2):
ipv6: do not delete previously existing ECMP routes if add fails
ipv6: fix ECMP route replacement

net/ipv6/ip6_fib.c | 39 +++++++++++++++++++++++++++++++++++++--
net/ipv6/route.c | 14 +++++++++-----
2 files changed, 46 insertions(+), 7 deletions(-)

--
2.4.1

2015-05-18 18:54:01

by Michal Kubecek

[permalink] [raw]
Subject: [PATCH net v3 1/2] ipv6: do not delete previously existing ECMP routes if add fails

If adding a nexthop of an IPv6 multipath route fails, comment in
ip6_route_multipath() says we are going to delete all nexthops already
added. However, current implementation deletes even the routes it
hasn't even tried to add yet. For example, running

ip route add 1234:5678::/64 \
nexthop via fe80::aa dev dummy1 \
nexthop via fe80::bb dev dummy1 \
nexthop via fe80::cc dev dummy1

twice results in removing all routes first command added.

Limit the second (delete) run to nexthops that succeeded in the first
(add) run.

Fixes: 51ebd3181572 ("ipv6: add support of equal cost multipath (ECMP)")
Signed-off-by: Michal Kubecek <[email protected]>
Acked-by: Nicolas Dichtel <[email protected]>
---
net/ipv6/route.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index d3588885f097..3821a3517478 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -2504,9 +2504,9 @@ static int ip6_route_multipath(struct fib6_config *cfg, int add)
int attrlen;
int err = 0, last_err = 0;

+ remaining = cfg->fc_mp_len;
beginning:
rtnh = (struct rtnexthop *)cfg->fc_mp;
- remaining = cfg->fc_mp_len;

/* Parse a Multipath Entry */
while (rtnh_ok(rtnh, remaining)) {
@@ -2536,6 +2536,7 @@ beginning:
* next hops that have been already added.
*/
add = 0;
+ remaining = cfg->fc_mp_len - remaining;
goto beginning;
}
}
--
2.4.1

2015-05-18 18:54:08

by Michal Kubecek

[permalink] [raw]
Subject: [PATCH net v3 2/2] ipv6: fix ECMP route replacement

When replacing an IPv6 multipath route with "ip route replace", i.e.
NLM_F_CREATE | NLM_F_REPLACE, fib6_add_rt2node() replaces only first
matching route without fixing its siblings, resulting in corrupted
siblings linked list; removing one of the siblings can then end in an
infinite loop.

IPv6 ECMP implementation is a bit different from IPv4 so that route
replacement cannot work in exactly the same way. This should be a
reasonable approximation:

1. If the new route is ECMP-able and there is a matching ECMP-able one
already, replace it and all its siblings (if any).

2. If the new route is ECMP-able and no matching ECMP-able route exists,
replace first matching non-ECMP-able (if any) or just add the new one.

3. If the new route is not ECMP-able, replace first matching
non-ECMP-able route (if any) or add the new route.

We also need to remove the NLM_F_REPLACE flag after replacing old
route(s) by first nexthop of an ECMP route so that each subsequent
nexthop does not replace previous one.

Fixes: 51ebd3181572 ("ipv6: add support of equal cost multipath (ECMP)")
Signed-off-by: Michal Kubecek <[email protected]>
---
net/ipv6/ip6_fib.c | 39 +++++++++++++++++++++++++++++++++++++--
net/ipv6/route.c | 11 +++++++----
2 files changed, 44 insertions(+), 6 deletions(-)

diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index 96dbffff5a24..bde57b113009 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -693,6 +693,7 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct rt6_info *rt,
{
struct rt6_info *iter = NULL;
struct rt6_info **ins;
+ struct rt6_info **fallback_ins = NULL;
int replace = (info->nlh &&
(info->nlh->nlmsg_flags & NLM_F_REPLACE));
int add = (!info->nlh ||
@@ -716,8 +717,13 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct rt6_info *rt,
(info->nlh->nlmsg_flags & NLM_F_EXCL))
return -EEXIST;
if (replace) {
- found++;
- break;
+ if (rt_can_ecmp == rt6_qualify_for_ecmp(iter)) {
+ found++;
+ break;
+ }
+ if (rt_can_ecmp)
+ fallback_ins = fallback_ins ?: ins;
+ goto next_iter;
}

if (iter->dst.dev == rt->dst.dev &&
@@ -753,9 +759,17 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct rt6_info *rt,
if (iter->rt6i_metric > rt->rt6i_metric)
break;

+next_iter:
ins = &iter->dst.rt6_next;
}

+ if (fallback_ins && !found) {
+ /* No ECMP-able route found, replace first non-ECMP one */
+ ins = fallback_ins;
+ iter = *ins;
+ found++;
+ }
+
/* Reset round-robin state, if necessary */
if (ins == &fn->leaf)
fn->rr_ptr = NULL;
@@ -815,6 +829,8 @@ add:
}

} else {
+ int nsiblings;
+
if (!found) {
if (add)
goto add;
@@ -835,8 +851,27 @@ add:
info->nl_net->ipv6.rt6_stats->fib_route_nodes++;
fn->fn_flags |= RTN_RTINFO;
}
+ nsiblings = iter->rt6i_nsiblings;
fib6_purge_rt(iter, fn, info->nl_net);
rt6_release(iter);
+
+ if (nsiblings) {
+ /* Replacing an ECMP route, remove all siblings */
+ ins = &rt->dst.rt6_next;
+ iter = *ins;
+ while (iter) {
+ if (rt6_qualify_for_ecmp(iter)) {
+ *ins = iter->dst.rt6_next;
+ fib6_purge_rt(iter, fn, info->nl_net);
+ rt6_release(iter);
+ nsiblings--;
+ } else {
+ ins = &iter->dst.rt6_next;
+ }
+ iter = *ins;
+ }
+ WARN_ON(nsiblings != 0);
+ }
}

return 0;
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 3821a3517478..c73ae5039e46 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -2541,11 +2541,14 @@ beginning:
}
}
/* Because each route is added like a single route we remove
- * this flag after the first nexthop (if there is a collision,
- * we have already fail to add the first nexthop:
- * fib6_add_rt2node() has reject it).
+ * these flags after the first nexthop: if there is a collision,
+ * we have already failed to add the first nexthop:
+ * fib6_add_rt2node() has rejected it; when replacing, old
+ * nexthops have been replaced by first new, the rest should
+ * be added to it.
*/
- cfg->fc_nlinfo.nlh->nlmsg_flags &= ~NLM_F_EXCL;
+ cfg->fc_nlinfo.nlh->nlmsg_flags &= ~(NLM_F_EXCL |
+ NLM_F_REPLACE);
rtnh = rtnh_next(rtnh, &remaining);
}

--
2.4.1

2015-05-19 20:51:09

by David Miller

[permalink] [raw]
Subject: Re: [PATCH net v3 2/2] ipv6: fix ECMP route replacement

From: Michal Kubecek <[email protected]>
Date: Mon, 18 May 2015 20:54:00 +0200 (CEST)

> When replacing an IPv6 multipath route with "ip route replace", i.e.
> NLM_F_CREATE | NLM_F_REPLACE, fib6_add_rt2node() replaces only first
> matching route without fixing its siblings, resulting in corrupted
> siblings linked list; removing one of the siblings can then end in an
> infinite loop.
>
> IPv6 ECMP implementation is a bit different from IPv4 so that route
> replacement cannot work in exactly the same way. This should be a
> reasonable approximation:
>
> 1. If the new route is ECMP-able and there is a matching ECMP-able one
> already, replace it and all its siblings (if any).
>
> 2. If the new route is ECMP-able and no matching ECMP-able route exists,
> replace first matching non-ECMP-able (if any) or just add the new one.
>
> 3. If the new route is not ECMP-able, replace first matching
> non-ECMP-able route (if any) or add the new route.
>
> We also need to remove the NLM_F_REPLACE flag after replacing old
> route(s) by first nexthop of an ECMP route so that each subsequent
> nexthop does not replace previous one.
>
> Fixes: 51ebd3181572 ("ipv6: add support of equal cost multipath (ECMP)")
> Signed-off-by: Michal Kubecek <[email protected]>

Can I get some reviews please from interested parties?

As far as I can tell, this faithfully implements the policy we decided
upon at the end of the previous thread for this patch. But I could be
wrong :-)

Thanks.

2015-05-20 08:56:38

by Nicolas Dichtel

[permalink] [raw]
Subject: Re: [PATCH net v3 2/2] ipv6: fix ECMP route replacement

Le 18/05/2015 20:54, Michal Kubecek a écrit :
> When replacing an IPv6 multipath route with "ip route replace", i.e.
> NLM_F_CREATE | NLM_F_REPLACE, fib6_add_rt2node() replaces only first
> matching route without fixing its siblings, resulting in corrupted
> siblings linked list; removing one of the siblings can then end in an
> infinite loop.
>
> IPv6 ECMP implementation is a bit different from IPv4 so that route
> replacement cannot work in exactly the same way. This should be a
> reasonable approximation:
>
> 1. If the new route is ECMP-able and there is a matching ECMP-able one
> already, replace it and all its siblings (if any).
>
> 2. If the new route is ECMP-able and no matching ECMP-able route exists,
> replace first matching non-ECMP-able (if any) or just add the new one.
>
> 3. If the new route is not ECMP-able, replace first matching
> non-ECMP-able route (if any) or add the new route.
>
> We also need to remove the NLM_F_REPLACE flag after replacing old
> route(s) by first nexthop of an ECMP route so that each subsequent
> nexthop does not replace previous one.
>
> Fixes: 51ebd3181572 ("ipv6: add support of equal cost multipath (ECMP)")
> Signed-off-by: Michal Kubecek <[email protected]>
LGTM.
Acked-by: Nicolas Dichtel <[email protected]>

2015-05-20 16:03:47

by David Miller

[permalink] [raw]
Subject: Re: [PATCH net v3 0/2] IPv6 ECMP route add/replace fixes

From: Michal Kubecek <[email protected]>
Date: Mon, 18 May 2015 20:53:50 +0200 (CEST)

> (1) When adding a nexthop of a multipath route fails (e.g. because of a
> conflict with an existing route), we are supposed to delete nexthops
> already added. However, currently we try to also delete all nexthops we
> haven't even tried to add yet so that a "ip route add" command can
> actually remove pre-existing routes if it fails.
>
> (2) Attempt to replace a multipath route results in a broken siblings
> linked list. Following commands (like "ip route del") can then either
> follow a link into freed memory or end in an infinite loop (if the slab
> object has been reused).
>
> v2: fix an omission in first patch
>
> v3: change the semantics of replace operation to better match IPv4

Series applied, thanks.