2016-10-24 09:28:57

by Jakub Sitnicki

[permalink] [raw]
Subject: [PATCH net-next 0/5] Route ICMPv6 errors with the flow when ECMP in use

The motivation for this series is to route ICMPv6 error messages
together with the flow they belong to when multipath routing is in
use. It intends to bring the ECMP routing in IPv6 stack on par with
IPv4.

This enables the use of tools that rely on ICMP error messages such as
traceroute and makes PMTU discovery work both ways. However, for it to
work IPv6 flow labels have to be same in both directions
(i.e. reflected) or need to be chosen in a manner that ensures that
the flow going in the opposite direction would actually be routed to a
given path.

Changes have been tested in a virtual setup with a topology as below:

Re1 --- Hs1
/
Hc --- Ri --- Rc
\
Re1 --- Hs2

Hc - client host
HsX - server host
Rc - core router
ReX - edge router
Ri - intermediate router

To test the changes, traceroute in UDP mode to the client host, with
flow label set, has been run from one of the server hosts. Full test
is available at [1].

-Jakub

[1] https://github.com/jsitnicki/tools/blob/master/net/tests/ecmp/test-ecmp-icmpv6-error-routing.sh


Jakub Sitnicki (5):
ipv6: Fold rt6_info_hash_nhsfn() into its only caller
net: Extend struct flowi6 with multipath hash
ipv6: Use multipath hash from flow info if available
ipv6: Compute multipath hash for sent ICMP errors from offending
packet
ipv6: Compute multipath hash for forwarded ICMP errors from offending
packet

include/linux/icmpv6.h | 2 ++
include/net/flow.h | 1 +
net/ipv6/icmp.c | 21 +++++++++++++++++++++
net/ipv6/route.c | 40 +++++++++++++++++++++++++++++-----------
4 files changed, 53 insertions(+), 11 deletions(-)

--
2.7.4


2016-10-24 09:29:06

by Jakub Sitnicki

[permalink] [raw]
Subject: [PATCH net-next 3/5] ipv6: Use multipath hash from flow info if available

Allow our callers to influence the choice of ECMP link by honoring the
hash passed together with the flow info. This will allow for special
treatment of ICMP errors which we would like to route over the same link
as the IP datagram that triggered the error.

Signed-off-by: Jakub Sitnicki <[email protected]>
---
net/ipv6/route.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 0514b35..1184c2b 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -430,9 +430,11 @@ static struct rt6_info *rt6_multipath_select(struct rt6_info *match,
int strict)
{
struct rt6_info *sibling, *next_sibling;
+ unsigned int hash;
int route_choosen;

- route_choosen = get_hash_from_flowi6(fl6) % (match->rt6i_nsiblings + 1);
+ hash = fl6->mp_hash ? : get_hash_from_flowi6(fl6);
+ route_choosen = hash % (match->rt6i_nsiblings + 1);
/* Don't change the route, if route_choosen == 0
* (siblings does not include ourself)
*/
--
2.7.4

2016-10-24 09:29:23

by Jakub Sitnicki

[permalink] [raw]
Subject: [PATCH net-next 5/5] ipv6: Compute multipath hash for forwarded ICMP errors from offending packet

Same as for the transmit path, let's do our best to ensure that received
ICMP errors that may be subject to forwarding will be routed the same
path as flow that triggered the error, if it was going in the opposite
direction.

Signed-off-by: Jakub Sitnicki <[email protected]>
---
net/ipv6/route.c | 26 ++++++++++++++++++++++++++
1 file changed, 26 insertions(+)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 1184c2b..c0f38ea 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1150,6 +1150,30 @@ struct dst_entry *ip6_route_input_lookup(struct net *net,
}
EXPORT_SYMBOL_GPL(ip6_route_input_lookup);

+static u32 ip6_multipath_icmp_hash(const struct sk_buff *skb)
+{
+ const struct icmp6hdr *icmph = icmp6_hdr(skb);
+ const struct ipv6hdr *inner_iph;
+ struct ipv6hdr _inner_iph;
+
+ if (icmph->icmp6_type != ICMPV6_DEST_UNREACH &&
+ icmph->icmp6_type != ICMPV6_PKT_TOOBIG &&
+ icmph->icmp6_type != ICMPV6_TIME_EXCEED &&
+ icmph->icmp6_type != ICMPV6_PARAMPROB)
+ goto standard_hash;
+
+ inner_iph = skb_header_pointer(
+ skb, skb_transport_offset(skb) + sizeof(*icmph),
+ sizeof(_inner_iph), &_inner_iph);
+ if (!inner_iph)
+ goto standard_hash;
+
+ return icmpv6_multipath_hash(inner_iph);
+
+standard_hash:
+ return 0; /* compute it later, if needed */
+}
+
void ip6_route_input(struct sk_buff *skb)
{
const struct ipv6hdr *iph = ipv6_hdr(skb);
@@ -1168,6 +1192,8 @@ void ip6_route_input(struct sk_buff *skb)
tun_info = skb_tunnel_info(skb);
if (tun_info && !(tun_info->mode & IP_TUNNEL_INFO_TX))
fl6.flowi6_tun_key.tun_id = tun_info->key.tun_id;
+ if (unlikely(fl6.flowi6_proto == IPPROTO_ICMPV6))
+ fl6.mp_hash = ip6_multipath_icmp_hash(skb);
skb_dst_drop(skb);
skb_dst_set(skb, ip6_route_input_lookup(net, skb->dev, &fl6, flags));
}
--
2.7.4

2016-10-24 09:29:09

by Jakub Sitnicki

[permalink] [raw]
Subject: [PATCH net-next 4/5] ipv6: Compute multipath hash for sent ICMP errors from offending packet

Improve debuggability with tools like traceroute and make PMTUD work in
setups that make use of ECMP routing by sending ICMP errors down the
same path as the offending packet would travel, if it was going in the
opposite direction.

There is a caveat, flows in both directions need use the same
label. Otherwise packets from flow in the opposite direction and ICMP
errors will not be routed over the same ECMP link.

Export the function for calculating the multipath hash so that we can
use it also on receive side, when forwarding ICMP errors.

Signed-off-by: Jakub Sitnicki <[email protected]>
---
include/linux/icmpv6.h | 2 ++
net/ipv6/icmp.c | 21 +++++++++++++++++++++
2 files changed, 23 insertions(+)

diff --git a/include/linux/icmpv6.h b/include/linux/icmpv6.h
index 57086e9..6282e03 100644
--- a/include/linux/icmpv6.h
+++ b/include/linux/icmpv6.h
@@ -45,4 +45,6 @@ extern void icmpv6_flow_init(struct sock *sk,
const struct in6_addr *saddr,
const struct in6_addr *daddr,
int oif);
+struct ipv6hdr;
+extern u32 icmpv6_multipath_hash(const struct ipv6hdr *iph);
#endif
diff --git a/net/ipv6/icmp.c b/net/ipv6/icmp.c
index bd59c34..ab376b3d1 100644
--- a/net/ipv6/icmp.c
+++ b/net/ipv6/icmp.c
@@ -385,6 +385,26 @@ static struct dst_entry *icmpv6_route_lookup(struct net *net,
return ERR_PTR(err);
}

+u32 icmpv6_multipath_hash(const struct ipv6hdr *iph)
+{
+ struct flowi6 fl6;
+
+ /* Calculate the multipath hash from the offending IP datagram that
+ * triggered the ICMP error. The source and destination addresses are
+ * swapped as we do our best to route the ICMP message together with the
+ * flow it belongs to. However, flows in both directions have to have
+ * the same label (e.g. by using flow label reflection) for it to
+ * happen.
+ */
+ memset(&fl6, 0, sizeof(fl6));
+ fl6.daddr = iph->saddr;
+ fl6.saddr = iph->daddr;
+ fl6.flowlabel = ip6_flowinfo(iph);
+ fl6.flowi6_proto = iph->nexthdr;
+
+ return get_hash_from_flowi6(&fl6);
+}
+
/*
* Send an ICMP message in response to a packet in error
*/
@@ -484,6 +504,7 @@ static void icmp6_send(struct sk_buff *skb, u8 type, u8 code, __u32 info,
fl6.flowi6_oif = iif;
fl6.fl6_icmp_type = type;
fl6.fl6_icmp_code = code;
+ fl6.mp_hash = icmpv6_multipath_hash(hdr);
security_skb_classify_flow(skb, flowi6_to_flowi(&fl6));

sk = icmpv6_xmit_lock(net);
--
2.7.4

2016-10-24 09:29:03

by Jakub Sitnicki

[permalink] [raw]
Subject: [PATCH net-next 2/5] net: Extend struct flowi6 with multipath hash

Allow for functions that fill out the IPv6 flow info to also pass a hash
computed over the skb contents. The hash value will drive the multipath
routing decisions.

This is intended for special treatment of ICMPv6 errors, where we would
like to make a routing decision based on the flow identifying the
offending IPv6 datagram that triggered the error, rather than the flow
of the ICMP error itself.

Signed-off-by: Jakub Sitnicki <[email protected]>
---
include/net/flow.h | 1 +
1 file changed, 1 insertion(+)

diff --git a/include/net/flow.h b/include/net/flow.h
index 035aa77..73ee3aa 100644
--- a/include/net/flow.h
+++ b/include/net/flow.h
@@ -143,6 +143,7 @@ struct flowi6 {
#define fl6_ipsec_spi uli.spi
#define fl6_mh_type uli.mht.type
#define fl6_gre_key uli.gre_key
+ __u32 mp_hash;
} __attribute__((__aligned__(BITS_PER_LONG/8)));

struct flowidn {
--
2.7.4

2016-10-24 09:30:17

by Jakub Sitnicki

[permalink] [raw]
Subject: [PATCH net-next 1/5] ipv6: Fold rt6_info_hash_nhsfn() into its only caller

Commit 644d0e656958 ("ipv6 Use get_hash_from_flowi6 for rt6 hash") has
turned rt6_info_hash_nhsfn() into a one-liner, so it no longer makes
sense to keep it around.

Also the accompanying documentation comment has become outdated, so just
remove it altogether.

Signed-off-by: Jakub Sitnicki <[email protected]>
Acked-by: Hannes Frederic Sowa <[email protected]>
---
net/ipv6/route.c | 12 +-----------
1 file changed, 1 insertion(+), 11 deletions(-)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index bdbc38e..0514b35 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -425,16 +425,6 @@ static bool rt6_check_expired(const struct rt6_info *rt)
return false;
}

-/* Multipath route selection:
- * Hash based function using packet header and flowlabel.
- * Adapted from fib_info_hashfn()
- */
-static int rt6_info_hash_nhsfn(unsigned int candidate_count,
- const struct flowi6 *fl6)
-{
- return get_hash_from_flowi6(fl6) % candidate_count;
-}
-
static struct rt6_info *rt6_multipath_select(struct rt6_info *match,
struct flowi6 *fl6, int oif,
int strict)
@@ -442,7 +432,7 @@ static struct rt6_info *rt6_multipath_select(struct rt6_info *match,
struct rt6_info *sibling, *next_sibling;
int route_choosen;

- route_choosen = rt6_info_hash_nhsfn(match->rt6i_nsiblings + 1, fl6);
+ route_choosen = get_hash_from_flowi6(fl6) % (match->rt6i_nsiblings + 1);
/* Don't change the route, if route_choosen == 0
* (siblings does not include ourself)
*/
--
2.7.4

2016-10-24 09:39:33

by Hannes Frederic Sowa

[permalink] [raw]
Subject: Re: [PATCH net-next 2/5] net: Extend struct flowi6 with multipath hash

On 24.10.2016 11:28, Jakub Sitnicki wrote:
> Allow for functions that fill out the IPv6 flow info to also pass a hash
> computed over the skb contents. The hash value will drive the multipath
> routing decisions.
>
> This is intended for special treatment of ICMPv6 errors, where we would
> like to make a routing decision based on the flow identifying the
> offending IPv6 datagram that triggered the error, rather than the flow
> of the ICMP error itself.
>
> Signed-off-by: Jakub Sitnicki <[email protected]>

Acked-by: Hannes Frederic Sowa <[email protected]>

2016-10-24 09:40:08

by Hannes Frederic Sowa

[permalink] [raw]
Subject: Re: [PATCH net-next 3/5] ipv6: Use multipath hash from flow info if available

On 24.10.2016 11:28, Jakub Sitnicki wrote:
> Allow our callers to influence the choice of ECMP link by honoring the
> hash passed together with the flow info. This will allow for special
> treatment of ICMP errors which we would like to route over the same link
> as the IP datagram that triggered the error.
>
> Signed-off-by: Jakub Sitnicki <[email protected]>

Acked-by: Hannes Frederic Sowa <[email protected]>


2016-10-24 09:40:44

by Hannes Frederic Sowa

[permalink] [raw]
Subject: Re: [PATCH net-next 4/5] ipv6: Compute multipath hash for sent ICMP errors from offending packet

On 24.10.2016 11:28, Jakub Sitnicki wrote:
> Improve debuggability with tools like traceroute and make PMTUD work in
> setups that make use of ECMP routing by sending ICMP errors down the
> same path as the offending packet would travel, if it was going in the
> opposite direction.
>
> There is a caveat, flows in both directions need use the same
> label. Otherwise packets from flow in the opposite direction and ICMP
> errors will not be routed over the same ECMP link.
>
> Export the function for calculating the multipath hash so that we can
> use it also on receive side, when forwarding ICMP errors.
>
> Signed-off-by: Jakub Sitnicki <[email protected]>

Acked-by: Hannes Frederic Sowa <[email protected]>


2016-10-24 09:43:08

by Hannes Frederic Sowa

[permalink] [raw]
Subject: Re: [PATCH net-next 5/5] ipv6: Compute multipath hash for forwarded ICMP errors from offending packet

On 24.10.2016 11:28, Jakub Sitnicki wrote:
> Same as for the transmit path, let's do our best to ensure that received
> ICMP errors that may be subject to forwarding will be routed the same
> path as flow that triggered the error, if it was going in the opposite
> direction.
>
> Signed-off-by: Jakub Sitnicki <[email protected]>

Acked-by: Hannes Frederic Sowa <[email protected]>

2016-10-27 15:23:45

by David Miller

[permalink] [raw]
Subject: Re: [PATCH net-next 0/5] Route ICMPv6 errors with the flow when ECMP in use

From: Jakub Sitnicki <[email protected]>
Date: Mon, 24 Oct 2016 11:28:47 +0200

> However, for it to work IPv6 flow labels have to be same in both
> directions (i.e. reflected) or need to be chosen in a manner that
> ensures that the flow going in the opposite direction would actually
> be routed to a given path.

My understanding is that this is not really guaranteed, and that
entities are nearly encouraged to set the flow label in whatever
manner makes sense for their use case.

I think we really cannot have any kind of hard dependency on how
flow labels are set and used by the internet.

2016-10-27 15:24:17

by David Miller

[permalink] [raw]
Subject: Re: [PATCH net-next 4/5] ipv6: Compute multipath hash for sent ICMP errors from offending packet

From: Jakub Sitnicki <[email protected]>
Date: Mon, 24 Oct 2016 11:28:51 +0200

> diff --git a/include/linux/icmpv6.h b/include/linux/icmpv6.h
> index 57086e9..6282e03 100644
> --- a/include/linux/icmpv6.h
> +++ b/include/linux/icmpv6.h
> @@ -45,4 +45,6 @@ extern void icmpv6_flow_init(struct sock *sk,
> const struct in6_addr *saddr,
> const struct in6_addr *daddr,
> int oif);
> +struct ipv6hdr;
> +extern u32 icmpv6_multipath_hash(const struct ipv6hdr *iph);
> #endif

We do not use "extern" in external function declarations in header file any more.

2016-10-27 15:25:35

by David Miller

[permalink] [raw]
Subject: Re: [PATCH net-next 5/5] ipv6: Compute multipath hash for forwarded ICMP errors from offending packet

From: Jakub Sitnicki <[email protected]>
Date: Mon, 24 Oct 2016 11:28:52 +0200

> + inner_iph = skb_header_pointer(
> + skb, skb_transport_offset(skb) + sizeof(*icmph),
> + sizeof(_inner_iph), &_inner_iph);

Please do not style this call like this, put as many arguments as
you can on the first line.

inner_iph = skb_header_pointer(skb,
skb_transport_offset(skb) + sizeof(*icmph),
sizeof(_inner_iph), &_inner_iph);

And on the second and subsequent lines, indent to the first column after
the openning parenthesis of the first line.

2016-10-27 21:54:54

by Hannes Frederic Sowa

[permalink] [raw]
Subject: Re: [PATCH net-next 0/5] Route ICMPv6 errors with the flow when ECMP in use

Hi,

On 27.10.2016 17:23, David Miller wrote:
> From: Jakub Sitnicki <[email protected]>
> Date: Mon, 24 Oct 2016 11:28:47 +0200
>
>> However, for it to work IPv6 flow labels have to be same in both
>> directions (i.e. reflected) or need to be chosen in a manner that
>> ensures that the flow going in the opposite direction would actually
>> be routed to a given path.
>
> My understanding is that this is not really guaranteed, and that
> entities are nearly encouraged to set the flow label in whatever
> manner makes sense for their use case.

In general this is true.

> I think we really cannot have any kind of hard dependency on how
> flow labels are set and used by the internet.

Probably/Hopefully ECMP setups are set up by the same entity that also
operates the servers, thus they can easily control the reflection of
flow labels on those servers. This might be especially important for
anycast services hosted behind ECMP services.

If the flow labels don't match, these patches are just best effort and
don't improve nor worsen the situation (lot's of traffic afaik still
carries 0 as flow label which indeed does help).

Bye,
Hannes

2016-10-27 22:00:19

by Jakub Sitnicki

[permalink] [raw]
Subject: Re: [PATCH net-next 4/5] ipv6: Compute multipath hash for sent ICMP errors from offending packet

On Thu, Oct 27, 2016 at 03:24 PM GMT, David Miller wrote:
> From: Jakub Sitnicki <[email protected]>
> Date: Mon, 24 Oct 2016 11:28:51 +0200
>
>> diff --git a/include/linux/icmpv6.h b/include/linux/icmpv6.h
>> index 57086e9..6282e03 100644
>> --- a/include/linux/icmpv6.h
>> +++ b/include/linux/icmpv6.h
>> @@ -45,4 +45,6 @@ extern void icmpv6_flow_init(struct sock *sk,
>> const struct in6_addr *saddr,
>> const struct in6_addr *daddr,
>> int oif);
>> +struct ipv6hdr;
>> +extern u32 icmpv6_multipath_hash(const struct ipv6hdr *iph);
>> #endif
>
> We do not use "extern" in external function declarations in header file any more.

My mistake, will remote it.

2016-10-27 22:10:38

by Jakub Sitnicki

[permalink] [raw]
Subject: Re: [PATCH net-next 5/5] ipv6: Compute multipath hash for forwarded ICMP errors from offending packet

On Thu, Oct 27, 2016 at 03:25 PM GMT, David Miller wrote:
> From: Jakub Sitnicki <[email protected]>
> Date: Mon, 24 Oct 2016 11:28:52 +0200
>
>> + inner_iph = skb_header_pointer(
>> + skb, skb_transport_offset(skb) + sizeof(*icmph),
>> + sizeof(_inner_iph), &_inner_iph);
>
> Please do not style this call like this, put as many arguments as
> you can on the first line.
>
> inner_iph = skb_header_pointer(skb,
> skb_transport_offset(skb) + sizeof(*icmph),
> sizeof(_inner_iph), &_inner_iph);
>
> And on the second and subsequent lines, indent to the first column after
> the openning parenthesis of the first line.

FWIW, I had it styled like that and then changed it. Will change back.

In my defense - checkpatch.pl made me do it, Your Honor! (line too long)

2016-10-27 22:35:31

by Tom Herbert

[permalink] [raw]
Subject: Re: [PATCH net-next 5/5] ipv6: Compute multipath hash for forwarded ICMP errors from offending packet

On Mon, Oct 24, 2016 at 2:28 AM, Jakub Sitnicki <[email protected]> wrote:
> Same as for the transmit path, let's do our best to ensure that received
> ICMP errors that may be subject to forwarding will be routed the same
> path as flow that triggered the error, if it was going in the opposite
> direction.
>
Unfortunately our ability to do this is generally quite limited. This
patch will select the route for multipath, but I don't believe sets
the same link in LAG and definitely can't help switches doing ECMP to
route the ICMP packet in the same way as the flow would be. Did you
see a problem that warrants solving this case?

Tom


> Signed-off-by: Jakub Sitnicki <[email protected]>
> ---
> net/ipv6/route.c | 26 ++++++++++++++++++++++++++
> 1 file changed, 26 insertions(+)
>
> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
> index 1184c2b..c0f38ea 100644
> --- a/net/ipv6/route.c
> +++ b/net/ipv6/route.c
> @@ -1150,6 +1150,30 @@ struct dst_entry *ip6_route_input_lookup(struct net *net,
> }
> EXPORT_SYMBOL_GPL(ip6_route_input_lookup);
>
> +static u32 ip6_multipath_icmp_hash(const struct sk_buff *skb)
> +{
> + const struct icmp6hdr *icmph = icmp6_hdr(skb);
> + const struct ipv6hdr *inner_iph;
> + struct ipv6hdr _inner_iph;
> +
> + if (icmph->icmp6_type != ICMPV6_DEST_UNREACH &&
> + icmph->icmp6_type != ICMPV6_PKT_TOOBIG &&
> + icmph->icmp6_type != ICMPV6_TIME_EXCEED &&
> + icmph->icmp6_type != ICMPV6_PARAMPROB)
> + goto standard_hash;
> +
> + inner_iph = skb_header_pointer(
> + skb, skb_transport_offset(skb) + sizeof(*icmph),
> + sizeof(_inner_iph), &_inner_iph);
> + if (!inner_iph)
> + goto standard_hash;
> +
> + return icmpv6_multipath_hash(inner_iph);
> +
> +standard_hash:
> + return 0; /* compute it later, if needed */
> +}
> +
> void ip6_route_input(struct sk_buff *skb)
> {
> const struct ipv6hdr *iph = ipv6_hdr(skb);
> @@ -1168,6 +1192,8 @@ void ip6_route_input(struct sk_buff *skb)
> tun_info = skb_tunnel_info(skb);
> if (tun_info && !(tun_info->mode & IP_TUNNEL_INFO_TX))
> fl6.flowi6_tun_key.tun_id = tun_info->key.tun_id;
> + if (unlikely(fl6.flowi6_proto == IPPROTO_ICMPV6))
> + fl6.mp_hash = ip6_multipath_icmp_hash(skb);

I will point out that this is only
> skb_dst_drop(skb);
> skb_dst_set(skb, ip6_route_input_lookup(net, skb->dev, &fl6, flags));
> }
> --
> 2.7.4
>

2016-10-28 08:32:27

by Jakub Sitnicki

[permalink] [raw]
Subject: Re: [PATCH net-next 5/5] ipv6: Compute multipath hash for forwarded ICMP errors from offending packet

On Thu, Oct 27, 2016 at 10:35 PM GMT, Tom Herbert wrote:
> On Mon, Oct 24, 2016 at 2:28 AM, Jakub Sitnicki <[email protected]> wrote:
>> Same as for the transmit path, let's do our best to ensure that received
>> ICMP errors that may be subject to forwarding will be routed the same
>> path as flow that triggered the error, if it was going in the opposite
>> direction.
>>
> Unfortunately our ability to do this is generally quite limited. This
> patch will select the route for multipath, but I don't believe sets
> the same link in LAG and definitely can't help switches doing ECMP to
> route the ICMP packet in the same way as the flow would be. Did you
> see a problem that warrants solving this case?

The motivation here is to bring IPv6 ECMP routing on par with IPv4 to
enable its wider use, targeting anycast services. Forwarding ICMP errors
back to the source host, at the L3 layer, is what we thought would be a
step forward.

Similar to change in IPv4 routing introduced in commit 79a131592dbb
("ipv4: ICMP packet inspection for multipath", [1]) we do our best at
L3, leaving any potential problems with LAG at lower layer (L2)
unaddressed.

>> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
>> index 1184c2b..c0f38ea 100644
>> --- a/net/ipv6/route.c
>> +++ b/net/ipv6/route.c

[...]

>> @@ -1168,6 +1192,8 @@ void ip6_route_input(struct sk_buff *skb)
>> tun_info = skb_tunnel_info(skb);
>> if (tun_info && !(tun_info->mode & IP_TUNNEL_INFO_TX))
>> fl6.flowi6_tun_key.tun_id = tun_info->key.tun_id;
>> + if (unlikely(fl6.flowi6_proto == IPPROTO_ICMPV6))
>> + fl6.mp_hash = ip6_multipath_icmp_hash(skb);
>
> I will point out that this is only

Sorry, looks like part of your reply got cut short. Could you repost?

-Jakub

[1] https://git.kernel.org/torvalds/c/79a131592dbb81a2dba208622a2ffbfc53f28bc0

2016-10-28 14:25:55

by Tom Herbert

[permalink] [raw]
Subject: Re: [PATCH net-next 5/5] ipv6: Compute multipath hash for forwarded ICMP errors from offending packet

On Fri, Oct 28, 2016 at 1:32 AM, Jakub Sitnicki <[email protected]> wrote:
> On Thu, Oct 27, 2016 at 10:35 PM GMT, Tom Herbert wrote:
>> On Mon, Oct 24, 2016 at 2:28 AM, Jakub Sitnicki <[email protected]> wrote:
>>> Same as for the transmit path, let's do our best to ensure that received
>>> ICMP errors that may be subject to forwarding will be routed the same
>>> path as flow that triggered the error, if it was going in the opposite
>>> direction.
>>>
>> Unfortunately our ability to do this is generally quite limited. This
>> patch will select the route for multipath, but I don't believe sets
>> the same link in LAG and definitely can't help switches doing ECMP to
>> route the ICMP packet in the same way as the flow would be. Did you
>> see a problem that warrants solving this case?
>
> The motivation here is to bring IPv6 ECMP routing on par with IPv4 to
> enable its wider use, targeting anycast services. Forwarding ICMP errors
> back to the source host, at the L3 layer, is what we thought would be a
> step forward.
>
> Similar to change in IPv4 routing introduced in commit 79a131592dbb
> ("ipv4: ICMP packet inspection for multipath", [1]) we do our best at
> L3, leaving any potential problems with LAG at lower layer (L2)
> unaddressed.
>
ICMP will almost certainly take a different path in the network than
TCP or UDP due to ECMP. If we ever get proper flow label support for
ECMP then that could solve the problem if all the devices do a hash
just on <srcIP, destIP, FlowLabel>.

If this patch is being done to be compatible with IPv4 I guess that's
okay, but it would be false advertisement to say this makes ICMP
follow the same path as the flow being targeted in an error.
Fortunately, I doubt anyone can have a dependency on this for ICMP.

In the realm of OAM with UDP encapsulation this requirement does come
up (that OAM messages can follow the same path as a particular flow).
That case is solvable by always using a UDP encapsulation with same
addresses, ports, and flow label. Unfortunately for that we still have
a few devices that insist on looking into the UDP payload to do
ECMP...

Tom

>>> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
>>> index 1184c2b..c0f38ea 100644
>>> --- a/net/ipv6/route.c
>>> +++ b/net/ipv6/route.c
>
> [...]
>
>>> @@ -1168,6 +1192,8 @@ void ip6_route_input(struct sk_buff *skb)
>>> tun_info = skb_tunnel_info(skb);
>>> if (tun_info && !(tun_info->mode & IP_TUNNEL_INFO_TX))
>>> fl6.flowi6_tun_key.tun_id = tun_info->key.tun_id;
>>> + if (unlikely(fl6.flowi6_proto == IPPROTO_ICMPV6))
>>> + fl6.mp_hash = ip6_multipath_icmp_hash(skb);
>>
>> I will point out that this is only
>
> Sorry, looks like part of your reply got cut short. Could you repost?
>
> -Jakub
>
> [1] https://git.kernel.org/torvalds/c/79a131592dbb81a2dba208622a2ffbfc53f28bc0

2016-10-30 13:03:17

by Jakub Sitnicki

[permalink] [raw]
Subject: Re: [PATCH net-next 5/5] ipv6: Compute multipath hash for forwarded ICMP errors from offending packet

On Fri, Oct 28, 2016 at 02:25 PM GMT, Tom Herbert wrote:
> On Fri, Oct 28, 2016 at 1:32 AM, Jakub Sitnicki <[email protected]> wrote:
>> On Thu, Oct 27, 2016 at 10:35 PM GMT, Tom Herbert wrote:
>>> On Mon, Oct 24, 2016 at 2:28 AM, Jakub Sitnicki <[email protected]> wrote:
>>>> Same as for the transmit path, let's do our best to ensure that received
>>>> ICMP errors that may be subject to forwarding will be routed the same
>>>> path as flow that triggered the error, if it was going in the opposite
>>>> direction.
>>>>
>>> Unfortunately our ability to do this is generally quite limited. This
>>> patch will select the route for multipath, but I don't believe sets
>>> the same link in LAG and definitely can't help switches doing ECMP to
>>> route the ICMP packet in the same way as the flow would be. Did you
>>> see a problem that warrants solving this case?
>>
>> The motivation here is to bring IPv6 ECMP routing on par with IPv4 to
>> enable its wider use, targeting anycast services. Forwarding ICMP errors
>> back to the source host, at the L3 layer, is what we thought would be a
>> step forward.
>>
>> Similar to change in IPv4 routing introduced in commit 79a131592dbb
>> ("ipv4: ICMP packet inspection for multipath", [1]) we do our best at
>> L3, leaving any potential problems with LAG at lower layer (L2)
>> unaddressed.
>>
> ICMP will almost certainly take a different path in the network than
> TCP or UDP due to ECMP. If we ever get proper flow label support for
> ECMP then that could solve the problem if all the devices do a hash
> just on <srcIP, destIP, FlowLabel>.

Sorry for my late reply, I have been traveling.

I think that either I am missing something here, or the proposed changes
address just the problem that you have described.

Yes, if we compute the hash that drives the route choice over the IP
header of the ICMP error, then there is no guarantee it will travel back
to the sender of the offending packet that triggered the error.

That is why, we look at the offending packet carried by an ICMP error
and hash over its fields, instead. We need, however, to take care of two
things:

1) swap the source with the destination address, because we are
forwarding the ICMP error in the opposite direction than the
offending packet was going (see icmpv6_multipath_hash() introduced in
patch 4/5); and

2) ensure the flow labels used in both directions are the same (either
reflected by one side, or fixed, e.g. not used and set to 0), so that
the 4-tuple we hash over when forwarding, <src addr, dst addr, flow
label, next hdr>, is the same both ways, modulo the order of
addresses.

> If this patch is being done to be compatible with IPv4 I guess that's
> okay, but it would be false advertisement to say this makes ICMP
> follow the same path as the flow being targeted in an error.
> Fortunately, I doubt anyone can have a dependency on this for ICMP.

I wouldn't want to propose anything that would be useless. If you think
that this is the case here, I would very much like to understand what
and why cannot work in practice.

Thanks for reviewing this series,
Jakub

2016-10-31 19:15:39

by David Miller

[permalink] [raw]
Subject: Re: [PATCH net-next 5/5] ipv6: Compute multipath hash for forwarded ICMP errors from offending packet

From: Jakub Sitnicki <[email protected]>
Date: Sun, 30 Oct 2016 14:03:11 +0100

> 2) ensure the flow labels used in both directions are the same (either
> reflected by one side, or fixed, e.g. not used and set to 0), so that
> the 4-tuple we hash over when forwarding, <src addr, dst addr, flow
> label, next hdr>, is the same both ways, modulo the order of
> addresses.

Even Linux, by default, does not do reflection.

See the flowlabel_consistency sysctl, which we set by default to '1'.

I think we need to think a lot more about how systems actually set and
use flowlabels.

Also, one issue I also had with this series was adding a new member
to the flow label. Is it possible to implement this like the ipv4
side did, by simply passing a new parameter around to the necessary
functions?

Thanks.

2016-10-31 19:25:41

by Tom Herbert

[permalink] [raw]
Subject: Re: [PATCH net-next 5/5] ipv6: Compute multipath hash for forwarded ICMP errors from offending packet

On Sun, Oct 30, 2016 at 6:03 AM, Jakub Sitnicki <[email protected]> wrote:
> On Fri, Oct 28, 2016 at 02:25 PM GMT, Tom Herbert wrote:
>> On Fri, Oct 28, 2016 at 1:32 AM, Jakub Sitnicki <[email protected]> wrote:
>>> On Thu, Oct 27, 2016 at 10:35 PM GMT, Tom Herbert wrote:
>>>> On Mon, Oct 24, 2016 at 2:28 AM, Jakub Sitnicki <[email protected]> wrote:
>>>>> Same as for the transmit path, let's do our best to ensure that received
>>>>> ICMP errors that may be subject to forwarding will be routed the same
>>>>> path as flow that triggered the error, if it was going in the opposite
>>>>> direction.
>>>>>
>>>> Unfortunately our ability to do this is generally quite limited. This
>>>> patch will select the route for multipath, but I don't believe sets
>>>> the same link in LAG and definitely can't help switches doing ECMP to
>>>> route the ICMP packet in the same way as the flow would be. Did you
>>>> see a problem that warrants solving this case?
>>>
>>> The motivation here is to bring IPv6 ECMP routing on par with IPv4 to
>>> enable its wider use, targeting anycast services. Forwarding ICMP errors
>>> back to the source host, at the L3 layer, is what we thought would be a
>>> step forward.
>>>
>>> Similar to change in IPv4 routing introduced in commit 79a131592dbb
>>> ("ipv4: ICMP packet inspection for multipath", [1]) we do our best at
>>> L3, leaving any potential problems with LAG at lower layer (L2)
>>> unaddressed.
>>>
>> ICMP will almost certainly take a different path in the network than
>> TCP or UDP due to ECMP. If we ever get proper flow label support for
>> ECMP then that could solve the problem if all the devices do a hash
>> just on <srcIP, destIP, FlowLabel>.
>
> Sorry for my late reply, I have been traveling.
>
> I think that either I am missing something here, or the proposed changes
> address just the problem that you have described.
>
> Yes, if we compute the hash that drives the route choice over the IP
> header of the ICMP error, then there is no guarantee it will travel back
> to the sender of the offending packet that triggered the error.
>
> That is why, we look at the offending packet carried by an ICMP error
> and hash over its fields, instead. We need, however, to take care of two
> things:
>
> 1) swap the source with the destination address, because we are
> forwarding the ICMP error in the opposite direction than the
> offending packet was going (see icmpv6_multipath_hash() introduced in
> patch 4/5); and
>
> 2) ensure the flow labels used in both directions are the same (either
> reflected by one side, or fixed, e.g. not used and set to 0), so that
> the 4-tuple we hash over when forwarding, <src addr, dst addr, flow
> label, next hdr>, is the same both ways, modulo the order of
> addresses.
>
>> If this patch is being done to be compatible with IPv4 I guess that's
>> okay, but it would be false advertisement to say this makes ICMP
>> follow the same path as the flow being targeted in an error.
>> Fortunately, I doubt anyone can have a dependency on this for ICMP.
>
> I wouldn't want to propose anything that would be useless. If you think
> that this is the case here, I would very much like to understand what
> and why cannot work in practice.
>
The normal hash for TCP or UDP using ECMP is over <protocol, srcIP,
dstIP, srcPort, dstPort>. For an ICMP packet ECMP would most likely be
done over <protocol, srcIP, dstIP>. There really is no way to ensure
that an ICMP packet will follow the same path as TCP or any other
protocol. Fortunately, this is really isn't so terrible. The Internet
has worked this way ever since routers started using ports as input to
ECMP and that hasn't caused any major meltdown.

Tom

> Thanks for reviewing this series,
> Jakub

2016-11-01 15:20:13

by Jakub Sitnicki

[permalink] [raw]
Subject: Re: [PATCH net-next 5/5] ipv6: Compute multipath hash for forwarded ICMP errors from offending packet

On Mon, Oct 31, 2016 at 07:15 PM GMT, David Miller wrote:
> From: Jakub Sitnicki <[email protected]>
> Date: Sun, 30 Oct 2016 14:03:11 +0100
>
>> 2) ensure the flow labels used in both directions are the same (either
>> reflected by one side, or fixed, e.g. not used and set to 0), so that
>> the 4-tuple we hash over when forwarding, <src addr, dst addr, flow
>> label, next hdr>, is the same both ways, modulo the order of
>> addresses.
>
> Even Linux, by default, does not do reflection.
>
> See the flowlabel_consistency sysctl, which we set by default to '1'.

Yes, unfortunately, if Linux-based hosts are used as sending/receiving
IPv6, ICMP error forwarding will not work out of the box. Users will be
burdened with adjusting the runtime network stack config, as you point
out, or otherwise instructing the apps to set the flow label,
e.g. traceroute6 -I <flow label> ...

> I think we need to think a lot more about how systems actually set and
> use flowlabels.

The only alternative I can think of, only for ECMP routing, is having a
toggle option that would exclude the flow label from the input to the
multipath hash.

We would be sacrificing the entropy that potentially comes from hashing
over the flow label, leading to better flow balancing. But we wouldn't
be making IPv6 multipath routing any worse than IPv4 is in that regard.
And user-space apps wouldn't need to resort to reflecting/setting the
label, just like with IPv4.

Is that something that you would consider a viable option?

> Also, one issue I also had with this series was adding a new member
> to the flow label. Is it possible to implement this like the ipv4
> side did, by simply passing a new parameter around to the necessary
> functions?

This was my initial approach, i.e. to mimic the IPv4 and pass the
multipath hash down the call chain via a parameter. However, I gave up
on it, thinking it will cause too much disturbance in the involved
functions' interfaces, when I realized that one of the call paths the
multipath hash would have to also be passed through is:

ip6_route_input_lookup
fib6_rule_lookup
fib_rules_lookup
fib6_rule_action
ip6_pol_route_input

To be honest, I was thinking that if extending flowi6 structure would
find acceptance, then maybe the new member could be at some point moved
to flowi_common and also used by IPv4 to avoid parameter passing there
as well.

Thanks,
Jakub

2016-11-01 15:35:15

by David Miller

[permalink] [raw]
Subject: Re: [PATCH net-next 5/5] ipv6: Compute multipath hash for forwarded ICMP errors from offending packet

From: Jakub Sitnicki <[email protected]>
Date: Tue, 01 Nov 2016 16:13:51 +0100

> On Mon, Oct 31, 2016 at 07:15 PM GMT, David Miller wrote:
>> From: Jakub Sitnicki <[email protected]>
>> Date: Sun, 30 Oct 2016 14:03:11 +0100
>>
>>> 2) ensure the flow labels used in both directions are the same (either
>>> reflected by one side, or fixed, e.g. not used and set to 0), so that
>>> the 4-tuple we hash over when forwarding, <src addr, dst addr, flow
>>> label, next hdr>, is the same both ways, modulo the order of
>>> addresses.
>>
>> Even Linux, by default, does not do reflection.
>>
>> See the flowlabel_consistency sysctl, which we set by default to '1'.
>
> Yes, unfortunately, if Linux-based hosts are used as sending/receiving
> IPv6, ICMP error forwarding will not work out of the box. Users will be
> burdened with adjusting the runtime network stack config, as you point
> out, or otherwise instructing the apps to set the flow label,
> e.g. traceroute6 -I <flow label> ...

I'm pretty sure that sysctl default was choosen intentionally, and we
actively are _encouraging_ the world to not depend upon reflection in
any way, shape, or form.

And it's kind of pointless to suggest a facility that can't work with
Linux endpoints out of the box.

This was the point I'm trying to make.

If the intentions of that sysctl default does pan out, the idea is for
the world to move towards arbitrary flow label settings, even perhaps
changing over time. The intention is to make this more, not less,
common. And the idea is to give maximum flexibility for endpoints to
set these flow labels, in order to increase entropy wherever possible.

I have a really hard time accepting a "fix" that depends upon behavior
that the Linux ipv6 stack doesn't even have.

2016-11-01 15:43:24

by Jakub Sitnicki

[permalink] [raw]
Subject: Re: [PATCH net-next 5/5] ipv6: Compute multipath hash for forwarded ICMP errors from offending packet

On Mon, Oct 31, 2016 at 07:25 PM GMT, Tom Herbert wrote:
> On Sun, Oct 30, 2016 at 6:03 AM, Jakub Sitnicki <[email protected]> wrote:
>> On Fri, Oct 28, 2016 at 02:25 PM GMT, Tom Herbert wrote:
>>>
>>> If this patch is being done to be compatible with IPv4 I guess that's
>>> okay, but it would be false advertisement to say this makes ICMP
>>> follow the same path as the flow being targeted in an error.
>>> Fortunately, I doubt anyone can have a dependency on this for ICMP.
>>
>> I wouldn't want to propose anything that would be useless. If you think
>> that this is the case here, I would very much like to understand what
>> and why cannot work in practice.
>>
> The normal hash for TCP or UDP using ECMP is over <protocol, srcIP,
> dstIP, srcPort, dstPort>. For an ICMP packet ECMP would most likely be
> done over <protocol, srcIP, dstIP>. There really is no way to ensure
> that an ICMP packet will follow the same path as TCP or any other
> protocol. Fortunately, this is really isn't so terrible. The Internet
> has worked this way ever since routers started using ports as input to
> ECMP and that hasn't caused any major meltdown.

Ahh, I see the problem now. Thank you for clearing it up for me.

You are right, for locally generated TCP/UDP traffic we are computing an
L4 hash (over the 5-tuple that you mentioned) that drives the multipath
routing. While when sending locally generated ICMP errors we are only
computing an L3 hash (over the mentioned 3-tuple).

I believe that is a problem affects both IPv6 and IPv4, and manifests
itself only when the offending packet that triggers the error is
destined for the ECMP router.

When an ECMP router is: (i) sending an ICMP error in reaction to a
packet that was to be forwarded, or (ii) forwarding an ICMP error,
everything works as expected. That is because when forwarding traffic we
limit ourselves to computing an L3 hash so that the IP fragments are
routed together. Right?

So, my understanding is that, with these changes, things are not perfect
but we are not worse than IPv4 right now.

Would you be okay with this series if I update the patch 4/5 description
to highlight the existing problem that you point out? A fix for this
IPv4/6 common issue could follow afterwards.

Thanks,
Jakub

2016-11-01 16:25:14

by Hannes Frederic Sowa

[permalink] [raw]
Subject: Re: [PATCH net-next 5/5] ipv6: Compute multipath hash for forwarded ICMP errors from offending packet

On 31.10.2016 20:25, Tom Herbert wrote:
> The normal hash for TCP or UDP using ECMP is over <protocol, srcIP,
> dstIP, srcPort, dstPort>. For an ICMP packet ECMP would most likely be
> done over <protocol, srcIP, dstIP>. There really is no way to ensure
> that an ICMP packet will follow the same path as TCP or any other
> protocol. Fortunately, this is really isn't so terrible. The Internet
> has worked this way ever since routers started using ports as input to
> ECMP and that hasn't caused any major meltdown.

The normal hash for forwarding is without srcPort or dstPort, so the
same as ICMP and especially also because of fragmentation problematic I
don't think a lot of routers use L4 port information for ECMP either.

Bye,
Hannes

2016-11-01 16:26:27

by Jakub Sitnicki

[permalink] [raw]
Subject: Re: [PATCH net-next 5/5] ipv6: Compute multipath hash for forwarded ICMP errors from offending packet

On Tue, Nov 01, 2016 at 03:35 PM GMT, David Miller wrote:
> From: Jakub Sitnicki <[email protected]>
> Date: Tue, 01 Nov 2016 16:13:51 +0100
>
>> On Mon, Oct 31, 2016 at 07:15 PM GMT, David Miller wrote:
>>> From: Jakub Sitnicki <[email protected]>
>>> Date: Sun, 30 Oct 2016 14:03:11 +0100
>>>
>>>> 2) ensure the flow labels used in both directions are the same (either
>>>> reflected by one side, or fixed, e.g. not used and set to 0), so that
>>>> the 4-tuple we hash over when forwarding, <src addr, dst addr, flow
>>>> label, next hdr>, is the same both ways, modulo the order of
>>>> addresses.
>>>
>>> Even Linux, by default, does not do reflection.
>>>
>>> See the flowlabel_consistency sysctl, which we set by default to '1'.
>>
>> Yes, unfortunately, if Linux-based hosts are used as sending/receiving
>> IPv6, ICMP error forwarding will not work out of the box. Users will be
>> burdened with adjusting the runtime network stack config, as you point
>> out, or otherwise instructing the apps to set the flow label,
>> e.g. traceroute6 -I <flow label> ...
>
> I'm pretty sure that sysctl default was choosen intentionally, and we
> actively are _encouraging_ the world to not depend upon reflection in
> any way, shape, or form.
>
> And it's kind of pointless to suggest a facility that can't work with
> Linux endpoints out of the box.
>
> This was the point I'm trying to make.
>
> If the intentions of that sysctl default does pan out, the idea is for
> the world to move towards arbitrary flow label settings, even perhaps
> changing over time. The intention is to make this more, not less,
> common. And the idea is to give maximum flexibility for endpoints to
> set these flow labels, in order to increase entropy wherever possible.
>
> I have a really hard time accepting a "fix" that depends upon behavior
> that the Linux ipv6 stack doesn't even have.

Fair enough. I'm not questioning the defaults or the benefits of
widespread use of flow labels.

I was trying to do this without changing as to how we hash the packets
and balance traffic over multiple paths, but that does yield a solution
that does not work out of the box with Linux endpoints. Hard to sell, I
agree.

As a potential way out, I can rework it so that we exclude the flow
label from the multipath hash. That way we lose some entropy (not worse
than IPv4), but do not depend any more on how flow labels are set
(flexible). This could be made runtime configurable, as it changes
existing behavior.

Thanks,
Jakub

2016-11-01 16:28:01

by Hannes Frederic Sowa

[permalink] [raw]
Subject: Re: [PATCH net-next 5/5] ipv6: Compute multipath hash for forwarded ICMP errors from offending packet

On 01.11.2016 16:35, David Miller wrote:
> I have a really hard time accepting a "fix" that depends upon behavior
> that the Linux ipv6 stack doesn't even have.

We actually support this feature:

commit df3687ffc6653e4d32168338b4dee20c164ed7c9
Author: Florent Fourcot <[email protected]>
Date: Fri Jan 17 17:15:03 2014 +0100

ipv6: add the IPV6_FL_F_REFLECT flag to IPV6_FL_A_GET

Add that time I tried to stick to the common practices at that time and
was against any kind of global sysctl enabling a reflection mode for
flowlabels. State of art was to keep some uniqueness in flowlabels, thus
also the introduction of the flowlabel_consistency label.

We basically can support reflection nowadays globally pretty easy, as we
have soften these rules a lot.

Bye,
Hannes

2016-11-01 16:39:09

by David Miller

[permalink] [raw]
Subject: Re: [PATCH net-next 5/5] ipv6: Compute multipath hash for forwarded ICMP errors from offending packet

From: Hannes Frederic Sowa <[email protected]>
Date: Tue, 1 Nov 2016 17:27:56 +0100

> On 01.11.2016 16:35, David Miller wrote:
>> I have a really hard time accepting a "fix" that depends upon behavior
>> that the Linux ipv6 stack doesn't even have.
>
> We actually support this feature:

But it is forbidden when the sysctl I mentioned is set, which is the
default.

I'm talking about default behavior, which is to not reflect.

2016-11-01 16:39:27

by Tom Herbert

[permalink] [raw]
Subject: Re: [PATCH net-next 5/5] ipv6: Compute multipath hash for forwarded ICMP errors from offending packet

On Tue, Nov 1, 2016 at 9:25 AM, Hannes Frederic Sowa
<[email protected]> wrote:
> On 31.10.2016 20:25, Tom Herbert wrote:
>> The normal hash for TCP or UDP using ECMP is over <protocol, srcIP,
>> dstIP, srcPort, dstPort>. For an ICMP packet ECMP would most likely be
>> done over <protocol, srcIP, dstIP>. There really is no way to ensure
>> that an ICMP packet will follow the same path as TCP or any other
>> protocol. Fortunately, this is really isn't so terrible. The Internet
>> has worked this way ever since routers started using ports as input to
>> ECMP and that hasn't caused any major meltdown.
>
> The normal hash for forwarding is without srcPort or dstPort, so the
> same as ICMP and especially also because of fragmentation problematic I
> don't think a lot of routers use L4 port information for ECMP either.
>
I don't think we can define a "normal hash". There is no requirement
that routers do ECMP a certain way, or that they do ECMP, or that for
that matter that they even consistently route packets for the same
flow. All of this is optimization, not something we can rely on
operationally. So in the general case, regardless of anything we do in
the stack, either ICMP packets will follow the same path as the flow
are they won't. If they don't things still need to to work. So I still
don't see what material benefit this patch gives us.

Tom

> Bye,
> Hannes
>

2016-11-01 16:54:18

by Hannes Frederic Sowa

[permalink] [raw]
Subject: Re: [PATCH net-next 5/5] ipv6: Compute multipath hash for forwarded ICMP errors from offending packet

Hello,

On 01.11.2016 17:39, Tom Herbert wrote:
> On Tue, Nov 1, 2016 at 9:25 AM, Hannes Frederic Sowa
> <[email protected]> wrote:
>> On 31.10.2016 20:25, Tom Herbert wrote:
>>> The normal hash for TCP or UDP using ECMP is over <protocol, srcIP,
>>> dstIP, srcPort, dstPort>. For an ICMP packet ECMP would most likely be
>>> done over <protocol, srcIP, dstIP>. There really is no way to ensure
>>> that an ICMP packet will follow the same path as TCP or any other
>>> protocol. Fortunately, this is really isn't so terrible. The Internet
>>> has worked this way ever since routers started using ports as input to
>>> ECMP and that hasn't caused any major meltdown.
>>
>> The normal hash for forwarding is without srcPort or dstPort, so the
>> same as ICMP and especially also because of fragmentation problematic I
>> don't think a lot of routers use L4 port information for ECMP either.
>>
> I don't think we can define a "normal hash". There is no requirement
> that routers do ECMP a certain way, or that they do ECMP, or that for
> that matter that they even consistently route packets for the same
> flow. All of this is optimization, not something we can rely on
> operationally. So in the general case, regardless of anything we do in
> the stack, either ICMP packets will follow the same path as the flow
> are they won't. If they don't things still need to to work. So I still
> don't see what material benefit this patch gives us.

There certainly is no standard ECMP hash algorithm. ;)

Even Linux IPv6 ECMP behaved like that for a long time (very bad). It
just routed put packets on different links without consulting any hash.
That exactly was the reason why it was unusable and was upgraded some
while ago.

I do remember a lot of IPv6 PMTU blackholes in the past, so every patch
that improves connectivity here seems valuable to me, even if it does
not fix the problem completely in the end.

Bye,
Hannes

2016-11-01 16:59:45

by Hannes Frederic Sowa

[permalink] [raw]
Subject: Re: [PATCH net-next 5/5] ipv6: Compute multipath hash for forwarded ICMP errors from offending packet

On 01.11.2016 17:39, David Miller wrote:
> From: Hannes Frederic Sowa <[email protected]>
> Date: Tue, 1 Nov 2016 17:27:56 +0100
>
>> On 01.11.2016 16:35, David Miller wrote:
>>> I have a really hard time accepting a "fix" that depends upon behavior
>>> that the Linux ipv6 stack doesn't even have.
>>
>> We actually support this feature:
>
> But it is forbidden when the sysctl I mentioned is set, which is the
> default.
>
> I'm talking about default behavior, which is to not reflect.

Oh, yes, understood.

I think we can flip this sysctl by default to off: current default
kernel config actually generates flow labels on its own, so the
description of this sysctl is violated by default anyway, as it doesn't
preserve the uniqueness anymore.

Bye,
Hannes