2018-02-07 16:40:00

by Yves-Alexis Perez

[permalink] [raw]
Subject: Regression for ip6-in-ip4 IPsec tunnel in 4.14.16

Hi Mike,

I noticed a regression in 4.14.16 stable kernel (I assume it's also
present in mainline).

I'm using an IPsec setup where I tunnel all my IP trafic to a gateway.
The tunnel can use either IPv6 or IPv4 (depending on what's available
locally) and will route both IPv4 and IPv6 (my gateway will assign both
family addresses).

The tunnel doesn't use ESP directly but rather encapsulates in UDP.

Starting with 4.14.16, IPv6 traffic is broken. When trying a simple ping
to an IPv6 address I get:

ping: sendmsg: Invalid argument

I bisected 4.14.15 to 4.14.16 and got the attached bisect log, which
ends with 8278804e05f6bcfe3fdfea4a404020752ead15a6 as the offending
commit. The -EINVAL is consistent with the “Invalid argument” return
from ping. I didn't try yet on a pure IPv6 setup (without IPsec
tunneling) but will followup when I have a chance to test it.

Reverting that commit on top of 4.14.17 fixes the problem.

If you need more info, please ask.

Regards,
--
Yves-Alexis


Attachments:
ip6-bisect.log (1.28 kB)
signature.asc (499.00 B)
This is a digitally signed message part
Download all attachments

2018-02-07 17:07:25

by Yves-Alexis Perez

[permalink] [raw]
Subject: Re: Regression for ip6-in-ip4 IPsec tunnel in 4.14.16

On Wed, 2018-02-07 at 17:38 +0100, Yves-Alexis Perez wrote:
> Starting with 4.14.16, IPv6 traffic is broken. When trying a simple ping
> to an IPv6 address I get:
>
> ping: sendmsg: Invalid argument

I forgot an important bit of information: due to the way routers usually broke
path MTU discovery by filtering ICMP, I'm lowering the MTU to 1280 (so exactly
IPV6_MIN_MTU) when using IPsec.

The MTU is configured by the IKE daemon (strongSwan, thus adding Tobias to
CC:) in routing table 220:

default via 192.168.28.254 dev eth0 proto static src 192.168.0.129 mtu 1280 advmss 1320

I'll try to printk the mtu before returning EINVAL to see why it's lower than
1280, but maybe the IP encapsulation is not correctly handled?

Regards,
--
Yves-Alexis


Attachments:
signature.asc (499.00 B)
This is a digitally signed message part

2018-02-07 17:24:11

by Yves-Alexis Perez

[permalink] [raw]
Subject: Re: Regression for ip6-in-ip4 IPsec tunnel in 4.14.16

On Wed, 2018-02-07 at 18:05 +0100, Yves-Alexis Perez wrote:
> I'll try to printk the mtu before returning EINVAL to see why it's lower than
> 1280, but maybe the IP encapsulation is not correctly handled?

I did:

diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 3763dc01e374..d3c651158d35 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1215,7 +1215,7 @@ static int ip6_setup_cork(struct sock *sk, struct inet_cork_full *cork,
mtu = np->frag_size;
}
if (mtu < IPV6_MIN_MTU)
- return -EINVAL;
+ printk("mtu: %d\n", mtu);
cork->base.fragsize = mtu;
if (dst_allfrag(rt->dst.path))
cork->base.flags |= IPCORK_ALLFRAG;

and I get:

févr. 07 18:19:50 scapa kernel: mtu: 1218

and it doesn't depend on the original packet size (same thing happens with
ping -s 100). It also happens with UDP (DNS) traffic, but apparently not with
TCP.

Regards,
--
Yves-Alexis


Attachments:
signature.asc (499.00 B)
This is a digitally signed message part

2018-02-07 18:52:03

by Mike Maloney

[permalink] [raw]
Subject: Re: Regression for ip6-in-ip4 IPsec tunnel in 4.14.16

On Wed, Feb 7, 2018 at 12:23 PM, Yves-Alexis Perez <[email protected]> wrote:
> On Wed, 2018-02-07 at 18:05 +0100, Yves-Alexis Perez wrote:
>> I'll try to printk the mtu before returning EINVAL to see why it's lower than
>> 1280, but maybe the IP encapsulation is not correctly handled?
>
> I did:
>
> diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
> index 3763dc01e374..d3c651158d35 100644
> --- a/net/ipv6/ip6_output.c
> +++ b/net/ipv6/ip6_output.c
> @@ -1215,7 +1215,7 @@ static int ip6_setup_cork(struct sock *sk, struct inet_cork_full *cork,
> mtu = np->frag_size;
> }
> if (mtu < IPV6_MIN_MTU)
> - return -EINVAL;
> + printk("mtu: %d\n", mtu);
> cork->base.fragsize = mtu;
> if (dst_allfrag(rt->dst.path))
> cork->base.flags |= IPCORK_ALLFRAG;
>
> and I get:
>
> févr. 07 18:19:50 scapa kernel: mtu: 1218
>
> and it doesn't depend on the original packet size (same thing happens with
> ping -s 100). It also happens with UDP (DNS) traffic, but apparently not with
> TCP.
>
> Regards,
> --
> Yves-Alexis

Hi Yves-Alexis -

I apologize for the problem. It seems to me that tunneling with an
outer MTU that causes the inner MTU to be smaller than the min, is
potentially problematic in other ways as well.

But also it could seem unfortunate that the code with my fix does not
look at actual packet size, but instead only looks at the MTU and then
fails, even if no packet was going to be so large. The intention of
my patch was to prevent a negative number while calculating the
maxfraglen in __ip6_append_data(). An alternative fix maybe to
instead return an error only if the mtu is less than or equal to the
fragheaderlen. Something like:

diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 3763dc01e374..5d912a289b95 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1214,8 +1214,6 @@ static int ip6_setup_cork(struct sock *sk,
struct inet_cork_full *cork,
if (np->frag_size)
mtu = np->frag_size;
}
- if (mtu < IPV6_MIN_MTU)
- return -EINVAL;
cork->base.fragsize = mtu;
if (dst_allfrag(rt->dst.path))
cork->base.flags |= IPCORK_ALLFRAG;
@@ -1264,6 +1262,8 @@ static int __ip6_append_data(struct sock *sk,

fragheaderlen = sizeof(struct ipv6hdr) + rt->rt6i_nfheader_len +
(opt ? opt->opt_nflen : 0);
+ if (mtu < fragheaderlen + 8)
+ return -EINVAL;
maxfraglen = ((mtu - fragheaderlen) & ~7) + fragheaderlen -
sizeof(struct frag_hdr);
(opt ? opt->opt_nflen : 0);

But then we also have to convince ourselves that maxfraglen can never
be <= 0. I'd have to think about that.

I am not sure if others have thoughts on supporting MTUs configured
below the min in the spec.


Thanks.
--
Mike Maloney

2018-02-07 19:47:57

by Yves-Alexis Perez

[permalink] [raw]
Subject: Re: Regression for ip6-in-ip4 IPsec tunnel in 4.14.16

On Wed, 2018-02-07 at 13:50 -0500, Mike Maloney wrote:
> On Wed, Feb 7, 2018 at 12:23 PM, Yves-Alexis Perez <[email protected]>
>
> Hi Yves-Alexis -
>
> I apologize for the problem. It seems to me that tunneling with an
> outer MTU that causes the inner MTU to be smaller than the min, is
> potentially problematic in other ways as well.

Maybe. I tried with removing the MTU setting, and I get (on ping again)

févr. 07 20:44:01 scapa kernel: mtu: 1266

which means I would get -EINVAL on standards kernels, which is not really good
either.

> But also it could seem unfortunate that the code with my fix does not
> look at actual packet size, but instead only looks at the MTU and then
> fails, even if no packet was going to be so large. The intention of
> my patch was to prevent a negative number while calculating the
> maxfraglen in __ip6_append_data(). An alternative fix maybe to
> instead return an error only if the mtu is less than or equal to the
> fragheaderlen. Something like:
>
> diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
> index 3763dc01e374..5d912a289b95 100644
> --- a/net/ipv6/ip6_output.c
> +++ b/net/ipv6/ip6_output.c
> @@ -1214,8 +1214,6 @@ static int ip6_setup_cork(struct sock *sk,
> struct inet_cork_full *cork,
> if (np->frag_size)
> mtu = np->frag_size;
> }
> - if (mtu < IPV6_MIN_MTU)
> - return -EINVAL;
> cork->base.fragsize = mtu;
> if (dst_allfrag(rt->dst.path))
> cork->base.flags |= IPCORK_ALLFRAG;
> @@ -1264,6 +1262,8 @@ static int __ip6_append_data(struct sock *sk,
>
> fragheaderlen = sizeof(struct ipv6hdr) + rt->rt6i_nfheader_len +
> (opt ? opt->opt_nflen : 0);
> + if (mtu < fragheaderlen + 8)
> + return -EINVAL;
> maxfraglen = ((mtu - fragheaderlen) & ~7) + fragheaderlen -
> sizeof(struct frag_hdr);
> (opt ? opt->opt_nflen : 0);
>
> But then we also have to convince ourselves that maxfraglen can never
> be <= 0. I'd have to think about that.
>
> I am not sure if others have thoughts on supporting MTUs configured
> below the min in the spec.
>
Here, the MTU is not below, so I'm not sure what happens.

Regards,
--
Yves-Alexis


Attachments:
signature.asc (499.00 B)
This is a digitally signed message part

2018-02-08 12:41:33

by Yves-Alexis Perez

[permalink] [raw]
Subject: Re: Regression for ip6-in-ip4 IPsec tunnel in 4.14.16

On Wed, 2018-02-07 at 20:46 +0100, Yves-Alexis Perez wrote:
> Maybe. I tried with removing the MTU setting, and I get (on ping again)
>
> févr. 07 20:44:01 scapa kernel: mtu: 1266
>
> which means I would get -EINVAL on standards kernels, which is not really good
> either.

Actually after rebooting on the Debian 4.14.17 kernel, with the outter MTU
unset (or set to 1360), I don't get the -EINVAL anymore, so maybe it'd be OK.

Unfortunately I have the feeling that debugging this will be a bit tricky for
other people. MTU tuning for tunnels are always a bit confusing, but having
the kernel return EINVAL on a ping doesn't really help narrowing it down to
that MTU setting. Not sure if some kind of logging could be added to help?

Regards,
--
Yves-Alexis


Attachments:
signature.asc (499.00 B)
This is a digitally signed message part