This change makes it so that we use icmpv6_send to report PMTU issues back
into tunnels in the case that the resulting packet is larger than the MTU
of the outgoing interface. Previously xfrm_local_error was being used in
this case, however this was resulting in no changes, I suspect due to the
fact that the tunnel itself was being kept out of the loop.
This patch fixes PMTU problems seen on ip6_vti tunnels and is based on the
behavior seen if the socket was orphaned. Instead of requiring the socket
to be orphaned this patch simply defaults to using icmpv6_send in the case
that the frame came though a tunnel.
Signed-off-by: Alexander Duyck <[email protected]>
---
net/ipv6/xfrm6_output.c | 18 ++++++++++++------
1 file changed, 12 insertions(+), 6 deletions(-)
diff --git a/net/ipv6/xfrm6_output.c b/net/ipv6/xfrm6_output.c
index 09c76a7b474d..6f9b514d0e38 100644
--- a/net/ipv6/xfrm6_output.c
+++ b/net/ipv6/xfrm6_output.c
@@ -72,6 +72,7 @@ static int xfrm6_tunnel_check_size(struct sk_buff *skb)
{
int mtu, ret = 0;
struct dst_entry *dst = skb_dst(skb);
+ struct xfrm_state *x = dst->xfrm;
mtu = dst_mtu(dst);
if (mtu < IPV6_MIN_MTU)
@@ -82,7 +83,7 @@ static int xfrm6_tunnel_check_size(struct sk_buff *skb)
if (xfrm6_local_dontfrag(skb))
xfrm6_local_rxpmtu(skb, mtu);
- else if (skb->sk)
+ else if (skb->sk && x->props.mode != XFRM_MODE_TUNNEL)
xfrm_local_error(skb, mtu);
else
icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu);
@@ -149,11 +150,16 @@ static int __xfrm6_output(struct sock *sk, struct sk_buff *skb)
else
mtu = dst_mtu(skb_dst(skb));
- if (skb->len > mtu && xfrm6_local_dontfrag(skb)) {
- xfrm6_local_rxpmtu(skb, mtu);
- return -EMSGSIZE;
- } else if (!skb->ignore_df && skb->len > mtu && skb->sk) {
- xfrm_local_error(skb, mtu);
+ if (!skb->ignore_df && skb->len > mtu) {
+ skb->dev = dst->dev;
+
+ if (xfrm6_local_dontfrag(skb))
+ xfrm6_local_rxpmtu(skb, mtu);
+ else if (skb->sk && x->props.mode != XFRM_MODE_TUNNEL)
+ xfrm_local_error(skb, mtu);
+ else
+ icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu);
+
return -EMSGSIZE;
}
On Wed, May 27, 2015 at 10:40:32AM -0700, Alexander Duyck wrote:
> This change makes it so that we use icmpv6_send to report PMTU issues back
> into tunnels in the case that the resulting packet is larger than the MTU
> of the outgoing interface. Previously xfrm_local_error was being used in
> this case, however this was resulting in no changes, I suspect due to the
> fact that the tunnel itself was being kept out of the loop.
>
> This patch fixes PMTU problems seen on ip6_vti tunnels and is based on the
> behavior seen if the socket was orphaned. Instead of requiring the socket
> to be orphaned this patch simply defaults to using icmpv6_send in the case
> that the frame came though a tunnel.
>
> Signed-off-by: Alexander Duyck <[email protected]>
Does this still work with normal tunnel mode and identical inner
and outer addresses? I recall we used to have a bug where in that
situation the kernel would interpret the ICMP message as a reduction
in outer MTU and thus resulting in a loop where the MTU keeps
getting smaller.
Cheers,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
On Thu, May 28, 2015 at 12:49:19PM +0800, Herbert Xu wrote:
> On Wed, May 27, 2015 at 10:40:32AM -0700, Alexander Duyck wrote:
> > This change makes it so that we use icmpv6_send to report PMTU issues back
> > into tunnels in the case that the resulting packet is larger than the MTU
> > of the outgoing interface. Previously xfrm_local_error was being used in
> > this case, however this was resulting in no changes, I suspect due to the
> > fact that the tunnel itself was being kept out of the loop.
> >
> > This patch fixes PMTU problems seen on ip6_vti tunnels and is based on the
> > behavior seen if the socket was orphaned. Instead of requiring the socket
> > to be orphaned this patch simply defaults to using icmpv6_send in the case
> > that the frame came though a tunnel.
> >
> > Signed-off-by: Alexander Duyck <[email protected]>
>
> Does this still work with normal tunnel mode and identical inner
> and outer addresses? I recall we used to have a bug where in that
> situation the kernel would interpret the ICMP message as a reduction
> in outer MTU and thus resulting in a loop where the MTU keeps
> getting smaller.
Right, I think this reintroduces a bug that I fixed some years ago with
commit dd767856a36e ("xfrm6: Don't call icmpv6_send on local error")
On Wed, May 27, 2015 at 10:40:32AM -0700, Alexander Duyck wrote:
> This change makes it so that we use icmpv6_send to report PMTU issues back
> into tunnels in the case that the resulting packet is larger than the MTU
> of the outgoing interface. Previously xfrm_local_error was being used in
> this case, however this was resulting in no changes, I suspect due to the
> fact that the tunnel itself was being kept out of the loop.
>
> This patch fixes PMTU problems seen on ip6_vti tunnels and is based on the
> behavior seen if the socket was orphaned. Instead of requiring the socket
> to be orphaned this patch simply defaults to using icmpv6_send in the case
> that the frame came though a tunnel.
We can use icmpv6_send() just in the case that the packet
was already transmitted by a tunnel device, otherwise we
get the bug back that I mentioned in my other mail.
Not sure if we have something to know that the packet
traversed a tunnel device. That's what I asked in the
thread 'Looking for a lost patch'.
On 05/27/2015 10:36 PM, Steffen Klassert wrote:
> On Wed, May 27, 2015 at 10:40:32AM -0700, Alexander Duyck wrote:
>> This change makes it so that we use icmpv6_send to report PMTU issues back
>> into tunnels in the case that the resulting packet is larger than the MTU
>> of the outgoing interface. Previously xfrm_local_error was being used in
>> this case, however this was resulting in no changes, I suspect due to the
>> fact that the tunnel itself was being kept out of the loop.
>>
>> This patch fixes PMTU problems seen on ip6_vti tunnels and is based on the
>> behavior seen if the socket was orphaned. Instead of requiring the socket
>> to be orphaned this patch simply defaults to using icmpv6_send in the case
>> that the frame came though a tunnel.
> We can use icmpv6_send() just in the case that the packet
> was already transmitted by a tunnel device, otherwise we
> get the bug back that I mentioned in my other mail.
>
> Not sure if we have something to know that the packet
> traversed a tunnel device. That's what I asked in the
> thread 'Looking for a lost patch'.
Okay I will try to do some more digging. From what I can tell right now
it looks like my ping attempts are getting hung up on the
xfrm_local_error in __xfrm6_output. I wonder if we couldn't somehow
make use of the skb->cb to store a pointer to the tunnel that could be
checked to determine if we are going through a VTI or not.
- Alex
On Thu, May 28, 2015 at 12:18:51AM -0700, Alexander Duyck wrote:
> On 05/27/2015 10:36 PM, Steffen Klassert wrote:
> >On Wed, May 27, 2015 at 10:40:32AM -0700, Alexander Duyck wrote:
> >>This change makes it so that we use icmpv6_send to report PMTU issues back
> >>into tunnels in the case that the resulting packet is larger than the MTU
> >>of the outgoing interface. Previously xfrm_local_error was being used in
> >>this case, however this was resulting in no changes, I suspect due to the
> >>fact that the tunnel itself was being kept out of the loop.
> >>
> >>This patch fixes PMTU problems seen on ip6_vti tunnels and is based on the
> >>behavior seen if the socket was orphaned. Instead of requiring the socket
> >>to be orphaned this patch simply defaults to using icmpv6_send in the case
> >>that the frame came though a tunnel.
> >We can use icmpv6_send() just in the case that the packet
> >was already transmitted by a tunnel device, otherwise we
> >get the bug back that I mentioned in my other mail.
> >
> >Not sure if we have something to know that the packet
> >traversed a tunnel device. That's what I asked in the
> >thread 'Looking for a lost patch'.
>
> Okay I will try to do some more digging. From what I can tell right
> now it looks like my ping attempts are getting hung up on the
> xfrm_local_error in __xfrm6_output. I wonder if we couldn't somehow
> make use of the skb->cb to store a pointer to the tunnel that could
> be checked to determine if we are going through a VTI or not.
Maybe it is as easy as the patch below, could you please test it?
Subject: [PATCH RFC] vti6: Add pmtu handling to vti6_xmit.
We currently rely on the PMTU discovery of xfrm.
However if a packet is localy sent, the PMTU mechanism
of xfrm tries to to local socket notification what
might not work for applications like ping that don't
check for this. So add pmtu handling to vti6_xmit to
report MTU changes immediately.
Signed-off-by: Steffen Klassert <[email protected]>
---
net/ipv6/ip6_vti.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/net/ipv6/ip6_vti.c b/net/ipv6/ip6_vti.c
index ff3bd86..13cb771 100644
--- a/net/ipv6/ip6_vti.c
+++ b/net/ipv6/ip6_vti.c
@@ -434,6 +434,7 @@ vti6_xmit(struct sk_buff *skb, struct net_device *dev, struct flowi *fl)
struct dst_entry *dst = skb_dst(skb);
struct net_device *tdev;
struct xfrm_state *x;
+ int mtu;
int err = -1;
if (!dst)
@@ -468,6 +469,15 @@ vti6_xmit(struct sk_buff *skb, struct net_device *dev, struct flowi *fl)
skb_dst_set(skb, dst);
skb->dev = skb_dst(skb)->dev;
+ mtu = dst_mtu(dst);
+ if (!skb->ignore_df && skb->len > mtu) {
+ skb_dst(skb)->ops->update_pmtu(dst, NULL, skb, mtu);
+
+ icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu);
+
+ return -EMSGSIZE;
+ }
+
err = dst_output(skb);
if (net_xmit_eval(err) == 0) {
struct pcpu_sw_netstats *tstats = this_cpu_ptr(dev->tstats);
--
1.9.1
On 05/28/2015 01:40 AM, Steffen Klassert wrote:
> On Thu, May 28, 2015 at 12:18:51AM -0700, Alexander Duyck wrote:
>> On 05/27/2015 10:36 PM, Steffen Klassert wrote:
>>> On Wed, May 27, 2015 at 10:40:32AM -0700, Alexander Duyck wrote:
>>>> This change makes it so that we use icmpv6_send to report PMTU issues back
>>>> into tunnels in the case that the resulting packet is larger than the MTU
>>>> of the outgoing interface. Previously xfrm_local_error was being used in
>>>> this case, however this was resulting in no changes, I suspect due to the
>>>> fact that the tunnel itself was being kept out of the loop.
>>>>
>>>> This patch fixes PMTU problems seen on ip6_vti tunnels and is based on the
>>>> behavior seen if the socket was orphaned. Instead of requiring the socket
>>>> to be orphaned this patch simply defaults to using icmpv6_send in the case
>>>> that the frame came though a tunnel.
>>> We can use icmpv6_send() just in the case that the packet
>>> was already transmitted by a tunnel device, otherwise we
>>> get the bug back that I mentioned in my other mail.
>>>
>>> Not sure if we have something to know that the packet
>>> traversed a tunnel device. That's what I asked in the
>>> thread 'Looking for a lost patch'.
>> Okay I will try to do some more digging. From what I can tell right
>> now it looks like my ping attempts are getting hung up on the
>> xfrm_local_error in __xfrm6_output. I wonder if we couldn't somehow
>> make use of the skb->cb to store a pointer to the tunnel that could
>> be checked to determine if we are going through a VTI or not.
> Maybe it is as easy as the patch below, could you please test it?
>
> Subject: [PATCH RFC] vti6: Add pmtu handling to vti6_xmit.
>
> We currently rely on the PMTU discovery of xfrm.
> However if a packet is localy sent, the PMTU mechanism
> of xfrm tries to to local socket notification what
> might not work for applications like ping that don't
> check for this. So add pmtu handling to vti6_xmit to
> report MTU changes immediately.
>
> Signed-off-by: Steffen Klassert <[email protected]>
> ---
> net/ipv6/ip6_vti.c | 10 ++++++++++
> 1 file changed, 10 insertions(+)
>
> diff --git a/net/ipv6/ip6_vti.c b/net/ipv6/ip6_vti.c
> index ff3bd86..13cb771 100644
> --- a/net/ipv6/ip6_vti.c
> +++ b/net/ipv6/ip6_vti.c
> @@ -434,6 +434,7 @@ vti6_xmit(struct sk_buff *skb, struct net_device *dev, struct flowi *fl)
> struct dst_entry *dst = skb_dst(skb);
> struct net_device *tdev;
> struct xfrm_state *x;
> + int mtu;
> int err = -1;
>
> if (!dst)
> @@ -468,6 +469,15 @@ vti6_xmit(struct sk_buff *skb, struct net_device *dev, struct flowi *fl)
> skb_dst_set(skb, dst);
> skb->dev = skb_dst(skb)->dev;
>
> + mtu = dst_mtu(dst);
> + if (!skb->ignore_df && skb->len > mtu) {
> + skb_dst(skb)->ops->update_pmtu(dst, NULL, skb, mtu);
> +
> + icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu);
> +
> + return -EMSGSIZE;
> + }
> +
> err = dst_output(skb);
> if (net_xmit_eval(err) == 0) {
> struct pcpu_sw_netstats *tstats = this_cpu_ptr(dev->tstats);
That seems to be working for me. I'm able to ping and while the first
packet fails the second one and all that follow make it through
correctly after the ptmu update.
- Alex
On 05/28/2015 12:15 PM, Alexander Duyck wrote:
> On 05/28/2015 01:40 AM, Steffen Klassert wrote:
>> On Thu, May 28, 2015 at 12:18:51AM -0700, Alexander Duyck wrote:
>>> On 05/27/2015 10:36 PM, Steffen Klassert wrote:
>>>> On Wed, May 27, 2015 at 10:40:32AM -0700, Alexander Duyck wrote:
>>>>> This change makes it so that we use icmpv6_send to report PMTU
>>>>> issues back
>>>>> into tunnels in the case that the resulting packet is larger than
>>>>> the MTU
>>>>> of the outgoing interface. Previously xfrm_local_error was being
>>>>> used in
>>>>> this case, however this was resulting in no changes, I suspect due
>>>>> to the
>>>>> fact that the tunnel itself was being kept out of the loop.
>>>>>
>>>>> This patch fixes PMTU problems seen on ip6_vti tunnels and is
>>>>> based on the
>>>>> behavior seen if the socket was orphaned. Instead of requiring
>>>>> the socket
>>>>> to be orphaned this patch simply defaults to using icmpv6_send in
>>>>> the case
>>>>> that the frame came though a tunnel.
>>>> We can use icmpv6_send() just in the case that the packet
>>>> was already transmitted by a tunnel device, otherwise we
>>>> get the bug back that I mentioned in my other mail.
>>>>
>>>> Not sure if we have something to know that the packet
>>>> traversed a tunnel device. That's what I asked in the
>>>> thread 'Looking for a lost patch'.
>>> Okay I will try to do some more digging. From what I can tell right
>>> now it looks like my ping attempts are getting hung up on the
>>> xfrm_local_error in __xfrm6_output. I wonder if we couldn't somehow
>>> make use of the skb->cb to store a pointer to the tunnel that could
>>> be checked to determine if we are going through a VTI or not.
>> Maybe it is as easy as the patch below, could you please test it?
>>
>> Subject: [PATCH RFC] vti6: Add pmtu handling to vti6_xmit.
>>
>> We currently rely on the PMTU discovery of xfrm.
>> However if a packet is localy sent, the PMTU mechanism
>> of xfrm tries to to local socket notification what
>> might not work for applications like ping that don't
>> check for this. So add pmtu handling to vti6_xmit to
>> report MTU changes immediately.
>>
>> Signed-off-by: Steffen Klassert <[email protected]>
>> ---
>> net/ipv6/ip6_vti.c | 10 ++++++++++
>> 1 file changed, 10 insertions(+)
>>
>> diff --git a/net/ipv6/ip6_vti.c b/net/ipv6/ip6_vti.c
>> index ff3bd86..13cb771 100644
>> --- a/net/ipv6/ip6_vti.c
>> +++ b/net/ipv6/ip6_vti.c
>> @@ -434,6 +434,7 @@ vti6_xmit(struct sk_buff *skb, struct net_device
>> *dev, struct flowi *fl)
>> struct dst_entry *dst = skb_dst(skb);
>> struct net_device *tdev;
>> struct xfrm_state *x;
>> + int mtu;
>> int err = -1;
>> if (!dst)
>> @@ -468,6 +469,15 @@ vti6_xmit(struct sk_buff *skb, struct net_device
>> *dev, struct flowi *fl)
>> skb_dst_set(skb, dst);
>> skb->dev = skb_dst(skb)->dev;
>> + mtu = dst_mtu(dst);
>> + if (!skb->ignore_df && skb->len > mtu) {
>> + skb_dst(skb)->ops->update_pmtu(dst, NULL, skb, mtu);
>> +
>> + icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu);
>> +
>> + return -EMSGSIZE;
>> + }
>> +
>> err = dst_output(skb);
>> if (net_xmit_eval(err) == 0) {
>> struct pcpu_sw_netstats *tstats = this_cpu_ptr(dev->tstats);
>
> That seems to be working for me. I'm able to ping and while the first
> packet fails the second one and all that follow make it through
> correctly after the ptmu update.
>
> - Alex
It looks like I spoke too soon. It resolves it for IPv6, but IPv4 over
the tunnel has the same issue. Probably need to have some sort of
protocol based check to determine which version of the call to use.
- Alex
From: Steffen Klassert <[email protected]>
We currently rely on the PMTU discovery of xfrm.
However if a packet is localy sent, the PMTU mechanism
of xfrm tries to to local socket notification what
might not work for applications like ping that don't
check for this. So add pmtu handling to vti6_xmit to
report MTU changes immediately.
Signed-off-by: Steffen Klassert <[email protected]>
Signed-off-by: Alexander Duyck <[email protected]>
---
So this version is slightly modified to cover the IPv4 case in addition to
the IPv6 case. With this patch I was able to run netperf over either an
IPv4 or IPv6 address routed over the ip6_vti tunnel.
net/ipv6/ip6_vti.c | 14 ++++++++++++++
1 file changed, 14 insertions(+)
diff --git a/net/ipv6/ip6_vti.c b/net/ipv6/ip6_vti.c
index d25209657edc..3b5c1ea50d2f 100644
--- a/net/ipv6/ip6_vti.c
+++ b/net/ipv6/ip6_vti.c
@@ -435,6 +435,7 @@ vti6_xmit(struct sk_buff *skb, struct net_device *dev, struct flowi *fl)
struct net_device *tdev;
struct xfrm_state *x;
int err = -1;
+ int mtu;
if (!dst)
goto tx_err_link_failure;
@@ -468,6 +469,19 @@ vti6_xmit(struct sk_buff *skb, struct net_device *dev, struct flowi *fl)
skb_dst_set(skb, dst);
skb->dev = skb_dst(skb)->dev;
+ mtu = dst_mtu(dst);
+ if (!skb->ignore_df && skb->len > mtu) {
+ skb_dst(skb)->ops->update_pmtu(dst, NULL, skb, mtu);
+
+ if (skb->protocol == htons(ETH_P_IPV6))
+ icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu);
+ else
+ icmp_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED,
+ htonl(mtu));
+
+ return -EMSGSIZE;
+ }
+
err = dst_output(skb);
if (net_xmit_eval(err) == 0) {
struct pcpu_sw_netstats *tstats = this_cpu_ptr(dev->tstats);
From: Alexander Duyck <[email protected]>
Date: Fri, 29 May 2015 11:28:26 -0700
> From: Steffen Klassert <[email protected]>
>
> We currently rely on the PMTU discovery of xfrm.
> However if a packet is localy sent, the PMTU mechanism
> of xfrm tries to to local socket notification what
> might not work for applications like ping that don't
> check for this. So add pmtu handling to vti6_xmit to
> report MTU changes immediately.
>
> Signed-off-by: Steffen Klassert <[email protected]>
> Signed-off-by: Alexander Duyck <[email protected]>
Applied, thanks Andrew.
http://www.spinics.net/lists/linux-crypto/msg15101.html
> From: Steffen Klassert <steffen.klassert@xxxxxxxxxxx>
>
> We currently rely on the PMTU discovery of xfrm.
> However if a packet is localy sent, the PMTU mechanism
> of xfrm tries to to local socket notification what
> might not work for applications like ping that don't
> check for this. So add pmtu handling to vti6_xmit to
> report MTU changes immediately.
>
> Signed-off-by: Steffen Klassert <steffen.klassert@xxxxxxxxxxx>
> Signed-off-by: Alexander Duyck <alexander.h.duyck@xxxxxxxxxx>
> ---
>
> So this version is slightly modified to cover the IPv4 case in addition to
> the IPv6 case. With this patch I was able to run netperf over either an
> IPv4 or IPv6 address routed over the ip6_vti tunnel.
We have the same issue. When we do a local ping to a remote device over
a v4 vti tunnel and an intermediate device has a low mtu, pmtu
discovery reduces the route's pmtu, and ping fails because it does not
handle the local error message generated by xfrm4_tunnel_check_size().
Your patch fixes our issue for v6 vti tunnels, but the issue still
exists for v4 tunnels. Is there any particular reason this patch was
not delivered for v4 tunnels too - i.e. in vti_xmit()?
On Wed, Feb 10, 2016 at 01:50:20AM +0000, Mark McKinstry wrote:
> >
> > So this version is slightly modified to cover the IPv4 case in addition to
> > the IPv6 case. With this patch I was able to run netperf over either an
> > IPv4 or IPv6 address routed over the ip6_vti tunnel.
> We have the same issue. When we do a local ping to a remote device over
> a v4 vti tunnel and an intermediate device has a low mtu, pmtu
> discovery reduces the route's pmtu, and ping fails because it does not
> handle the local error message generated by xfrm4_tunnel_check_size().
> Your patch fixes our issue for v6 vti tunnels, but the issue still
> exists for v4 tunnels. Is there any particular reason this patch was
> not delivered for v4 tunnels too - i.e. in vti_xmit()?
I don't remember why we fixed it just for ipv6, we probably need
a similar patch for ipv4.
Does the patch below help (compile tested only)?
Subject: [PATCH] vti: Add pmtu handling to vti_xmit.
We currently rely on the PMTU discovery of xfrm.
However if a packet is localy sent, the PMTU mechanism
of xfrm tries to to local socket notification what
might not work for applications like ping that don't
check for this. So add pmtu handling to vti_xmit to
report MTU changes immediately.
Signed-off-by: Steffen Klassert <[email protected]>
---
net/ipv4/ip_vti.c | 13 +++++++++++++
1 file changed, 13 insertions(+)
diff --git a/net/ipv4/ip_vti.c b/net/ipv4/ip_vti.c
index 5cf10b7..6862305 100644
--- a/net/ipv4/ip_vti.c
+++ b/net/ipv4/ip_vti.c
@@ -156,6 +156,7 @@ static netdev_tx_t vti_xmit(struct sk_buff *skb, struct net_device *dev,
struct dst_entry *dst = skb_dst(skb);
struct net_device *tdev; /* Device to other host */
int err;
+ int mtu;
if (!dst) {
dev->stats.tx_carrier_errors++;
@@ -196,6 +197,18 @@ static netdev_tx_t vti_xmit(struct sk_buff *skb, struct net_device *dev,
skb_dst_set(skb, dst);
skb->dev = skb_dst(skb)->dev;
+ mtu = dst_mtu(dst);
+ if (!skb->ignore_df && skb->len > mtu) {
+ skb_dst(skb)->ops->update_pmtu(dst, NULL, skb, mtu);
+ if (skb->protocol == htons(ETH_P_IP))
+ icmp_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED,
+ htonl(mtu));
+ else
+ icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu);
+
+ return -EMSGSIZE;
+ }
+
err = dst_output(tunnel->net, skb->sk, skb);
if (net_xmit_eval(err) == 0)
err = skb->len;
--
1.9.1
On 17/02/16 20:08, Steffen Klassert wrote:
> On Wed, Feb 10, 2016 at 01:50:20AM +0000, Mark McKinstry wrote:
>>> So this version is slightly modified to cover the IPv4 case in addition to
>>> the IPv6 case. With this patch I was able to run netperf over either an
>>> IPv4 or IPv6 address routed over the ip6_vti tunnel.
>> We have the same issue. When we do a local ping to a remote device over
>> a v4 vti tunnel and an intermediate device has a low mtu, pmtu
>> discovery reduces the route's pmtu, and ping fails because it does not
>> handle the local error message generated by xfrm4_tunnel_check_size().
>> Your patch fixes our issue for v6 vti tunnels, but the issue still
>> exists for v4 tunnels. Is there any particular reason this patch was
>> not delivered for v4 tunnels too - i.e. in vti_xmit()?
> I don't remember why we fixed it just for ipv6, we probably need
> a similar patch for ipv4.
>
> Does the patch below help (compile tested only)?
>
> Subject: [PATCH] vti: Add pmtu handling to vti_xmit.
>
> We currently rely on the PMTU discovery of xfrm.
> However if a packet is localy sent, the PMTU mechanism
> of xfrm tries to to local socket notification what
> might not work for applications like ping that don't
> check for this. So add pmtu handling to vti_xmit to
> report MTU changes immediately.
>
> Signed-off-by: Steffen Klassert <[email protected]>
> ---
> net/ipv4/ip_vti.c | 13 +++++++++++++
> 1 file changed, 13 insertions(+)
>
> diff --git a/net/ipv4/ip_vti.c b/net/ipv4/ip_vti.c
> index 5cf10b7..6862305 100644
> --- a/net/ipv4/ip_vti.c
> +++ b/net/ipv4/ip_vti.c
> @@ -156,6 +156,7 @@ static netdev_tx_t vti_xmit(struct sk_buff *skb, struct net_device *dev,
> struct dst_entry *dst = skb_dst(skb);
> struct net_device *tdev; /* Device to other host */
> int err;
> + int mtu;
>
> if (!dst) {
> dev->stats.tx_carrier_errors++;
> @@ -196,6 +197,18 @@ static netdev_tx_t vti_xmit(struct sk_buff *skb, struct net_device *dev,
> skb_dst_set(skb, dst);
> skb->dev = skb_dst(skb)->dev;
>
> + mtu = dst_mtu(dst);
> + if (!skb->ignore_df && skb->len > mtu) {
> + skb_dst(skb)->ops->update_pmtu(dst, NULL, skb, mtu);
> + if (skb->protocol == htons(ETH_P_IP))
> + icmp_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED,
> + htonl(mtu));
> + else
> + icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu);
> +
> + return -EMSGSIZE;
> + }
> +
> err = dst_output(tunnel->net, skb->sk, skb);
> if (net_xmit_eval(err) == 0)
> err = skb->len;
This patch fixes our issue, thanks. In our scenario the tunnel path MTU
now gets updated so that subsequent large packets sent over the tunnel
get fragmented correctly.
On Thu, Feb 18, 2016 at 01:40:00AM +0000, Mark McKinstry wrote:
> This patch fixes our issue, thanks. In our scenario the tunnel path MTU
> now gets updated so that subsequent large packets sent over the tunnel
> get fragmented correctly.
I've applied this patch to the ipsec tree now.
Thanks for testing!
On 19/02/16 01:19, Steffen Klassert wrote:
> On Thu, Feb 18, 2016 at 01:40:00AM +0000, Mark McKinstry wrote:
>> This patch fixes our issue, thanks. In our scenario the tunnel path MTU
>> now gets updated so that subsequent large packets sent over the tunnel
>> get fragmented correctly.
> I've applied this patch to the ipsec tree now.
> Thanks for testing!
I spoke too soon. Upon further testing with this patch we have found it
causes
a skt buffer leak. This is problematic for us and can cause memory
exhaustion in
one of our test scenarios that has an IPv4 IPsec tunnel over a PPP link.
Also
the patch's -EMSGSIZE return value appears to be invalid because vti_xmit()
should be returning a type netdev_tx_t (NETDEV_TX_OK etc). It looks to
me that
this patch should really be doing a goto tx_error rather than doing an early
return with -EMSGSIZE. This would result in the skt buffer being freed,
NETDEV_TX_OK being returned (thus indicating vti_xmit() "took care of
packet"),
and the tx_errors counter being incremented (which seems like a reasonable
thing to do).
I think the original IPv6 patch probably has the same issues, and could be
causing a DOS attack vulnerability in recent Linux releases. If this patch's
code gets hit for every received packet then the box's memory will soon be
exhausted - e.g. a rogue device sends a stream of largish pkts through a box
with a vti interface, and ignores every ICMPV6_PKT_TOOBIG pkt sent back
to it.
On Wed, Feb 24, 2016 at 09:37:39PM +0000, Mark McKinstry wrote:
> On 19/02/16 01:19, Steffen Klassert wrote:
> > On Thu, Feb 18, 2016 at 01:40:00AM +0000, Mark McKinstry wrote:
> >> This patch fixes our issue, thanks. In our scenario the tunnel path MTU
> >> now gets updated so that subsequent large packets sent over the tunnel
> >> get fragmented correctly.
> > I've applied this patch to the ipsec tree now.
> > Thanks for testing!
> I spoke too soon. Upon further testing with this patch we have found it
> causes
> a skt buffer leak. This is problematic for us and can cause memory
> exhaustion in
> one of our test scenarios that has an IPv4 IPsec tunnel over a PPP link.
> Also
> the patch's -EMSGSIZE return value appears to be invalid because vti_xmit()
> should be returning a type netdev_tx_t (NETDEV_TX_OK etc). It looks to
> me that
> this patch should really be doing a goto tx_error rather than doing an early
> return with -EMSGSIZE. This would result in the skt buffer being freed,
> NETDEV_TX_OK being returned (thus indicating vti_xmit() "took care of
> packet"),
> and the tx_errors counter being incremented (which seems like a reasonable
> thing to do).
Yes, you are right here.
>
> I think the original IPv6 patch probably has the same issues, and could be
> causing a DOS attack vulnerability in recent Linux releases.
We need to fix both, ipv4 and ipv6. I'll care for it,
thanks for the report.
On Wed, Feb 24, 2016 at 09:37:39PM +0000, Mark McKinstry wrote:
> On 19/02/16 01:19, Steffen Klassert wrote:
> > On Thu, Feb 18, 2016 at 01:40:00AM +0000, Mark McKinstry wrote:
> >> This patch fixes our issue, thanks. In our scenario the tunnel path MTU
> >> now gets updated so that subsequent large packets sent over the tunnel
> >> get fragmented correctly.
> > I've applied this patch to the ipsec tree now.
> > Thanks for testing!
> I spoke too soon. Upon further testing with this patch we have found it
> causes
> a skt buffer leak. This is problematic for us and can cause memory
> exhaustion in
> one of our test scenarios that has an IPv4 IPsec tunnel over a PPP link.
The patch below is what I plan to apply on top of the original patch.
Subject: [PATCH] vti: Fix recource leeks on pmtu discovery
A recent patch introduced pmtu handling directly in the
vti transmit routine. Unfortunately we now return without
releasing the dst_entry and freeing the sk_buff. This patch
fixes the issue.
Fixes: 325b71fe0f57 ("vti: Add pmtu handling to vti_xmit.")
Reported-by: Mark McKinstry <[email protected]>
Signed-off-by: Steffen Klassert <[email protected]>
---
net/ipv4/ip_vti.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/net/ipv4/ip_vti.c b/net/ipv4/ip_vti.c
index 6862305..2ea2b6e 100644
--- a/net/ipv4/ip_vti.c
+++ b/net/ipv4/ip_vti.c
@@ -206,7 +206,8 @@ static netdev_tx_t vti_xmit(struct sk_buff *skb, struct net_device *dev,
else
icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu);
- return -EMSGSIZE;
+ dst_release(dst);
+ goto tx_error;
}
err = dst_output(tunnel->net, skb->sk, skb);
--
1.9.1
On 04/03/16 20:05, Steffen Klassert wrote:
> On Wed, Feb 24, 2016 at 09:37:39PM +0000, Mark McKinstry wrote:
>> On 19/02/16 01:19, Steffen Klassert wrote:
>>> On Thu, Feb 18, 2016 at 01:40:00AM +0000, Mark McKinstry wrote:
>>>> This patch fixes our issue, thanks. In our scenario the tunnel path MTU
>>>> now gets updated so that subsequent large packets sent over the tunnel
>>>> get fragmented correctly.
>>> I've applied this patch to the ipsec tree now.
>>> Thanks for testing!
>> I spoke too soon. Upon further testing with this patch we have found it
>> causes
>> a skt buffer leak. This is problematic for us and can cause memory
>> exhaustion in
>> one of our test scenarios that has an IPv4 IPsec tunnel over a PPP link.
> The patch below is what I plan to apply on top of the original patch.
>
> Subject: [PATCH] vti: Fix recource leeks on pmtu discovery
>
> A recent patch introduced pmtu handling directly in the
> vti transmit routine. Unfortunately we now return without
> releasing the dst_entry and freeing the sk_buff. This patch
> fixes the issue.
>
> Fixes: 325b71fe0f57 ("vti: Add pmtu handling to vti_xmit.")
> Reported-by: Mark McKinstry <[email protected]>
> Signed-off-by: Steffen Klassert <[email protected]>
> ---
> net/ipv4/ip_vti.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/net/ipv4/ip_vti.c b/net/ipv4/ip_vti.c
> index 6862305..2ea2b6e 100644
> --- a/net/ipv4/ip_vti.c
> +++ b/net/ipv4/ip_vti.c
> @@ -206,7 +206,8 @@ static netdev_tx_t vti_xmit(struct sk_buff *skb, struct net_device *dev,
> else
> icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu);
>
> - return -EMSGSIZE;
> + dst_release(dst);
> + goto tx_error;
> }
>
> err = dst_output(tunnel->net, skb->sk, skb);
Your patch adds a dst_release() call to my suggested fix, but this is
problematic because the kfree_skb() call at tx_error already takes care
of releasing dst - via kfree_skb() > __kfree_skb() > skb_release_all() >
skb_release_head_state() > skb_dst_drop()
> refdst_drop() > dst_release(). In our scenario your patch results in
a negative refcount kernel warning being generated in dst_release() for
every packet that is too big to go over the vti.
On Mon, Mar 14, 2016 at 09:52:05PM +0000, Mark McKinstry wrote:
> Your patch adds a dst_release() call to my suggested fix, but this is
> problematic because the kfree_skb() call at tx_error already takes care
> of releasing dst - via kfree_skb() > __kfree_skb() > skb_release_all() >
> skb_release_head_state() > skb_dst_drop()
> > refdst_drop() > dst_release(). In our scenario your patch results in
> a negative refcount kernel warning being generated in dst_release() for
> every packet that is too big to go over the vti.
Hm. I've just noticed that my pmtu test does not trigger this
codepath, so I did not see the warning.
Seems like we do the pmtu handling too late, it should happen before
we do skb_dst_set(). Also skb_scrub_packet() resets skb->ignore_df,
so checking ignore_df after skb_scrub_packet() does not make much sense.
I'll send an updated version after some more testing.
On Tue, Mar 15, 2016 at 01:28:01PM +0100, Steffen Klassert wrote:
> On Mon, Mar 14, 2016 at 09:52:05PM +0000, Mark McKinstry wrote:
> > Your patch adds a dst_release() call to my suggested fix, but this is
> > problematic because the kfree_skb() call at tx_error already takes care
> > of releasing dst - via kfree_skb() > __kfree_skb() > skb_release_all() >
> > skb_release_head_state() > skb_dst_drop()
> > > refdst_drop() > dst_release(). In our scenario your patch results in
> > a negative refcount kernel warning being generated in dst_release() for
> > every packet that is too big to go over the vti.
>
> Hm. I've just noticed that my pmtu test does not trigger this
> codepath, so I did not see the warning.
>
> Seems like we do the pmtu handling too late, it should happen before
> we do skb_dst_set(). Also skb_scrub_packet() resets skb->ignore_df,
> so checking ignore_df after skb_scrub_packet() does not make much sense.
>
> I'll send an updated version after some more testing.
>
I've added a testcase that triggers this codepath to my testing
environment. The patch below works for me, could you please test
if it fixes your problems?
Subject: [PATCH] vti: Add pmtu handling to vti_xmit.
We currently rely on the PMTU discovery of xfrm.
However if a packet is locally sent, the PMTU mechanism
of xfrm tries to do local socket notification what
might not work for applications like ping that don't
check for this. So add pmtu handling to vti_xmit to
report MTU changes immediately.
Signed-off-by: Steffen Klassert <[email protected]>
---
net/ipv4/ip_vti.c | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)
diff --git a/net/ipv4/ip_vti.c b/net/ipv4/ip_vti.c
index 5cf10b7..a917903 100644
--- a/net/ipv4/ip_vti.c
+++ b/net/ipv4/ip_vti.c
@@ -156,6 +156,7 @@ static netdev_tx_t vti_xmit(struct sk_buff *skb, struct net_device *dev,
struct dst_entry *dst = skb_dst(skb);
struct net_device *tdev; /* Device to other host */
int err;
+ int mtu;
if (!dst) {
dev->stats.tx_carrier_errors++;
@@ -192,6 +193,23 @@ static netdev_tx_t vti_xmit(struct sk_buff *skb, struct net_device *dev,
tunnel->err_count = 0;
}
+ mtu = dst_mtu(dst);
+ if (skb->len > mtu) {
+ skb_dst(skb)->ops->update_pmtu(skb_dst(skb), NULL, skb, mtu);
+ if (skb->protocol == htons(ETH_P_IP)) {
+ icmp_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED,
+ htonl(mtu));
+ } else {
+ if (mtu < IPV6_MIN_MTU)
+ mtu = IPV6_MIN_MTU;
+
+ icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu);
+ }
+
+ dst_release(dst);
+ goto tx_error;
+ }
+
skb_scrub_packet(skb, !net_eq(tunnel->net, dev_net(dev)));
skb_dst_set(skb, dst);
skb->dev = skb_dst(skb)->dev;
--
1.9.1
I've tested this patch in our scenario and I can confirm that it still
fixes all of our issues.
On 22/03/16 23:53, Steffen Klassert wrote:
> On Tue, Mar 15, 2016 at 01:28:01PM +0100, Steffen Klassert wrote:
>> On Mon, Mar 14, 2016 at 09:52:05PM +0000, Mark McKinstry wrote:
>>> Your patch adds a dst_release() call to my suggested fix, but this is
>>> problematic because the kfree_skb() call at tx_error already takes care
>>> of releasing dst - via kfree_skb() > __kfree_skb() > skb_release_all() >
>>> skb_release_head_state() > skb_dst_drop()
>>> > refdst_drop() > dst_release(). In our scenario your patch results in
>>> a negative refcount kernel warning being generated in dst_release() for
>>> every packet that is too big to go over the vti.
>> Hm. I've just noticed that my pmtu test does not trigger this
>> codepath, so I did not see the warning.
>>
>> Seems like we do the pmtu handling too late, it should happen before
>> we do skb_dst_set(). Also skb_scrub_packet() resets skb->ignore_df,
>> so checking ignore_df after skb_scrub_packet() does not make much sense.
>>
>> I'll send an updated version after some more testing.
>>
> I've added a testcase that triggers this codepath to my testing
> environment. The patch below works for me, could you please test
> if it fixes your problems?
>
> Subject: [PATCH] vti: Add pmtu handling to vti_xmit.
>
> We currently rely on the PMTU discovery of xfrm.
> However if a packet is locally sent, the PMTU mechanism
> of xfrm tries to do local socket notification what
> might not work for applications like ping that don't
> check for this. So add pmtu handling to vti_xmit to
> report MTU changes immediately.
>
> Signed-off-by: Steffen Klassert <[email protected]>
> ---
> net/ipv4/ip_vti.c | 18 ++++++++++++++++++
> 1 file changed, 18 insertions(+)
>
> diff --git a/net/ipv4/ip_vti.c b/net/ipv4/ip_vti.c
> index 5cf10b7..a917903 100644
> --- a/net/ipv4/ip_vti.c
> +++ b/net/ipv4/ip_vti.c
> @@ -156,6 +156,7 @@ static netdev_tx_t vti_xmit(struct sk_buff *skb, struct net_device *dev,
> struct dst_entry *dst = skb_dst(skb);
> struct net_device *tdev; /* Device to other host */
> int err;
> + int mtu;
>
> if (!dst) {
> dev->stats.tx_carrier_errors++;
> @@ -192,6 +193,23 @@ static netdev_tx_t vti_xmit(struct sk_buff *skb, struct net_device *dev,
> tunnel->err_count = 0;
> }
>
> + mtu = dst_mtu(dst);
> + if (skb->len > mtu) {
> + skb_dst(skb)->ops->update_pmtu(skb_dst(skb), NULL, skb, mtu);
> + if (skb->protocol == htons(ETH_P_IP)) {
> + icmp_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED,
> + htonl(mtu));
> + } else {
> + if (mtu < IPV6_MIN_MTU)
> + mtu = IPV6_MIN_MTU;
> +
> + icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu);
> + }
> +
> + dst_release(dst);
> + goto tx_error;
> + }
> +
> skb_scrub_packet(skb, !net_eq(tunnel->net, dev_net(dev)));
> skb_dst_set(skb, dst);
> skb->dev = skb_dst(skb)->dev;
On Wed, Mar 30, 2016 at 09:04:03PM +0000, Mark McKinstry wrote:
> I've tested this patch in our scenario and I can confirm that it still
> fixes all of our issues.
I've applied the patch to the ipsec tree now.
Thanks for testing!