2016-11-07 11:41:14

by Vicente Jiménez

[permalink] [raw]
Subject: [Regression w/ patch] Restore network resistance to weird ICMP messages

Handle weird ICMP fragmentation needed messages with next hop MTU
equal to (or exceeding) dropped packet size

Fixes: 46517008e116 ("ipv4: Kill ip_rt_frag_needed().")

In a large corporate network, we spotted this weird ICMP message after
a long troubleshooting. See attached capture file. Those ICMP "network
unreachable - fragmentation needed and don't fragment bit set"
messages are sent by a router that drop 1500 bytes IP packets and fill
the next hop MTU ICMP field with 1500.

Those messages cause the TCP connection to stall but only on newer
kernels. Older kernels set path MTU to 1492 and communicates
successfully.

After checking code and commit history, I spotted how commit
46517008e116 ("ipv4: Kill ip_rt_frag_needed().") from June 2012
changed ICMP messages handling by removing ip_rt_frag_needed function.

The relevant part of the ip_rt_frag_needed function that was removed is:

if (new_mtu < 68 || new_mtu >= old_mtu) {
/* BSD 4.2 derived systems incorrectly adjust
* tot_len by the IP header length, and report
* a zero MTU in the ICMP message.
*/
if (mtu == 0 &&
old_mtu >= 68 + (iph->ihl << 2))
old_mtu -= iph->ihl << 2;
mtu = guess_mtu(old_mtu);
}


This condition handled the cases when next hop MTU where zero (less
than 68). Now this is handled by the protocol and fixed by commit
68b7107b6298 "ipv4: icmp: Fix pMTU handling for rare case".

But the rarest case when (next hop MTU) new_mtu >= old_mtu (dropped
packet length) was also removed. This commit restores this check.
Instead of using a table lookup like function guess_mtu uses, it just
try to set the path MTU decrementing by 2 bytes the dropped packet
size.

In our case, setting the path MTU to just 1498 (one iteration) worked.
This solution should converge in any case to a good value by small
steps. I don't think there's a need to a more complex solution.

The patched kernel worked perfectly setting the path MTU to 1498 from
the initial default interface value of 1500. This time I don't have a
capture file from inside the affected center, but all received packed
had a maximum size of 1498.

--
cheers
vicente


Attachments:
ICMP discarting and sugesting 1500 2.pcapng (87.56 kB)
0001-ipv4-icmp-Fix-pMTU-handling-for-rarest-case.patch (1.21 kB)
Download all attachments

2016-11-10 01:22:15

by David Miller

[permalink] [raw]
Subject: Re: [Regression w/ patch] Restore network resistance to weird ICMP messages

From: Vicente Jim?nez <[email protected]>
Date: Mon, 7 Nov 2016 12:11:59 +0100

> From bfc9a00e6b78d8eb60e46dacd7d761669d29a573 Mon Sep 17 00:00:00 2001
> From: Vicente Jimenez Aguilar <[email protected]>
> Date: Mon, 31 Oct 2016 13:10:29 +0100
> Subject: [PATCH] ipv4: icmp: Fix pMTU handling for rarest case
>
> Restore network resistance to weird ICMP fragmentation needed messages
> with next hop MTU equal to (or exceeding) dropped packet size
>
> Fixes: 46517008e116 ("ipv4: Kill ip_rt_frag_needed().")
> Signed-off-by: Vicente Jimenez Aguilar <[email protected]>
> ---
> net/ipv4/icmp.c | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
> index 38abe70..c0af1d2 100644
> --- a/net/ipv4/icmp.c
> +++ b/net/ipv4/icmp.c
> @@ -776,6 +776,7 @@ static bool icmp_unreach(struct sk_buff *skb)
> struct icmphdr *icmph;
> struct net *net;
> u32 info = 0;
> + unsigned short old_mtu;
>
> net = dev_net(skb_dst(skb)->dev);
>

Order local variable declarations from longest to shortest line
please.

> + if ( info >= old_mtu )

There should be no space after the '(' and before the ')' in this
conditional.

2016-11-10 10:52:05

by Vicente Jiménez

[permalink] [raw]
Subject: Re: [Regression w/ patch] Restore network resistance to weird ICMP messages

Corrected patch attached.
Thanks for the advices.
I was unaware of those style policies.

On Thu, Nov 10, 2016 at 2:22 AM, David Miller <[email protected]> wrote:
> From: Vicente Jiménez <[email protected]>
> Date: Mon, 7 Nov 2016 12:11:59 +0100
>
>> From bfc9a00e6b78d8eb60e46dacd7d761669d29a573 Mon Sep 17 00:00:00 2001
>> From: Vicente Jimenez Aguilar <[email protected]>
>> Date: Mon, 31 Oct 2016 13:10:29 +0100
>> Subject: [PATCH] ipv4: icmp: Fix pMTU handling for rarest case
>>
>> Restore network resistance to weird ICMP fragmentation needed messages
>> with next hop MTU equal to (or exceeding) dropped packet size
>>
>> Fixes: 46517008e116 ("ipv4: Kill ip_rt_frag_needed().")
>> Signed-off-by: Vicente Jimenez Aguilar <[email protected]>
>> ---
>> net/ipv4/icmp.c | 7 +++++++
>> 1 file changed, 7 insertions(+)
>>
>> diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
>> index 38abe70..c0af1d2 100644
>> --- a/net/ipv4/icmp.c
>> +++ b/net/ipv4/icmp.c
>> @@ -776,6 +776,7 @@ static bool icmp_unreach(struct sk_buff *skb)
>> struct icmphdr *icmph;
>> struct net *net;
>> u32 info = 0;
>> + unsigned short old_mtu;
>>
>> net = dev_net(skb_dst(skb)->dev);
>>
>
> Order local variable declarations from longest to shortest line
> please.
>
>> + if ( info >= old_mtu )
>
> There should be no space after the '(' and before the ')' in this
> conditional.



--
saludos
vicente


Attachments:
0001-ipv4-icmp-Fix-pMTU-handling-for-rarest-case.patch (1.24 kB)

2016-11-10 14:48:29

by David Miller

[permalink] [raw]
Subject: Re: [Regression w/ patch] Restore network resistance to weird ICMP messages

From: Vicente Jim?nez <[email protected]>
Date: Thu, 10 Nov 2016 11:52:01 +0100

> Corrected patch attached.
> Thanks for the advices.
> I was unaware of those style policies.

This is not how to submit a fixed patch.

You must make a new, fresh, list posting fully formed and with
a clean Subject line and commit message.

Not as a reply to the discussion.