2022-10-07 11:37:00

by Vyacheslav Salnikov

[permalink] [raw]
Subject: bridge:fragmented packets dropped by bridge

Hi.

I switched from kernel versions 4.9 to 5.15 and found that the MTU on
the interfaces in the bridge does not change.
For example:
I have the following bridge:
bridge interface
br0 sw1
sw2
sw3

And I change with ifconfig MTU.
I see that br0 sw1..sw3 has changed MTU from 1500 -> 1982.

But if i send a ping through these interfaces, I get 1500(I added
prints for output)
I investigated the code and found the reason:
The following commit came in the new kernel:
https://github.com/torvalds/linux/commit/ac6627a28dbfb5d96736544a00c3938fa7ea6dfb

And the behavior of the MTU setting has changed:
>
> Kernel 4.9:
> if (net->ipv4.sysctl_ip_fwd_use_pmtu ||
> ip_mtu_locked(dst) ||
> !forwarding) <--- True
> return dst_mtu(dst) <--- 1982
>
>
> / 'forwarding = true' case should always honour route mtu /
> mtu = dst_metric_raw(dst, RTAX_MTU);
> if (mtu)
> return mtu;



Kernel 5.15:
>
> if (READ_ONCE(net->ipv4.sysctl_ip_fwd_use_pmtu) ||
> ip_mtu_locked(dst) ||
> !forwarding) { <--- True
> mtu = rt->rt_pmtu; <--- 0
> if (mtu && time_before(jiffies, rt->dst.expires)) <-- False
> goto out;
> }
>
> / 'forwarding = true' case should always honour route mtu /
> mtu = dst_metric_raw(dst, RTAX_MTU); <---- 1500
> if (mtu) <--- True
> goto out;

As I see from the code in the end takes mtu from br_dst_default_metrics
> static const u32 br_dst_default_metrics[RTAX_MAX] = {
> [RTAX_MTU - 1] = 1500,
> };

Why is rt_pmtu now used instead of dst_mtu?
Why is forwarding = False called with dst_metric_raw?
Maybe we should add processing when mtu = rt->rt_pmtu == 0?
Could this be an error?


I found a thread discussing a similar problem. It suggested porting
the next patch:
Signed-off-by: Rundong Ge <[email protected]>
---
include/net/ip.h | 2 ++
1 file changed, 2 insertions(+)

diff --git a/include/net/ip.h b/include/net/ip.h
index 29d89de..0512de3 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -450,6 +450,8 @@ static inline unsigned int
ip_dst_mtu_maybe_forward(const struct dst_entry *dst,
static inline unsigned int ip_skb_dst_mtu(struct sock *sk,
const struct sk_buff *skb)
{
+ if ((skb_dst(skb)->flags & DST_FAKE_RTABLE) && skb->dev)
+ return min(skb->dev->mtu, IP_MAX_MTU);
if (!sk || !sk_fullsock(sk) || ip_sk_use_pmtu(sk)) {
bool forwarding = IPCB(skb)->flags & IPSKB_FORWARDED;


Why was this patch not accepted in the end?
--
Best regards,
Slava.


2022-10-07 14:06:48

by Slade Watkins

[permalink] [raw]
Subject: Re: bridge:fragmented packets dropped by bridge

Hey there,

On 10/7/22 at 7:21 AM, Vyacheslav Salnikov wrote:
> Why was this patch not accepted in the end?

Huh...

I had to do just a little bit of digging to find the original thread,
but it really doesn't seem to me like there was a consensus on whether
or not to take the patch:
https://lore.kernel.org/lkml/[email protected]/T/#u

Reason I say that is that the thread is rather old, and died off quickly.

Someone involved with that patch may be able to offer the answers you're
looking for, if they haven't forgotten about it after all this time.

-srw

2022-10-13 01:43:59

by Vadim Fedorenko

[permalink] [raw]
Subject: Re: bridge:fragmented packets dropped by bridge

On 07.10.2022 12:21, Vyacheslav Salnikov wrote:
> Hi.
>
> I switched from kernel versions 4.9 to 5.15 and found that the MTU on
> the interfaces in the bridge does not change.
> For example:
> I have the following bridge:
> bridge interface
> br0 sw1
> sw2
> sw3
>
> And I change with ifconfig MTU.
> I see that br0 sw1..sw3 has changed MTU from 1500 -> 1982.
>
> But if i send a ping through these interfaces, I get 1500(I added
> prints for output)
> I investigated the code and found the reason:
> The following commit came in the new kernel:
> https://github.com/torvalds/linux/commit/ac6627a28dbfb5d96736544a00c3938fa7ea6dfb
>
> And the behavior of the MTU setting has changed:
>>
>> Kernel 4.9:
>> if (net->ipv4.sysctl_ip_fwd_use_pmtu ||
>> ip_mtu_locked(dst) ||
>> !forwarding) <--- True
>> return dst_mtu(dst) <--- 1982
>>
>>
>> / 'forwarding = true' case should always honour route mtu /
>> mtu = dst_metric_raw(dst, RTAX_MTU);
>> if (mtu)
>> return mtu;
>
>
>
> Kernel 5.15:
>>
>> if (READ_ONCE(net->ipv4.sysctl_ip_fwd_use_pmtu) ||
>> ip_mtu_locked(dst) ||
>> !forwarding) { <--- True
>> mtu = rt->rt_pmtu; <--- 0
>> if (mtu && time_before(jiffies, rt->dst.expires)) <-- False
>> goto out;
>> }
>>
>> / 'forwarding = true' case should always honour route mtu /
>> mtu = dst_metric_raw(dst, RTAX_MTU); <---- 1500
>> if (mtu) <--- True
>> goto out;
>
> As I see from the code in the end takes mtu from br_dst_default_metrics
>> static const u32 br_dst_default_metrics[RTAX_MAX] = {
>> [RTAX_MTU - 1] = 1500,
>> };
>
> Why is rt_pmtu now used instead of dst_mtu?
> Why is forwarding = False called with dst_metric_raw?
> Maybe we should add processing when mtu = rt->rt_pmtu == 0?
> Could this be an error?
>
If you compare ipv4_mtu code from 4.9 you will see that the very first mtu value
is filled by rt->rt_pmtu value. I believe there were changes to the bridge code
where rt_pmtu value got empty or cleared.

I'm still looking for the root cause of the problem, will update you once I find it.


>
> I found a thread discussing a similar problem. It suggested porting
> the next patch:
> Signed-off-by: Rundong Ge <[email protected]>
> ---
> include/net/ip.h | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/include/net/ip.h b/include/net/ip.h
> index 29d89de..0512de3 100644
> --- a/include/net/ip.h
> +++ b/include/net/ip.h
> @@ -450,6 +450,8 @@ static inline unsigned int
> ip_dst_mtu_maybe_forward(const struct dst_entry *dst,
> static inline unsigned int ip_skb_dst_mtu(struct sock *sk,
> const struct sk_buff *skb)
> {
> + if ((skb_dst(skb)->flags & DST_FAKE_RTABLE) && skb->dev)
> + return min(skb->dev->mtu, IP_MAX_MTU);
> if (!sk || !sk_fullsock(sk) || ip_sk_use_pmtu(sk)) {
> bool forwarding = IPCB(skb)->flags & IPSKB_FORWARDED;
>
>
> Why was this patch not accepted in the end?

2022-10-15 01:46:18

by Vadim Fedorenko

[permalink] [raw]
Subject: Re: bridge:fragmented packets dropped by bridge

On 07.10.2022 12:21, Vyacheslav Salnikov wrote:
> Hi.
>
> I switched from kernel versions 4.9 to 5.15 and found that the MTU on
> the interfaces in the bridge does not change.
> For example:
> I have the following bridge:
> bridge interface
> br0 sw1
> sw2
> sw3
>
> And I change with ifconfig MTU.
> I see that br0 sw1..sw3 has changed MTU from 1500 -> 1982.
>
> But if i send a ping through these interfaces, I get 1500(I added
> prints for output)
> I investigated the code and found the reason:
> The following commit came in the new kernel:
> https://github.com/torvalds/linux/commit/ac6627a28dbfb5d96736544a00c3938fa7ea6dfb
>
> And the behavior of the MTU setting has changed:
>>
>> Kernel 4.9:
>> if (net->ipv4.sysctl_ip_fwd_use_pmtu ||
>> ip_mtu_locked(dst) ||
>> !forwarding) <--- True
>> return dst_mtu(dst) <--- 1982
>>
>>
>> / 'forwarding = true' case should always honour route mtu /
>> mtu = dst_metric_raw(dst, RTAX_MTU);
>> if (mtu)
>> return mtu;
>
>
>
> Kernel 5.15:
>>
>> if (READ_ONCE(net->ipv4.sysctl_ip_fwd_use_pmtu) ||
>> ip_mtu_locked(dst) ||
>> !forwarding) { <--- True
>> mtu = rt->rt_pmtu; <--- 0
>> if (mtu && time_before(jiffies, rt->dst.expires)) <-- False
>> goto out;
>> }
>>
>> / 'forwarding = true' case should always honour route mtu /
>> mtu = dst_metric_raw(dst, RTAX_MTU); <---- 1500
>> if (mtu) <--- True
>> goto out;
>
> As I see from the code in the end takes mtu from br_dst_default_metrics
>> static const u32 br_dst_default_metrics[RTAX_MAX] = {
>> [RTAX_MTU - 1] = 1500,
>> };
>
> Why is rt_pmtu now used instead of dst_mtu?
> Why is forwarding = False called with dst_metric_raw?
> Maybe we should add processing when mtu = rt->rt_pmtu == 0?
> Could this be an error?
>

Can you share kernel configs for both versions? Actually only one config value
is needed - CONFIG_BRIDGE_NETFILTER.

It will help me investigate the issue.

>
> I found a thread discussing a similar problem. It suggested porting
> the next patch:
> Signed-off-by: Rundong Ge <[email protected]>
> ---
> include/net/ip.h | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/include/net/ip.h b/include/net/ip.h
> index 29d89de..0512de3 100644
> --- a/include/net/ip.h
> +++ b/include/net/ip.h
> @@ -450,6 +450,8 @@ static inline unsigned int
> ip_dst_mtu_maybe_forward(const struct dst_entry *dst,
> static inline unsigned int ip_skb_dst_mtu(struct sock *sk,
> const struct sk_buff *skb)
> {
> + if ((skb_dst(skb)->flags & DST_FAKE_RTABLE) && skb->dev)
> + return min(skb->dev->mtu, IP_MAX_MTU);
> if (!sk || !sk_fullsock(sk) || ip_sk_use_pmtu(sk)) {
> bool forwarding = IPCB(skb)->flags & IPSKB_FORWARDED;
>
>
> Why was this patch not accepted in the end?