2019-07-30 17:12:14

by Nikolay Aleksandrov

[permalink] [raw]
Subject: Re: [PATCH] bridge:fragmented packets dropped by bridge

On 30/07/2019 15:25, Rundong Ge wrote:
> Given following setup:
> -modprobe br_netfilter
> -echo '1' > /proc/sys/net/bridge/bridge-nf-call-iptables
> -brctl addbr br0
> -brctl addif br0 enp2s0
> -brctl addif br0 enp3s0
> -brctl addif br0 enp6s0
> -ifconfig enp2s0 mtu 1300
> -ifconfig enp3s0 mtu 1500
> -ifconfig enp6s0 mtu 1500
> -ifconfig br0 up
>
> multi-port
> mtu1500 - mtu1500|bridge|1500 - mtu1500
> A | B
> mtu1300
>
> With netfilter defragmentation/conntrack enabled, fragmented
> packets from A will be defragmented in prerouting, and refragmented
> at postrouting.
> But in this scenario the bridge found the frag_max_size(1500) is
> larger than the dst mtu stored in the fake_rtable whitch is
> always equal to the bridge's mtu 1300, then packets will be dopped.
>
> This modifies ip_skb_dst_mtu to use the out dev's mtu instead
> of bridge's mtu in bridge refragment.
>
> Signed-off-by: Rundong Ge <[email protected]>
> ---
> include/net/ip.h | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/include/net/ip.h b/include/net/ip.h
> index 29d89de..0512de3 100644
> --- a/include/net/ip.h
> +++ b/include/net/ip.h
> @@ -450,6 +450,8 @@ static inline unsigned int ip_dst_mtu_maybe_forward(const struct dst_entry *dst,
> static inline unsigned int ip_skb_dst_mtu(struct sock *sk,
> const struct sk_buff *skb)
> {
> + if ((skb_dst(skb)->flags & DST_FAKE_RTABLE) && skb->dev)
> + return min(skb->dev->mtu, IP_MAX_MTU);
> if (!sk || !sk_fullsock(sk) || ip_sk_use_pmtu(sk)) {
> bool forwarding = IPCB(skb)->flags & IPSKB_FORWARDED;
>
>

I don't think this is correct, there's a reason why the bridge chooses the smallest
possible MTU out of its members and this is simply a hack to circumvent it.
If you really like to do so just set the bridge MTU manually, we've added support
so it won't change automatically to the smallest, but then how do you pass packets
1500 -> 1300 in this setup ?

You're talking about the frag_size check in br_nf_ip_fragment(), right ?


2019-07-30 17:27:51

by Rundong Ge

[permalink] [raw]
Subject: Re: [PATCH] bridge:fragmented packets dropped by bridge

Yes it is about the frag_size check in br_nf_ip_fragment(). As i said
without br_netfilter the packets forwarding is fine.
And I feel it is weird that br_nf_dev_queue_xmit() use out dev's mtu
to decide whether to do the fragmentation, but
then br_nf_ip_fragment() use bridge's mtu to do the actual fragmentation.

And in this case fragmented packets fit the out dev mtu but were
dropped, I think it is not right.

Nikolay Aleksandrov <[email protected]> 于2019年7月30日周二 下午8:41写道:
>
> On 30/07/2019 15:25, Rundong Ge wrote:
> > Given following setup:
> > -modprobe br_netfilter
> > -echo '1' > /proc/sys/net/bridge/bridge-nf-call-iptables
> > -brctl addbr br0
> > -brctl addif br0 enp2s0
> > -brctl addif br0 enp3s0
> > -brctl addif br0 enp6s0
> > -ifconfig enp2s0 mtu 1300
> > -ifconfig enp3s0 mtu 1500
> > -ifconfig enp6s0 mtu 1500
> > -ifconfig br0 up
> >
> > multi-port
> > mtu1500 - mtu1500|bridge|1500 - mtu1500
> > A | B
> > mtu1300
> >
> > With netfilter defragmentation/conntrack enabled, fragmented
> > packets from A will be defragmented in prerouting, and refragmented
> > at postrouting.
> > But in this scenario the bridge found the frag_max_size(1500) is
> > larger than the dst mtu stored in the fake_rtable whitch is
> > always equal to the bridge's mtu 1300, then packets will be dopped.
> >
> > This modifies ip_skb_dst_mtu to use the out dev's mtu instead
> > of bridge's mtu in bridge refragment.
> >
> > Signed-off-by: Rundong Ge <[email protected]>
> > ---
> > include/net/ip.h | 2 ++
> > 1 file changed, 2 insertions(+)
> >
> > diff --git a/include/net/ip.h b/include/net/ip.h
> > index 29d89de..0512de3 100644
> > --- a/include/net/ip.h
> > +++ b/include/net/ip.h
> > @@ -450,6 +450,8 @@ static inline unsigned int ip_dst_mtu_maybe_forward(const struct dst_entry *dst,
> > static inline unsigned int ip_skb_dst_mtu(struct sock *sk,
> > const struct sk_buff *skb)
> > {
> > + if ((skb_dst(skb)->flags & DST_FAKE_RTABLE) && skb->dev)
> > + return min(skb->dev->mtu, IP_MAX_MTU);
> > if (!sk || !sk_fullsock(sk) || ip_sk_use_pmtu(sk)) {
> > bool forwarding = IPCB(skb)->flags & IPSKB_FORWARDED;
> >
> >
>
> I don't think this is correct, there's a reason why the bridge chooses the smallest
> possible MTU out of its members and this is simply a hack to circumvent it.
> If you really like to do so just set the bridge MTU manually, we've added support
> so it won't change automatically to the smallest, but then how do you pass packets
> 1500 -> 1300 in this setup ?
>
> You're talking about the frag_size check in br_nf_ip_fragment(), right ?
>

2019-08-26 03:15:50

by Rundong Ge

[permalink] [raw]
Subject: Re: [PATCH] bridge:fragmented packets dropped by bridge

On Tue, Jul 30, 2019 at 8:41 PM Nikolay Aleksandrov
<[email protected]> wrote:
>
> On 30/07/2019 15:25, Rundong Ge wrote:
> > Given following setup:
> > -modprobe br_netfilter
> > -echo '1' > /proc/sys/net/bridge/bridge-nf-call-iptables
> > -brctl addbr br0
> > -brctl addif br0 enp2s0
> > -brctl addif br0 enp3s0
> > -brctl addif br0 enp6s0
> > -ifconfig enp2s0 mtu 1300
> > -ifconfig enp3s0 mtu 1500
> > -ifconfig enp6s0 mtu 1500
> > -ifconfig br0 up
> >
> > multi-port
> > mtu1500 - mtu1500|bridge|1500 - mtu1500
> > A | B
> > mtu1300
> >
> > With netfilter defragmentation/conntrack enabled, fragmented
> > packets from A will be defragmented in prerouting, and refragmented
> > at postrouting.
> > But in this scenario the bridge found the frag_max_size(1500) is
> > larger than the dst mtu stored in the fake_rtable whitch is
> > always equal to the bridge's mtu 1300, then packets will be dopped.
> >
> > This modifies ip_skb_dst_mtu to use the out dev's mtu instead
> > of bridge's mtu in bridge refragment.
> >
> > Signed-off-by: Rundong Ge <[email protected]>
> > ---
> > include/net/ip.h | 2 ++
> > 1 file changed, 2 insertions(+)
> >
> > diff --git a/include/net/ip.h b/include/net/ip.h
> > index 29d89de..0512de3 100644
> > --- a/include/net/ip.h
> > +++ b/include/net/ip.h
> > @@ -450,6 +450,8 @@ static inline unsigned int ip_dst_mtu_maybe_forward(const struct dst_entry *dst,
> > static inline unsigned int ip_skb_dst_mtu(struct sock *sk,
> > const struct sk_buff *skb)
> > {
> > + if ((skb_dst(skb)->flags & DST_FAKE_RTABLE) && skb->dev)
> > + return min(skb->dev->mtu, IP_MAX_MTU);
> > if (!sk || !sk_fullsock(sk) || ip_sk_use_pmtu(sk)) {
> > bool forwarding = IPCB(skb)->flags & IPSKB_FORWARDED;
> >
> >
>
> I don't think this is correct, there's a reason why the bridge chooses the smallest
> possible MTU out of its members and this is simply a hack to circumvent it.
> If you really like to do so just set the bridge MTU manually, we've added support
> so it won't change automatically to the smallest, but then how do you pass packets
> 1500 -> 1300 in this setup ?
>
> You're talking about the frag_size check in br_nf_ip_fragment(), right ?
>

Hi Nikolay
My setup may not be common. And may I know if there is any reason to
use output port's MTU
to do the re-fragment check but then use the bridge's MTU to do the re-fragment?
Is it the expected behavior that the bridge's MTU will affect the
FORWARD traffic re-fragment,
because I used to think the bridge's MTU will only effect the OUTPUT
traffic sent from "br0".
And the modification in this patch will replace the MTU in the
fake_rtable which is only
used in the FORWARD re-fragment and won't affect the local traffic from "br0".

TKS
Raydodn