2019-01-10 00:51:17

by Ian Kumlien

[permalink] [raw]
Subject: [BUG] moving fq back to clock monotonic breaks my setup

Hi,

Just been trough ~5+ hours of bisecting and eventually actually found
the culprit =)

commit fb420d5d91c1274d5966917725e71f27ed092a85 (refs/bisect/bad)
Author: Eric Dumazet <[email protected]>
Date: Fri Sep 28 10:28:44 2018 -0700

tcp/fq: move back to CLOCK_MONOTONIC

[--8<--]

So this might be because my setup might be "odd".

Basically I have a firewall with four nics that uses two of those nics
to handle my normal
internet connection (firewall/MASQ/NAT) and the other two are assigned
to one bridge each.

The firewall is also my local caching DNS server and DHCP server,
which is also used by the VM:s...
But with 4.20 DHCP replies disappeared before entering the bridge - i
couldn't even see them in
tcpdump! (all nics are ixgbe on a atom soc)

I'm currently running a kernel with that patch reversed but I'm also
wondering about possible ways
forward since I'm reverting a fix from someone else...


2019-01-10 06:04:11

by Eric Dumazet

[permalink] [raw]
Subject: Re: [BUG] moving fq back to clock monotonic breaks my setup

On Wed, Jan 9, 2019 at 4:48 PM Ian Kumlien <[email protected]> wrote:
>
> Hi,
>
> Just been trough ~5+ hours of bisecting and eventually actually found
> the culprit =)
>
> commit fb420d5d91c1274d5966917725e71f27ed092a85 (refs/bisect/bad)
> Author: Eric Dumazet <[email protected]>
> Date: Fri Sep 28 10:28:44 2018 -0700
>
> tcp/fq: move back to CLOCK_MONOTONIC
>
> [--8<--]
>
> So this might be because my setup might be "odd".
>
> Basically I have a firewall with four nics that uses two of those nics
> to handle my normal
> internet connection (firewall/MASQ/NAT) and the other two are assigned
> to one bridge each.
>
> The firewall is also my local caching DNS server and DHCP server,
> which is also used by the VM:s...
> But with 4.20 DHCP replies disappeared before entering the bridge - i
> couldn't even see them in
> tcpdump! (all nics are ixgbe on a atom soc)
>
> I'm currently running a kernel with that patch reversed but I'm also
> wondering about possible ways
> forward since I'm reverting a fix from someone else...

I suggest you use netdev@ mailing list instead of lkml

Then, we probably need to clear skb->tstamp in more paths (you are
mentioning bridge ...)

See commit 8203e2d844d34af247a151d8ebd68553a6e91785 for reference.

Can you try :

diff --git a/net/bridge/br_forward.c b/net/bridge/br_forward.c
index 5372e2042adfe20d3cd039c29057535b2413be61..bd4fa141420c92a44716bd93fcd8aa3d3310203a
100644
--- a/net/bridge/br_forward.c
+++ b/net/bridge/br_forward.c
@@ -53,6 +53,7 @@ int br_dev_queue_push_xmit(struct net *net, struct
sock *sk, struct sk_buff *skb
skb_set_network_header(skb, depth);
}

+ skb->tstamp = 0;
dev_queue_xmit(skb);

return 0;

Thanks.

2019-01-10 08:37:39

by Ian Kumlien

[permalink] [raw]
Subject: Re: [BUG] moving fq back to clock monotonic breaks my setup

On Thu, Jan 10, 2019 at 6:53 AM Eric Dumazet <[email protected]> wrote:
> On Wed, Jan 9, 2019 at 4:48 PM Ian Kumlien <[email protected]> wrote:
> >
> > Hi,
> >
> > Just been trough ~5+ hours of bisecting and eventually actually found
> > the culprit =)
> >
> > commit fb420d5d91c1274d5966917725e71f27ed092a85 (refs/bisect/bad)
> > Author: Eric Dumazet <[email protected]>
> > Date: Fri Sep 28 10:28:44 2018 -0700
> >
> > tcp/fq: move back to CLOCK_MONOTONIC
> >
> > [--8<--]
> >
> > So this might be because my setup might be "odd".
> >
> > Basically I have a firewall with four nics that uses two of those nics
> > to handle my normal
> > internet connection (firewall/MASQ/NAT) and the other two are assigned
> > to one bridge each.
> >
> > The firewall is also my local caching DNS server and DHCP server,
> > which is also used by the VM:s...
> > But with 4.20 DHCP replies disappeared before entering the bridge - i
> > couldn't even see them in
> > tcpdump! (all nics are ixgbe on a atom soc)
> >
> > I'm currently running a kernel with that patch reversed but I'm also
> > wondering about possible ways
> > forward since I'm reverting a fix from someone else...
>
> I suggest you use netdev@ mailing list instead of lkml
>
> Then, we probably need to clear skb->tstamp in more paths (you are
> mentioning bridge ...)
>
> See commit 8203e2d844d34af247a151d8ebd68553a6e91785 for reference.
>
> Can you try :
>
> diff --git a/net/bridge/br_forward.c b/net/bridge/br_forward.c
> index 5372e2042adfe20d3cd039c29057535b2413be61..bd4fa141420c92a44716bd93fcd8aa3d3310203a
> 100644
> --- a/net/bridge/br_forward.c
> +++ b/net/bridge/br_forward.c
> @@ -53,6 +53,7 @@ int br_dev_queue_push_xmit(struct net *net, struct
> sock *sk, struct sk_buff *skb
> skb_set_network_header(skb, depth);
> }
>
> + skb->tstamp = 0;
> dev_queue_xmit(skb);
>
> return 0;

This works, and so does: https://marc.info/?l=linux-netdev&m=154696956604748&w=2

Pointed out by Paolo (tested both separately)

2019-01-10 08:57:41

by Paolo Abeni

[permalink] [raw]
Subject: Re: [BUG] moving fq back to clock monotonic breaks my setup

On Thu, 2019-01-10 at 09:25 +0100, Ian Kumlien wrote:
> On Thu, Jan 10, 2019 at 6:53 AM Eric Dumazet <[email protected]> wrote:
> > On Wed, Jan 9, 2019 at 4:48 PM Ian Kumlien <[email protected]> wrote:
> > > Hi,
> > >
> > > Just been trough ~5+ hours of bisecting and eventually actually found
> > > the culprit =)
> > >
> > > commit fb420d5d91c1274d5966917725e71f27ed092a85 (refs/bisect/bad)
> > > Author: Eric Dumazet <[email protected]>
> > > Date: Fri Sep 28 10:28:44 2018 -0700
> > >
> > > tcp/fq: move back to CLOCK_MONOTONIC
> > >
> > > [--8<--]
> > >
> > > So this might be because my setup might be "odd".
> > >
> > > Basically I have a firewall with four nics that uses two of those nics
> > > to handle my normal
> > > internet connection (firewall/MASQ/NAT) and the other two are assigned
> > > to one bridge each.
> > >
> > > The firewall is also my local caching DNS server and DHCP server,
> > > which is also used by the VM:s...
> > > But with 4.20 DHCP replies disappeared before entering the bridge - i
> > > couldn't even see them in
> > > tcpdump! (all nics are ixgbe on a atom soc)
> > >
> > > I'm currently running a kernel with that patch reversed but I'm also
> > > wondering about possible ways
> > > forward since I'm reverting a fix from someone else...
> >
> > I suggest you use netdev@ mailing list instead of lkml
> >
> > Then, we probably need to clear skb->tstamp in more paths (you are
> > mentioning bridge ...)
> >
> > See commit 8203e2d844d34af247a151d8ebd68553a6e91785 for reference.
> >
> > Can you try :
> >
> > diff --git a/net/bridge/br_forward.c b/net/bridge/br_forward.c
> > index 5372e2042adfe20d3cd039c29057535b2413be61..bd4fa141420c92a44716bd93fcd8aa3d3310203a
> > 100644
> > --- a/net/bridge/br_forward.c
> > +++ b/net/bridge/br_forward.c
> > @@ -53,6 +53,7 @@ int br_dev_queue_push_xmit(struct net *net, struct
> > sock *sk, struct sk_buff *skb
> > skb_set_network_header(skb, depth);
> > }
> >
> > + skb->tstamp = 0;
> > dev_queue_xmit(skb);
> >
> > return 0;
>
> This works, and so does: https://marc.info/?l=linux-netdev&m=154696956604748&w=2
>
> Pointed out by Paolo (tested both separately)

Note: I cleared the tstamp in br_forward_finish() instead of
br_dev_queue_push_xmit() because I think the latter could be called
also in the local xmit path, via br_nf_post_routing.

We must preserve the tstamp in output path, right?

Thanks,

Paolo





2019-01-11 09:48:17

by Eric Dumazet

[permalink] [raw]
Subject: Re: [BUG] moving fq back to clock monotonic breaks my setup

On Thu, Jan 10, 2019 at 12:55 AM Paolo Abeni <[email protected]> wrote:
>
> On Thu, 2019-01-10 at 09:25 +0100, Ian Kumlien wrote:


> > This works, and so does: https://marc.info/?l=linux-netdev&m=154696956604748&w=2
> >
> > Pointed out by Paolo (tested both separately)
>
> Note: I cleared the tstamp in br_forward_finish() instead of
> br_dev_queue_push_xmit() because I think the latter could be called
> also in the local xmit path, via br_nf_post_routing.
>
> We must preserve the tstamp in output path, right?
>

I was not aware of your patch, SGTM, thanks.

2019-01-11 12:35:41

by Ian Kumlien

[permalink] [raw]
Subject: Re: [BUG] moving fq back to clock monotonic breaks my setup

On Fri, Jan 11, 2019 at 10:35 AM Eric Dumazet <[email protected]> wrote:
> On Thu, Jan 10, 2019 at 12:55 AM Paolo Abeni <[email protected]> wrote:
> > On Thu, 2019-01-10 at 09:25 +0100, Ian Kumlien wrote:
>
>
> > > This works, and so does: https://marc.info/?l=linux-netdev&m=154696956604748&w=2
> > >
> > > Pointed out by Paolo (tested both separately)
> >
> > Note: I cleared the tstamp in br_forward_finish() instead of
> > br_dev_queue_push_xmit() because I think the latter could be called
> > also in the local xmit path, via br_nf_post_routing.
> >
> > We must preserve the tstamp in output path, right?
> >
>
> I was not aware of your patch, SGTM, thanks.

And you can add Tested-by: [email protected]