Hi,
I've came across the behavior where adding a child qdisc and then deleting
it again makes the networking dysfunctional (I guess that's because all of
a sudden there is absolutely no working qdisc on the device, although
there originally was a default one in the parent).
In a nutshell, is this expected behavior or bug?
=====
jikos:~ # tc qdisc show
qdisc tbf 10: dev eth0 root refcnt 2 rate 800Mbit burst 131000b lat 1.0ms
jikos:~ # ping -c 1 nix.cz | head -2
PING nix.cz (195.47.235.3) 56(84) bytes of data.
64 bytes from info.nix.cz (195.47.235.3): icmp_seq=1 ttl=89 time=1.59 ms
jikos:~ # tc qdisc add dev eth0 parent 10:1 sfq
jikos:~ # tc qdisc show
qdisc tbf 10: dev eth0 root refcnt 2 rate 800Mbit burst 131000b lat 1.0ms
qdisc sfq 8008: dev eth0 parent 10:1 limit 127p quantum 1514b depth 127 divisor 1024
jikos:~ # ping -c 1 nix.cz | head -2
PING nix.cz (195.47.235.3) 56(84) bytes of data.
64 bytes from info.nix.cz (195.47.235.3): icmp_seq=1 ttl=89 time=1.67 ms
jikos:~ # tc qdisc del dev eth0 parent 10:1 sfq
jikos:~ # tc qdisc show
qdisc tbf 10: dev eth0 root refcnt 2 rate 800Mbit burst 131000b lat 1.0ms
jikos:~ # ping -c 1 nix.cz | head -2
PING nix.cz (195.47.235.3) 56(84) bytes of data.
[ ... nothing happens ... ]
^C
jikos:~ # tc qdisc add dev eth0 parent 10:1 sfq
jikos:~ # ping -c 1 nix.cz | head -2
PING nix.cz (195.47.235.3) 56(84) bytes of data.
64 bytes from info.nix.cz (195.47.235.3): icmp_seq=1 ttl=89 time=1.66 ms
=====
Thanks,
--
Jiri Kosina
On Thu, 14 Apr 2016, Jiri Kosina wrote:
> In a nutshell, is this expected behavior or bug?
Just to clarify what seems to suggest to me that this is rather a bug that
needs to be fixed (but apparently one that has been there for quite a long
time) can be demonstrated by this:
>
> =====
> jikos:~ # tc qdisc show
> qdisc tbf 10: dev eth0 root refcnt 2 rate 800Mbit burst 131000b lat 1.0ms
The above configuration works.
> jikos:~ # ping -c 1 nix.cz | head -2
> PING nix.cz (195.47.235.3) 56(84) bytes of data.
> 64 bytes from info.nix.cz (195.47.235.3): icmp_seq=1 ttl=89 time=1.59 ms
>
> jikos:~ # tc qdisc add dev eth0 parent 10:1 sfq
> jikos:~ # tc qdisc show
> qdisc tbf 10: dev eth0 root refcnt 2 rate 800Mbit burst 131000b lat 1.0ms
> qdisc sfq 8008: dev eth0 parent 10:1 limit 127p quantum 1514b depth 127 divisor 1024
>
> jikos:~ # ping -c 1 nix.cz | head -2
> PING nix.cz (195.47.235.3) 56(84) bytes of data.
> 64 bytes from info.nix.cz (195.47.235.3): icmp_seq=1 ttl=89 time=1.67 ms
>
> jikos:~ # tc qdisc del dev eth0 parent 10:1 sfq
> jikos:~ # tc qdisc show
> qdisc tbf 10: dev eth0 root refcnt 2 rate 800Mbit burst 131000b lat 1.0ms
The above configuration doesn't although it's identical to the working one
at the beginning.
> jikos:~ # ping -c 1 nix.cz | head -2
> PING nix.cz (195.47.235.3) 56(84) bytes of data.
> [ ... nothing happens ... ]
> ^C
--
Jiri Kosina
SUSE Labs
On Thu, 2016-04-14 at 16:44 +0200, Jiri Kosina wrote:
> Hi,
>
> I've came across the behavior where adding a child qdisc and then deleting
> it again makes the networking dysfunctional (I guess that's because all of
> a sudden there is absolutely no working qdisc on the device, although
> there originally was a default one in the parent).
>
> In a nutshell, is this expected behavior or bug?
This is the expected behavior.
If the kernel was suddenly doing a 'replace' when you ask a delete,
then the scripts doing a delete , than a add would break.
tc users are skilled admins ;)
On Thu, Apr 14, 2016 at 08:01:39AM -0700, Eric Dumazet wrote:
> On Thu, 2016-04-14 at 16:44 +0200, Jiri Kosina wrote:
> > Hi,
> >
> > I've came across the behavior where adding a child qdisc and then deleting
> > it again makes the networking dysfunctional (I guess that's because all of
> > a sudden there is absolutely no working qdisc on the device, although
> > there originally was a default one in the parent).
> >
> > In a nutshell, is this expected behavior or bug?
>
> This is the expected behavior.
OTOH some qdiscs (CBQ, DRR, DSMARK, HFSC, HTB, QFQ) assign the default
one upon deletion instead of noop_qdisc, hence I would describe
the situation using the words 'inconsistent' and 'accident' rather than
'expected'. :)
Anyhow, the problem with skilled admins is they accept quirks too easily
and just build their scripts around them - the same scripts we have to
keep compatible to then.
Cheers, Phil
On Thu, 14 Apr 2016, Phil Sutter wrote:
> OTOH some qdiscs (CBQ, DRR, DSMARK, HFSC, HTB, QFQ) assign the default
> one upon deletion instead of noop_qdisc, hence I would describe
> the situation using the words 'inconsistent' and 'accident' rather than
> 'expected'. :)
Exactly. I'd again like to stress the fact that this configuration works:
jikos:~ # tc qdisc show
qdisc tbf 10: dev eth0 root refcnt 2 rate 800Mbit burst 131000b lat 1.0ms
and this (after performing add/delete operation) doesn't:
jikos:~ # tc qdisc show
qdisc tbf 10: dev eth0 root refcnt 2 rate 800Mbit burst 131000b lat 1.0ms
It's hard to spot a difference (hint: there is none).
Thanks,
--
Jiri Kosina
SUSE Labs
On Thu, 2016-04-14 at 17:34 +0200, Jiri Kosina wrote:
> On Thu, 14 Apr 2016, Phil Sutter wrote:
>
> > OTOH some qdiscs (CBQ, DRR, DSMARK, HFSC, HTB, QFQ) assign the default
> > one upon deletion instead of noop_qdisc, hence I would describe
> > the situation using the words 'inconsistent' and 'accident' rather than
> > 'expected'. :)
>
> Exactly. I'd again like to stress the fact that this configuration works:
>
> jikos:~ # tc qdisc show
> qdisc tbf 10: dev eth0 root refcnt 2 rate 800Mbit burst 131000b lat 1.0ms
>
> and this (after performing add/delete operation) doesn't:
>
> jikos:~ # tc qdisc show
> qdisc tbf 10: dev eth0 root refcnt 2 rate 800Mbit burst 131000b lat 1.0ms
>
> It's hard to spot a difference (hint: there is none).
This is because some qdisc are not visible in the dump.
qdisc_list_add() uses a single list, so adding too much stuff in it
could slow down fast path (qdisc_lookup(), called from
qdisc_tree_reduce_backlog())
On Thu, 14 Apr 2016, Phil Sutter wrote:
> > > I've came across the behavior where adding a child qdisc and then deleting
> > > it again makes the networking dysfunctional (I guess that's because all of
> > > a sudden there is absolutely no working qdisc on the device, although
> > > there originally was a default one in the parent).
> > >
> > > In a nutshell, is this expected behavior or bug?
> >
> > This is the expected behavior.
>
> OTOH some qdiscs (CBQ, DRR, DSMARK, HFSC, HTB, QFQ) assign the default
> one upon deletion instead of noop_qdisc, hence I would describe
> the situation using the words 'inconsistent' and 'accident' rather than
> 'expected'. :)
Would a patch that'd unify this in a sense that all qdiscs would assign
the default one upon deletion acceptable?
Thanks,
--
Jiri Kosina
SUSE Labs
On Thu, Apr 14, 2016 at 08:44:40AM -0700, Eric Dumazet wrote:
> On Thu, 2016-04-14 at 17:34 +0200, Jiri Kosina wrote:
> > On Thu, 14 Apr 2016, Phil Sutter wrote:
> >
> > > OTOH some qdiscs (CBQ, DRR, DSMARK, HFSC, HTB, QFQ) assign the default
> > > one upon deletion instead of noop_qdisc, hence I would describe
> > > the situation using the words 'inconsistent' and 'accident' rather than
> > > 'expected'. :)
> >
> > Exactly. I'd again like to stress the fact that this configuration works:
> >
> > jikos:~ # tc qdisc show
> > qdisc tbf 10: dev eth0 root refcnt 2 rate 800Mbit burst 131000b lat 1.0ms
> >
> > and this (after performing add/delete operation) doesn't:
> >
> > jikos:~ # tc qdisc show
> > qdisc tbf 10: dev eth0 root refcnt 2 rate 800Mbit burst 131000b lat 1.0ms
> >
> > It's hard to spot a difference (hint: there is none).
>
> This is because some qdisc are not visible in the dump.
And those being invisible can be overridden using 'tc qd add', right?
AFAIR they're not listed because they don't properly register, so the
system doesn't care to override them. In this case we could change all
classful qdiscs to restore the default qdisc if a leaf qdisc is being
deleted instead of noop (which is probably not what the user wants
anyway).
Cheers, Phil
On Thu, 2016-04-14 at 18:22 +0200, Phil Sutter wrote:
> And those being invisible can be overridden using 'tc qd add', right?
> AFAIR they're not listed because they don't properly register, so the
> system doesn't care to override them. In this case we could change all
> classful qdiscs to restore the default qdisc if a leaf qdisc is being
> deleted instead of noop (which is probably not what the user wants
> anyway).
Even if they properly register, they are not visible.
Take a look at
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=95dc19299f741c986227ec33e23cbf9b3321f812
for some context.
When a default pfifo is created on say a HTB class, you do not see it by
default in a dump.
If you have 100 HTB classes, HTB created 100 pfifo just fine, and it
works, unless an admin tries to delete them maybe ;)
On Thu, 2016-04-14 at 18:08 +0200, Jiri Kosina wrote:
> On Thu, 14 Apr 2016, Phil Sutter wrote:
>
> > > > I've came across the behavior where adding a child qdisc and then deleting
> > > > it again makes the networking dysfunctional (I guess that's because all of
> > > > a sudden there is absolutely no working qdisc on the device, although
> > > > there originally was a default one in the parent).
> > > >
> > > > In a nutshell, is this expected behavior or bug?
> > >
> > > This is the expected behavior.
> >
> > OTOH some qdiscs (CBQ, DRR, DSMARK, HFSC, HTB, QFQ) assign the default
> > one upon deletion instead of noop_qdisc, hence I would describe
> > the situation using the words 'inconsistent' and 'accident' rather than
> > 'expected'. :)
>
> Would a patch that'd unify this in a sense that all qdiscs would assign
> the default one upon deletion acceptable?
>
And what would be the chosen behavior ?
Relying on TBF installing a bfifo for you at delete would be hazardous.
For example CBQ got it differently than HFSC
If qdisc_create_dflt() fails in CBQ, we fail the 'delete', while HFSC
falls back to noop_qdisc, without warning the user :(
At least always using noop_qdisc is consistent. No magic there.
Doing 'unification' right now would break existing scripts.
This is too late, I am afraid.
On 16-04-14 01:49 PM, Eric Dumazet wrote:
> And what would be the chosen behavior ?
>
TBF is probably a bad example because it started life as
a classless qdisc. There was only one built-in fifo queue
that was shaped. Then someone made it classful and changed
this behavior. To me it sounds reasonable to have the
default behavior restored. At minimal consistency.
> Relying on TBF installing a bfifo for you at delete would be hazardous.
>
> For example CBQ got it differently than HFSC
>
> If qdisc_create_dflt() fails in CBQ, we fail the 'delete', while HFSC
> falls back to noop_qdisc, without warning the user :(
>
> At least always using noop_qdisc is consistent. No magic there.
>
> Doing 'unification' right now would break existing scripts.
>
> This is too late, I am afraid.
Sigh. So rant:
IMO, we should let any new APIs and API updates stay longer
in discussion. Or better mark them as unstable for sometime.
The excuse that "it is out in the wild therefore cant be changed"
is harmful because the timeline is "forever" whereas
patches are applied after a short period of posting
and discussions and sometimes not involving the right people.
It is like having a jury issuing a death sentence after 1 week of
deliberation. You cant take it back after execution.
cheers,
jamal
On Fri, 2016-04-15 at 08:42 -0400, Jamal Hadi Salim wrote:
> On 16-04-14 01:49 PM, Eric Dumazet wrote:
>
> > And what would be the chosen behavior ?
> >
>
> TBF is probably a bad example because it started life as
> a classless qdisc. There was only one built-in fifo queue
> that was shaped. Then someone made it classful and changed
> this behavior. To me it sounds reasonable to have the
> default behavior restored. At minimal consistency.
Then you need to save the initial qdisc (bfifo for TBF) in a special
place, to make sure the delete operation is guaranteed to succeed.
Or fail the delete if the bfifo can not be allocated.
I can tell that determinism if far more interesting than usability for
some users occasionally playing with tc.
Surely the silent fallback to noop_qdisc is wrong.
Anyway, we probably need to improve our ability to understand qdisc
hierarchies. Having some hidden qdiscs is the real problem here.
We need to add some hash table so that qdisc_match_from_root() does not
have to scan hundred of qdiscs.
From: Eric Dumazet <[email protected]>
Date: Fri, 15 Apr 2016 07:58:48 -0700
> Having some hidden qdiscs is the real problem here.
+1