2023-08-01 21:02:04

by Vladimir Oltean

[permalink] [raw]
Subject: [PATCH v3 net-next 00/10] Improve the taprio qdisc's relationship with its children

Changes in v3:
Fix ptp_mock compilation as module, fix small mistakes in selftests.

Changes in v2:
It was requested to add test cases for the taprio software and offload modes.
Those are patches 08 and 09.

That implies adding taprio offload support to netdevsim, which is patch 07.

In turn, that implies adding a PHC driver for netdevsim, which is patch 06.

v1 at:
https://lore.kernel.org/lkml/[email protected]/

Original message:

Prompted by Vinicius' request to consolidate some child Qdisc
dereferences in taprio:
https://lore.kernel.org/netdev/[email protected]/

I remembered that I had left some unfinished work in this Qdisc, namely
commit af7b29b1deaa ("Revert "net/sched: taprio: make qdisc_leaf() see
the per-netdev-queue pfifo child qdiscs"").

This patch set represents another stab at, essentially, what's in the
title. Not only does taprio not properly detect when it's grafted as a
non-root qdisc, but it also returns incorrect per-class stats.
Eventually, Vinicius' request is addressed too, although in a different
form than the one he requested (which was purely cosmetic).

Review from people more experienced with Qdiscs than me would be
appreciated. I tried my best to explain what I consider to be problems.
I am deliberately targeting net-next because the changes are too
invasive for net - they were reverted from stable once already.

Vladimir Oltean (10):
net/sched: taprio: don't access q->qdiscs[] in unoffloaded mode during
attach()
net/sched: taprio: keep child Qdisc refcount elevated at 2 in offload
mode
net/sched: taprio: try again to report q->qdiscs[] to qdisc_leaf()
net/sched: taprio: delete misleading comment about preallocating child
qdiscs
net/sched: taprio: dump class stats for the actual q->qdiscs[]
net: ptp: create a mock-up PTP Hardware Clock driver
net: netdevsim: use mock PHC driver
net: netdevsim: mimic tc-taprio offload
selftests/tc-testing: test that taprio can only be attached as root
selftests/tc-testing: verify that a qdisc can be grafted onto a taprio
class

MAINTAINERS | 7 +
drivers/net/Kconfig | 1 +
drivers/net/netdevsim/ethtool.c | 11 ++
drivers/net/netdevsim/netdev.c | 38 +++-
drivers/net/netdevsim/netdevsim.h | 2 +
drivers/ptp/Kconfig | 11 ++
drivers/ptp/Makefile | 1 +
drivers/ptp/ptp_mock.c | 175 ++++++++++++++++++
include/linux/ptp_mock.h | 38 ++++
net/sched/sch_taprio.c | 68 ++++---
.../tc-testing/tc-tests/qdiscs/taprio.json | 100 +++++++++-
11 files changed, 423 insertions(+), 29 deletions(-)
create mode 100644 drivers/ptp/ptp_mock.c
create mode 100644 include/linux/ptp_mock.h

--
2.34.1



2023-08-01 21:17:35

by Vladimir Oltean

[permalink] [raw]
Subject: [PATCH v3 net-next 01/10] net/sched: taprio: don't access q->qdiscs[] in unoffloaded mode during attach()

This is a simple code transformation with no intended behavior change,
just to make it absolutely clear that q->qdiscs[] is only attached to
the child taprio classes in full offload mode.

Right now we use the q->qdiscs[] variable in taprio_attach() for
software mode too, but that is quite confusing and avoidable. We use
it only to reach the netdev TX queue, but we could as well just use
netdev_get_tx_queue() for that.

Signed-off-by: Vladimir Oltean <[email protected]>
---
v1->v3: none

net/sched/sch_taprio.c | 12 +++++++++---
1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
index 717ae51d94a0..d9ff75a08e12 100644
--- a/net/sched/sch_taprio.c
+++ b/net/sched/sch_taprio.c
@@ -2134,14 +2134,20 @@ static void taprio_attach(struct Qdisc *sch)

/* Attach underlying qdisc */
for (ntx = 0; ntx < dev->num_tx_queues; ntx++) {
- struct Qdisc *qdisc = q->qdiscs[ntx];
+ struct netdev_queue *dev_queue = netdev_get_tx_queue(dev, ntx);
struct Qdisc *old;

if (FULL_OFFLOAD_IS_ENABLED(q->flags)) {
+ struct Qdisc *qdisc = q->qdiscs[ntx];
+
qdisc->flags |= TCQ_F_ONETXQUEUE | TCQ_F_NOPARENT;
- old = dev_graft_qdisc(qdisc->dev_queue, qdisc);
+ old = dev_graft_qdisc(dev_queue, qdisc);
} else {
- old = dev_graft_qdisc(qdisc->dev_queue, sch);
+ /* In software mode, attach the root taprio qdisc
+ * to all netdev TX queues, so that dev_qdisc_enqueue()
+ * goes through taprio_enqueue().
+ */
+ old = dev_graft_qdisc(dev_queue, sch);
qdisc_refcount_inc(sch);
}
if (old)
--
2.34.1


2023-08-01 21:54:06

by Vladimir Oltean

[permalink] [raw]
Subject: [PATCH v3 net-next 02/10] net/sched: taprio: keep child Qdisc refcount elevated at 2 in offload mode

Normally, Qdiscs have one reference on them held by their owner and one
held for each TXQ to which they are attached, however this is not the
case with the children of an offloaded taprio. Instead, the taprio qdisc
currently lives in the following fragile equilibrium.

In the software scheduling case, taprio attaches itself (the root Qdisc)
to all TXQs, thus having a refcount of 1 + the number of TX queues. In
this mode, the q->qdiscs[] children are not visible directly to the
Qdisc API. The lifetime of the Qdiscs from this private array lasts
until qdisc_destroy() -> taprio_destroy().

In the fully offloaded case, the root taprio has a refcount of 1, and
all child q->qdiscs[] also have a refcount of 1. The child q->qdiscs[]
are attached to the netdev TXQs directly and thus are visible to the
Qdisc API, however taprio loses a reference to them very early - during
qdisc_graft(parent==NULL) -> taprio_attach(). At that time, taprio frees
the q->qdiscs[] array to not leak memory, but interestingly, it does not
release a reference on these qdiscs because it doesn't effectively own
them - they are created by taprio but owned by the Qdisc core, and will
be freed by qdisc_graft(parent==NULL, new==NULL) -> qdisc_put(old) when
the Qdisc is deleted or when the child Qdisc is replaced with something
else.

My interest is to change this equilibrium such that taprio also owns a
reference on the q->qdiscs[] child Qdiscs for the lifetime of the root
Qdisc, including in full offload mode. I want this because I would like
taprio_leaf(), taprio_dump_class(), taprio_dump_class_stats() to have
insight into q->qdiscs[] for the software scheduling mode - currently
they look at dev_queue->qdisc_sleeping, which is, as mentioned, the same
as the root taprio.

The following set of changes is necessary:
- don't free q->qdiscs[] early in taprio_attach(), free it late in
taprio_destroy() for consistency with software mode. But:
- currently that's not possible, because taprio doesn't own a reference
on q->qdiscs[]. So hold that reference - once during the initial
attach() and once during subsequent graft() calls when the child is
changed.
- always keep track of the current child in q->qdiscs[], even for full
offload mode, so that we free in taprio_destroy() what we should, and
not something stale.

Signed-off-by: Vladimir Oltean <[email protected]>
---
v2->v3: none
v1->v2:
- fix refcount not dropping to 0 after a graft operation - spotted by
Paolo
- slightly reword commit message and comments

net/sched/sch_taprio.c | 39 +++++++++++++++++++++++++--------------
1 file changed, 25 insertions(+), 14 deletions(-)

diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
index d9ff75a08e12..41944197876a 100644
--- a/net/sched/sch_taprio.c
+++ b/net/sched/sch_taprio.c
@@ -2135,30 +2135,31 @@ static void taprio_attach(struct Qdisc *sch)
/* Attach underlying qdisc */
for (ntx = 0; ntx < dev->num_tx_queues; ntx++) {
struct netdev_queue *dev_queue = netdev_get_tx_queue(dev, ntx);
- struct Qdisc *old;
+ struct Qdisc *old, *dev_queue_qdisc;

if (FULL_OFFLOAD_IS_ENABLED(q->flags)) {
struct Qdisc *qdisc = q->qdiscs[ntx];

+ /* In offload mode, the root taprio qdisc is bypassed
+ * and the netdev TX queues see the children directly
+ */
qdisc->flags |= TCQ_F_ONETXQUEUE | TCQ_F_NOPARENT;
- old = dev_graft_qdisc(dev_queue, qdisc);
+ dev_queue_qdisc = qdisc;
} else {
/* In software mode, attach the root taprio qdisc
* to all netdev TX queues, so that dev_qdisc_enqueue()
* goes through taprio_enqueue().
*/
- old = dev_graft_qdisc(dev_queue, sch);
- qdisc_refcount_inc(sch);
+ dev_queue_qdisc = sch;
}
+ old = dev_graft_qdisc(dev_queue, dev_queue_qdisc);
+ /* The qdisc's refcount requires to be elevated once
+ * for each netdev TX queue it is grafted onto
+ */
+ qdisc_refcount_inc(dev_queue_qdisc);
if (old)
qdisc_put(old);
}
-
- /* access to the child qdiscs is not needed in offload mode */
- if (FULL_OFFLOAD_IS_ENABLED(q->flags)) {
- kfree(q->qdiscs);
- q->qdiscs = NULL;
- }
}

static struct netdev_queue *taprio_queue_get(struct Qdisc *sch,
@@ -2187,13 +2188,23 @@ static int taprio_graft(struct Qdisc *sch, unsigned long cl,
if (dev->flags & IFF_UP)
dev_deactivate(dev);

+ /* In offload mode, the child Qdisc is directly attached to the netdev
+ * TX queue, and thus, we need to keep its refcount elevated in order
+ * to counteract qdisc_graft()'s call to qdisc_put() once per TX queue.
+ * However, save the reference to the new qdisc in the private array in
+ * both software and offload cases, to have an up-to-date reference to
+ * our children.
+ */
+ *old = q->qdiscs[cl - 1];
if (FULL_OFFLOAD_IS_ENABLED(q->flags)) {
- *old = dev_graft_qdisc(dev_queue, new);
- } else {
- *old = q->qdiscs[cl - 1];
- q->qdiscs[cl - 1] = new;
+ WARN_ON_ONCE(dev_graft_qdisc(dev_queue, new) != *old);
+ if (new)
+ qdisc_refcount_inc(new);
+ if (*old)
+ qdisc_put(*old);
}

+ q->qdiscs[cl - 1] = new;
if (new)
new->flags |= TCQ_F_ONETXQUEUE | TCQ_F_NOPARENT;

--
2.34.1


2023-08-03 11:36:47

by Paolo Abeni

[permalink] [raw]
Subject: Re: [Intel-wired-lan] [PATCH v3 net-next 00/10] Improve the taprio qdisc's relationship with its children

Hi,

On Tue, 2023-08-01 at 21:24 +0300, Vladimir Oltean wrote:
> Changes in v3:
> Fix ptp_mock compilation as module, fix small mistakes in selftests.
>
> Changes in v2:
> It was requested to add test cases for the taprio software and offload modes.
> Those are patches 08 and 09.
>
> That implies adding taprio offload support to netdevsim, which is patch 07.
>
> In turn, that implies adding a PHC driver for netdevsim, which is patch 06.
>
> v1 at:
> https://lore.kernel.org/lkml/[email protected]/
>
> Original message:
>
> Prompted by Vinicius' request to consolidate some child Qdisc
> dereferences in taprio:
> https://lore.kernel.org/netdev/[email protected]/
>
> I remembered that I had left some unfinished work in this Qdisc, namely
> commit af7b29b1deaa ("Revert "net/sched: taprio: make qdisc_leaf() see
> the per-netdev-queue pfifo child qdiscs"").
>
> This patch set represents another stab at, essentially, what's in the
> title. Not only does taprio not properly detect when it's grafted as a
> non-root qdisc, but it also returns incorrect per-class stats.
> Eventually, Vinicius' request is addressed too, although in a different
> form than the one he requested (which was purely cosmetic).
>
> Review from people more experienced with Qdiscs than me would be
> appreciated. I tried my best to explain what I consider to be problems.
> I am deliberately targeting net-next because the changes are too
> invasive for net - they were reverted from stable once already.

The series LGTM, modulo the minor comments from Kurt on patch 6/10. I
agree they can be handled with follow-up patches.

Keeping it a little longer on PW: it would be great if someone from the
tc crew could have a look!

Thanks!

Paolo