2022-09-15 10:32:06

by Vladimir Oltean

[permalink] [raw]
Subject: [PATCH v2 net 2/2] net/sched: taprio: make qdisc_leaf() see the per-netdev-queue pfifo child qdiscs

taprio can only operate as root qdisc, and to that end, there exists the
following check in taprio_init(), just as in mqprio:

if (sch->parent != TC_H_ROOT)
return -EOPNOTSUPP;

And indeed, when we try to attach taprio to an mqprio child, it fails as
expected:

$ tc qdisc add dev swp0 root handle 1: mqprio num_tc 8 \
map 0 1 2 3 4 5 6 7 \
queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 hw 0
$ tc qdisc replace dev swp0 parent 1:2 taprio num_tc 8 \
map 0 1 2 3 4 5 6 7 \
queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
base-time 0 sched-entry S 0x7f 990000 sched-entry S 0x80 100000 \
flags 0x0 clockid CLOCK_TAI
Error: sch_taprio: Can only be attached as root qdisc.

(extack message added by me)

But when we try to attach a taprio child to a taprio root qdisc,
surprisingly it doesn't fail:

$ tc qdisc replace dev swp0 root handle 1: taprio num_tc 8 \
map 0 1 2 3 4 5 6 7 queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
base-time 0 sched-entry S 0x7f 990000 sched-entry S 0x80 100000 \
flags 0x0 clockid CLOCK_TAI
$ tc qdisc replace dev swp0 parent 1:2 taprio num_tc 8 \
map 0 1 2 3 4 5 6 7 \
queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
base-time 0 sched-entry S 0x7f 990000 sched-entry S 0x80 100000 \
flags 0x0 clockid CLOCK_TAI

This is because tc_modify_qdisc() behaves differently when mqprio is
root, vs when taprio is root.

In the mqprio case, it finds the parent qdisc through
p = qdisc_lookup(dev, TC_H_MAJ(clid)), and then the child qdisc through
q = qdisc_leaf(p, clid). This leaf qdisc q has handle 0, so it is
ignored according to the comment right below ("It may be default qdisc,
ignore it"). As a result, tc_modify_qdisc() goes through the
qdisc_create() code path, and this gives taprio_init() a chance to check
for sch_parent != TC_H_ROOT and error out.

Whereas in the taprio case, the returned q = qdisc_leaf(p, clid) is
different. It is not the default qdisc created for each netdev queue
(both taprio and mqprio call qdisc_create_dflt() and keep them in
a private q->qdiscs[], or priv->qdiscs[], respectively). Instead, taprio
makes qdisc_leaf() return the _root_ qdisc, aka itself.

When taprio does that, tc_modify_qdisc() goes through the qdisc_change()
code path, because the qdisc layer never finds out about the child qdisc
of the root. And through the ->change() ops, taprio has no reason to
check whether its parent is root or not, just through ->init(), which is
not called.

The problem is the taprio_leaf() implementation. Even though code wise,
it does the exact same thing as mqprio_leaf() which it is copied from,
it works with different input data. This is because mqprio does not
attach itself (the root) to each device TX queue, but one of the default
qdiscs from its private array.

In fact, since commit 13511704f8d7 ("net: taprio offload: enforce qdisc
to netdev queue mapping"), taprio does this too, but just for the full
offload case. So if we tried to attach a taprio child to a fully
offloaded taprio root qdisc, it would properly fail too; just not to a
software root taprio.

To fix the problem, stop looking at the Qdisc that's attached to the TX
queue, and instead, always return the default qdiscs that we've
allocated (and to which we privately enqueue and dequeue, in software
scheduling mode).

Since Qdisc_class_ops :: leaf is only called from tc_modify_qdisc(),
the risk of unforeseen side effects introduced by this change is
minimal.

Fixes: 5a781ccbd19e ("tc: Add support for configuring the taprio scheduler")
Signed-off-by: Vladimir Oltean <[email protected]>
---
v1->v2: none

net/sched/sch_taprio.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
index a3b4f92a9937..5bffc37022e0 100644
--- a/net/sched/sch_taprio.c
+++ b/net/sched/sch_taprio.c
@@ -1949,12 +1949,14 @@ static int taprio_dump(struct Qdisc *sch, struct sk_buff *skb)

static struct Qdisc *taprio_leaf(struct Qdisc *sch, unsigned long cl)
{
- struct netdev_queue *dev_queue = taprio_queue_get(sch, cl);
+ struct taprio_sched *q = qdisc_priv(sch);
+ struct net_device *dev = qdisc_dev(sch);
+ unsigned int ntx = cl - 1;

- if (!dev_queue)
+ if (ntx >= dev->num_tx_queues)
return NULL;

- return dev_queue->qdisc_sleeping;
+ return q->qdiscs[ntx];
}

static unsigned long taprio_find(struct Qdisc *sch, u32 classid)
--
2.34.1


2022-09-15 21:59:16

by Vinicius Costa Gomes

[permalink] [raw]
Subject: Re: [PATCH v2 net 2/2] net/sched: taprio: make qdisc_leaf() see the per-netdev-queue pfifo child qdiscs

Vladimir Oltean <[email protected]> writes:

> taprio can only operate as root qdisc, and to that end, there exists the
> following check in taprio_init(), just as in mqprio:
>
> if (sch->parent != TC_H_ROOT)
> return -EOPNOTSUPP;
>
> And indeed, when we try to attach taprio to an mqprio child, it fails as
> expected:
>
> $ tc qdisc add dev swp0 root handle 1: mqprio num_tc 8 \
> map 0 1 2 3 4 5 6 7 \
> queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 hw 0
> $ tc qdisc replace dev swp0 parent 1:2 taprio num_tc 8 \
> map 0 1 2 3 4 5 6 7 \
> queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
> base-time 0 sched-entry S 0x7f 990000 sched-entry S 0x80 100000 \
> flags 0x0 clockid CLOCK_TAI
> Error: sch_taprio: Can only be attached as root qdisc.
>
> (extack message added by me)
>
> But when we try to attach a taprio child to a taprio root qdisc,
> surprisingly it doesn't fail:
>
> $ tc qdisc replace dev swp0 root handle 1: taprio num_tc 8 \
> map 0 1 2 3 4 5 6 7 queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
> base-time 0 sched-entry S 0x7f 990000 sched-entry S 0x80 100000 \
> flags 0x0 clockid CLOCK_TAI
> $ tc qdisc replace dev swp0 parent 1:2 taprio num_tc 8 \
> map 0 1 2 3 4 5 6 7 \
> queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
> base-time 0 sched-entry S 0x7f 990000 sched-entry S 0x80 100000 \
> flags 0x0 clockid CLOCK_TAI
>
> This is because tc_modify_qdisc() behaves differently when mqprio is
> root, vs when taprio is root.
>
> In the mqprio case, it finds the parent qdisc through
> p = qdisc_lookup(dev, TC_H_MAJ(clid)), and then the child qdisc through
> q = qdisc_leaf(p, clid). This leaf qdisc q has handle 0, so it is
> ignored according to the comment right below ("It may be default qdisc,
> ignore it"). As a result, tc_modify_qdisc() goes through the
> qdisc_create() code path, and this gives taprio_init() a chance to check
> for sch_parent != TC_H_ROOT and error out.
>
> Whereas in the taprio case, the returned q = qdisc_leaf(p, clid) is
> different. It is not the default qdisc created for each netdev queue
> (both taprio and mqprio call qdisc_create_dflt() and keep them in
> a private q->qdiscs[], or priv->qdiscs[], respectively). Instead, taprio
> makes qdisc_leaf() return the _root_ qdisc, aka itself.
>
> When taprio does that, tc_modify_qdisc() goes through the qdisc_change()
> code path, because the qdisc layer never finds out about the child qdisc
> of the root. And through the ->change() ops, taprio has no reason to
> check whether its parent is root or not, just through ->init(), which is
> not called.
>
> The problem is the taprio_leaf() implementation. Even though code wise,
> it does the exact same thing as mqprio_leaf() which it is copied from,
> it works with different input data. This is because mqprio does not
> attach itself (the root) to each device TX queue, but one of the default
> qdiscs from its private array.
>
> In fact, since commit 13511704f8d7 ("net: taprio offload: enforce qdisc
> to netdev queue mapping"), taprio does this too, but just for the full
> offload case. So if we tried to attach a taprio child to a fully
> offloaded taprio root qdisc, it would properly fail too; just not to a
> software root taprio.
>
> To fix the problem, stop looking at the Qdisc that's attached to the TX
> queue, and instead, always return the default qdiscs that we've
> allocated (and to which we privately enqueue and dequeue, in software
> scheduling mode).
>
> Since Qdisc_class_ops :: leaf is only called from tc_modify_qdisc(),
> the risk of unforeseen side effects introduced by this change is
> minimal.
>
> Fixes: 5a781ccbd19e ("tc: Add support for configuring the taprio scheduler")
> Signed-off-by: Vladimir Oltean <[email protected]>
> ---

I spent a long time trying to think of cases that this would break,
could not think of anything:


Reviewed-by: Vinicius Costa Gomes <[email protected]>


Cheers,
--
Vinicius

2023-02-22 14:03:13

by Kurt Kanzenbach

[permalink] [raw]
Subject: Re: [PATCH v2 net 2/2] net/sched: taprio: make qdisc_leaf() see the per-netdev-queue pfifo child qdiscs

Hi Vladimir,

On Thu Sep 15 2022, Vladimir Oltean wrote:
> taprio can only operate as root qdisc, and to that end, there exists the
> following check in taprio_init(), just as in mqprio:
>
> if (sch->parent != TC_H_ROOT)
> return -EOPNOTSUPP;
>
> And indeed, when we try to attach taprio to an mqprio child, it fails as
> expected:
>
> $ tc qdisc add dev swp0 root handle 1: mqprio num_tc 8 \
> map 0 1 2 3 4 5 6 7 \
> queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 hw 0
> $ tc qdisc replace dev swp0 parent 1:2 taprio num_tc 8 \
> map 0 1 2 3 4 5 6 7 \
> queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
> base-time 0 sched-entry S 0x7f 990000 sched-entry S 0x80 100000 \
> flags 0x0 clockid CLOCK_TAI
> Error: sch_taprio: Can only be attached as root qdisc.
>
> (extack message added by me)
>
> But when we try to attach a taprio child to a taprio root qdisc,
> surprisingly it doesn't fail:
>
> $ tc qdisc replace dev swp0 root handle 1: taprio num_tc 8 \
> map 0 1 2 3 4 5 6 7 queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
> base-time 0 sched-entry S 0x7f 990000 sched-entry S 0x80 100000 \
> flags 0x0 clockid CLOCK_TAI
> $ tc qdisc replace dev swp0 parent 1:2 taprio num_tc 8 \
> map 0 1 2 3 4 5 6 7 \
> queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
> base-time 0 sched-entry S 0x7f 990000 sched-entry S 0x80 100000 \
> flags 0x0 clockid CLOCK_TAI
>
> This is because tc_modify_qdisc() behaves differently when mqprio is
> root, vs when taprio is root.
>
> In the mqprio case, it finds the parent qdisc through
> p = qdisc_lookup(dev, TC_H_MAJ(clid)), and then the child qdisc through
> q = qdisc_leaf(p, clid). This leaf qdisc q has handle 0, so it is
> ignored according to the comment right below ("It may be default qdisc,
> ignore it"). As a result, tc_modify_qdisc() goes through the
> qdisc_create() code path, and this gives taprio_init() a chance to check
> for sch_parent != TC_H_ROOT and error out.
>
> Whereas in the taprio case, the returned q = qdisc_leaf(p, clid) is
> different. It is not the default qdisc created for each netdev queue
> (both taprio and mqprio call qdisc_create_dflt() and keep them in
> a private q->qdiscs[], or priv->qdiscs[], respectively). Instead, taprio
> makes qdisc_leaf() return the _root_ qdisc, aka itself.
>
> When taprio does that, tc_modify_qdisc() goes through the qdisc_change()
> code path, because the qdisc layer never finds out about the child qdisc
> of the root. And through the ->change() ops, taprio has no reason to
> check whether its parent is root or not, just through ->init(), which is
> not called.
>
> The problem is the taprio_leaf() implementation. Even though code wise,
> it does the exact same thing as mqprio_leaf() which it is copied from,
> it works with different input data. This is because mqprio does not
> attach itself (the root) to each device TX queue, but one of the default
> qdiscs from its private array.
>
> In fact, since commit 13511704f8d7 ("net: taprio offload: enforce qdisc
> to netdev queue mapping"), taprio does this too, but just for the full
> offload case. So if we tried to attach a taprio child to a fully
> offloaded taprio root qdisc, it would properly fail too; just not to a
> software root taprio.
>
> To fix the problem, stop looking at the Qdisc that's attached to the TX
> queue, and instead, always return the default qdiscs that we've
> allocated (and to which we privately enqueue and dequeue, in software
> scheduling mode).
>
> Since Qdisc_class_ops :: leaf is only called from tc_modify_qdisc(),
> the risk of unforeseen side effects introduced by this change is
> minimal.
>
> Fixes: 5a781ccbd19e ("tc: Add support for configuring the taprio scheduler")
> Signed-off-by: Vladimir Oltean <[email protected]>

This commit was backported to v5.15-LTS which results in NULL pointer
dereferences e.g., when attaching an ETF child qdisc to taprio.

From what I can see is, that the issue was reported back then and this
commit was reverted [1]. However, the revert didn't make it into
v5.15-LTS? Is there a reason for it? I'm testing 5.15.94-rt59 here.

Thanks,
Kurt

[1] - https://lore.kernel.org/all/[email protected]/


Attachments:
signature.asc (861.00 B)

2023-02-22 14:25:16

by Vladimir Oltean

[permalink] [raw]
Subject: Re: [PATCH v2 net 2/2] net/sched: taprio: make qdisc_leaf() see the per-netdev-queue pfifo child qdiscs

+Greg, Sasha.

Hi Kurt,

On Wed, Feb 22, 2023 at 03:03:04PM +0100, Kurt Kanzenbach wrote:
> This commit was backported to v5.15-LTS which results in NULL pointer
> dereferences e.g., when attaching an ETF child qdisc to taprio.
>
> From what I can see is, that the issue was reported back then and this
> commit was reverted [1]. However, the revert didn't make it into
> v5.15-LTS? Is there a reason for it? I'm testing 5.15.94-rt59 here.
>
> Thanks,
> Kurt
>
> [1] - https://lore.kernel.org/all/[email protected]/

You are right; the patchwork-bot clearly says that the revert was
applied to net.git as commit af7b29b1deaa ("Revert "net/sched: taprio:
make qdisc_leaf() see the per-netdev-queue pfifo child qdiscs""), but
the revert never made it to stable.

OTOH, the original patch did make it to, and still is in, linux-stable.
I have backport notification emails of the original to 5.4, 5.10, 5.15, 5.19.

Greg, Sasha, can you please pick up and backport commit af7b29b1deaa
("Revert "net/sched: taprio: make qdisc_leaf() see the per-netdev-queue
pfifo child qdiscs"") to the currently maintained stable kernels?

Thanks.

2023-02-23 13:03:46

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH v2 net 2/2] net/sched: taprio: make qdisc_leaf() see the per-netdev-queue pfifo child qdiscs

On Wed, Feb 22, 2023 at 04:25:07PM +0200, Vladimir Oltean wrote:
> +Greg, Sasha.

Hint, in the future, please cc: [email protected] with stable
kernel requests, I know I don't see stuff that isn't cc:ed there
normally when working on that tree.

>
> Hi Kurt,
>
> On Wed, Feb 22, 2023 at 03:03:04PM +0100, Kurt Kanzenbach wrote:
> > This commit was backported to v5.15-LTS which results in NULL pointer
> > dereferences e.g., when attaching an ETF child qdisc to taprio.
> >
> > From what I can see is, that the issue was reported back then and this
> > commit was reverted [1]. However, the revert didn't make it into
> > v5.15-LTS? Is there a reason for it? I'm testing 5.15.94-rt59 here.
> >
> > Thanks,
> > Kurt
> >
> > [1] - https://lore.kernel.org/all/[email protected]/
>
> You are right; the patchwork-bot clearly says that the revert was
> applied to net.git as commit af7b29b1deaa ("Revert "net/sched: taprio:
> make qdisc_leaf() see the per-netdev-queue pfifo child qdiscs""), but
> the revert never made it to stable.
>
> OTOH, the original patch did make it to, and still is in, linux-stable.
> I have backport notification emails of the original to 5.4, 5.10, 5.15, 5.19.
>
> Greg, Sasha, can you please pick up and backport commit af7b29b1deaa
> ("Revert "net/sched: taprio: make qdisc_leaf() see the per-netdev-queue
> pfifo child qdiscs"") to the currently maintained stable kernels?

Now queued up, thanks.

greg k-h

2023-02-23 13:19:01

by Vladimir Oltean

[permalink] [raw]
Subject: Re: [PATCH v2 net 2/2] net/sched: taprio: make qdisc_leaf() see the per-netdev-queue pfifo child qdiscs

On Thu, Feb 23, 2023 at 02:03:17PM +0100, Greg Kroah-Hartman wrote:
> On Wed, Feb 22, 2023 at 04:25:07PM +0200, Vladimir Oltean wrote:
> > +Greg, Sasha.
>
> Hint, in the future, please cc: [email protected] with stable
> kernel requests, I know I don't see stuff that isn't cc:ed there
> normally when working on that tree.

Ok, sorry, I got ahead of myself.

> > Greg, Sasha, can you please pick up and backport commit af7b29b1deaa
> > ("Revert "net/sched: taprio: make qdisc_leaf() see the per-netdev-queue
> > pfifo child qdiscs"") to the currently maintained stable kernels?
>
> Now queued up, thanks.

Thanks! I got email notifications for 5.4, 5.10 and 5.15.