2022-09-15 10:38:47

by Vladimir Oltean

[permalink] [raw]
Subject: [PATCH v2 net 0/2] Fixes for tc-taprio software mode

While working on some new features for tc-taprio, I found some strange
behavior which looked like bugs. I was able to eventually trigger a NULL
pointer dereference. This patch set fixes 2 issues I saw. Detailed
explanation in patches.

Changes in v2: dropped patch 3/3 (will resend to net-next).

Vladimir Oltean (2):
net/sched: taprio: avoid disabling offload when it was never enabled
net/sched: taprio: make qdisc_leaf() see the per-netdev-queue pfifo
child qdiscs

net/sched/sch_taprio.c | 18 +++++++++++-------
1 file changed, 11 insertions(+), 7 deletions(-)

--
2.34.1


2022-09-15 10:52:26

by Vladimir Oltean

[permalink] [raw]
Subject: [PATCH v2 net 1/2] net/sched: taprio: avoid disabling offload when it was never enabled

In an incredibly strange API design decision, qdisc->destroy() gets
called even if qdisc->init() never succeeded, not exclusively since
commit 87b60cfacf9f ("net_sched: fix error recovery at qdisc creation"),
but apparently also earlier (in the case of qdisc_create_dflt()).

The taprio qdisc does not fully acknowledge this when it attempts full
offload, because it starts off with q->flags = TAPRIO_FLAGS_INVALID in
taprio_init(), then it replaces q->flags with TCA_TAPRIO_ATTR_FLAGS
parsed from netlink (in taprio_change(), tail called from taprio_init()).

But in taprio_destroy(), we call taprio_disable_offload(), and this
determines what to do based on FULL_OFFLOAD_IS_ENABLED(q->flags).

But looking at the implementation of FULL_OFFLOAD_IS_ENABLED()
(a bitwise check of bit 1 in q->flags), it is invalid to call this macro
on q->flags when it contains TAPRIO_FLAGS_INVALID, because that is set
to U32_MAX, and therefore FULL_OFFLOAD_IS_ENABLED() will return true on
an invalid set of flags.

As a result, it is possible to crash the kernel if user space forces an
error between setting q->flags = TAPRIO_FLAGS_INVALID, and the calling
of taprio_enable_offload(). This is because drivers do not expect the
offload to be disabled when it was never enabled.

The error that we force here is to attach taprio as a non-root qdisc,
but instead as child of an mqprio root qdisc:

$ tc qdisc add dev swp0 root handle 1: \
mqprio num_tc 8 map 0 1 2 3 4 5 6 7 \
queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 hw 0
$ tc qdisc replace dev swp0 parent 1:1 \
taprio num_tc 8 map 0 1 2 3 4 5 6 7 \
queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 base-time 0 \
sched-entry S 0x7f 990000 sched-entry S 0x80 100000 \
flags 0x0 clockid CLOCK_TAI
Unable to handle kernel paging request at virtual address fffffffffffffff8
[fffffffffffffff8] pgd=0000000000000000, p4d=0000000000000000
Internal error: Oops: 96000004 [#1] PREEMPT SMP
Call trace:
taprio_dump+0x27c/0x310
vsc9959_port_setup_tc+0x1f4/0x460
felix_port_setup_tc+0x24/0x3c
dsa_slave_setup_tc+0x54/0x27c
taprio_disable_offload.isra.0+0x58/0xe0
taprio_destroy+0x80/0x104
qdisc_create+0x240/0x470
tc_modify_qdisc+0x1fc/0x6b0
rtnetlink_rcv_msg+0x12c/0x390
netlink_rcv_skb+0x5c/0x130
rtnetlink_rcv+0x1c/0x2c

Fix this by keeping track of the operations we made, and undo the
offload only if we actually did it.

I've added "bool offloaded" inside a 4 byte hole between "int clockid"
and "atomic64_t picos_per_byte". Now the first cache line looks like
below:

$ pahole -C taprio_sched net/sched/sch_taprio.o
struct taprio_sched {
struct Qdisc * * qdiscs; /* 0 8 */
struct Qdisc * root; /* 8 8 */
u32 flags; /* 16 4 */
enum tk_offsets tk_offset; /* 20 4 */
int clockid; /* 24 4 */
bool offloaded; /* 28 1 */

/* XXX 3 bytes hole, try to pack */

atomic64_t picos_per_byte; /* 32 0 */

/* XXX 8 bytes hole, try to pack */

spinlock_t current_entry_lock; /* 40 0 */

/* XXX 8 bytes hole, try to pack */

struct sched_entry * current_entry; /* 48 8 */
struct sched_gate_list * oper_sched; /* 56 8 */
/* --- cacheline 1 boundary (64 bytes) --- */

Fixes: 9c66d1564676 ("taprio: Add support for hardware offloading")
Signed-off-by: Vladimir Oltean <[email protected]>
---
v1->v2: none

net/sched/sch_taprio.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
index db88a692ef81..a3b4f92a9937 100644
--- a/net/sched/sch_taprio.c
+++ b/net/sched/sch_taprio.c
@@ -67,6 +67,7 @@ struct taprio_sched {
u32 flags;
enum tk_offsets tk_offset;
int clockid;
+ bool offloaded;
atomic64_t picos_per_byte; /* Using picoseconds because for 10Gbps+
* speeds it's sub-nanoseconds per byte
*/
@@ -1279,6 +1280,8 @@ static int taprio_enable_offload(struct net_device *dev,
goto done;
}

+ q->offloaded = true;
+
done:
taprio_offload_free(offload);

@@ -1293,12 +1296,9 @@ static int taprio_disable_offload(struct net_device *dev,
struct tc_taprio_qopt_offload *offload;
int err;

- if (!FULL_OFFLOAD_IS_ENABLED(q->flags))
+ if (!q->offloaded)
return 0;

- if (!ops->ndo_setup_tc)
- return -EOPNOTSUPP;
-
offload = taprio_offload_alloc(0);
if (!offload) {
NL_SET_ERR_MSG(extack,
@@ -1314,6 +1314,8 @@ static int taprio_disable_offload(struct net_device *dev,
goto out;
}

+ q->offloaded = false;
+
out:
taprio_offload_free(offload);

--
2.34.1

2022-09-15 22:04:00

by Vinicius Costa Gomes

[permalink] [raw]
Subject: Re: [PATCH v2 net 1/2] net/sched: taprio: avoid disabling offload when it was never enabled

Vladimir Oltean <[email protected]> writes:

> In an incredibly strange API design decision, qdisc->destroy() gets
> called even if qdisc->init() never succeeded, not exclusively since
> commit 87b60cfacf9f ("net_sched: fix error recovery at qdisc creation"),
> but apparently also earlier (in the case of qdisc_create_dflt()).
>
> The taprio qdisc does not fully acknowledge this when it attempts full
> offload, because it starts off with q->flags = TAPRIO_FLAGS_INVALID in
> taprio_init(), then it replaces q->flags with TCA_TAPRIO_ATTR_FLAGS
> parsed from netlink (in taprio_change(), tail called from taprio_init()).
>
> But in taprio_destroy(), we call taprio_disable_offload(), and this
> determines what to do based on FULL_OFFLOAD_IS_ENABLED(q->flags).
>
> But looking at the implementation of FULL_OFFLOAD_IS_ENABLED()
> (a bitwise check of bit 1 in q->flags), it is invalid to call this macro
> on q->flags when it contains TAPRIO_FLAGS_INVALID, because that is set
> to U32_MAX, and therefore FULL_OFFLOAD_IS_ENABLED() will return true on
> an invalid set of flags.
>
> As a result, it is possible to crash the kernel if user space forces an
> error between setting q->flags = TAPRIO_FLAGS_INVALID, and the calling
> of taprio_enable_offload(). This is because drivers do not expect the
> offload to be disabled when it was never enabled.
>
> The error that we force here is to attach taprio as a non-root qdisc,
> but instead as child of an mqprio root qdisc:
>
> $ tc qdisc add dev swp0 root handle 1: \
> mqprio num_tc 8 map 0 1 2 3 4 5 6 7 \
> queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 hw 0
> $ tc qdisc replace dev swp0 parent 1:1 \
> taprio num_tc 8 map 0 1 2 3 4 5 6 7 \
> queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 base-time 0 \
> sched-entry S 0x7f 990000 sched-entry S 0x80 100000 \
> flags 0x0 clockid CLOCK_TAI
> Unable to handle kernel paging request at virtual address fffffffffffffff8
> [fffffffffffffff8] pgd=0000000000000000, p4d=0000000000000000
> Internal error: Oops: 96000004 [#1] PREEMPT SMP
> Call trace:
> taprio_dump+0x27c/0x310
> vsc9959_port_setup_tc+0x1f4/0x460
> felix_port_setup_tc+0x24/0x3c
> dsa_slave_setup_tc+0x54/0x27c
> taprio_disable_offload.isra.0+0x58/0xe0
> taprio_destroy+0x80/0x104
> qdisc_create+0x240/0x470
> tc_modify_qdisc+0x1fc/0x6b0
> rtnetlink_rcv_msg+0x12c/0x390
> netlink_rcv_skb+0x5c/0x130
> rtnetlink_rcv+0x1c/0x2c
>
> Fix this by keeping track of the operations we made, and undo the
> offload only if we actually did it.
>
> I've added "bool offloaded" inside a 4 byte hole between "int clockid"
> and "atomic64_t picos_per_byte". Now the first cache line looks like
> below:
>
> $ pahole -C taprio_sched net/sched/sch_taprio.o
> struct taprio_sched {
> struct Qdisc * * qdiscs; /* 0 8 */
> struct Qdisc * root; /* 8 8 */
> u32 flags; /* 16 4 */
> enum tk_offsets tk_offset; /* 20 4 */
> int clockid; /* 24 4 */
> bool offloaded; /* 28 1 */
>
> /* XXX 3 bytes hole, try to pack */
>
> atomic64_t picos_per_byte; /* 32 0 */
>
> /* XXX 8 bytes hole, try to pack */
>
> spinlock_t current_entry_lock; /* 40 0 */
>
> /* XXX 8 bytes hole, try to pack */
>
> struct sched_entry * current_entry; /* 48 8 */
> struct sched_gate_list * oper_sched; /* 56 8 */
> /* --- cacheline 1 boundary (64 bytes) --- */
>
> Fixes: 9c66d1564676 ("taprio: Add support for hardware offloading")
> Signed-off-by: Vladimir Oltean <[email protected]>
> ---

Reviewed-by: Vinicius Costa Gomes <[email protected]>


Cheers,
--
Vinicius

2022-09-20 19:48:48

by patchwork-bot+netdevbpf

[permalink] [raw]
Subject: Re: [PATCH v2 net 0/2] Fixes for tc-taprio software mode

Hello:

This series was applied to netdev/net.git (master)
by Jakub Kicinski <[email protected]>:

On Thu, 15 Sep 2022 13:08:00 +0300 you wrote:
> While working on some new features for tc-taprio, I found some strange
> behavior which looked like bugs. I was able to eventually trigger a NULL
> pointer dereference. This patch set fixes 2 issues I saw. Detailed
> explanation in patches.
>
> Changes in v2: dropped patch 3/3 (will resend to net-next).
>
> [...]

Here is the summary with links:
- [v2,net,1/2] net/sched: taprio: avoid disabling offload when it was never enabled
https://git.kernel.org/netdev/net/c/db46e3a88a09
- [v2,net,2/2] net/sched: taprio: make qdisc_leaf() see the per-netdev-queue pfifo child qdiscs
https://git.kernel.org/netdev/net/c/1461d212ab27

You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html