2022-11-13 17:25:06

by Hawkins Jiawei

[permalink] [raw]
Subject: [PATCH v2] net: sched: fix memory leak in tcindex_set_parms

Syzkaller reports a memory leak as follows:
====================================
BUG: memory leak
unreferenced object 0xffff88810c287f00 (size 256):
comm "syz-executor105", pid 3600, jiffies 4294943292 (age 12.990s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<ffffffff814cf9f0>] kmalloc_trace+0x20/0x90 mm/slab_common.c:1046
[<ffffffff839c9e07>] kmalloc include/linux/slab.h:576 [inline]
[<ffffffff839c9e07>] kmalloc_array include/linux/slab.h:627 [inline]
[<ffffffff839c9e07>] kcalloc include/linux/slab.h:659 [inline]
[<ffffffff839c9e07>] tcf_exts_init include/net/pkt_cls.h:250 [inline]
[<ffffffff839c9e07>] tcindex_set_parms+0xa7/0xbe0 net/sched/cls_tcindex.c:342
[<ffffffff839caa1f>] tcindex_change+0xdf/0x120 net/sched/cls_tcindex.c:553
[<ffffffff8394db62>] tc_new_tfilter+0x4f2/0x1100 net/sched/cls_api.c:2147
[<ffffffff8389e91c>] rtnetlink_rcv_msg+0x4dc/0x5d0 net/core/rtnetlink.c:6082
[<ffffffff839eba67>] netlink_rcv_skb+0x87/0x1d0 net/netlink/af_netlink.c:2540
[<ffffffff839eab87>] netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
[<ffffffff839eab87>] netlink_unicast+0x397/0x4c0 net/netlink/af_netlink.c:1345
[<ffffffff839eb046>] netlink_sendmsg+0x396/0x710 net/netlink/af_netlink.c:1921
[<ffffffff8383e796>] sock_sendmsg_nosec net/socket.c:714 [inline]
[<ffffffff8383e796>] sock_sendmsg+0x56/0x80 net/socket.c:734
[<ffffffff8383eb08>] ____sys_sendmsg+0x178/0x410 net/socket.c:2482
[<ffffffff83843678>] ___sys_sendmsg+0xa8/0x110 net/socket.c:2536
[<ffffffff838439c5>] __sys_sendmmsg+0x105/0x330 net/socket.c:2622
[<ffffffff83843c14>] __do_sys_sendmmsg net/socket.c:2651 [inline]
[<ffffffff83843c14>] __se_sys_sendmmsg net/socket.c:2648 [inline]
[<ffffffff83843c14>] __x64_sys_sendmmsg+0x24/0x30 net/socket.c:2648
[<ffffffff84605fd5>] do_syscall_x64 arch/x86/entry/common.c:50 [inline]
[<ffffffff84605fd5>] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
[<ffffffff84800087>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
====================================

Kernel uses tcindex_change() to change an existing
traffic-control-indices filter properties. During the
process of changing, kernel clears the old
traffic-control-indices filter result, and updates it
by RCU assigning new traffic-control-indices data.

Yet the problem is that, kernel clears the old
traffic-control-indices filter result, without destroying
its tcf_exts structure, which triggers the above
memory leak.

This patch solves it by using tcf_exts_destroy() to
destroy the tcf_exts structure in old
traffic-control-indices filter result, after the
RCU grace period.

[Thanks to the suggestion from Jakub Kicinski and Cong Wang]

Fixes: b9a24bb76bf6 ("net_sched: properly handle failure case of tcf_exts_init()")
Link: https://lore.kernel.org/all/[email protected]/
Reported-by: [email protected]
Tested-by: [email protected]
Cc: Cong Wang <[email protected]>
Cc: Jakub Kicinski <[email protected]>
Signed-off-by: Hawkins Jiawei <[email protected]>
---
v2:
- remove all 'will' in commit message according to Jakub Kicinski
- add Fixes tag according to Jakub Kicinski
- remove all ifdefs according to Jakub Kicinski and Cong Wang
- add synchronize_rcu() before destorying old_e according to
Cong Wang

v1: https://lore.kernel.org/all/[email protected]/
net/sched/cls_tcindex.c | 8 ++++++++
1 file changed, 8 insertions(+)

diff --git a/net/sched/cls_tcindex.c b/net/sched/cls_tcindex.c
index 1c9eeb98d826..d2fac9559d3e 100644
--- a/net/sched/cls_tcindex.c
+++ b/net/sched/cls_tcindex.c
@@ -338,6 +338,7 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base,
struct tcf_result cr = {};
int err, balloc = 0;
struct tcf_exts e;
+ struct tcf_exts old_e = {};

err = tcf_exts_init(&e, net, TCA_TCINDEX_ACT, TCA_TCINDEX_POLICE);
if (err < 0)
@@ -479,6 +480,7 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base,
}

if (old_r && old_r != r) {
+ old_e = old_r->exts;
err = tcindex_filter_result_init(old_r, cp, net);
if (err < 0) {
kfree(f);
@@ -510,6 +512,12 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base,
tcf_exts_destroy(&new_filter_result.exts);
}

+ /* Note: old_e should be destroyed after the RCU grace period,
+ * to avoid possible use-after-free by concurrent readers.
+ */
+ synchronize_rcu();
+ tcf_exts_destroy(&old_e);
+
if (oldp)
tcf_queue_work(&oldp->rwork, tcindex_partial_destroy_work);
return 0;
--
2.25.1



2022-11-15 11:58:20

by Paolo Abeni

[permalink] [raw]
Subject: Re: [PATCH v2] net: sched: fix memory leak in tcindex_set_parms

On Mon, 2022-11-14 at 01:05 +0800, Hawkins Jiawei wrote:
> Syzkaller reports a memory leak as follows:
> ====================================
> BUG: memory leak
> unreferenced object 0xffff88810c287f00 (size 256):
> comm "syz-executor105", pid 3600, jiffies 4294943292 (age 12.990s)
> hex dump (first 32 bytes):
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
> backtrace:
> [<ffffffff814cf9f0>] kmalloc_trace+0x20/0x90 mm/slab_common.c:1046
> [<ffffffff839c9e07>] kmalloc include/linux/slab.h:576 [inline]
> [<ffffffff839c9e07>] kmalloc_array include/linux/slab.h:627 [inline]
> [<ffffffff839c9e07>] kcalloc include/linux/slab.h:659 [inline]
> [<ffffffff839c9e07>] tcf_exts_init include/net/pkt_cls.h:250 [inline]
> [<ffffffff839c9e07>] tcindex_set_parms+0xa7/0xbe0 net/sched/cls_tcindex.c:342
> [<ffffffff839caa1f>] tcindex_change+0xdf/0x120 net/sched/cls_tcindex.c:553
> [<ffffffff8394db62>] tc_new_tfilter+0x4f2/0x1100 net/sched/cls_api.c:2147
> [<ffffffff8389e91c>] rtnetlink_rcv_msg+0x4dc/0x5d0 net/core/rtnetlink.c:6082
> [<ffffffff839eba67>] netlink_rcv_skb+0x87/0x1d0 net/netlink/af_netlink.c:2540
> [<ffffffff839eab87>] netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
> [<ffffffff839eab87>] netlink_unicast+0x397/0x4c0 net/netlink/af_netlink.c:1345
> [<ffffffff839eb046>] netlink_sendmsg+0x396/0x710 net/netlink/af_netlink.c:1921
> [<ffffffff8383e796>] sock_sendmsg_nosec net/socket.c:714 [inline]
> [<ffffffff8383e796>] sock_sendmsg+0x56/0x80 net/socket.c:734
> [<ffffffff8383eb08>] ____sys_sendmsg+0x178/0x410 net/socket.c:2482
> [<ffffffff83843678>] ___sys_sendmsg+0xa8/0x110 net/socket.c:2536
> [<ffffffff838439c5>] __sys_sendmmsg+0x105/0x330 net/socket.c:2622
> [<ffffffff83843c14>] __do_sys_sendmmsg net/socket.c:2651 [inline]
> [<ffffffff83843c14>] __se_sys_sendmmsg net/socket.c:2648 [inline]
> [<ffffffff83843c14>] __x64_sys_sendmmsg+0x24/0x30 net/socket.c:2648
> [<ffffffff84605fd5>] do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> [<ffffffff84605fd5>] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
> [<ffffffff84800087>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
> ====================================
>
> Kernel uses tcindex_change() to change an existing
> traffic-control-indices filter properties. During the
> process of changing, kernel clears the old
> traffic-control-indices filter result, and updates it
> by RCU assigning new traffic-control-indices data.
>
> Yet the problem is that, kernel clears the old
> traffic-control-indices filter result, without destroying
> its tcf_exts structure, which triggers the above
> memory leak.
>
> This patch solves it by using tcf_exts_destroy() to
> destroy the tcf_exts structure in old
> traffic-control-indices filter result, after the
> RCU grace period.
>
> [Thanks to the suggestion from Jakub Kicinski and Cong Wang]
>
> Fixes: b9a24bb76bf6 ("net_sched: properly handle failure case of tcf_exts_init()")
> Link: https://lore.kernel.org/all/[email protected]/
> Reported-by: [email protected]
> Tested-by: [email protected]
> Cc: Cong Wang <[email protected]>
> Cc: Jakub Kicinski <[email protected]>
> Signed-off-by: Hawkins Jiawei <[email protected]>
> ---
> v2:
> - remove all 'will' in commit message according to Jakub Kicinski
> - add Fixes tag according to Jakub Kicinski
> - remove all ifdefs according to Jakub Kicinski and Cong Wang
> - add synchronize_rcu() before destorying old_e according to
> Cong Wang
>
> v1: https://lore.kernel.org/all/[email protected]/
> net/sched/cls_tcindex.c | 8 ++++++++
> 1 file changed, 8 insertions(+)
>
> diff --git a/net/sched/cls_tcindex.c b/net/sched/cls_tcindex.c
> index 1c9eeb98d826..d2fac9559d3e 100644
> --- a/net/sched/cls_tcindex.c
> +++ b/net/sched/cls_tcindex.c
> @@ -338,6 +338,7 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base,
> struct tcf_result cr = {};
> int err, balloc = 0;
> struct tcf_exts e;
> + struct tcf_exts old_e = {};
>
> err = tcf_exts_init(&e, net, TCA_TCINDEX_ACT, TCA_TCINDEX_POLICE);
> if (err < 0)
> @@ -479,6 +480,7 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base,
> }
>
> if (old_r && old_r != r) {
> + old_e = old_r->exts;
> err = tcindex_filter_result_init(old_r, cp, net);
> if (err < 0) {
> kfree(f);
> @@ -510,6 +512,12 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base,
> tcf_exts_destroy(&new_filter_result.exts);
> }
>
> + /* Note: old_e should be destroyed after the RCU grace period,
> + * to avoid possible use-after-free by concurrent readers.
> + */
> + synchronize_rcu();

this could make tc reconfiguration potentially very slow. I'm wondering
if we can delegate the tcf_exts_destroy() to some workqueue?

Thanks!

Paolo


2022-11-15 13:08:32

by Dmitry Vyukov

[permalink] [raw]
Subject: Re: [PATCH v2] net: sched: fix memory leak in tcindex_set_parms

On Tue, 15 Nov 2022 at 12:36, Paolo Abeni <[email protected]> wrote:
>
> On Mon, 2022-11-14 at 01:05 +0800, Hawkins Jiawei wrote:
> > Syzkaller reports a memory leak as follows:
> > ====================================
> > BUG: memory leak
> > unreferenced object 0xffff88810c287f00 (size 256):
> > comm "syz-executor105", pid 3600, jiffies 4294943292 (age 12.990s)
> > hex dump (first 32 bytes):
> > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
> > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
> > backtrace:
> > [<ffffffff814cf9f0>] kmalloc_trace+0x20/0x90 mm/slab_common.c:1046
> > [<ffffffff839c9e07>] kmalloc include/linux/slab.h:576 [inline]
> > [<ffffffff839c9e07>] kmalloc_array include/linux/slab.h:627 [inline]
> > [<ffffffff839c9e07>] kcalloc include/linux/slab.h:659 [inline]
> > [<ffffffff839c9e07>] tcf_exts_init include/net/pkt_cls.h:250 [inline]
> > [<ffffffff839c9e07>] tcindex_set_parms+0xa7/0xbe0 net/sched/cls_tcindex.c:342
> > [<ffffffff839caa1f>] tcindex_change+0xdf/0x120 net/sched/cls_tcindex.c:553
> > [<ffffffff8394db62>] tc_new_tfilter+0x4f2/0x1100 net/sched/cls_api.c:2147
> > [<ffffffff8389e91c>] rtnetlink_rcv_msg+0x4dc/0x5d0 net/core/rtnetlink.c:6082
> > [<ffffffff839eba67>] netlink_rcv_skb+0x87/0x1d0 net/netlink/af_netlink.c:2540
> > [<ffffffff839eab87>] netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
> > [<ffffffff839eab87>] netlink_unicast+0x397/0x4c0 net/netlink/af_netlink.c:1345
> > [<ffffffff839eb046>] netlink_sendmsg+0x396/0x710 net/netlink/af_netlink.c:1921
> > [<ffffffff8383e796>] sock_sendmsg_nosec net/socket.c:714 [inline]
> > [<ffffffff8383e796>] sock_sendmsg+0x56/0x80 net/socket.c:734
> > [<ffffffff8383eb08>] ____sys_sendmsg+0x178/0x410 net/socket.c:2482
> > [<ffffffff83843678>] ___sys_sendmsg+0xa8/0x110 net/socket.c:2536
> > [<ffffffff838439c5>] __sys_sendmmsg+0x105/0x330 net/socket.c:2622
> > [<ffffffff83843c14>] __do_sys_sendmmsg net/socket.c:2651 [inline]
> > [<ffffffff83843c14>] __se_sys_sendmmsg net/socket.c:2648 [inline]
> > [<ffffffff83843c14>] __x64_sys_sendmmsg+0x24/0x30 net/socket.c:2648
> > [<ffffffff84605fd5>] do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> > [<ffffffff84605fd5>] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
> > [<ffffffff84800087>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
> > ====================================
> >
> > Kernel uses tcindex_change() to change an existing
> > traffic-control-indices filter properties. During the
> > process of changing, kernel clears the old
> > traffic-control-indices filter result, and updates it
> > by RCU assigning new traffic-control-indices data.
> >
> > Yet the problem is that, kernel clears the old
> > traffic-control-indices filter result, without destroying
> > its tcf_exts structure, which triggers the above
> > memory leak.
> >
> > This patch solves it by using tcf_exts_destroy() to
> > destroy the tcf_exts structure in old
> > traffic-control-indices filter result, after the
> > RCU grace period.
> >
> > [Thanks to the suggestion from Jakub Kicinski and Cong Wang]
> >
> > Fixes: b9a24bb76bf6 ("net_sched: properly handle failure case of tcf_exts_init()")
> > Link: https://lore.kernel.org/all/[email protected]/
> > Reported-by: [email protected]
> > Tested-by: [email protected]
> > Cc: Cong Wang <[email protected]>
> > Cc: Jakub Kicinski <[email protected]>
> > Signed-off-by: Hawkins Jiawei <[email protected]>
> > ---
> > v2:
> > - remove all 'will' in commit message according to Jakub Kicinski
> > - add Fixes tag according to Jakub Kicinski
> > - remove all ifdefs according to Jakub Kicinski and Cong Wang
> > - add synchronize_rcu() before destorying old_e according to
> > Cong Wang
> >
> > v1: https://lore.kernel.org/all/[email protected]/
> > net/sched/cls_tcindex.c | 8 ++++++++
> > 1 file changed, 8 insertions(+)
> >
> > diff --git a/net/sched/cls_tcindex.c b/net/sched/cls_tcindex.c
> > index 1c9eeb98d826..d2fac9559d3e 100644
> > --- a/net/sched/cls_tcindex.c
> > +++ b/net/sched/cls_tcindex.c
> > @@ -338,6 +338,7 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base,
> > struct tcf_result cr = {};
> > int err, balloc = 0;
> > struct tcf_exts e;
> > + struct tcf_exts old_e = {};
> >
> > err = tcf_exts_init(&e, net, TCA_TCINDEX_ACT, TCA_TCINDEX_POLICE);
> > if (err < 0)
> > @@ -479,6 +480,7 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base,
> > }
> >
> > if (old_r && old_r != r) {
> > + old_e = old_r->exts;
> > err = tcindex_filter_result_init(old_r, cp, net);
> > if (err < 0) {
> > kfree(f);
> > @@ -510,6 +512,12 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base,
> > tcf_exts_destroy(&new_filter_result.exts);
> > }
> >
> > + /* Note: old_e should be destroyed after the RCU grace period,
> > + * to avoid possible use-after-free by concurrent readers.
> > + */
> > + synchronize_rcu();
>
> this could make tc reconfiguration potentially very slow. I'm wondering
> if we can delegate the tcf_exts_destroy() to some workqueue?

call_rcu?

2022-11-15 17:30:36

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [PATCH v2] net: sched: fix memory leak in tcindex_set_parms

On Mon, 14 Nov 2022 01:05:08 +0800 Hawkins Jiawei wrote:
> diff --git a/net/sched/cls_tcindex.c b/net/sched/cls_tcindex.c
> index 1c9eeb98d826..d2fac9559d3e 100644
> --- a/net/sched/cls_tcindex.c
> +++ b/net/sched/cls_tcindex.c
> @@ -338,6 +338,7 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base,
> struct tcf_result cr = {};
> int err, balloc = 0;
> struct tcf_exts e;
> + struct tcf_exts old_e = {};

This is not a valid way of initializing a structure.
tcf_exts_init() is supposed to be called.
If we add a list member to that structure this code will break, again.

> err = tcf_exts_init(&e, net, TCA_TCINDEX_ACT, TCA_TCINDEX_POLICE);
> if (err < 0)
> @@ -479,6 +480,7 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base,
> }
>
> if (old_r && old_r != r) {
> + old_e = old_r->exts;
> err = tcindex_filter_result_init(old_r, cp, net);
> if (err < 0) {
> kfree(f);
> @@ -510,6 +512,12 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base,
> tcf_exts_destroy(&new_filter_result.exts);
> }
>
> + /* Note: old_e should be destroyed after the RCU grace period,
> + * to avoid possible use-after-free by concurrent readers.
> + */
> + synchronize_rcu();
> + tcf_exts_destroy(&old_e);

I don't think this dance is required, @cp is a copy of the original
data, and the original (@p) is destroyed in a safe manner below.

> if (oldp)
> tcf_queue_work(&oldp->rwork, tcindex_partial_destroy_work);
> return 0;

2022-11-15 19:18:24

by Paolo Abeni

[permalink] [raw]
Subject: Re: [PATCH v2] net: sched: fix memory leak in tcindex_set_parms

On Tue, 2022-11-15 at 09:02 -0800, Jakub Kicinski wrote:
> On Mon, 14 Nov 2022 01:05:08 +0800 Hawkins Jiawei wrote:
>
> > @@ -479,6 +480,7 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base,
> > }
> >
> > if (old_r && old_r != r) {
> > + old_e = old_r->exts;
> > err = tcindex_filter_result_init(old_r, cp, net);
> > if (err < 0) {
> > kfree(f);
> > @@ -510,6 +512,12 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base,
> > tcf_exts_destroy(&new_filter_result.exts);
> > }
> >
> > + /* Note: old_e should be destroyed after the RCU grace period,
> > + * to avoid possible use-after-free by concurrent readers.
> > + */
> > + synchronize_rcu();
> > + tcf_exts_destroy(&old_e);
>
> I don't think this dance is required, @cp is a copy of the original
> data, and the original (@p) is destroyed in a safe manner below.

This code confuses me more than a bit, and I don't follow ?!? it looks
like that at this point:

* the data path could access 'old_r->exts' contents via 'p' just before
the previous 'tcindex_filter_result_init(old_r, cp, net);' but still
potentially within the same RCU grace period

* 'tcindex_filter_result_init(old_r, cp, net);' has 'unlinked' the old
exts from 'p' so that will not be freed by later
tcindex_partial_destroy_work() 

Overall it looks to me that we need some somewhat wait for the RCU
grace period,

Somewhat side question: it looks like that the 'perfect hashing' usage
is the root cause of the issue addressed here, and very likely is
afflicted by other problems, e.g. the data curruption in 'err =
tcindex_filter_result_init(old_r, cp, net);'.

AFAICS 'perfect hashing' usage is a sort of optimization that the user-
space may trigger with some combination of the tcindex arguments. I'm
wondering if we could drop all perfect hashing related code?

Paolo


2022-11-16 03:15:04

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [PATCH v2] net: sched: fix memory leak in tcindex_set_parms

On Tue, 15 Nov 2022 19:57:10 +0100 Paolo Abeni wrote:
> This code confuses me more than a bit, and I don't follow ?!?

It's very confusing :S

For starters I don't know when r != old_r. I mean now it triggers
randomly after the RCU-ification, but in the original code when
it was just a memset(). When would old_r ever not be null and yet
point to a different entry?

> it looks like that at this point:
>
> * the data path could access 'old_r->exts' contents via 'p' just before
> the previous 'tcindex_filter_result_init(old_r, cp, net);' but still
> potentially within the same RCU grace period
>
> * 'tcindex_filter_result_init(old_r, cp, net);' has 'unlinked' the old
> exts from 'p' so that will not be freed by later
> tcindex_partial_destroy_work() 
>
> Overall it looks to me that we need some somewhat wait for the RCU
> grace period,

Isn't it better to make @cp a deeper copy of @p ?
I thought it already is but we don't seem to be cloning p->h.
Also the cloning of p->perfect looks quite lossy.

> Somewhat side question: it looks like that the 'perfect hashing' usage
> is the root cause of the issue addressed here, and very likely is
> afflicted by other problems, e.g. the data curruption in 'err =
> tcindex_filter_result_init(old_r, cp, net);'.
>
> AFAICS 'perfect hashing' usage is a sort of optimization that the user-
> space may trigger with some combination of the tcindex arguments. I'm
> wondering if we could drop all perfect hashing related code?

The thought of "how much of this can we delete" did cross my mind :)

2022-11-16 07:52:51

by Hawkins Jiawei

[permalink] [raw]
Subject: Re: [PATCH v2] net: sched: fix memory leak in tcindex_set_parms

Hi,

On Wed, 16 Nov 2022 at 01:02, Jakub Kicinski <[email protected]> wrote:
>
> On Mon, 14 Nov 2022 01:05:08 +0800 Hawkins Jiawei wrote:
> > diff --git a/net/sched/cls_tcindex.c b/net/sched/cls_tcindex.c
> > index 1c9eeb98d826..d2fac9559d3e 100644
> > --- a/net/sched/cls_tcindex.c
> > +++ b/net/sched/cls_tcindex.c
> > @@ -338,6 +338,7 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base,
> >       struct tcf_result cr = {};
> >       int err, balloc = 0;
> >       struct tcf_exts e;
> > +     struct tcf_exts old_e = {};
>
> This is not a valid way of initializing a structure.
> tcf_exts_init() is supposed to be called.
> If we add a list member to that structure this code will break, again.

Yes, you are right. But the `old_e` variable here is used only for freeing
old_r->exts resource, `old_e` will be overwritten by old_r->exts content
as follows:

struct tcf_exts old_e = {};
...

if (old_r && old_r != r) {
        old_e = old_r->exts;
...
}
...
synchronize_rcu();
tcf_exts_destroy(&old_e);

So this patch uses `struct tcf_exts old_e = {}` here just for a cleared space.

2022-11-16 13:22:00

by Hawkins Jiawei

[permalink] [raw]
Subject: Re: [PATCH v2] net: sched: fix memory leak in tcindex_set_parms

On Wed, 16 Nov 2022 at 10:44, Jakub Kicinski <[email protected]> wrote:
>
> On Tue, 15 Nov 2022 19:57:10 +0100 Paolo Abeni wrote:
> > This code confuses me more than a bit, and I don't follow ?!?
>
> It's very confusing :S
>
> For starters I don't know when r != old_r. I mean now it triggers
> randomly after the RCU-ification, but in the original code when
> it was just a memset(). When would old_r ever not be null and yet
> point to a different entry?

I am also confused about the code when I tried to fix this bug.

As for when `old_r != r`, according to the simplified
code below, this should be probably true if `p->perfect` is true
or `!p->perfect && !pc->h` is true(please correct me if I am wrong)

struct tcindex_filter_result new_filter_result, *old_r = r;
struct tcindex_data *cp = NULL, *oldp;
struct tcf_result cr = {};

/* tcindex_data attributes must look atomic to classifier/lookup so
* allocate new tcindex data and RCU assign it onto root. Keeping
* perfect hash and hash pointers from old data.
*/
cp = kzalloc(sizeof(*cp), GFP_KERNEL);

if (p->perfect) {
if (tcindex_alloc_perfect_hash(net, cp) < 0)
goto errout;
...
}
cp->h = p->h;

if (!cp->perfect && !cp->h) {
if (valid_perfect_hash(cp)) {
if (tcindex_alloc_perfect_hash(net, cp) < 0)
goto errout_alloc;

} else {
struct tcindex_filter __rcu **hash;

hash = kcalloc(cp->hash,
sizeof(struct tcindex_filter *),
GFP_KERNEL);

if (!hash)
goto errout_alloc;

cp->h = hash;
}
}
...

if (cp->perfect)
r = cp->perfect + handle;
else
r = tcindex_lookup(cp, handle) ? : &new_filter_result;

if (old_r && old_r != r) {
err = tcindex_filter_result_init(old_r, cp, net);
if (err < 0) {
kfree(f);
goto errout_alloc;
}
}

* If `p->perfect` is true, tcindex_alloc_perfect_hash() newly
alloctes cp->perfect.

* If `!p->perfect && !p->h` is true, cp->perfect or cp->h is
newly allocated.

In either case, r probably points to the newly allocated memory,
which should not equals to the old_r.

>
> > it looks like that at this point:
> >
> > * the data path could access 'old_r->exts' contents via 'p' just before
> > the previous 'tcindex_filter_result_init(old_r, cp, net);' but still
> > potentially within the same RCU grace period
> >
> > * 'tcindex_filter_result_init(old_r, cp, net);' has 'unlinked' the old
> > exts from 'p' so that will not be freed by later
> > tcindex_partial_destroy_work()
> >
> > Overall it looks to me that we need some somewhat wait for the RCU
> > grace period,
>
> Isn't it better to make @cp a deeper copy of @p ?
> I thought it already is but we don't seem to be cloning p->h.
> Also the cloning of p->perfect looks quite lossy.

Yes, I also think @cp should be a deeper copy of @p.

But it seems that in tcindex_alloc_perfect_hash(),
each @cp ->exts will be initialized by tcf_exts_init()
as below, and tcindex_set_parms() forgets to free the
old ->exts content, triggering this memory leak.(Please
correct me if I am wrong)

static int tcindex_alloc_perfect_hash(struct net *net,
struct tcindex_data *cp)
{
int i, err = 0;

cp->perfect = kcalloc(cp->hash, sizeof(struct tcindex_filter_result),
GFP_KERNEL | __GFP_NOWARN);

for (i = 0; i < cp->hash; i++) {
err = tcf_exts_init(&cp->perfect[i].exts, net,
TCA_TCINDEX_ACT, TCA_TCINDEX_POLICE);
if (err < 0)
goto errout;
cp->perfect[i].p = cp;
}
}

static inline int tcf_exts_init(struct tcf_exts *exts, struct net *net,
int action, int police)
{
#ifdef CONFIG_NET_CLS_ACT
exts->type = 0;
exts->nr_actions = 0;
/* Note: we do not own yet a reference on net.
* This reference might be taken later from tcf_exts_get_net().
*/
exts->net = net;
exts->actions = kcalloc(TCA_ACT_MAX_PRIO, sizeof(struct tc_action *),
GFP_KERNEL);
if (!exts->actions)
return -ENOMEM;
#endif
exts->action = action;
exts->police = police;
return 0;
}

>
> > Somewhat side question: it looks like that the 'perfect hashing' usage
> > is the root cause of the issue addressed here, and very likely is
> > afflicted by other problems, e.g. the data curruption in 'err =
> > tcindex_filter_result_init(old_r, cp, net);'.
> >
> > AFAICS 'perfect hashing' usage is a sort of optimization that the user-
> > space may trigger with some combination of the tcindex arguments. I'm
> > wondering if we could drop all perfect hashing related code?
>
> The thought of "how much of this can we delete" did cross my mind :)

2022-11-16 13:44:48

by Hawkins Jiawei

[permalink] [raw]
Subject: Re: [PATCH v2] net: sched: fix memory leak in tcindex_set_parms

On Wed, 16 Nov 2022 at 20:10, Hawkins Jiawei <[email protected]> wrote:
>
> On Wed, 16 Nov 2022 at 10:44, Jakub Kicinski <[email protected]> wrote:
> >
> > On Tue, 15 Nov 2022 19:57:10 +0100 Paolo Abeni wrote:
> > > This code confuses me more than a bit, and I don't follow ?!?
> >
> > It's very confusing :S
> >
> > For starters I don't know when r != old_r. I mean now it triggers
> > randomly after the RCU-ification, but in the original code when
> > it was just a memset(). When would old_r ever not be null and yet
> > point to a different entry?
>
> I am also confused about the code when I tried to fix this bug.
>
> As for when `old_r != r`, according to the simplified
> code below, this should be probably true if `p->perfect` is true
> or `!p->perfect && !pc->h` is true(please correct me if I am wrong)
>
> struct tcindex_filter_result new_filter_result, *old_r = r;
> struct tcindex_data *cp = NULL, *oldp;
> struct tcf_result cr = {};
>
> /* tcindex_data attributes must look atomic to classifier/lookup so
> * allocate new tcindex data and RCU assign it onto root. Keeping
> * perfect hash and hash pointers from old data.
> */
> cp = kzalloc(sizeof(*cp), GFP_KERNEL);
>
> if (p->perfect) {
> if (tcindex_alloc_perfect_hash(net, cp) < 0)
> goto errout;
> ...
> }
> cp->h = p->h;
>
> if (!cp->perfect && !cp->h) {
> if (valid_perfect_hash(cp)) {
> if (tcindex_alloc_perfect_hash(net, cp) < 0)
> goto errout_alloc;
>
> } else {
> struct tcindex_filter __rcu **hash;
>
> hash = kcalloc(cp->hash,
> sizeof(struct tcindex_filter *),
> GFP_KERNEL);
>
> if (!hash)
> goto errout_alloc;
>
> cp->h = hash;
> }
> }
> ...
>
> if (cp->perfect)
> r = cp->perfect + handle;
> else
> r = tcindex_lookup(cp, handle) ? : &new_filter_result;
>
> if (old_r && old_r != r) {
> err = tcindex_filter_result_init(old_r, cp, net);
> if (err < 0) {
> kfree(f);
> goto errout_alloc;
> }
> }
>
> * If `p->perfect` is true, tcindex_alloc_perfect_hash() newly
> alloctes cp->perfect.
>
> * If `!p->perfect && !p->h` is true, cp->perfect or cp->h is
> newly allocated.
>
> In either case, r probably points to the newly allocated memory,
> which should not equals to the old_r.

Sorry for the error. In the second case, `r` is possibly
pointing to the `&new_filter_result`, which is a stack variable
address, and should still not equal to the `old_r`.

>
> >
> > > it looks like that at this point:
> > >
> > > * the data path could access 'old_r->exts' contents via 'p' just before
> > > the previous 'tcindex_filter_result_init(old_r, cp, net);' but still
> > > potentially within the same RCU grace period
> > >
> > > * 'tcindex_filter_result_init(old_r, cp, net);' has 'unlinked' the old
> > > exts from 'p' so that will not be freed by later
> > > tcindex_partial_destroy_work()
> > >
> > > Overall it looks to me that we need some somewhat wait for the RCU
> > > grace period,
> >
> > Isn't it better to make @cp a deeper copy of @p ?
> > I thought it already is but we don't seem to be cloning p->h.
> > Also the cloning of p->perfect looks quite lossy.
>
> Yes, I also think @cp should be a deeper copy of @p.
>
> But it seems that in tcindex_alloc_perfect_hash(),
> each @cp ->exts will be initialized by tcf_exts_init()
> as below, and tcindex_set_parms() forgets to free the
> old ->exts content, triggering this memory leak.(Please
> correct me if I am wrong)
>
> static int tcindex_alloc_perfect_hash(struct net *net,
> struct tcindex_data *cp)
> {
> int i, err = 0;
>
> cp->perfect = kcalloc(cp->hash, sizeof(struct tcindex_filter_result),
> GFP_KERNEL | __GFP_NOWARN);
>
> for (i = 0; i < cp->hash; i++) {
> err = tcf_exts_init(&cp->perfect[i].exts, net,
> TCA_TCINDEX_ACT, TCA_TCINDEX_POLICE);
> if (err < 0)
> goto errout;
> cp->perfect[i].p = cp;
> }
> }
>
> static inline int tcf_exts_init(struct tcf_exts *exts, struct net *net,
> int action, int police)
> {
> #ifdef CONFIG_NET_CLS_ACT
> exts->type = 0;
> exts->nr_actions = 0;
> /* Note: we do not own yet a reference on net.
> * This reference might be taken later from tcf_exts_get_net().
> */
> exts->net = net;
> exts->actions = kcalloc(TCA_ACT_MAX_PRIO, sizeof(struct tc_action *),
> GFP_KERNEL);
> if (!exts->actions)
> return -ENOMEM;
> #endif
> exts->action = action;
> exts->police = police;
> return 0;
> }
>
> >
> > > Somewhat side question: it looks like that the 'perfect hashing' usage
> > > is the root cause of the issue addressed here, and very likely is
> > > afflicted by other problems, e.g. the data curruption in 'err =
> > > tcindex_filter_result_init(old_r, cp, net);'.
> > >
> > > AFAICS 'perfect hashing' usage is a sort of optimization that the user-
> > > space may trigger with some combination of the tcindex arguments. I'm
> > > wondering if we could drop all perfect hashing related code?
> >
> > The thought of "how much of this can we delete" did cross my mind :)