2022-03-09 02:25:28

by Peng Liu

[permalink] [raw]
Subject: [PATCH 0/3] kunit: fix a UAF bug and do some optimization

This series is to fix UAF when running kfence test case test_gfpzero,
which is time costly. This UAF bug can be easily triggered by setting
CONFIG_KFENCE_DYNAMIC_OBJECTS = 65535. Furthermore, some optimization
for kunit tests has been done.

Peng Liu (3):
kunit: fix UAF when run kfence test case test_gfpzero
kunit: make kunit_test_timeout compatible with comment
kfence: test: try to avoid test_gfpzero trigger rcu_stall

lib/kunit/try-catch.c | 3 ++-
mm/kfence/kfence_test.c | 3 ++-
2 files changed, 4 insertions(+), 2 deletions(-)

--
2.18.0.huawei.25


2022-03-09 02:25:40

by Peng Liu

[permalink] [raw]
Subject: [PATCH 2/3] kunit: make kunit_test_timeout compatible with comment

In function kunit_test_timeout, it is declared "300 * MSEC_PER_SEC"
represent 5min. However, it is wrong when dealing with arm64 whose
default HZ = 250, or some other situations. Use msecs_to_jiffies to
fix this, and kunit_test_timeout will work as desired.

Signed-off-by: Peng Liu <[email protected]>
---
lib/kunit/try-catch.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/kunit/try-catch.c b/lib/kunit/try-catch.c
index 6b3d4db94077..f7825991d576 100644
--- a/lib/kunit/try-catch.c
+++ b/lib/kunit/try-catch.c
@@ -52,7 +52,7 @@ static unsigned long kunit_test_timeout(void)
* If tests timeout due to exceeding sysctl_hung_task_timeout_secs,
* the task will be killed and an oops generated.
*/
- return 300 * MSEC_PER_SEC; /* 5 min */
+ return 300 * msecs_to_jiffies(MSEC_PER_SEC); /* 5 min */
}

void kunit_try_catch_run(struct kunit_try_catch *try_catch, void *context)
--
2.18.0.huawei.25

2022-03-09 02:25:50

by Peng Liu

[permalink] [raw]
Subject: [PATCH 3/3] kfence: test: try to avoid test_gfpzero trigger rcu_stall

When CONFIG_KFENCE_DYNAMIC_OBJECTS is set to a big number, kfence
kunit-test-case test_gfpzero will eat up nearly all the CPU's
resources and rcu_stall is reported as the following log which is
cut from a physical server.

rcu: INFO: rcu_sched self-detected stall on CPU
rcu: 68-....: (14422 ticks this GP) idle=6ce/1/0x4000000000000002
softirq=592/592 fqs=7500 (t=15004 jiffies g=10677 q=20019)
Task dump for CPU 68:
task:kunit_try_catch state:R running task
stack: 0 pid: 9728 ppid: 2 flags:0x0000020a
Call trace:
dump_backtrace+0x0/0x1e4
show_stack+0x20/0x2c
sched_show_task+0x148/0x170
...
rcu_sched_clock_irq+0x70/0x180
update_process_times+0x68/0xb0
tick_sched_handle+0x38/0x74
...
gic_handle_irq+0x78/0x2c0
el1_irq+0xb8/0x140
kfree+0xd8/0x53c
test_alloc+0x264/0x310 [kfence_test]
test_gfpzero+0xf4/0x840 [kfence_test]
kunit_try_run_case+0x48/0x20c
kunit_generic_run_threadfn_adapter+0x28/0x34
kthread+0x108/0x13c
ret_from_fork+0x10/0x18

To avoid rcu_stall and unacceptable latency, a schedule point is
added to test_gfpzero.

Signed-off-by: Peng Liu <[email protected]>
---
mm/kfence/kfence_test.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/mm/kfence/kfence_test.c b/mm/kfence/kfence_test.c
index caed6b4eba94..1b50f70a4c0f 100644
--- a/mm/kfence/kfence_test.c
+++ b/mm/kfence/kfence_test.c
@@ -627,6 +627,7 @@ static void test_gfpzero(struct kunit *test)
kunit_warn(test, "giving up ... cannot get same object back\n");
return;
}
+ cond_resched();
}

for (i = 0; i < size; i++)
--
2.18.0.huawei.25

2022-03-09 06:21:30

by Marco Elver

[permalink] [raw]
Subject: Re: [PATCH 0/3] kunit: fix a UAF bug and do some optimization

On Wed, 9 Mar 2022 at 02:29, 'Peng Liu' via kasan-dev
<[email protected]> wrote:
>
> This series is to fix UAF when running kfence test case test_gfpzero,
> which is time costly. This UAF bug can be easily triggered by setting
> CONFIG_KFENCE_DYNAMIC_OBJECTS = 65535. Furthermore, some optimization
> for kunit tests has been done.

Yeah, I've observed this problem before, so thanks for fixing.

It's CONFIG_KFENCE_NUM_OBJECTS (not "DYNAMIC") - please fix in all patches.


> Peng Liu (3):
> kunit: fix UAF when run kfence test case test_gfpzero
> kunit: make kunit_test_timeout compatible with comment
> kfence: test: try to avoid test_gfpzero trigger rcu_stall
>
> lib/kunit/try-catch.c | 3 ++-
> mm/kfence/kfence_test.c | 3 ++-
> 2 files changed, 4 insertions(+), 2 deletions(-)
>
> --
> 2.18.0.huawei.25
>
> --
> You received this message because you are subscribed to the Google Groups "kasan-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
> To view this discussion on the web visit https://groups.google.com/d/msgid/kasan-dev/20220309014705.1265861-1-liupeng256%40huawei.com.

2022-03-09 06:32:26

by Marco Elver

[permalink] [raw]
Subject: Re: [PATCH 2/3] kunit: make kunit_test_timeout compatible with comment

On Wed, 9 Mar 2022 at 02:29, 'Peng Liu' via kasan-dev
<[email protected]> wrote:
>
> In function kunit_test_timeout, it is declared "300 * MSEC_PER_SEC"
> represent 5min. However, it is wrong when dealing with arm64 whose
> default HZ = 250, or some other situations. Use msecs_to_jiffies to
> fix this, and kunit_test_timeout will work as desired.
>
> Signed-off-by: Peng Liu <[email protected]>

Does this need a:

Fixes: 5f3e06208920 ("kunit: test: add support for test abort")

?

> ---
> lib/kunit/try-catch.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/lib/kunit/try-catch.c b/lib/kunit/try-catch.c
> index 6b3d4db94077..f7825991d576 100644
> --- a/lib/kunit/try-catch.c
> +++ b/lib/kunit/try-catch.c
> @@ -52,7 +52,7 @@ static unsigned long kunit_test_timeout(void)
> * If tests timeout due to exceeding sysctl_hung_task_timeout_secs,
> * the task will be killed and an oops generated.
> */
> - return 300 * MSEC_PER_SEC; /* 5 min */
> + return 300 * msecs_to_jiffies(MSEC_PER_SEC); /* 5 min */

Why not just "300 * HZ" ?

> }
>
> void kunit_try_catch_run(struct kunit_try_catch *try_catch, void *context)
> --
> 2.18.0.huawei.25
>
> --
> You received this message because you are subscribed to the Google Groups "kasan-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
> To view this discussion on the web visit https://groups.google.com/d/msgid/kasan-dev/20220309014705.1265861-3-liupeng256%40huawei.com.

2022-03-09 06:52:19

by Marco Elver

[permalink] [raw]
Subject: Re: [PATCH 2/3] kunit: make kunit_test_timeout compatible with comment

On Wed, 9 Mar 2022 at 07:32, liupeng (DM) <[email protected]> wrote:
>
> Thank you for your advice.
>
> On 2022/3/9 14:03, Marco Elver wrote:
> > On Wed, 9 Mar 2022 at 02:29, 'Peng Liu' via kasan-dev
> > <[email protected]> wrote:
> >> In function kunit_test_timeout, it is declared "300 * MSEC_PER_SEC"
> >> represent 5min. However, it is wrong when dealing with arm64 whose
> >> default HZ = 250, or some other situations. Use msecs_to_jiffies to
> >> fix this, and kunit_test_timeout will work as desired.
> >>
> >> Signed-off-by: Peng Liu <[email protected]>
> > Does this need a:
> >
> > Fixes: 5f3e06208920 ("kunit: test: add support for test abort")
> >
> > ?
>
> Yes, I will add this description.
>
> >> ---
> >> lib/kunit/try-catch.c | 2 +-
> >> 1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/lib/kunit/try-catch.c b/lib/kunit/try-catch.c
> >> index 6b3d4db94077..f7825991d576 100644
> >> --- a/lib/kunit/try-catch.c
> >> +++ b/lib/kunit/try-catch.c
> >> @@ -52,7 +52,7 @@ static unsigned long kunit_test_timeout(void)
> >> * If tests timeout due to exceeding sysctl_hung_task_timeout_secs,
> >> * the task will be killed and an oops generated.
> >> */
> >> - return 300 * MSEC_PER_SEC; /* 5 min */
> >> + return 300 * msecs_to_jiffies(MSEC_PER_SEC); /* 5 min */
> > Why not just "300 * HZ" ?
>
> Because I have seen patch
>
> df3c30f6e904 ("staging: lustre: replace direct HZ access with kernel APIs").
>
> Here, both "msecs_to_jiffies(MSEC_PER_SEC)" and "300 * HZ" is ok for me.

I see - let's keep as-is and use msecs_to_jiffies().

Thanks,
-- Marco

2022-03-09 07:24:14

by Peng Liu

[permalink] [raw]
Subject: Re: [PATCH 2/3] kunit: make kunit_test_timeout compatible with comment

Thank you for your advice.

On 2022/3/9 14:03, Marco Elver wrote:
> On Wed, 9 Mar 2022 at 02:29, 'Peng Liu' via kasan-dev
> <[email protected]> wrote:
>> In function kunit_test_timeout, it is declared "300 * MSEC_PER_SEC"
>> represent 5min. However, it is wrong when dealing with arm64 whose
>> default HZ = 250, or some other situations. Use msecs_to_jiffies to
>> fix this, and kunit_test_timeout will work as desired.
>>
>> Signed-off-by: Peng Liu <[email protected]>
> Does this need a:
>
> Fixes: 5f3e06208920 ("kunit: test: add support for test abort")
>
> ?

Yes, I will add this description.

>> ---
>> lib/kunit/try-catch.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/lib/kunit/try-catch.c b/lib/kunit/try-catch.c
>> index 6b3d4db94077..f7825991d576 100644
>> --- a/lib/kunit/try-catch.c
>> +++ b/lib/kunit/try-catch.c
>> @@ -52,7 +52,7 @@ static unsigned long kunit_test_timeout(void)
>> * If tests timeout due to exceeding sysctl_hung_task_timeout_secs,
>> * the task will be killed and an oops generated.
>> */
>> - return 300 * MSEC_PER_SEC; /* 5 min */
>> + return 300 * msecs_to_jiffies(MSEC_PER_SEC); /* 5 min */
> Why not just "300 * HZ" ?

Because I have seen patch

df3c30f6e904 ("staging: lustre: replace direct HZ access with kernel APIs").

Here, both "msecs_to_jiffies(MSEC_PER_SEC)" and "300 * HZ" is ok for me.

>> }
>>
>> void kunit_try_catch_run(struct kunit_try_catch *try_catch, void *context)
>> --
>> 2.18.0.huawei.25
>>
>> --
>> You received this message because you are subscribed to the Google Groups "kasan-dev" group.
>> To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
>> To view this discussion on the web visit https://groups.google.com/d/msgid/kasan-dev/20220309014705.1265861-3-liupeng256%40huawei.com.
> .

2022-03-09 07:56:13

by Peng Liu

[permalink] [raw]
Subject: Re: [PATCH 0/3] kunit: fix a UAF bug and do some optimization

Good, I will send a revised series latter.

On 2022/3/9 14:12, Marco Elver wrote:
> On Wed, 9 Mar 2022 at 02:29, 'Peng Liu' via kasan-dev
> <[email protected]> wrote:
>> This series is to fix UAF when running kfence test case test_gfpzero,
>> which is time costly. This UAF bug can be easily triggered by setting
>> CONFIG_KFENCE_DYNAMIC_OBJECTS = 65535. Furthermore, some optimization
>> for kunit tests has been done.
> Yeah, I've observed this problem before, so thanks for fixing.
>
> It's CONFIG_KFENCE_NUM_OBJECTS (not "DYNAMIC") - please fix in all patches.
>
Sorry for this mistake, I will check it in all patches.
>> Peng Liu (3):
>> kunit: fix UAF when run kfence test case test_gfpzero
>> kunit: make kunit_test_timeout compatible with comment
>> kfence: test: try to avoid test_gfpzero trigger rcu_stall
>>
>> lib/kunit/try-catch.c | 3 ++-
>> mm/kfence/kfence_test.c | 3 ++-
>> 2 files changed, 4 insertions(+), 2 deletions(-)
>>
>> --
>> 2.18.0.huawei.25
>>
>> --
>> You received this message because you are subscribed to the Google Groups "kasan-dev" group.
>> To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
>> To view this discussion on the web visit https://groups.google.com/d/msgid/kasan-dev/20220309014705.1265861-1-liupeng256%40huawei.com.
> .

2022-03-10 01:42:10

by Shuah Khan

[permalink] [raw]
Subject: Re: [PATCH 3/3] kfence: test: try to avoid test_gfpzero trigger rcu_stall

On 3/8/22 6:47 PM, Peng Liu wrote:
> When CONFIG_KFENCE_DYNAMIC_OBJECTS is set to a big number, kfence
> kunit-test-case test_gfpzero will eat up nearly all the CPU's
> resources and rcu_stall is reported as the following log which is
> cut from a physical server.
>
> rcu: INFO: rcu_sched self-detected stall on CPU
> rcu: 68-....: (14422 ticks this GP) idle=6ce/1/0x4000000000000002
> softirq=592/592 fqs=7500 (t=15004 jiffies g=10677 q=20019)
> Task dump for CPU 68:
> task:kunit_try_catch state:R running task
> stack: 0 pid: 9728 ppid: 2 flags:0x0000020a
> Call trace:
> dump_backtrace+0x0/0x1e4
> show_stack+0x20/0x2c
> sched_show_task+0x148/0x170
> ...
> rcu_sched_clock_irq+0x70/0x180
> update_process_times+0x68/0xb0
> tick_sched_handle+0x38/0x74
> ...
> gic_handle_irq+0x78/0x2c0
> el1_irq+0xb8/0x140
> kfree+0xd8/0x53c
> test_alloc+0x264/0x310 [kfence_test]
> test_gfpzero+0xf4/0x840 [kfence_test]
> kunit_try_run_case+0x48/0x20c
> kunit_generic_run_threadfn_adapter+0x28/0x34
> kthread+0x108/0x13c
> ret_from_fork+0x10/0x18
>
> To avoid rcu_stall and unacceptable latency, a schedule point is
> added to test_gfpzero.
>
> Signed-off-by: Peng Liu <[email protected]>
> ---
> mm/kfence/kfence_test.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/mm/kfence/kfence_test.c b/mm/kfence/kfence_test.c
> index caed6b4eba94..1b50f70a4c0f 100644
> --- a/mm/kfence/kfence_test.c
> +++ b/mm/kfence/kfence_test.c
> @@ -627,6 +627,7 @@ static void test_gfpzero(struct kunit *test)
> kunit_warn(test, "giving up ... cannot get same object back\n");
> return;
> }
> + cond_resched();

This sounds like a band-aid - is there a better way to fix this?

> }
>
> for (i = 0; i < size; i++)
>

thanks,
-- Shuah