LinuxLists.cc - [PATCH 2/4] kcov: add interrupt handling self test

2024-06-04 13:46:22

Subject: [PATCH 2/4] kcov: add interrupt handling self test

Add a boot self test that can catch sprious coverage from interrupts.
The coverage callback filters out interrupt code, but only after the
handler updates preempt count. Some code periodically leaks out
of that section and leads to spurious coverage.
Add a best-effort (but simple) test that is likely to catch such bugs.
If the test is enabled on CI systems that use KCOV, they should catch
any issues fast.

Signed-off-by: Dmitry Vyukov <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]

---

In my local testing w/o the previous fix,
it immidiatly produced the following splat:

kcov: running selftest
BUG: TASK stack guard page was hit at ffffc90000147ff8
Oops: stack guard page: 0000 [#1] PREEMPT SMP KASAN PTI
...
kvm_set_cpu_l1tf_flush_l1d+0x5/0x20
sysvec_call_function+0x15/0xb0
asm_sysvec_call_function+0x1a/0x20
kcov_init+0xe4/0x130
do_one_initcall+0xbc/0x470
kernel_init_freeable+0x4fc/0x930
kernel_init+0x1c/0x2b0
---
kernel/kcov.c | 28 ++++++++++++++++++++++++++++
lib/Kconfig.debug | 9 +++++++++
2 files changed, 37 insertions(+)

diff --git a/kernel/kcov.c b/kernel/kcov.c
index c3124f6d5536..04136f80042f 100644
--- a/kernel/kcov.c
+++ b/kernel/kcov.c
@@ -1057,6 +1057,30 @@ u64 kcov_common_handle(void)
}
EXPORT_SYMBOL(kcov_common_handle);

+#ifdef CONFIG_KCOV_TEST
+static void __init selftest(void)
+{
+ volatile int i;
+
+ pr_err("running self test\n");
+ /*
+ * Test that interrupts don't produce spurious coverage.
+ * The coverage callback filters out interrupt code, but only
+ * after the handler updates preempt count. Some code periodically
+ * leaks out of that section and leads to spurious coverage.
+ * It's hard to call the actual interrupt handler directly,
+ * so we just loop here for ~400 ms waiting for a timer interrupt.
+ * We set kcov_mode to enable tracing, but don't setup the area,
+ * so any attempt to trace will crash.
+ */
+ current->kcov_mode = KCOV_MODE_TRACE_PC;
+ for (i = 0; i < (1 << 28); i++)
+ ;
+ current->kcov_mode = 0;
+ pr_err("done running self test\n");
+}
+#endif
+
static int __init kcov_init(void)
{
int cpu;
@@ -1076,6 +1100,10 @@ static int __init kcov_init(void)
*/
debugfs_create_file_unsafe("kcov", 0600, NULL, NULL, &kcov_fops);

+#ifdef CONFIG_KCOV_TEST
+ selftest();
+#endif
+
return 0;
}

diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 59b6765d86b8..79836a15b6cb 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -2171,6 +2171,15 @@ config KCOV_IRQ_AREA_SIZE
soft interrupts. This specifies the size of those areas in the
number of unsigned long words.

+config KCOV_TEST
+ bool "Test CONFIG_KCOV feature"
+ depends on KCOV
+ help
+ Sanity check for KCOV coverage collection.
+ Runs built-in self test on boot to detect some common issues.
+
+ If unsure, say N.
+
menuconfig RUNTIME_TESTING_MENU
bool "Runtime Testing"
default y
--
2.45.1.467.gbab1589fc0-goog

2024-06-04 15:27:12

by Alexander Potapenko

[permalink] [raw]

Subject: Re: [PATCH 2/4] kcov: add interrupt handling self test

On Tue, Jun 4, 2024 at 3:45 PM Dmitry Vyukov <[email protected]> wrote:
>
> Add a boot self test that can catch sprious coverage from interrupts.
> The coverage callback filters out interrupt code, but only after the
> handler updates preempt count. Some code periodically leaks out
> of that section and leads to spurious coverage.
> Add a best-effort (but simple) test that is likely to catch such bugs.
> If the test is enabled on CI systems that use KCOV, they should catch
> any issues fast.
>
> Signed-off-by: Dmitry Vyukov <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
Reviewed-by: Alexander Potapenko <[email protected]>

2024-06-05 09:10:29

by Marco Elver

[permalink] [raw]

Subject: Re: [PATCH 2/4] kcov: add interrupt handling self test

On Tue, 4 Jun 2024 at 15:45, Dmitry Vyukov <[email protected]> wrote:
>
> Add a boot self test that can catch sprious coverage from interrupts.
> The coverage callback filters out interrupt code, but only after the
> handler updates preempt count. Some code periodically leaks out
> of that section and leads to spurious coverage.
> Add a best-effort (but simple) test that is likely to catch such bugs.
> If the test is enabled on CI systems that use KCOV, they should catch
> any issues fast.
>
> Signed-off-by: Dmitry Vyukov <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
>
> ---
>
> In my local testing w/o the previous fix,
> it immidiatly produced the following splat:
>
> kcov: running selftest
> BUG: TASK stack guard page was hit at ffffc90000147ff8
> Oops: stack guard page: 0000 [#1] PREEMPT SMP KASAN PTI
> ...
> kvm_set_cpu_l1tf_flush_l1d+0x5/0x20
> sysvec_call_function+0x15/0xb0
> asm_sysvec_call_function+0x1a/0x20
> kcov_init+0xe4/0x130
> do_one_initcall+0xbc/0x470
> kernel_init_freeable+0x4fc/0x930
> kernel_init+0x1c/0x2b0
> ---
> kernel/kcov.c | 28 ++++++++++++++++++++++++++++
> lib/Kconfig.debug | 9 +++++++++
> 2 files changed, 37 insertions(+)
>
> diff --git a/kernel/kcov.c b/kernel/kcov.c
> index c3124f6d5536..04136f80042f 100644
> --- a/kernel/kcov.c
> +++ b/kernel/kcov.c
> @@ -1057,6 +1057,30 @@ u64 kcov_common_handle(void)
> }
> EXPORT_SYMBOL(kcov_common_handle);
>
> +#ifdef CONFIG_KCOV_TEST
> +static void __init selftest(void)
> +{
> + volatile int i;
> +
> + pr_err("running self test\n");
> + /*
> + * Test that interrupts don't produce spurious coverage.
> + * The coverage callback filters out interrupt code, but only
> + * after the handler updates preempt count. Some code periodically
> + * leaks out of that section and leads to spurious coverage.
> + * It's hard to call the actual interrupt handler directly,
> + * so we just loop here for ~400 ms waiting for a timer interrupt.

Where do the 400 ms come from? I only see that it loops a long time,
but that the timing is entirely dependent on how fast the CPU executes
the loop.

> + * We set kcov_mode to enable tracing, but don't setup the area,
> + * so any attempt to trace will crash.
> + */
> + current->kcov_mode = KCOV_MODE_TRACE_PC;
> + for (i = 0; i < (1 << 28); i++)
> + ;

Can't you check jiffies, and e.g. check that actual ~100-500ms have elapsed?

timeout = jiffies + msecs_to_jiffies(300);
while (!time_after(jiffies, timeout)) {
cpu_relax();
}

> + current->kcov_mode = 0;
> + pr_err("done running self test\n");
> +}
> +#endif
> +
> static int __init kcov_init(void)
> {
> int cpu;
> @@ -1076,6 +1100,10 @@ static int __init kcov_init(void)
> */
> debugfs_create_file_unsafe("kcov", 0600, NULL, NULL, &kcov_fops);
>
> +#ifdef CONFIG_KCOV_TEST
> + selftest();
> +#endif
> +
> return 0;
> }
>
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index 59b6765d86b8..79836a15b6cb 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -2171,6 +2171,15 @@ config KCOV_IRQ_AREA_SIZE
> soft interrupts. This specifies the size of those areas in the
> number of unsigned long words.
>
> +config KCOV_TEST

s/TEST/SELFTEST/

It may be confused with a longer standalone test.

> + bool "Test CONFIG_KCOV feature"

Maybe "Perform short selftests on boot" (similar to CONFIG_KCSAN_SELFTEST).

> + depends on KCOV
> + help
> + Sanity check for KCOV coverage collection.
> + Runs built-in self test on boot to detect some common issues.
> +
> + If unsure, say N.
> +
> menuconfig RUNTIME_TESTING_MENU
> bool "Runtime Testing"
> default y
> --
> 2.45.1.467.gbab1589fc0-goog
>

2024-06-05 09:19:35

by Dmitry Vyukov

[permalink] [raw]

Subject: Re: [PATCH 2/4] kcov: add interrupt handling self test

On Wed, 5 Jun 2024 at 11:10, Marco Elver <[email protected]> wrote:
>
> > Add a boot self test that can catch sprious coverage from interrupts.
> > The coverage callback filters out interrupt code, but only after the
> > handler updates preempt count. Some code periodically leaks out
> > of that section and leads to spurious coverage.
> > Add a best-effort (but simple) test that is likely to catch such bugs.
> > If the test is enabled on CI systems that use KCOV, they should catch
> > any issues fast.
> >
> > Signed-off-by: Dmitry Vyukov <[email protected]>
> > Cc: [email protected]
> > Cc: [email protected]
> > Cc: [email protected]
> >
> > ---
> >
> > In my local testing w/o the previous fix,
> > it immidiatly produced the following splat:
> >
> > kcov: running selftest
> > BUG: TASK stack guard page was hit at ffffc90000147ff8
> > Oops: stack guard page: 0000 [#1] PREEMPT SMP KASAN PTI
> > ...
> > kvm_set_cpu_l1tf_flush_l1d+0x5/0x20
> > sysvec_call_function+0x15/0xb0
> > asm_sysvec_call_function+0x1a/0x20
> > kcov_init+0xe4/0x130
> > do_one_initcall+0xbc/0x470
> > kernel_init_freeable+0x4fc/0x930
> > kernel_init+0x1c/0x2b0
> > ---
> > kernel/kcov.c | 28 ++++++++++++++++++++++++++++
> > lib/Kconfig.debug | 9 +++++++++
> > 2 files changed, 37 insertions(+)
> >
> > diff --git a/kernel/kcov.c b/kernel/kcov.c
> > index c3124f6d5536..04136f80042f 100644
> > --- a/kernel/kcov.c
> > +++ b/kernel/kcov.c
> > @@ -1057,6 +1057,30 @@ u64 kcov_common_handle(void)
> > }
> > EXPORT_SYMBOL(kcov_common_handle);
> >
> > +#ifdef CONFIG_KCOV_TEST
> > +static void __init selftest(void)
> > +{
> > + volatile int i;
> > +
> > + pr_err("running self test\n");
> > + /*
> > + * Test that interrupts don't produce spurious coverage.
> > + * The coverage callback filters out interrupt code, but only
> > + * after the handler updates preempt count. Some code periodically
> > + * leaks out of that section and leads to spurious coverage.
> > + * It's hard to call the actual interrupt handler directly,
> > + * so we just loop here for ~400 ms waiting for a timer interrupt.
>
> Where do the 400 ms come from? I only see that it loops a long time,
> but that the timing is entirely dependent on how fast the CPU executes
> the loop.
>
> > + * We set kcov_mode to enable tracing, but don't setup the area,
> > + * so any attempt to trace will crash.
> > + */
> > + current->kcov_mode = KCOV_MODE_TRACE_PC;
> > + for (i = 0; i < (1 << 28); i++)
> > + ;
>
> Can't you check jiffies, and e.g. check that actual ~100-500ms have elapsed?
>
> timeout = jiffies + msecs_to_jiffies(300);
> while (!time_after(jiffies, timeout)) {
> cpu_relax();
> }

We can't call any functions. If anything is instrumented, the kernel crashes.

But just reading jiffies should be fine, so we can do:

unsigned long start = jiffies;
while ((jiffies - start) * MSEC_PER_SEC / HZ < 500)
;

> > + current->kcov_mode = 0;
> > + pr_err("done running self test\n");
> > +}
> > +#endif
> > +
> > static int __init kcov_init(void)
> > {
> > int cpu;
> > @@ -1076,6 +1100,10 @@ static int __init kcov_init(void)
> > */
> > debugfs_create_file_unsafe("kcov", 0600, NULL, NULL, &kcov_fops);
> >
> > +#ifdef CONFIG_KCOV_TEST
> > + selftest();
> > +#endif
> > +
> > return 0;
> > }
> >
> > diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> > index 59b6765d86b8..79836a15b6cb 100644
> > --- a/lib/Kconfig.debug
> > +++ b/lib/Kconfig.debug
> > @@ -2171,6 +2171,15 @@ config KCOV_IRQ_AREA_SIZE
> > soft interrupts. This specifies the size of those areas in the
> > number of unsigned long words.
> >
> > +config KCOV_TEST
>
> s/TEST/SELFTEST/
>
> It may be confused with a longer standalone test.
>
> > + bool "Test CONFIG_KCOV feature"
>
> Maybe "Perform short selftests on boot" (similar to CONFIG_KCSAN_SELFTEST).
>
> > + depends on KCOV
> > + help
> > + Sanity check for KCOV coverage collection.
> > + Runs built-in self test on boot to detect some common issues.
> > +
> > + If unsure, say N.
> > +
> > menuconfig RUNTIME_TESTING_MENU
> > bool "Runtime Testing"
> > default y
> > --
> > 2.45.1.467.gbab1589fc0-goog
> >

2024-06-05 09:38:21

by Marco Elver

[permalink] [raw]

Subject: Re: [PATCH 2/4] kcov: add interrupt handling self test

On Wed, 5 Jun 2024 at 11:18, Dmitry Vyukov <[email protected]> wrote:
>
> On Wed, 5 Jun 2024 at 11:10, Marco Elver <[email protected]> wrote:
> >
> > > Add a boot self test that can catch sprious coverage from interrupts.
> > > The coverage callback filters out interrupt code, but only after the
> > > handler updates preempt count. Some code periodically leaks out
> > > of that section and leads to spurious coverage.
> > > Add a best-effort (but simple) test that is likely to catch such bugs.
> > > If the test is enabled on CI systems that use KCOV, they should catch
> > > any issues fast.
> > >
> > > Signed-off-by: Dmitry Vyukov <[email protected]>
> > > Cc: [email protected]
> > > Cc: [email protected]
> > > Cc: [email protected]
> > >
> > > ---
> > >
> > > In my local testing w/o the previous fix,
> > > it immidiatly produced the following splat:
> > >
> > > kcov: running selftest
> > > BUG: TASK stack guard page was hit at ffffc90000147ff8
> > > Oops: stack guard page: 0000 [#1] PREEMPT SMP KASAN PTI
> > > ...
> > > kvm_set_cpu_l1tf_flush_l1d+0x5/0x20
> > > sysvec_call_function+0x15/0xb0
> > > asm_sysvec_call_function+0x1a/0x20
> > > kcov_init+0xe4/0x130
> > > do_one_initcall+0xbc/0x470
> > > kernel_init_freeable+0x4fc/0x930
> > > kernel_init+0x1c/0x2b0
> > > ---
> > > kernel/kcov.c | 28 ++++++++++++++++++++++++++++
> > > lib/Kconfig.debug | 9 +++++++++
> > > 2 files changed, 37 insertions(+)
> > >
> > > diff --git a/kernel/kcov.c b/kernel/kcov.c
> > > index c3124f6d5536..04136f80042f 100644
> > > --- a/kernel/kcov.c
> > > +++ b/kernel/kcov.c
> > > @@ -1057,6 +1057,30 @@ u64 kcov_common_handle(void)
> > > }
> > > EXPORT_SYMBOL(kcov_common_handle);
> > >
> > > +#ifdef CONFIG_KCOV_TEST
> > > +static void __init selftest(void)
> > > +{
> > > + volatile int i;
> > > +
> > > + pr_err("running self test\n");
> > > + /*
> > > + * Test that interrupts don't produce spurious coverage.
> > > + * The coverage callback filters out interrupt code, but only
> > > + * after the handler updates preempt count. Some code periodically
> > > + * leaks out of that section and leads to spurious coverage.
> > > + * It's hard to call the actual interrupt handler directly,
> > > + * so we just loop here for ~400 ms waiting for a timer interrupt.
> >
> > Where do the 400 ms come from? I only see that it loops a long time,
> > but that the timing is entirely dependent on how fast the CPU executes
> > the loop.
> >
> > > + * We set kcov_mode to enable tracing, but don't setup the area,
> > > + * so any attempt to trace will crash.
> > > + */
> > > + current->kcov_mode = KCOV_MODE_TRACE_PC;
> > > + for (i = 0; i < (1 << 28); i++)
> > > + ;
> >
> > Can't you check jiffies, and e.g. check that actual ~100-500ms have elapsed?
> >
> > timeout = jiffies + msecs_to_jiffies(300);
> > while (!time_after(jiffies, timeout)) {
> > cpu_relax();
> > }
>
> We can't call any functions. If anything is instrumented, the kernel crashes.
>
> But just reading jiffies should be fine, so we can do:
>
> unsigned long start = jiffies;
> while ((jiffies - start) * MSEC_PER_SEC / HZ < 500)
> ;

I'm quite sure that those helpers are macros, but who knows if that
will ever change.

The above open-coded version looks reasonable.

2024-06-11 08:04:18

by Dmitry Vyukov

[permalink] [raw]

Subject: Re: [PATCH 2/4] kcov: add interrupt handling self test

On Wed, 5 Jun 2024 at 11:33, Marco Elver <[email protected]> wrote:
>
> On Wed, 5 Jun 2024 at 11:18, Dmitry Vyukov <[email protected]> wrote:
> >
> > On Wed, 5 Jun 2024 at 11:10, Marco Elver <[email protected]> wrote:
> > >
> > > > Add a boot self test that can catch sprious coverage from interrupts.
> > > > The coverage callback filters out interrupt code, but only after the
> > > > handler updates preempt count. Some code periodically leaks out
> > > > of that section and leads to spurious coverage.
> > > > Add a best-effort (but simple) test that is likely to catch such bugs.
> > > > If the test is enabled on CI systems that use KCOV, they should catch
> > > > any issues fast.
> > > >
> > > > Signed-off-by: Dmitry Vyukov <[email protected]>
> > > > Cc: [email protected]
> > > > Cc: [email protected]
> > > > Cc: [email protected]
> > > >
> > > > ---
> > > >
> > > > In my local testing w/o the previous fix,
> > > > it immidiatly produced the following splat:
> > > >
> > > > kcov: running selftest
> > > > BUG: TASK stack guard page was hit at ffffc90000147ff8
> > > > Oops: stack guard page: 0000 [#1] PREEMPT SMP KASAN PTI
> > > > ...
> > > > kvm_set_cpu_l1tf_flush_l1d+0x5/0x20
> > > > sysvec_call_function+0x15/0xb0
> > > > asm_sysvec_call_function+0x1a/0x20
> > > > kcov_init+0xe4/0x130
> > > > do_one_initcall+0xbc/0x470
> > > > kernel_init_freeable+0x4fc/0x930
> > > > kernel_init+0x1c/0x2b0
> > > > ---
> > > > kernel/kcov.c | 28 ++++++++++++++++++++++++++++
> > > > lib/Kconfig.debug | 9 +++++++++
> > > > 2 files changed, 37 insertions(+)
> > > >
> > > > diff --git a/kernel/kcov.c b/kernel/kcov.c
> > > > index c3124f6d5536..04136f80042f 100644
> > > > --- a/kernel/kcov.c
> > > > +++ b/kernel/kcov.c
> > > > @@ -1057,6 +1057,30 @@ u64 kcov_common_handle(void)
> > > > }
> > > > EXPORT_SYMBOL(kcov_common_handle);
> > > >
> > > > +#ifdef CONFIG_KCOV_TEST
> > > > +static void __init selftest(void)
> > > > +{
> > > > + volatile int i;
> > > > +
> > > > + pr_err("running self test\n");
> > > > + /*
> > > > + * Test that interrupts don't produce spurious coverage.
> > > > + * The coverage callback filters out interrupt code, but only
> > > > + * after the handler updates preempt count. Some code periodically
> > > > + * leaks out of that section and leads to spurious coverage.
> > > > + * It's hard to call the actual interrupt handler directly,
> > > > + * so we just loop here for ~400 ms waiting for a timer interrupt.
> > >
> > > Where do the 400 ms come from? I only see that it loops a long time,
> > > but that the timing is entirely dependent on how fast the CPU executes
> > > the loop.
> > >
> > > > + * We set kcov_mode to enable tracing, but don't setup the area,
> > > > + * so any attempt to trace will crash.
> > > > + */
> > > > + current->kcov_mode = KCOV_MODE_TRACE_PC;
> > > > + for (i = 0; i < (1 << 28); i++)
> > > > + ;
> > >
> > > Can't you check jiffies, and e.g. check that actual ~100-500ms have elapsed?
> > >
> > > timeout = jiffies + msecs_to_jiffies(300);
> > > while (!time_after(jiffies, timeout)) {
> > > cpu_relax();
> > > }
> >
> > We can't call any functions. If anything is instrumented, the kernel crashes.
> >
> > But just reading jiffies should be fine, so we can do:
> >
> > unsigned long start = jiffies;
> > while ((jiffies - start) * MSEC_PER_SEC / HZ < 500)
> > ;
>
> I'm quite sure that those helpers are macros, but who knows if that
> will ever change.
>
> The above open-coded version looks reasonable.

Sent v2 with fixes, PTAL.
https://lore.kernel.org/all/[email protected]/T/#t