2024-06-11 08:16:32

by Dmitry Vyukov

[permalink] [raw]
Subject: [PATCH v2 0/4] KCOV fixes

Fix spurious KCOV coverage from interrupts and add a test.
Ignore some additional files that lead to large amounts
of uninteresting coverage.
As a reference point, tracing a simple open system call
produces ~10K PCs with these changes instead of ~45K PCs.

Dmitry Vyukov (4):
x86/entry: Remove unwanted instrumentation in common_interrupt()
kcov: add interrupt handling self test
module: Fix KCOV-ignored file name
x86: Ignore stack unwinding in KCOV

arch/x86/include/asm/hardirq.h | 8 ++++++--
arch/x86/include/asm/idtentry.h | 6 +++---
arch/x86/kernel/Makefile | 8 ++++++++
kernel/kcov.c | 31 +++++++++++++++++++++++++++++++
kernel/module/Makefile | 2 +-
lib/Kconfig.debug | 8 ++++++++
6 files changed, 57 insertions(+), 6 deletions(-)


base-commit: 83a7eefedc9b56fe7bfeff13b6c7356688ffa670
--
2.45.2.505.gda0bf45e8d-goog



2024-06-11 08:17:44

by Dmitry Vyukov

[permalink] [raw]
Subject: [PATCH v2 2/4] kcov: add interrupt handling self test

Add a boot self test that can catch sprious coverage from interrupts.
The coverage callback filters out interrupt code, but only after the
handler updates preempt count. Some code periodically leaks out
of that section and leads to spurious coverage.
Add a best-effort (but simple) test that is likely to catch such bugs.
If the test is enabled on CI systems that use KCOV, they should catch
any issues fast.

Signed-off-by: Dmitry Vyukov <[email protected]>
Reviewed-by: Alexander Potapenko <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]

---

Changed since v1:
- renamed KCOV_TEST to KCOV_SELFTEST
- improved the config description
- loop for exactly 300ms in the test

In my local testing w/o the previous fix,
it immidiatly produced the following splat:

kcov: running selftest
BUG: TASK stack guard page was hit at ffffc90000147ff8
Oops: stack guard page: 0000 [#1] PREEMPT SMP KASAN PTI
...
kvm_set_cpu_l1tf_flush_l1d+0x5/0x20
sysvec_call_function+0x15/0xb0
asm_sysvec_call_function+0x1a/0x20
kcov_init+0xe4/0x130
do_one_initcall+0xbc/0x470
kernel_init_freeable+0x4fc/0x930
kernel_init+0x1c/0x2b0
---
kernel/kcov.c | 31 +++++++++++++++++++++++++++++++
lib/Kconfig.debug | 8 ++++++++
2 files changed, 39 insertions(+)

diff --git a/kernel/kcov.c b/kernel/kcov.c
index c3124f6d5536..72a5bf55107f 100644
--- a/kernel/kcov.c
+++ b/kernel/kcov.c
@@ -11,6 +11,7 @@
#include <linux/fs.h>
#include <linux/hashtable.h>
#include <linux/init.h>
+#include <linux/jiffies.h>
#include <linux/kmsan-checks.h>
#include <linux/mm.h>
#include <linux/preempt.h>
@@ -1057,6 +1058,32 @@ u64 kcov_common_handle(void)
}
EXPORT_SYMBOL(kcov_common_handle);

+#ifdef CONFIG_KCOV_SELFTEST
+static void __init selftest(void)
+{
+ unsigned long start;
+
+ pr_err("running self test\n");
+ /*
+ * Test that interrupts don't produce spurious coverage.
+ * The coverage callback filters out interrupt code, but only
+ * after the handler updates preempt count. Some code periodically
+ * leaks out of that section and leads to spurious coverage.
+ * It's hard to call the actual interrupt handler directly,
+ * so we just loop here for a bit waiting for a timer interrupt.
+ * We set kcov_mode to enable tracing, but don't setup the area,
+ * so any attempt to trace will crash. Note: we must not call any
+ * potentially traced functions in this region.
+ */
+ start = jiffies;
+ current->kcov_mode = KCOV_MODE_TRACE_PC;
+ while ((jiffies - start) * MSEC_PER_SEC / HZ < 300)
+ ;
+ current->kcov_mode = 0;
+ pr_err("done running self test\n");
+}
+#endif
+
static int __init kcov_init(void)
{
int cpu;
@@ -1076,6 +1103,10 @@ static int __init kcov_init(void)
*/
debugfs_create_file_unsafe("kcov", 0600, NULL, NULL, &kcov_fops);

+#ifdef CONFIG_KCOV_SELFTEST
+ selftest();
+#endif
+
return 0;
}

diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 59b6765d86b8..695a437a52d9 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -2171,6 +2171,14 @@ config KCOV_IRQ_AREA_SIZE
soft interrupts. This specifies the size of those areas in the
number of unsigned long words.

+config KCOV_SELFTEST
+ bool "Perform short selftests on boot"
+ depends on KCOV
+ help
+ Run short KCOV coverage collection selftests on boot.
+ On test failure, causes the kernel to panic. Recommended to be
+ enabled, ensuring critical functionality works as intended.
+
menuconfig RUNTIME_TESTING_MENU
bool "Runtime Testing"
default y
--
2.45.2.505.gda0bf45e8d-goog


2024-06-11 09:35:07

by Dmitry Vyukov

[permalink] [raw]
Subject: Re: [PATCH v2 0/4] KCOV fixes

On Tue, 11 Jun 2024 at 09:50, Dmitry Vyukov <[email protected]> wrote:
>
> Fix spurious KCOV coverage from interrupts and add a test.
> Ignore some additional files that lead to large amounts
> of uninteresting coverage.
> As a reference point, tracing a simple open system call
> produces ~10K PCs with these changes instead of ~45K PCs.
>
> Dmitry Vyukov (4):
> x86/entry: Remove unwanted instrumentation in common_interrupt()
> kcov: add interrupt handling self test
> module: Fix KCOV-ignored file name
> x86: Ignore stack unwinding in KCOV
>
> arch/x86/include/asm/hardirq.h | 8 ++++++--
> arch/x86/include/asm/idtentry.h | 6 +++---
> arch/x86/kernel/Makefile | 8 ++++++++
> kernel/kcov.c | 31 +++++++++++++++++++++++++++++++
> kernel/module/Makefile | 2 +-
> lib/Kconfig.debug | 8 ++++++++
> 6 files changed, 57 insertions(+), 6 deletions(-)
>
>
> base-commit: 83a7eefedc9b56fe7bfeff13b6c7356688ffa670
> --
> 2.45.2.505.gda0bf45e8d-goog

Thomas, Ingo, Borislav, Dave,

Can you take this via x86 tree please?

2024-06-11 09:55:51

by Marco Elver

[permalink] [raw]
Subject: Re: [PATCH v2 2/4] kcov: add interrupt handling self test

On Tue, 11 Jun 2024 at 09:50, Dmitry Vyukov <[email protected]> wrote:
>
> Add a boot self test that can catch sprious coverage from interrupts.
> The coverage callback filters out interrupt code, but only after the
> handler updates preempt count. Some code periodically leaks out
> of that section and leads to spurious coverage.
> Add a best-effort (but simple) test that is likely to catch such bugs.
> If the test is enabled on CI systems that use KCOV, they should catch
> any issues fast.
>
> Signed-off-by: Dmitry Vyukov <[email protected]>
> Reviewed-by: Alexander Potapenko <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]

Reviewed-by: Marco Elver <[email protected]>

> ---
>
> Changed since v1:
> - renamed KCOV_TEST to KCOV_SELFTEST
> - improved the config description
> - loop for exactly 300ms in the test
>
> In my local testing w/o the previous fix,
> it immidiatly produced the following splat:
>
> kcov: running selftest
> BUG: TASK stack guard page was hit at ffffc90000147ff8
> Oops: stack guard page: 0000 [#1] PREEMPT SMP KASAN PTI
> ...
> kvm_set_cpu_l1tf_flush_l1d+0x5/0x20
> sysvec_call_function+0x15/0xb0
> asm_sysvec_call_function+0x1a/0x20
> kcov_init+0xe4/0x130
> do_one_initcall+0xbc/0x470
> kernel_init_freeable+0x4fc/0x930
> kernel_init+0x1c/0x2b0
> ---
> kernel/kcov.c | 31 +++++++++++++++++++++++++++++++
> lib/Kconfig.debug | 8 ++++++++
> 2 files changed, 39 insertions(+)
>
> diff --git a/kernel/kcov.c b/kernel/kcov.c
> index c3124f6d5536..72a5bf55107f 100644
> --- a/kernel/kcov.c
> +++ b/kernel/kcov.c
> @@ -11,6 +11,7 @@
> #include <linux/fs.h>
> #include <linux/hashtable.h>
> #include <linux/init.h>
> +#include <linux/jiffies.h>
> #include <linux/kmsan-checks.h>
> #include <linux/mm.h>
> #include <linux/preempt.h>
> @@ -1057,6 +1058,32 @@ u64 kcov_common_handle(void)
> }
> EXPORT_SYMBOL(kcov_common_handle);
>
> +#ifdef CONFIG_KCOV_SELFTEST
> +static void __init selftest(void)
> +{
> + unsigned long start;
> +
> + pr_err("running self test\n");
> + /*
> + * Test that interrupts don't produce spurious coverage.
> + * The coverage callback filters out interrupt code, but only
> + * after the handler updates preempt count. Some code periodically
> + * leaks out of that section and leads to spurious coverage.
> + * It's hard to call the actual interrupt handler directly,
> + * so we just loop here for a bit waiting for a timer interrupt.
> + * We set kcov_mode to enable tracing, but don't setup the area,
> + * so any attempt to trace will crash. Note: we must not call any
> + * potentially traced functions in this region.
> + */
> + start = jiffies;
> + current->kcov_mode = KCOV_MODE_TRACE_PC;
> + while ((jiffies - start) * MSEC_PER_SEC / HZ < 300)
> + ;
> + current->kcov_mode = 0;
> + pr_err("done running self test\n");
> +}
> +#endif
> +
> static int __init kcov_init(void)
> {
> int cpu;
> @@ -1076,6 +1103,10 @@ static int __init kcov_init(void)
> */
> debugfs_create_file_unsafe("kcov", 0600, NULL, NULL, &kcov_fops);
>
> +#ifdef CONFIG_KCOV_SELFTEST
> + selftest();
> +#endif
> +
> return 0;
> }
>
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index 59b6765d86b8..695a437a52d9 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -2171,6 +2171,14 @@ config KCOV_IRQ_AREA_SIZE
> soft interrupts. This specifies the size of those areas in the
> number of unsigned long words.
>
> +config KCOV_SELFTEST
> + bool "Perform short selftests on boot"
> + depends on KCOV
> + help
> + Run short KCOV coverage collection selftests on boot.
> + On test failure, causes the kernel to panic. Recommended to be
> + enabled, ensuring critical functionality works as intended.
> +
> menuconfig RUNTIME_TESTING_MENU
> bool "Runtime Testing"
> default y
> --
> 2.45.2.505.gda0bf45e8d-goog
>

2024-06-13 23:01:39

by Andrey Konovalov

[permalink] [raw]
Subject: Re: [PATCH v2 2/4] kcov: add interrupt handling self test

On Tue, Jun 11, 2024 at 9:50 AM 'Dmitry Vyukov' via syzkaller
<[email protected]> wrote:
>
> Add a boot self test that can catch sprious coverage from interrupts.
> The coverage callback filters out interrupt code, but only after the
> handler updates preempt count. Some code periodically leaks out
> of that section and leads to spurious coverage.
> Add a best-effort (but simple) test that is likely to catch such bugs.
> If the test is enabled on CI systems that use KCOV, they should catch
> any issues fast.
>
> Signed-off-by: Dmitry Vyukov <[email protected]>
> Reviewed-by: Alexander Potapenko <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
>
> ---
>
> Changed since v1:
> - renamed KCOV_TEST to KCOV_SELFTEST
> - improved the config description
> - loop for exactly 300ms in the test
>
> In my local testing w/o the previous fix,
> it immidiatly produced the following splat:
>
> kcov: running selftest
> BUG: TASK stack guard page was hit at ffffc90000147ff8
> Oops: stack guard page: 0000 [#1] PREEMPT SMP KASAN PTI
> ...
> kvm_set_cpu_l1tf_flush_l1d+0x5/0x20
> sysvec_call_function+0x15/0xb0
> asm_sysvec_call_function+0x1a/0x20
> kcov_init+0xe4/0x130
> do_one_initcall+0xbc/0x470
> kernel_init_freeable+0x4fc/0x930
> kernel_init+0x1c/0x2b0
> ---
> kernel/kcov.c | 31 +++++++++++++++++++++++++++++++
> lib/Kconfig.debug | 8 ++++++++
> 2 files changed, 39 insertions(+)
>
> diff --git a/kernel/kcov.c b/kernel/kcov.c
> index c3124f6d5536..72a5bf55107f 100644
> --- a/kernel/kcov.c
> +++ b/kernel/kcov.c
> @@ -11,6 +11,7 @@
> #include <linux/fs.h>
> #include <linux/hashtable.h>
> #include <linux/init.h>
> +#include <linux/jiffies.h>
> #include <linux/kmsan-checks.h>
> #include <linux/mm.h>
> #include <linux/preempt.h>
> @@ -1057,6 +1058,32 @@ u64 kcov_common_handle(void)
> }
> EXPORT_SYMBOL(kcov_common_handle);
>
> +#ifdef CONFIG_KCOV_SELFTEST
> +static void __init selftest(void)
> +{
> + unsigned long start;
> +
> + pr_err("running self test\n");
> + /*
> + * Test that interrupts don't produce spurious coverage.
> + * The coverage callback filters out interrupt code, but only
> + * after the handler updates preempt count. Some code periodically
> + * leaks out of that section and leads to spurious coverage.
> + * It's hard to call the actual interrupt handler directly,
> + * so we just loop here for a bit waiting for a timer interrupt.
> + * We set kcov_mode to enable tracing, but don't setup the area,
> + * so any attempt to trace will crash. Note: we must not call any
> + * potentially traced functions in this region.
> + */
> + start = jiffies;
> + current->kcov_mode = KCOV_MODE_TRACE_PC;
> + while ((jiffies - start) * MSEC_PER_SEC / HZ < 300)
> + ;
> + current->kcov_mode = 0;
> + pr_err("done running self test\n");
> +}
> +#endif
> +
> static int __init kcov_init(void)
> {
> int cpu;
> @@ -1076,6 +1103,10 @@ static int __init kcov_init(void)
> */
> debugfs_create_file_unsafe("kcov", 0600, NULL, NULL, &kcov_fops);
>
> +#ifdef CONFIG_KCOV_SELFTEST
> + selftest();
> +#endif
> +
> return 0;
> }
>
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index 59b6765d86b8..695a437a52d9 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -2171,6 +2171,14 @@ config KCOV_IRQ_AREA_SIZE
> soft interrupts. This specifies the size of those areas in the
> number of unsigned long words.
>
> +config KCOV_SELFTEST
> + bool "Perform short selftests on boot"
> + depends on KCOV
> + help
> + Run short KCOV coverage collection selftests on boot.
> + On test failure, causes the kernel to panic. Recommended to be

Nit: "causes the kernel to panic" => "causes a kernel panic" or "panic
the kernel"

> + enabled, ensuring critical functionality works as intended.
> +
> menuconfig RUNTIME_TESTING_MENU
> bool "Runtime Testing"
> default y
> --
> 2.45.2.505.gda0bf45e8d-goog

Reviewed-by: Andrey Konovalov <[email protected]>