2023-02-02 05:28:06

by Eric Biggers

[permalink] [raw]
Subject: [PATCH 4.19 00/16] Backport oops_limit to 4.19

This series backports the patchset
"exit: Put an upper limit on how often we can oops"
(https://lore.kernel.org/linux-mm/[email protected]/T/#u)
to 4.19, as recommended at
https://googleprojectzero.blogspot.com/2023/01/exploiting-null-dereferences-in-linux.html
This follows the backports to 5.10 and 5.15 which already released, and
the backport to 5.4 that I just sent out too.

This required backporting various prerequisite patches.

I've tested that oops_limit and warn_limit work correctly on x86_64.

David Gow (1):
mm: kasan: do not panic if both panic_on_warn and kasan_multishot set

Eric W. Biederman (2):
exit: Add and use make_task_dead.
objtool: Add a missing comma to avoid string concatenation

Jann Horn (1):
exit: Put an upper limit on how often we can oops

Kees Cook (7):
exit: Expose "oops_count" to sysfs
exit: Allow oops_limit to be disabled
panic: Consolidate open-coded panic_on_warn checks
panic: Introduce warn_limit
panic: Expose "warn_count" to sysfs
docs: Fix path paste-o for /sys/kernel/warn_count
exit: Use READ_ONCE() for all oops/warn limit reads

Nathan Chancellor (2):
hexagon: Fix function name in die()
h8300: Fix build errors from do_exit() to make_task_dead() transition

Randy Dunlap (1):
ia64: make IA64_MCA_RECOVERY bool instead of tristate

Tiezhu Yang (1):
panic: unset panic_on_warn inside panic()

Xiaoming Ni (1):
sysctl: add a new register_sysctl_init() interface

.../ABI/testing/sysfs-kernel-oops_count | 6 ++
.../ABI/testing/sysfs-kernel-warn_count | 6 ++
Documentation/sysctl/kernel.txt | 20 +++++
arch/alpha/kernel/traps.c | 6 +-
arch/alpha/mm/fault.c | 2 +-
arch/arm/kernel/traps.c | 2 +-
arch/arm/mm/fault.c | 2 +-
arch/arm64/kernel/traps.c | 2 +-
arch/arm64/mm/fault.c | 2 +-
arch/h8300/kernel/traps.c | 3 +-
arch/h8300/mm/fault.c | 2 +-
arch/hexagon/kernel/traps.c | 2 +-
arch/ia64/Kconfig | 2 +-
arch/ia64/kernel/mca_drv.c | 2 +-
arch/ia64/kernel/traps.c | 2 +-
arch/ia64/mm/fault.c | 2 +-
arch/m68k/kernel/traps.c | 2 +-
arch/m68k/mm/fault.c | 2 +-
arch/microblaze/kernel/exceptions.c | 4 +-
arch/mips/kernel/traps.c | 2 +-
arch/nds32/kernel/traps.c | 8 +-
arch/nios2/kernel/traps.c | 4 +-
arch/openrisc/kernel/traps.c | 2 +-
arch/parisc/kernel/traps.c | 2 +-
arch/powerpc/kernel/traps.c | 2 +-
arch/riscv/kernel/traps.c | 2 +-
arch/riscv/mm/fault.c | 2 +-
arch/s390/kernel/dumpstack.c | 2 +-
arch/s390/kernel/nmi.c | 2 +-
arch/sh/kernel/traps.c | 2 +-
arch/sparc/kernel/traps_32.c | 4 +-
arch/sparc/kernel/traps_64.c | 4 +-
arch/x86/entry/entry_32.S | 6 +-
arch/x86/entry/entry_64.S | 6 +-
arch/x86/kernel/dumpstack.c | 4 +-
arch/xtensa/kernel/traps.c | 2 +-
fs/proc/proc_sysctl.c | 33 ++++++++
include/linux/kernel.h | 1 +
include/linux/sched/task.h | 1 +
include/linux/sysctl.h | 3 +
kernel/exit.c | 72 ++++++++++++++++++
kernel/panic.c | 75 ++++++++++++++++---
kernel/sched/core.c | 3 +-
mm/kasan/report.c | 4 +-
tools/objtool/check.c | 3 +-
45 files changed, 258 insertions(+), 64 deletions(-)
create mode 100644 Documentation/ABI/testing/sysfs-kernel-oops_count
create mode 100644 Documentation/ABI/testing/sysfs-kernel-warn_count


base-commit: b17faf2c4e88ac0deb894f068bda67ace57e9c0a
--
2.39.1



2023-02-02 05:28:12

by Eric Biggers

[permalink] [raw]
Subject: [PATCH 4.19 05/16] objtool: Add a missing comma to avoid string concatenation

From: "Eric W. Biederman" <[email protected]>

commit 1fb466dff904e4a72282af336f2c355f011eec61 upstream.

Recently the kbuild robot reported two new errors:

>> lib/kunit/kunit-example-test.o: warning: objtool: .text.unlikely: unexpected end of section
>> arch/x86/kernel/dumpstack.o: warning: objtool: oops_end() falls through to next function show_opcodes()

I don't know why they did not occur in my test setup but after digging
it I realized I had accidentally dropped a comma in
tools/objtool/check.c when I renamed rewind_stack_do_exit to
rewind_stack_and_make_dead.

Add that comma back to fix objtool errors.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 0e25498f8cd4 ("exit: Add and use make_task_dead.")
Reported-by: kernel test robot <[email protected]>
Signed-off-by: "Eric W. Biederman" <[email protected]>
Signed-off-by: Eric Biggers <[email protected]>
---
tools/objtool/check.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/objtool/check.c b/tools/objtool/check.c
index 0477190557107..55f3e7fa1c5d0 100644
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -168,7 +168,7 @@ static int __dead_end_function(struct objtool_file *file, struct symbol *func,
"fortify_panic",
"usercopy_abort",
"machine_real_restart",
- "rewind_stack_and_make_dead"
+ "rewind_stack_and_make_dead",
};

if (func->bind == STB_WEAK)
--
2.39.1


2023-02-02 05:28:16

by Eric Biggers

[permalink] [raw]
Subject: [PATCH 4.19 07/16] h8300: Fix build errors from do_exit() to make_task_dead() transition

From: Nathan Chancellor <[email protected]>

commit ab4ababdf77ccc56c7301c751dff49c79709c51c upstream.

When building ARCH=h8300 defconfig:

arch/h8300/kernel/traps.c: In function 'die':
arch/h8300/kernel/traps.c:109:2: error: implicit declaration of function
'make_dead_task' [-Werror=implicit-function-declaration]
109 | make_dead_task(SIGSEGV);
| ^~~~~~~~~~~~~~

arch/h8300/mm/fault.c: In function 'do_page_fault':
arch/h8300/mm/fault.c:54:2: error: implicit declaration of function
'make_dead_task' [-Werror=implicit-function-declaration]
54 | make_dead_task(SIGKILL);
| ^~~~~~~~~~~~~~

The function's name is make_task_dead(), change it so there is no more
build error.

Additionally, include linux/sched/task.h in arch/h8300/kernel/traps.c
to avoid the same error because do_exit()'s declaration is in kernel.h
but make_task_dead()'s is in task.h, which is not included in traps.c.

Fixes: 0e25498f8cd4 ("exit: Add and use make_task_dead.")
Signed-off-by: Nathan Chancellor <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Eric W. Biederman <[email protected]>
Signed-off-by: Eric Biggers <[email protected]>
---
arch/h8300/kernel/traps.c | 3 ++-
arch/h8300/mm/fault.c | 2 +-
2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/h8300/kernel/traps.c b/arch/h8300/kernel/traps.c
index a284c126f07a6..090adaee4b84c 100644
--- a/arch/h8300/kernel/traps.c
+++ b/arch/h8300/kernel/traps.c
@@ -17,6 +17,7 @@
#include <linux/types.h>
#include <linux/sched.h>
#include <linux/sched/debug.h>
+#include <linux/sched/task.h>
#include <linux/mm_types.h>
#include <linux/kernel.h>
#include <linux/errno.h>
@@ -110,7 +111,7 @@ void die(const char *str, struct pt_regs *fp, unsigned long err)
dump(fp);

spin_unlock_irq(&die_lock);
- make_dead_task(SIGSEGV);
+ make_task_dead(SIGSEGV);
}

static int kstack_depth_to_print = 24;
diff --git a/arch/h8300/mm/fault.c b/arch/h8300/mm/fault.c
index a8d8fc63780e4..573825c3cb708 100644
--- a/arch/h8300/mm/fault.c
+++ b/arch/h8300/mm/fault.c
@@ -52,7 +52,7 @@ asmlinkage int do_page_fault(struct pt_regs *regs, unsigned long address,
printk(" at virtual address %08lx\n", address);
if (!user_mode(regs))
die("Oops", regs, error_code);
- make_dead_task(SIGKILL);
+ make_task_dead(SIGKILL);

return 1;
}
--
2.39.1


2023-02-02 05:28:20

by Eric Biggers

[permalink] [raw]
Subject: [PATCH 4.19 02/16] panic: unset panic_on_warn inside panic()

From: Tiezhu Yang <[email protected]>

commit 1a2383e8b84c0451fd9b1eec3b9aab16f30b597c upstream.

In the current code, the following three places need to unset
panic_on_warn before calling panic() to avoid recursive panics:

kernel/kcsan/report.c: print_report()
kernel/sched/core.c: __schedule_bug()
mm/kfence/report.c: kfence_report_error()

In order to avoid copy-pasting "panic_on_warn = 0" all over the places,
it is better to move it inside panic() and then remove it from the other
places.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Tiezhu Yang <[email protected]>
Reviewed-by: Marco Elver <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Xuefeng Li <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Eric Biggers <[email protected]>
---
kernel/panic.c | 20 +++++++++++---------
1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/kernel/panic.c b/kernel/panic.c
index 8138a676fb7d1..a078d413042f2 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -142,6 +142,16 @@ void panic(const char *fmt, ...)
int old_cpu, this_cpu;
bool _crash_kexec_post_notifiers = crash_kexec_post_notifiers;

+ if (panic_on_warn) {
+ /*
+ * This thread may hit another WARN() in the panic path.
+ * Resetting this prevents additional WARN() from panicking the
+ * system on this thread. Other threads are blocked by the
+ * panic_mutex in panic().
+ */
+ panic_on_warn = 0;
+ }
+
/*
* Disable local interrupts. This will prevent panic_smp_self_stop
* from deadlocking the first cpu that invokes the panic, since
@@ -530,16 +540,8 @@ void __warn(const char *file, int line, void *caller, unsigned taint,
if (args)
vprintk(args->fmt, args->args);

- if (panic_on_warn) {
- /*
- * This thread may hit another WARN() in the panic path.
- * Resetting this prevents additional WARN() from panicking the
- * system on this thread. Other threads are blocked by the
- * panic_mutex in panic().
- */
- panic_on_warn = 0;
+ if (panic_on_warn)
panic("panic_on_warn set ...\n");
- }

print_modules();

--
2.39.1


2023-02-02 05:28:25

by Eric Biggers

[permalink] [raw]
Subject: [PATCH 4.19 01/16] sysctl: add a new register_sysctl_init() interface

From: Xiaoming Ni <[email protected]>

commit 3ddd9a808cee7284931312f2f3e854c9617f44b2 upstream.

Patch series "sysctl: first set of kernel/sysctl cleanups", v2.

Finally had time to respin the series of the work we had started last
year on cleaning up the kernel/sysct.c kitchen sink. People keeps
stuffing their sysctls in that file and this creates a maintenance
burden. So this effort is aimed at placing sysctls where they actually
belong.

I'm going to split patches up into series as there is quite a bit of
work.

This first set adds register_sysctl_init() for uses of registerting a
sysctl on the init path, adds const where missing to a few places,
generalizes common values so to be more easy to share, and starts the
move of a few kernel/sysctl.c out where they belong.

The majority of rework on v2 in this first patch set is 0-day fixes.
Eric Biederman's feedback is later addressed in subsequent patch sets.

I'll only post the first two patch sets for now. We can address the
rest once the first two patch sets get completely reviewed / Acked.

This patch (of 9):

The kernel/sysctl.c is a kitchen sink where everyone leaves their dirty
dishes, this makes it very difficult to maintain.

To help with this maintenance let's start by moving sysctls to places
where they actually belong. The proc sysctl maintainers do not want to
know what sysctl knobs you wish to add for your own piece of code, we
just care about the core logic.

Today though folks heavily rely on tables on kernel/sysctl.c so they can
easily just extend this table with their needed sysctls. In order to
help users move their sysctls out we need to provide a helper which can
be used during code initialization.

We special-case the initialization use of register_sysctl() since it
*is* safe to fail, given all that sysctls do is provide a dynamic
interface to query or modify at runtime an existing variable. So the
use case of register_sysctl() on init should *not* stop if the sysctls
don't end up getting registered. It would be counter productive to stop
boot if a simple sysctl registration failed.

Provide a helper for init then, and document the recommended init levels
to use for callers of this routine. We will later use this in
subsequent patches to start slimming down kernel/sysctl.c tables and
moving sysctl registration to the code which actually needs these
sysctls.

[[email protected]: major commit log and documentation rephrasing also moved to fs/proc/proc_sysctl.c ]

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Xiaoming Ni <[email protected]>
Signed-off-by: Luis Chamberlain <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Cc: Iurii Zaikin <[email protected]>
Cc: "Eric W. Biederman" <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Paul Turner <[email protected]>
Cc: Andy Shevchenko <[email protected]>
Cc: Sebastian Reichel <[email protected]>
Cc: Tetsuo Handa <[email protected]>
Cc: Petr Mladek <[email protected]>
Cc: Sergey Senozhatsky <[email protected]>
Cc: Qing Wang <[email protected]>
Cc: Benjamin LaHaise <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Amir Goldstein <[email protected]>
Cc: Stephen Kitt <[email protected]>
Cc: Antti Palosaari <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Clemens Ladisch <[email protected]>
Cc: David Airlie <[email protected]>
Cc: Jani Nikula <[email protected]>
Cc: Joel Becker <[email protected]>
Cc: Joonas Lahtinen <[email protected]>
Cc: Joseph Qi <[email protected]>
Cc: Julia Lawall <[email protected]>
Cc: Lukas Middendorf <[email protected]>
Cc: Mark Fasheh <[email protected]>
Cc: Phillip Potter <[email protected]>
Cc: Rodrigo Vivi <[email protected]>
Cc: Douglas Gilbert <[email protected]>
Cc: James E.J. Bottomley <[email protected]>
Cc: Jani Nikula <[email protected]>
Cc: John Ogness <[email protected]>
Cc: Martin K. Petersen <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Steven Rostedt (VMware) <[email protected]>
Cc: Suren Baghdasaryan <[email protected]>
Cc: "Theodore Ts'o" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Eric Biggers <[email protected]>
---
fs/proc/proc_sysctl.c | 33 +++++++++++++++++++++++++++++++++
include/linux/sysctl.h | 3 +++
2 files changed, 36 insertions(+)

diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index c95f32b83a942..7c62a526506c1 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -13,6 +13,7 @@
#include <linux/namei.h>
#include <linux/mm.h>
#include <linux/module.h>
+#include <linux/kmemleak.h>
#include "internal.h"

static const struct dentry_operations proc_sys_dentry_operations;
@@ -1376,6 +1377,38 @@ struct ctl_table_header *register_sysctl(const char *path, struct ctl_table *tab
}
EXPORT_SYMBOL(register_sysctl);

+/**
+ * __register_sysctl_init() - register sysctl table to path
+ * @path: path name for sysctl base
+ * @table: This is the sysctl table that needs to be registered to the path
+ * @table_name: The name of sysctl table, only used for log printing when
+ * registration fails
+ *
+ * The sysctl interface is used by userspace to query or modify at runtime
+ * a predefined value set on a variable. These variables however have default
+ * values pre-set. Code which depends on these variables will always work even
+ * if register_sysctl() fails. If register_sysctl() fails you'd just loose the
+ * ability to query or modify the sysctls dynamically at run time. Chances of
+ * register_sysctl() failing on init are extremely low, and so for both reasons
+ * this function does not return any error as it is used by initialization code.
+ *
+ * Context: Can only be called after your respective sysctl base path has been
+ * registered. So for instance, most base directories are registered early on
+ * init before init levels are processed through proc_sys_init() and
+ * sysctl_init().
+ */
+void __init __register_sysctl_init(const char *path, struct ctl_table *table,
+ const char *table_name)
+{
+ struct ctl_table_header *hdr = register_sysctl(path, table);
+
+ if (unlikely(!hdr)) {
+ pr_err("failed when register_sysctl %s to %s\n", table_name, path);
+ return;
+ }
+ kmemleak_not_leak(hdr);
+}
+
static char *append_path(const char *path, char *pos, const char *name)
{
int namelen;
diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
index b769ecfcc3bd4..0a980aecc8f02 100644
--- a/include/linux/sysctl.h
+++ b/include/linux/sysctl.h
@@ -198,6 +198,9 @@ struct ctl_table_header *register_sysctl_paths(const struct ctl_path *path,
void unregister_sysctl_table(struct ctl_table_header * table);

extern int sysctl_init(void);
+extern void __register_sysctl_init(const char *path, struct ctl_table *table,
+ const char *table_name);
+#define register_sysctl_init(path, table) __register_sysctl_init(path, table, #table)

extern struct ctl_table sysctl_mount_point[];

--
2.39.1


2023-02-02 05:28:29

by Eric Biggers

[permalink] [raw]
Subject: [PATCH 4.19 06/16] hexagon: Fix function name in die()

From: Nathan Chancellor <[email protected]>

commit 4f0712ccec09c071e221242a2db9a6779a55a949 upstream.

When building ARCH=hexagon defconfig:

arch/hexagon/kernel/traps.c:217:2: error: implicit declaration of
function 'make_dead_task' [-Werror,-Wimplicit-function-declaration]
make_dead_task(err);
^

The function's name is make_task_dead(), change it so there is no more
build error.

Fixes: 0e25498f8cd4 ("exit: Add and use make_task_dead.")
Signed-off-by: Nathan Chancellor <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Eric W. Biederman <[email protected]>
Signed-off-by: Eric Biggers <[email protected]>
---
arch/hexagon/kernel/traps.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/hexagon/kernel/traps.c b/arch/hexagon/kernel/traps.c
index ac8154992f3eb..34a74f73f1690 100644
--- a/arch/hexagon/kernel/traps.c
+++ b/arch/hexagon/kernel/traps.c
@@ -234,7 +234,7 @@ int die(const char *str, struct pt_regs *regs, long err)
panic("Fatal exception");

oops_exit();
- make_dead_task(err);
+ make_task_dead(err);
return 0;
}

--
2.39.1


2023-02-02 05:28:33

by Eric Biggers

[permalink] [raw]
Subject: [PATCH 4.19 03/16] mm: kasan: do not panic if both panic_on_warn and kasan_multishot set

From: David Gow <[email protected]>

commit be4f1ae978ffe98cc95ec49ceb95386fb4474974 upstream.

KASAN errors will currently trigger a panic when panic_on_warn is set.
This renders kasan_multishot useless, as further KASAN errors won't be
reported if the kernel has already paniced. By making kasan_multishot
disable this behaviour for KASAN errors, we can still have the benefits of
panic_on_warn for non-KASAN warnings, yet be able to use kasan_multishot.

This is particularly important when running KASAN tests, which need to
trigger multiple KASAN errors: previously these would panic the system if
panic_on_warn was set, now they can run (and will panic the system should
non-KASAN warnings show up).

Signed-off-by: David Gow <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Tested-by: Andrey Konovalov <[email protected]>
Reviewed-by: Andrey Konovalov <[email protected]>
Reviewed-by: Brendan Higgins <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Juri Lelli <[email protected]>
Cc: Patricia Alfonso <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: Vincent Guittot <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Eric Biggers <[email protected]>
---
mm/kasan/report.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/kasan/report.c b/mm/kasan/report.c
index 5c169aa688fde..90fdb261a5e2d 100644
--- a/mm/kasan/report.c
+++ b/mm/kasan/report.c
@@ -176,7 +176,7 @@ static void kasan_end_report(unsigned long *flags)
pr_err("==================================================================\n");
add_taint(TAINT_BAD_PAGE, LOCKDEP_NOW_UNRELIABLE);
spin_unlock_irqrestore(&report_lock, *flags);
- if (panic_on_warn)
+ if (panic_on_warn && !test_bit(KASAN_BIT_MULTI_SHOT, &kasan_flags))
panic("panic_on_warn set ...\n");
kasan_enable_current();
}
--
2.39.1


2023-02-02 05:28:39

by Eric Biggers

[permalink] [raw]
Subject: [PATCH 4.19 04/16] exit: Add and use make_task_dead.

From: "Eric W. Biederman" <[email protected]>

commit 0e25498f8cd43c1b5aa327f373dd094e9a006da7 upstream.

There are two big uses of do_exit. The first is it's design use to be
the guts of the exit(2) system call. The second use is to terminate
a task after something catastrophic has happened like a NULL pointer
in kernel code.

Add a function make_task_dead that is initialy exactly the same as
do_exit to cover the cases where do_exit is called to handle
catastrophic failure. In time this can probably be reduced to just a
light wrapper around do_task_dead. For now keep it exactly the same so
that there will be no behavioral differences introducing this new
concept.

Replace all of the uses of do_exit that use it for catastraphic
task cleanup with make_task_dead to make it clear what the code
is doing.

As part of this rename rewind_stack_do_exit
rewind_stack_and_make_dead.

Signed-off-by: "Eric W. Biederman" <[email protected]>
Signed-off-by: Eric Biggers <[email protected]>
---
arch/alpha/kernel/traps.c | 6 +++---
arch/alpha/mm/fault.c | 2 +-
arch/arm/kernel/traps.c | 2 +-
arch/arm/mm/fault.c | 2 +-
arch/arm64/kernel/traps.c | 2 +-
arch/arm64/mm/fault.c | 2 +-
arch/h8300/kernel/traps.c | 2 +-
arch/h8300/mm/fault.c | 2 +-
arch/hexagon/kernel/traps.c | 2 +-
arch/ia64/kernel/mca_drv.c | 2 +-
arch/ia64/kernel/traps.c | 2 +-
arch/ia64/mm/fault.c | 2 +-
arch/m68k/kernel/traps.c | 2 +-
arch/m68k/mm/fault.c | 2 +-
arch/microblaze/kernel/exceptions.c | 4 ++--
arch/mips/kernel/traps.c | 2 +-
arch/nds32/kernel/traps.c | 8 ++++----
arch/nios2/kernel/traps.c | 4 ++--
arch/openrisc/kernel/traps.c | 2 +-
arch/parisc/kernel/traps.c | 2 +-
arch/powerpc/kernel/traps.c | 2 +-
arch/riscv/kernel/traps.c | 2 +-
arch/riscv/mm/fault.c | 2 +-
arch/s390/kernel/dumpstack.c | 2 +-
arch/s390/kernel/nmi.c | 2 +-
arch/sh/kernel/traps.c | 2 +-
arch/sparc/kernel/traps_32.c | 4 +---
arch/sparc/kernel/traps_64.c | 4 +---
arch/x86/entry/entry_32.S | 6 +++---
arch/x86/entry/entry_64.S | 6 +++---
arch/x86/kernel/dumpstack.c | 4 ++--
arch/xtensa/kernel/traps.c | 2 +-
include/linux/sched/task.h | 1 +
kernel/exit.c | 9 +++++++++
tools/objtool/check.c | 3 ++-
35 files changed, 56 insertions(+), 49 deletions(-)

diff --git a/arch/alpha/kernel/traps.c b/arch/alpha/kernel/traps.c
index bc9627698796e..22f5c27b96b7c 100644
--- a/arch/alpha/kernel/traps.c
+++ b/arch/alpha/kernel/traps.c
@@ -192,7 +192,7 @@ die_if_kernel(char * str, struct pt_regs *regs, long err, unsigned long *r9_15)
local_irq_enable();
while (1);
}
- do_exit(SIGSEGV);
+ make_task_dead(SIGSEGV);
}

#ifndef CONFIG_MATHEMU
@@ -577,7 +577,7 @@ do_entUna(void * va, unsigned long opcode, unsigned long reg,

printk("Bad unaligned kernel access at %016lx: %p %lx %lu\n",
pc, va, opcode, reg);
- do_exit(SIGSEGV);
+ make_task_dead(SIGSEGV);

got_exception:
/* Ok, we caught the exception, but we don't want it. Is there
@@ -632,7 +632,7 @@ do_entUna(void * va, unsigned long opcode, unsigned long reg,
local_irq_enable();
while (1);
}
- do_exit(SIGSEGV);
+ make_task_dead(SIGSEGV);
}

/*
diff --git a/arch/alpha/mm/fault.c b/arch/alpha/mm/fault.c
index 188fc9256baf1..6ce67b05412ec 100644
--- a/arch/alpha/mm/fault.c
+++ b/arch/alpha/mm/fault.c
@@ -206,7 +206,7 @@ do_page_fault(unsigned long address, unsigned long mmcsr,
printk(KERN_ALERT "Unable to handle kernel paging request at "
"virtual address %016lx\n", address);
die_if_kernel("Oops", regs, cause, (unsigned long*)regs - 16);
- do_exit(SIGKILL);
+ make_task_dead(SIGKILL);

/* We ran out of memory, or some other thing happened to us that
made us unable to handle the page fault gracefully. */
diff --git a/arch/arm/kernel/traps.c b/arch/arm/kernel/traps.c
index d49fafd2b865f..02a4aea52fb05 100644
--- a/arch/arm/kernel/traps.c
+++ b/arch/arm/kernel/traps.c
@@ -344,7 +344,7 @@ static void oops_end(unsigned long flags, struct pt_regs *regs, int signr)
if (panic_on_oops)
panic("Fatal exception");
if (signr)
- do_exit(signr);
+ make_task_dead(signr);
}

/*
diff --git a/arch/arm/mm/fault.c b/arch/arm/mm/fault.c
index a9ee0d9dc740a..8457384139cb8 100644
--- a/arch/arm/mm/fault.c
+++ b/arch/arm/mm/fault.c
@@ -149,7 +149,7 @@ __do_kernel_fault(struct mm_struct *mm, unsigned long addr, unsigned int fsr,
show_pte(mm, addr);
die("Oops", regs, fsr);
bust_spinlocks(0);
- do_exit(SIGKILL);
+ make_task_dead(SIGKILL);
}

/*
diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
index 965595fe68045..20f896b27a447 100644
--- a/arch/arm64/kernel/traps.c
+++ b/arch/arm64/kernel/traps.c
@@ -224,7 +224,7 @@ void die(const char *str, struct pt_regs *regs, int err)
raw_spin_unlock_irqrestore(&die_lock, flags);

if (ret != NOTIFY_STOP)
- do_exit(SIGSEGV);
+ make_task_dead(SIGSEGV);
}

static bool show_unhandled_signals_ratelimited(void)
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index b046006a387ff..0d2be8eb87ec8 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -268,7 +268,7 @@ static void die_kernel_fault(const char *msg, unsigned long addr,
show_pte(addr);
die("Oops", regs, esr);
bust_spinlocks(0);
- do_exit(SIGKILL);
+ make_task_dead(SIGKILL);
}

static void __do_kernel_fault(unsigned long addr, unsigned int esr,
diff --git a/arch/h8300/kernel/traps.c b/arch/h8300/kernel/traps.c
index e47a9e0dc278f..a284c126f07a6 100644
--- a/arch/h8300/kernel/traps.c
+++ b/arch/h8300/kernel/traps.c
@@ -110,7 +110,7 @@ void die(const char *str, struct pt_regs *fp, unsigned long err)
dump(fp);

spin_unlock_irq(&die_lock);
- do_exit(SIGSEGV);
+ make_dead_task(SIGSEGV);
}

static int kstack_depth_to_print = 24;
diff --git a/arch/h8300/mm/fault.c b/arch/h8300/mm/fault.c
index fabffb83930af..a8d8fc63780e4 100644
--- a/arch/h8300/mm/fault.c
+++ b/arch/h8300/mm/fault.c
@@ -52,7 +52,7 @@ asmlinkage int do_page_fault(struct pt_regs *regs, unsigned long address,
printk(" at virtual address %08lx\n", address);
if (!user_mode(regs))
die("Oops", regs, error_code);
- do_exit(SIGKILL);
+ make_dead_task(SIGKILL);

return 1;
}
diff --git a/arch/hexagon/kernel/traps.c b/arch/hexagon/kernel/traps.c
index 91ee04842c22c..ac8154992f3eb 100644
--- a/arch/hexagon/kernel/traps.c
+++ b/arch/hexagon/kernel/traps.c
@@ -234,7 +234,7 @@ int die(const char *str, struct pt_regs *regs, long err)
panic("Fatal exception");

oops_exit();
- do_exit(err);
+ make_dead_task(err);
return 0;
}

diff --git a/arch/ia64/kernel/mca_drv.c b/arch/ia64/kernel/mca_drv.c
index 06419a95af309..05885ad68cccd 100644
--- a/arch/ia64/kernel/mca_drv.c
+++ b/arch/ia64/kernel/mca_drv.c
@@ -176,7 +176,7 @@ mca_handler_bh(unsigned long paddr, void *iip, unsigned long ipsr)
spin_unlock(&mca_bh_lock);

/* This process is about to be killed itself */
- do_exit(SIGKILL);
+ make_task_dead(SIGKILL);
}

/**
diff --git a/arch/ia64/kernel/traps.c b/arch/ia64/kernel/traps.c
index c6f4932073a18..32c2877b3c0ad 100644
--- a/arch/ia64/kernel/traps.c
+++ b/arch/ia64/kernel/traps.c
@@ -85,7 +85,7 @@ die (const char *str, struct pt_regs *regs, long err)
if (panic_on_oops)
panic("Fatal exception");

- do_exit(SIGSEGV);
+ make_task_dead(SIGSEGV);
return 0;
}

diff --git a/arch/ia64/mm/fault.c b/arch/ia64/mm/fault.c
index a9d55ad8d67be..0b072af7e20d8 100644
--- a/arch/ia64/mm/fault.c
+++ b/arch/ia64/mm/fault.c
@@ -302,7 +302,7 @@ ia64_do_page_fault (unsigned long address, unsigned long isr, struct pt_regs *re
regs = NULL;
bust_spinlocks(0);
if (regs)
- do_exit(SIGKILL);
+ make_task_dead(SIGKILL);
return;

out_of_memory:
diff --git a/arch/m68k/kernel/traps.c b/arch/m68k/kernel/traps.c
index b2fd000b92857..9b70a7f5e7058 100644
--- a/arch/m68k/kernel/traps.c
+++ b/arch/m68k/kernel/traps.c
@@ -1139,7 +1139,7 @@ void die_if_kernel (char *str, struct pt_regs *fp, int nr)
pr_crit("%s: %08x\n", str, nr);
show_registers(fp);
add_taint(TAINT_DIE, LOCKDEP_NOW_UNRELIABLE);
- do_exit(SIGSEGV);
+ make_task_dead(SIGSEGV);
}

asmlinkage void set_esp0(unsigned long ssp)
diff --git a/arch/m68k/mm/fault.c b/arch/m68k/mm/fault.c
index 9b6163c05a754..fc3aea6dcaa77 100644
--- a/arch/m68k/mm/fault.c
+++ b/arch/m68k/mm/fault.c
@@ -48,7 +48,7 @@ int send_fault_sig(struct pt_regs *regs)
pr_alert("Unable to handle kernel access");
pr_cont(" at virtual address %p\n", addr);
die_if_kernel("Oops", regs, 0 /*error_code*/);
- do_exit(SIGKILL);
+ make_task_dead(SIGKILL);
}

return 1;
diff --git a/arch/microblaze/kernel/exceptions.c b/arch/microblaze/kernel/exceptions.c
index eafff21fcb0e6..182402db6b043 100644
--- a/arch/microblaze/kernel/exceptions.c
+++ b/arch/microblaze/kernel/exceptions.c
@@ -44,10 +44,10 @@ void die(const char *str, struct pt_regs *fp, long err)
pr_warn("Oops: %s, sig: %ld\n", str, err);
show_regs(fp);
spin_unlock_irq(&die_lock);
- /* do_exit() should take care of panic'ing from an interrupt
+ /* make_task_dead() should take care of panic'ing from an interrupt
* context so we don't handle it here
*/
- do_exit(err);
+ make_task_dead(err);
}

/* for user application debugging */
diff --git a/arch/mips/kernel/traps.c b/arch/mips/kernel/traps.c
index 0ca4185cc5e38..01c85d37c47ab 100644
--- a/arch/mips/kernel/traps.c
+++ b/arch/mips/kernel/traps.c
@@ -412,7 +412,7 @@ void __noreturn die(const char *str, struct pt_regs *regs)
if (regs && kexec_should_crash(current))
crash_kexec(regs);

- do_exit(sig);
+ make_task_dead(sig);
}

extern struct exception_table_entry __start___dbe_table[];
diff --git a/arch/nds32/kernel/traps.c b/arch/nds32/kernel/traps.c
index 1496aab489988..a811f286bc85c 100644
--- a/arch/nds32/kernel/traps.c
+++ b/arch/nds32/kernel/traps.c
@@ -183,7 +183,7 @@ void die(const char *str, struct pt_regs *regs, int err)

bust_spinlocks(0);
spin_unlock_irq(&die_lock);
- do_exit(SIGSEGV);
+ make_task_dead(SIGSEGV);
}

EXPORT_SYMBOL(die);
@@ -286,7 +286,7 @@ void unhandled_interruption(struct pt_regs *regs)
pr_emerg("unhandled_interruption\n");
show_regs(regs);
if (!user_mode(regs))
- do_exit(SIGKILL);
+ make_task_dead(SIGKILL);
force_sig(SIGKILL, current);
}

@@ -297,7 +297,7 @@ void unhandled_exceptions(unsigned long entry, unsigned long addr,
addr, type);
show_regs(regs);
if (!user_mode(regs))
- do_exit(SIGKILL);
+ make_task_dead(SIGKILL);
force_sig(SIGKILL, current);
}

@@ -324,7 +324,7 @@ void do_revinsn(struct pt_regs *regs)
pr_emerg("Reserved Instruction\n");
show_regs(regs);
if (!user_mode(regs))
- do_exit(SIGILL);
+ make_task_dead(SIGILL);
force_sig(SIGILL, current);
}

diff --git a/arch/nios2/kernel/traps.c b/arch/nios2/kernel/traps.c
index 3bc3cd22b750e..dc6c270a355b6 100644
--- a/arch/nios2/kernel/traps.c
+++ b/arch/nios2/kernel/traps.c
@@ -37,10 +37,10 @@ void die(const char *str, struct pt_regs *regs, long err)
show_regs(regs);
spin_unlock_irq(&die_lock);
/*
- * do_exit() should take care of panic'ing from an interrupt
+ * make_task_dead() should take care of panic'ing from an interrupt
* context so we don't handle it here
*/
- do_exit(err);
+ make_task_dead(err);
}

void _exception(int signo, struct pt_regs *regs, int code, unsigned long addr)
diff --git a/arch/openrisc/kernel/traps.c b/arch/openrisc/kernel/traps.c
index d8981cbb852a5..dfa3db1bccef4 100644
--- a/arch/openrisc/kernel/traps.c
+++ b/arch/openrisc/kernel/traps.c
@@ -224,7 +224,7 @@ void die(const char *str, struct pt_regs *regs, long err)
__asm__ __volatile__("l.nop 1");
do {} while (1);
#endif
- do_exit(SIGSEGV);
+ make_task_dead(SIGSEGV);
}

/* This is normally the 'Oops' routine */
diff --git a/arch/parisc/kernel/traps.c b/arch/parisc/kernel/traps.c
index d7a66d8525091..5d9cf726e4ecd 100644
--- a/arch/parisc/kernel/traps.c
+++ b/arch/parisc/kernel/traps.c
@@ -265,7 +265,7 @@ void die_if_kernel(char *str, struct pt_regs *regs, long err)
panic("Fatal exception");

oops_exit();
- do_exit(SIGSEGV);
+ make_task_dead(SIGSEGV);
}

/* gdb uses break 4,8 */
diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index 2379c4bf3979e..63c751ce130a5 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -251,7 +251,7 @@ static void oops_end(unsigned long flags, struct pt_regs *regs,
panic("Fatal exception in interrupt");
if (panic_on_oops)
panic("Fatal exception");
- do_exit(signr);
+ make_task_dead(signr);
}
NOKPROBE_SYMBOL(oops_end);

diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c
index 7c65750508f25..9b736e616131d 100644
--- a/arch/riscv/kernel/traps.c
+++ b/arch/riscv/kernel/traps.c
@@ -64,7 +64,7 @@ void die(struct pt_regs *regs, const char *str)
if (panic_on_oops)
panic("Fatal exception");
if (ret != NOTIFY_STOP)
- do_exit(SIGSEGV);
+ make_task_dead(SIGSEGV);
}

void do_trap(struct pt_regs *regs, int signo, int code,
diff --git a/arch/riscv/mm/fault.c b/arch/riscv/mm/fault.c
index 523dbfbac03dd..7b5f110f8191a 100644
--- a/arch/riscv/mm/fault.c
+++ b/arch/riscv/mm/fault.c
@@ -200,7 +200,7 @@ asmlinkage void do_page_fault(struct pt_regs *regs)
(addr < PAGE_SIZE) ? "NULL pointer dereference" :
"paging request", addr);
die(regs, "Oops");
- do_exit(SIGKILL);
+ make_task_dead(SIGKILL);

/*
* We ran out of memory, call the OOM killer, and return the userspace
diff --git a/arch/s390/kernel/dumpstack.c b/arch/s390/kernel/dumpstack.c
index 5b23c4f6e50cd..7b21019096f44 100644
--- a/arch/s390/kernel/dumpstack.c
+++ b/arch/s390/kernel/dumpstack.c
@@ -187,5 +187,5 @@ void die(struct pt_regs *regs, const char *str)
if (panic_on_oops)
panic("Fatal exception: panic_on_oops");
oops_exit();
- do_exit(SIGSEGV);
+ make_task_dead(SIGSEGV);
}
diff --git a/arch/s390/kernel/nmi.c b/arch/s390/kernel/nmi.c
index 8c867b43c8ebc..471a965ae75a4 100644
--- a/arch/s390/kernel/nmi.c
+++ b/arch/s390/kernel/nmi.c
@@ -179,7 +179,7 @@ void s390_handle_mcck(void)
"malfunction (code 0x%016lx).\n", mcck.mcck_code);
printk(KERN_EMERG "mcck: task: %s, pid: %d.\n",
current->comm, current->pid);
- do_exit(SIGSEGV);
+ make_task_dead(SIGSEGV);
}
}
EXPORT_SYMBOL_GPL(s390_handle_mcck);
diff --git a/arch/sh/kernel/traps.c b/arch/sh/kernel/traps.c
index 8b49cced663dc..5fafbef7849b1 100644
--- a/arch/sh/kernel/traps.c
+++ b/arch/sh/kernel/traps.c
@@ -57,7 +57,7 @@ void die(const char *str, struct pt_regs *regs, long err)
if (panic_on_oops)
panic("Fatal exception");

- do_exit(SIGSEGV);
+ make_task_dead(SIGSEGV);
}

void die_if_kernel(const char *str, struct pt_regs *regs, long err)
diff --git a/arch/sparc/kernel/traps_32.c b/arch/sparc/kernel/traps_32.c
index bcdfc6168dd58..ec7fcfbad06c0 100644
--- a/arch/sparc/kernel/traps_32.c
+++ b/arch/sparc/kernel/traps_32.c
@@ -86,9 +86,7 @@ void __noreturn die_if_kernel(char *str, struct pt_regs *regs)
}
printk("Instruction DUMP:");
instruction_dump ((unsigned long *) regs->pc);
- if(regs->psr & PSR_PS)
- do_exit(SIGKILL);
- do_exit(SIGSEGV);
+ make_task_dead((regs->psr & PSR_PS) ? SIGKILL : SIGSEGV);
}

void do_hw_interrupt(struct pt_regs *regs, unsigned long type)
diff --git a/arch/sparc/kernel/traps_64.c b/arch/sparc/kernel/traps_64.c
index 86879c28910b7..1a338784a37c8 100644
--- a/arch/sparc/kernel/traps_64.c
+++ b/arch/sparc/kernel/traps_64.c
@@ -2565,9 +2565,7 @@ void __noreturn die_if_kernel(char *str, struct pt_regs *regs)
}
if (panic_on_oops)
panic("Fatal exception");
- if (regs->tstate & TSTATE_PRIV)
- do_exit(SIGKILL);
- do_exit(SIGSEGV);
+ make_task_dead((regs->tstate & TSTATE_PRIV)? SIGKILL : SIGSEGV);
}
EXPORT_SYMBOL(die_if_kernel);

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index 78b308f2f2ea6..e6c258bf95116 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -1500,13 +1500,13 @@ ENTRY(async_page_fault)
END(async_page_fault)
#endif

-ENTRY(rewind_stack_do_exit)
+ENTRY(rewind_stack_and_make_dead)
/* Prevent any naive code from trying to unwind to our caller. */
xorl %ebp, %ebp

movl PER_CPU_VAR(cpu_current_top_of_stack), %esi
leal -TOP_OF_KERNEL_STACK_PADDING-PTREGS_SIZE(%esi), %esp

- call do_exit
+ call make_task_dead
1: jmp 1b
-END(rewind_stack_do_exit)
+END(rewind_stack_and_make_dead)
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 3f418aedef8d7..1ebb70694d721 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1762,7 +1762,7 @@ ENTRY(ignore_sysret)
sysret
END(ignore_sysret)

-ENTRY(rewind_stack_do_exit)
+ENTRY(rewind_stack_and_make_dead)
UNWIND_HINT_FUNC
/* Prevent any naive code from trying to unwind to our caller. */
xorl %ebp, %ebp
@@ -1771,5 +1771,5 @@ ENTRY(rewind_stack_do_exit)
leaq -PTREGS_SIZE(%rax), %rsp
UNWIND_HINT_REGS

- call do_exit
-END(rewind_stack_do_exit)
+ call make_task_dead
+END(rewind_stack_and_make_dead)
diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
index 2b5886401e5f4..2b17a5cec0997 100644
--- a/arch/x86/kernel/dumpstack.c
+++ b/arch/x86/kernel/dumpstack.c
@@ -326,7 +326,7 @@ unsigned long oops_begin(void)
}
NOKPROBE_SYMBOL(oops_begin);

-void __noreturn rewind_stack_do_exit(int signr);
+void __noreturn rewind_stack_and_make_dead(int signr);

void oops_end(unsigned long flags, struct pt_regs *regs, int signr)
{
@@ -361,7 +361,7 @@ void oops_end(unsigned long flags, struct pt_regs *regs, int signr)
* reuse the task stack and that existing poisons are invalid.
*/
kasan_unpoison_task_stack(current);
- rewind_stack_do_exit(signr);
+ rewind_stack_and_make_dead(signr);
}
NOKPROBE_SYMBOL(oops_end);

diff --git a/arch/xtensa/kernel/traps.c b/arch/xtensa/kernel/traps.c
index 86507fa7c2d7c..2ae5c505894bd 100644
--- a/arch/xtensa/kernel/traps.c
+++ b/arch/xtensa/kernel/traps.c
@@ -542,5 +542,5 @@ void die(const char * str, struct pt_regs * regs, long err)
if (panic_on_oops)
panic("Fatal exception");

- do_exit(err);
+ make_task_dead(err);
}
diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h
index 91401309b1aa2..0ee9aab3e3091 100644
--- a/include/linux/sched/task.h
+++ b/include/linux/sched/task.h
@@ -36,6 +36,7 @@ extern int sched_fork(unsigned long clone_flags, struct task_struct *p);
extern void sched_dead(struct task_struct *p);

void __noreturn do_task_dead(void);
+void __noreturn make_task_dead(int signr);

extern void proc_caches_init(void);

diff --git a/kernel/exit.c b/kernel/exit.c
index 908e7a33e1fcb..2147bbc2c7c32 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -922,6 +922,15 @@ void __noreturn do_exit(long code)
}
EXPORT_SYMBOL_GPL(do_exit);

+void __noreturn make_task_dead(int signr)
+{
+ /*
+ * Take the task off the cpu after something catastrophic has
+ * happened.
+ */
+ do_exit(signr);
+}
+
void complete_and_exit(struct completion *comp, long code)
{
if (comp)
diff --git a/tools/objtool/check.c b/tools/objtool/check.c
index c0ab27368a345..0477190557107 100644
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -159,6 +159,7 @@ static int __dead_end_function(struct objtool_file *file, struct symbol *func,
"panic",
"do_exit",
"do_task_dead",
+ "make_task_dead",
"__module_put_and_exit",
"complete_and_exit",
"kvm_spurious_fault",
@@ -167,7 +168,7 @@ static int __dead_end_function(struct objtool_file *file, struct symbol *func,
"fortify_panic",
"usercopy_abort",
"machine_real_restart",
- "rewind_stack_do_exit",
+ "rewind_stack_and_make_dead"
};

if (func->bind == STB_WEAK)
--
2.39.1


2023-02-02 05:29:08

by Eric Biggers

[permalink] [raw]
Subject: [PATCH 4.19 10/16] exit: Expose "oops_count" to sysfs

From: Kees Cook <[email protected]>

commit 9db89b41117024f80b38b15954017fb293133364 upstream.

Since Oops count is now tracked and is a fairly interesting signal, add
the entry /sys/kernel/oops_count to expose it to userspace.

Cc: "Eric W. Biederman" <[email protected]>
Cc: Jann Horn <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Reviewed-by: Luis Chamberlain <[email protected]>
Signed-off-by: Kees Cook <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Eric Biggers <[email protected]>
---
.../ABI/testing/sysfs-kernel-oops_count | 6 +++++
kernel/exit.c | 22 +++++++++++++++++--
2 files changed, 26 insertions(+), 2 deletions(-)
create mode 100644 Documentation/ABI/testing/sysfs-kernel-oops_count

diff --git a/Documentation/ABI/testing/sysfs-kernel-oops_count b/Documentation/ABI/testing/sysfs-kernel-oops_count
new file mode 100644
index 0000000000000..156cca9dbc960
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-kernel-oops_count
@@ -0,0 +1,6 @@
+What: /sys/kernel/oops_count
+Date: November 2022
+KernelVersion: 6.2.0
+Contact: Linux Kernel Hardening List <[email protected]>
+Description:
+ Shows how many times the system has Oopsed since last boot.
diff --git a/kernel/exit.c b/kernel/exit.c
index fa3004ff33362..5cd8a34257650 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -62,6 +62,7 @@
#include <linux/random.h>
#include <linux/rcuwait.h>
#include <linux/compat.h>
+#include <linux/sysfs.h>

#include <linux/uaccess.h>
#include <asm/unistd.h>
@@ -95,6 +96,25 @@ static __init int kernel_exit_sysctls_init(void)
late_initcall(kernel_exit_sysctls_init);
#endif

+static atomic_t oops_count = ATOMIC_INIT(0);
+
+#ifdef CONFIG_SYSFS
+static ssize_t oops_count_show(struct kobject *kobj, struct kobj_attribute *attr,
+ char *page)
+{
+ return sysfs_emit(page, "%d\n", atomic_read(&oops_count));
+}
+
+static struct kobj_attribute oops_count_attr = __ATTR_RO(oops_count);
+
+static __init int kernel_exit_sysfs_init(void)
+{
+ sysfs_add_file_to_group(kernel_kobj, &oops_count_attr.attr, NULL);
+ return 0;
+}
+late_initcall(kernel_exit_sysfs_init);
+#endif
+
static void __unhash_process(struct task_struct *p, bool group_dead)
{
nr_threads--;
@@ -951,8 +971,6 @@ EXPORT_SYMBOL_GPL(do_exit);

void __noreturn make_task_dead(int signr)
{
- static atomic_t oops_count = ATOMIC_INIT(0);
-
/*
* Take the task off the cpu after something catastrophic has
* happened.
--
2.39.1


2023-02-02 05:29:08

by Eric Biggers

[permalink] [raw]
Subject: [PATCH 4.19 08/16] ia64: make IA64_MCA_RECOVERY bool instead of tristate

From: Randy Dunlap <[email protected]>

commit dbecf9b8b8ce580f4e11afed9d61e8aa294cddd2 upstream.

In linux-next, IA64_MCA_RECOVERY uses the (new) function
make_task_dead(), which is not exported for use by modules. Instead of
exporting it for one user, convert IA64_MCA_RECOVERY to be a bool
Kconfig symbol.

In a config file from "kernel test robot <[email protected]>" for a
different problem, this linker error was exposed when
CONFIG_IA64_MCA_RECOVERY=m.

Fixes this build error:

ERROR: modpost: "make_task_dead" [arch/ia64/kernel/mca_recovery.ko] undefined!

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 0e25498f8cd4 ("exit: Add and use make_task_dead.")
Signed-off-by: Randy Dunlap <[email protected]>
Suggested-by: Christoph Hellwig <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: "Eric W. Biederman" <[email protected]>
Cc: Tony Luck <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Eric Biggers <[email protected]>
---
arch/ia64/Kconfig | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig
index 8b4a0c1748c03..0d56b19b7511a 100644
--- a/arch/ia64/Kconfig
+++ b/arch/ia64/Kconfig
@@ -445,7 +445,7 @@ config ARCH_PROC_KCORE_TEXT
depends on PROC_KCORE

config IA64_MCA_RECOVERY
- tristate "MCA recovery from errors other than TLB."
+ bool "MCA recovery from errors other than TLB."

config PERFMON
bool "Performance monitor support"
--
2.39.1


2023-02-02 05:29:08

by Eric Biggers

[permalink] [raw]
Subject: [PATCH 4.19 09/16] exit: Put an upper limit on how often we can oops

From: Jann Horn <[email protected]>

commit d4ccd54d28d3c8598e2354acc13e28c060961dbb upstream.

Many Linux systems are configured to not panic on oops; but allowing an
attacker to oops the system **really** often can make even bugs that look
completely unexploitable exploitable (like NULL dereferences and such) if
each crash elevates a refcount by one or a lock is taken in read mode, and
this causes a counter to eventually overflow.

The most interesting counters for this are 32 bits wide (like open-coded
refcounts that don't use refcount_t). (The ldsem reader count on 32-bit
platforms is just 16 bits, but probably nobody cares about 32-bit platforms
that much nowadays.)

So let's panic the system if the kernel is constantly oopsing.

The speed of oopsing 2^32 times probably depends on several factors, like
how long the stack trace is and which unwinder you're using; an empirically
important one is whether your console is showing a graphical environment or
a text console that oopses will be printed to.
In a quick single-threaded benchmark, it looks like oopsing in a vfork()
child with a very short stack trace only takes ~510 microseconds per run
when a graphical console is active; but switching to a text console that
oopses are printed to slows it down around 87x, to ~45 milliseconds per
run.
(Adding more threads makes this faster, but the actual oops printing
happens under &die_lock on x86, so you can maybe speed this up by a factor
of around 2 and then any further improvement gets eaten up by lock
contention.)

It looks like it would take around 8-12 days to overflow a 32-bit counter
with repeated oopsing on a multi-core X86 system running a graphical
environment; both me (in an X86 VM) and Seth (with a distro kernel on
normal hardware in a standard configuration) got numbers in that ballpark.

12 days aren't *that* short on a desktop system, and you'd likely need much
longer on a typical server system (assuming that people don't run graphical
desktop environments on their servers), and this is a *very* noisy and
violent approach to exploiting the kernel; and it also seems to take orders
of magnitude longer on some machines, probably because stuff like EFI
pstore will slow it down a ton if that's active.

Signed-off-by: Jann Horn <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Luis Chamberlain <[email protected]>
Signed-off-by: Kees Cook <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Eric Biggers <[email protected]>
---
Documentation/sysctl/kernel.txt | 9 +++++++
kernel/exit.c | 43 +++++++++++++++++++++++++++++++++
2 files changed, 52 insertions(+)

diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt
index db1676525ca35..fd65f4e651d55 100644
--- a/Documentation/sysctl/kernel.txt
+++ b/Documentation/sysctl/kernel.txt
@@ -51,6 +51,7 @@ show up in /proc/sys/kernel:
- msgmnb
- msgmni
- nmi_watchdog
+- oops_limit
- osrelease
- ostype
- overflowgid
@@ -555,6 +556,14 @@ scanned for a given scan.

==============================================================

+oops_limit:
+
+Number of kernel oopses after which the kernel should panic when
+``panic_on_oops`` is not set. Setting this to 0 or 1 has the same effect
+as setting ``panic_on_oops=1``.
+
+==============================================================
+
osrelease, ostype & version:

# cat osrelease
diff --git a/kernel/exit.c b/kernel/exit.c
index 2147bbc2c7c32..fa3004ff33362 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -68,6 +68,33 @@
#include <asm/pgtable.h>
#include <asm/mmu_context.h>

+/*
+ * The default value should be high enough to not crash a system that randomly
+ * crashes its kernel from time to time, but low enough to at least not permit
+ * overflowing 32-bit refcounts or the ldsem writer count.
+ */
+static unsigned int oops_limit = 10000;
+
+#ifdef CONFIG_SYSCTL
+static struct ctl_table kern_exit_table[] = {
+ {
+ .procname = "oops_limit",
+ .data = &oops_limit,
+ .maxlen = sizeof(oops_limit),
+ .mode = 0644,
+ .proc_handler = proc_douintvec,
+ },
+ { }
+};
+
+static __init int kernel_exit_sysctls_init(void)
+{
+ register_sysctl_init("kernel", kern_exit_table);
+ return 0;
+}
+late_initcall(kernel_exit_sysctls_init);
+#endif
+
static void __unhash_process(struct task_struct *p, bool group_dead)
{
nr_threads--;
@@ -924,10 +951,26 @@ EXPORT_SYMBOL_GPL(do_exit);

void __noreturn make_task_dead(int signr)
{
+ static atomic_t oops_count = ATOMIC_INIT(0);
+
/*
* Take the task off the cpu after something catastrophic has
* happened.
*/
+
+ /*
+ * Every time the system oopses, if the oops happens while a reference
+ * to an object was held, the reference leaks.
+ * If the oops doesn't also leak memory, repeated oopsing can cause
+ * reference counters to wrap around (if they're not using refcount_t).
+ * This means that repeated oopsing can make unexploitable-looking bugs
+ * exploitable through repeated oopsing.
+ * To make sure this can't happen, place an upper bound on how often the
+ * kernel may oops without panic().
+ */
+ if (atomic_inc_return(&oops_count) >= READ_ONCE(oops_limit))
+ panic("Oopsed too often (kernel.oops_limit is %d)", oops_limit);
+
do_exit(signr);
}

--
2.39.1


2023-02-02 05:29:08

by Eric Biggers

[permalink] [raw]
Subject: [PATCH 4.19 12/16] panic: Consolidate open-coded panic_on_warn checks

From: Kees Cook <[email protected]>

commit 79cc1ba7badf9e7a12af99695a557e9ce27ee967 upstream.

Several run-time checkers (KASAN, UBSAN, KFENCE, KCSAN, sched) roll
their own warnings, and each check "panic_on_warn". Consolidate this
into a single function so that future instrumentation can be added in
a single location.

Cc: Marco Elver <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Juri Lelli <[email protected]>
Cc: Vincent Guittot <[email protected]>
Cc: Dietmar Eggemann <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Ben Segall <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Daniel Bristot de Oliveira <[email protected]>
Cc: Valentin Schneider <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Andrey Konovalov <[email protected]>
Cc: Vincenzo Frascino <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: David Gow <[email protected]>
Cc: tangmeng <[email protected]>
Cc: Jann Horn <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: Petr Mladek <[email protected]>
Cc: "Paul E. McKenney" <[email protected]>
Cc: Sebastian Andrzej Siewior <[email protected]>
Cc: "Guilherme G. Piccoli" <[email protected]>
Cc: Tiezhu Yang <[email protected]>
Cc: [email protected]
Cc: [email protected]
Reviewed-by: Luis Chamberlain <[email protected]>
Signed-off-by: Kees Cook <[email protected]>
Reviewed-by: Marco Elver <[email protected]>
Reviewed-by: Andrey Konovalov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Eric Biggers <[email protected]>
---
include/linux/kernel.h | 1 +
kernel/panic.c | 9 +++++++--
kernel/sched/core.c | 3 +--
mm/kasan/report.c | 4 ++--
4 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 50733abbe548e..a28ec4c2f3f5a 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -327,6 +327,7 @@ extern long (*panic_blink)(int state);
__printf(1, 2)
void panic(const char *fmt, ...) __noreturn __cold;
void nmi_panic(struct pt_regs *regs, const char *msg);
+void check_panic_on_warn(const char *origin);
extern void oops_enter(void);
extern void oops_exit(void);
void print_oops_end_marker(void);
diff --git a/kernel/panic.c b/kernel/panic.c
index a078d413042f2..08b8adc55b2bf 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -125,6 +125,12 @@ void nmi_panic(struct pt_regs *regs, const char *msg)
}
EXPORT_SYMBOL(nmi_panic);

+void check_panic_on_warn(const char *origin)
+{
+ if (panic_on_warn)
+ panic("%s: panic_on_warn set ...\n", origin);
+}
+
/**
* panic - halt the system
* @fmt: The text string to print
@@ -540,8 +546,7 @@ void __warn(const char *file, int line, void *caller, unsigned taint,
if (args)
vprintk(args->fmt, args->args);

- if (panic_on_warn)
- panic("panic_on_warn set ...\n");
+ check_panic_on_warn("kernel");

print_modules();

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index a034642497718..46227cc48124d 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3316,8 +3316,7 @@ static noinline void __schedule_bug(struct task_struct *prev)
print_ip_sym(preempt_disable_ip);
pr_cont("\n");
}
- if (panic_on_warn)
- panic("scheduling while atomic\n");
+ check_panic_on_warn("scheduling while atomic");

dump_stack();
add_taint(TAINT_WARN, LOCKDEP_STILL_OK);
diff --git a/mm/kasan/report.c b/mm/kasan/report.c
index 90fdb261a5e2d..29fd88ddddaeb 100644
--- a/mm/kasan/report.c
+++ b/mm/kasan/report.c
@@ -176,8 +176,8 @@ static void kasan_end_report(unsigned long *flags)
pr_err("==================================================================\n");
add_taint(TAINT_BAD_PAGE, LOCKDEP_NOW_UNRELIABLE);
spin_unlock_irqrestore(&report_lock, *flags);
- if (panic_on_warn && !test_bit(KASAN_BIT_MULTI_SHOT, &kasan_flags))
- panic("panic_on_warn set ...\n");
+ if (!test_bit(KASAN_BIT_MULTI_SHOT, &kasan_flags))
+ check_panic_on_warn("KASAN");
kasan_enable_current();
}

--
2.39.1


2023-02-02 05:29:40

by Eric Biggers

[permalink] [raw]
Subject: [PATCH 4.19 13/16] panic: Introduce warn_limit

From: Kees Cook <[email protected]>

commit 9fc9e278a5c0b708eeffaf47d6eb0c82aa74ed78 upstream.

Like oops_limit, add warn_limit for limiting the number of warnings when
panic_on_warn is not set.

Cc: Jonathan Corbet <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Baolin Wang <[email protected]>
Cc: "Jason A. Donenfeld" <[email protected]>
Cc: Eric Biggers <[email protected]>
Cc: Huang Ying <[email protected]>
Cc: Petr Mladek <[email protected]>
Cc: tangmeng <[email protected]>
Cc: "Guilherme G. Piccoli" <[email protected]>
Cc: Tiezhu Yang <[email protected]>
Cc: Sebastian Andrzej Siewior <[email protected]>
Cc: [email protected]
Reviewed-by: Luis Chamberlain <[email protected]>
Signed-off-by: Kees Cook <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Eric Biggers <[email protected]>
---
Documentation/sysctl/kernel.txt | 10 ++++++++++
kernel/panic.c | 27 +++++++++++++++++++++++++++
2 files changed, 37 insertions(+)

diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt
index e1d375df4f286..c8d3dbda3c1e2 100644
--- a/Documentation/sysctl/kernel.txt
+++ b/Documentation/sysctl/kernel.txt
@@ -97,6 +97,7 @@ show up in /proc/sys/kernel:
- threads-max
- unprivileged_bpf_disabled
- unknown_nmi_panic
+- warn_limit
- watchdog
- watchdog_thresh
- version
@@ -1114,6 +1115,15 @@ example. If a system hangs up, try pressing the NMI switch.

==============================================================

+warn_limit:
+
+Number of kernel warnings after which the kernel should panic when
+``panic_on_warn`` is not set. Setting this to 0 disables checking
+the warning count. Setting this to 1 has the same effect as setting
+``panic_on_warn=1``. The default value is 0.
+
+==============================================================
+
watchdog:

This parameter can be used to disable or enable the soft lockup detector
diff --git a/kernel/panic.c b/kernel/panic.c
index 08b8adc55b2bf..d1a5360262f04 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -42,6 +42,7 @@ static int pause_on_oops_flag;
static DEFINE_SPINLOCK(pause_on_oops_lock);
bool crash_kexec_post_notifiers;
int panic_on_warn __read_mostly;
+static unsigned int warn_limit __read_mostly;

int panic_timeout = CONFIG_PANIC_TIMEOUT;
EXPORT_SYMBOL_GPL(panic_timeout);
@@ -50,6 +51,26 @@ ATOMIC_NOTIFIER_HEAD(panic_notifier_list);

EXPORT_SYMBOL(panic_notifier_list);

+#ifdef CONFIG_SYSCTL
+static struct ctl_table kern_panic_table[] = {
+ {
+ .procname = "warn_limit",
+ .data = &warn_limit,
+ .maxlen = sizeof(warn_limit),
+ .mode = 0644,
+ .proc_handler = proc_douintvec,
+ },
+ { }
+};
+
+static __init int kernel_panic_sysctls_init(void)
+{
+ register_sysctl_init("kernel", kern_panic_table);
+ return 0;
+}
+late_initcall(kernel_panic_sysctls_init);
+#endif
+
static long no_blink(int state)
{
return 0;
@@ -127,8 +148,14 @@ EXPORT_SYMBOL(nmi_panic);

void check_panic_on_warn(const char *origin)
{
+ static atomic_t warn_count = ATOMIC_INIT(0);
+
if (panic_on_warn)
panic("%s: panic_on_warn set ...\n", origin);
+
+ if (atomic_inc_return(&warn_count) >= READ_ONCE(warn_limit) && warn_limit)
+ panic("%s: system warned too often (kernel.warn_limit is %d)",
+ origin, warn_limit);
}

/**
--
2.39.1


2023-02-02 05:29:40

by Eric Biggers

[permalink] [raw]
Subject: [PATCH 4.19 15/16] docs: Fix path paste-o for /sys/kernel/warn_count

From: Kees Cook <[email protected]>

commit 00dd027f721e0458418f7750d8a5a664ed3e5994 upstream.

Running "make htmldocs" shows that "/sys/kernel/oops_count" was
duplicated. This should have been "warn_count":

Warning: /sys/kernel/oops_count is defined 2 times:
./Documentation/ABI/testing/sysfs-kernel-warn_count:0
./Documentation/ABI/testing/sysfs-kernel-oops_count:0

Fix the typo.

Reported-by: kernel test robot <[email protected]>
Link: https://lore.kernel.org/linux-doc/[email protected]
Fixes: 8b05aa263361 ("panic: Expose "warn_count" to sysfs")
Cc: [email protected]
Signed-off-by: Kees Cook <[email protected]>
Signed-off-by: Eric Biggers <[email protected]>
---
Documentation/ABI/testing/sysfs-kernel-warn_count | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/ABI/testing/sysfs-kernel-warn_count b/Documentation/ABI/testing/sysfs-kernel-warn_count
index 08f083d2fd51b..90a029813717d 100644
--- a/Documentation/ABI/testing/sysfs-kernel-warn_count
+++ b/Documentation/ABI/testing/sysfs-kernel-warn_count
@@ -1,4 +1,4 @@
-What: /sys/kernel/oops_count
+What: /sys/kernel/warn_count
Date: November 2022
KernelVersion: 6.2.0
Contact: Linux Kernel Hardening List <[email protected]>
--
2.39.1


2023-02-02 05:29:40

by Eric Biggers

[permalink] [raw]
Subject: [PATCH 4.19 11/16] exit: Allow oops_limit to be disabled

From: Kees Cook <[email protected]>

commit de92f65719cd672f4b48397540b9f9eff67eca40 upstream.

In preparation for keeping oops_limit logic in sync with warn_limit,
have oops_limit == 0 disable checking the Oops counter.

Cc: Jann Horn <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Baolin Wang <[email protected]>
Cc: "Jason A. Donenfeld" <[email protected]>
Cc: Eric Biggers <[email protected]>
Cc: Huang Ying <[email protected]>
Cc: "Eric W. Biederman" <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: [email protected]
Signed-off-by: Kees Cook <[email protected]>
Signed-off-by: Eric Biggers <[email protected]>
---
Documentation/sysctl/kernel.txt | 5 +++--
kernel/exit.c | 2 +-
2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt
index fd65f4e651d55..e1d375df4f286 100644
--- a/Documentation/sysctl/kernel.txt
+++ b/Documentation/sysctl/kernel.txt
@@ -559,8 +559,9 @@ scanned for a given scan.
oops_limit:

Number of kernel oopses after which the kernel should panic when
-``panic_on_oops`` is not set. Setting this to 0 or 1 has the same effect
-as setting ``panic_on_oops=1``.
+``panic_on_oops`` is not set. Setting this to 0 disables checking
+the count. Setting this to 1 has the same effect as setting
+``panic_on_oops=1``. The default value is 10000.

==============================================================

diff --git a/kernel/exit.c b/kernel/exit.c
index 5cd8a34257650..b2f0aaf6bee78 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -986,7 +986,7 @@ void __noreturn make_task_dead(int signr)
* To make sure this can't happen, place an upper bound on how often the
* kernel may oops without panic().
*/
- if (atomic_inc_return(&oops_count) >= READ_ONCE(oops_limit))
+ if (atomic_inc_return(&oops_count) >= READ_ONCE(oops_limit) && oops_limit)
panic("Oopsed too often (kernel.oops_limit is %d)", oops_limit);

do_exit(signr);
--
2.39.1


2023-02-02 05:29:41

by Eric Biggers

[permalink] [raw]
Subject: [PATCH 4.19 14/16] panic: Expose "warn_count" to sysfs

From: Kees Cook <[email protected]>

commit 8b05aa26336113c4cea25f1c333ee8cd4fc212a6 upstream.

Since Warn count is now tracked and is a fairly interesting signal, add
the entry /sys/kernel/warn_count to expose it to userspace.

Cc: Petr Mladek <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: tangmeng <[email protected]>
Cc: "Guilherme G. Piccoli" <[email protected]>
Cc: Sebastian Andrzej Siewior <[email protected]>
Cc: Tiezhu Yang <[email protected]>
Reviewed-by: Luis Chamberlain <[email protected]>
Signed-off-by: Kees Cook <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Eric Biggers <[email protected]>
---
.../ABI/testing/sysfs-kernel-warn_count | 6 +++++
kernel/panic.c | 22 +++++++++++++++++--
2 files changed, 26 insertions(+), 2 deletions(-)
create mode 100644 Documentation/ABI/testing/sysfs-kernel-warn_count

diff --git a/Documentation/ABI/testing/sysfs-kernel-warn_count b/Documentation/ABI/testing/sysfs-kernel-warn_count
new file mode 100644
index 0000000000000..08f083d2fd51b
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-kernel-warn_count
@@ -0,0 +1,6 @@
+What: /sys/kernel/oops_count
+Date: November 2022
+KernelVersion: 6.2.0
+Contact: Linux Kernel Hardening List <[email protected]>
+Description:
+ Shows how many times the system has Warned since last boot.
diff --git a/kernel/panic.c b/kernel/panic.c
index d1a5360262f04..dbb6e27d33e10 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -29,6 +29,7 @@
#include <linux/bug.h>
#include <linux/ratelimit.h>
#include <linux/debugfs.h>
+#include <linux/sysfs.h>
#include <asm/sections.h>

#define PANIC_TIMER_STEP 100
@@ -71,6 +72,25 @@ static __init int kernel_panic_sysctls_init(void)
late_initcall(kernel_panic_sysctls_init);
#endif

+static atomic_t warn_count = ATOMIC_INIT(0);
+
+#ifdef CONFIG_SYSFS
+static ssize_t warn_count_show(struct kobject *kobj, struct kobj_attribute *attr,
+ char *page)
+{
+ return sysfs_emit(page, "%d\n", atomic_read(&warn_count));
+}
+
+static struct kobj_attribute warn_count_attr = __ATTR_RO(warn_count);
+
+static __init int kernel_panic_sysfs_init(void)
+{
+ sysfs_add_file_to_group(kernel_kobj, &warn_count_attr.attr, NULL);
+ return 0;
+}
+late_initcall(kernel_panic_sysfs_init);
+#endif
+
static long no_blink(int state)
{
return 0;
@@ -148,8 +168,6 @@ EXPORT_SYMBOL(nmi_panic);

void check_panic_on_warn(const char *origin)
{
- static atomic_t warn_count = ATOMIC_INIT(0);
-
if (panic_on_warn)
panic("%s: panic_on_warn set ...\n", origin);

--
2.39.1


2023-02-02 05:29:40

by Eric Biggers

[permalink] [raw]
Subject: [PATCH 4.19 16/16] exit: Use READ_ONCE() for all oops/warn limit reads

From: Kees Cook <[email protected]>

commit 7535b832c6399b5ebfc5b53af5c51dd915ee2538 upstream.

Use a temporary variable to take full advantage of READ_ONCE() behavior.
Without this, the report (and even the test) might be out of sync with
the initial test.

Reported-by: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/lkml/[email protected]
Fixes: 9fc9e278a5c0 ("panic: Introduce warn_limit")
Fixes: d4ccd54d28d3 ("exit: Put an upper limit on how often we can oops")
Cc: "Eric W. Biederman" <[email protected]>
Cc: Jann Horn <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Petr Mladek <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Luis Chamberlain <[email protected]>
Cc: Marco Elver <[email protected]>
Cc: tangmeng <[email protected]>
Cc: Sebastian Andrzej Siewior <[email protected]>
Cc: Tiezhu Yang <[email protected]>
Signed-off-by: Kees Cook <[email protected]>
Signed-off-by: Eric Biggers <[email protected]>
---
kernel/exit.c | 6 ++++--
kernel/panic.c | 7 +++++--
2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/kernel/exit.c b/kernel/exit.c
index b2f0aaf6bee78..02360ec3b1225 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -975,6 +975,7 @@ void __noreturn make_task_dead(int signr)
* Take the task off the cpu after something catastrophic has
* happened.
*/
+ unsigned int limit;

/*
* Every time the system oopses, if the oops happens while a reference
@@ -986,8 +987,9 @@ void __noreturn make_task_dead(int signr)
* To make sure this can't happen, place an upper bound on how often the
* kernel may oops without panic().
*/
- if (atomic_inc_return(&oops_count) >= READ_ONCE(oops_limit) && oops_limit)
- panic("Oopsed too often (kernel.oops_limit is %d)", oops_limit);
+ limit = READ_ONCE(oops_limit);
+ if (atomic_inc_return(&oops_count) >= limit && limit)
+ panic("Oopsed too often (kernel.oops_limit is %d)", limit);

do_exit(signr);
}
diff --git a/kernel/panic.c b/kernel/panic.c
index dbb6e27d33e10..982ecba286c08 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -168,12 +168,15 @@ EXPORT_SYMBOL(nmi_panic);

void check_panic_on_warn(const char *origin)
{
+ unsigned int limit;
+
if (panic_on_warn)
panic("%s: panic_on_warn set ...\n", origin);

- if (atomic_inc_return(&warn_count) >= READ_ONCE(warn_limit) && warn_limit)
+ limit = READ_ONCE(warn_limit);
+ if (atomic_inc_return(&warn_count) >= limit && limit)
panic("%s: system warned too often (kernel.warn_limit is %d)",
- origin, warn_limit);
+ origin, limit);
}

/**
--
2.39.1


2023-02-02 19:37:30

by SeongJae Park

[permalink] [raw]
Subject: Re: [PATCH 4.19 03/16] mm: kasan: do not panic if both panic_on_warn and kasan_multishot set

On Wed, 1 Feb 2023 21:25:51 -0800 Eric Biggers <[email protected]> wrote:

> From: David Gow <[email protected]>
>
> commit be4f1ae978ffe98cc95ec49ceb95386fb4474974 upstream.
>
> KASAN errors will currently trigger a panic when panic_on_warn is set.
> This renders kasan_multishot useless, as further KASAN errors won't be
> reported if the kernel has already paniced. By making kasan_multishot
> disable this behaviour for KASAN errors, we can still have the benefits of
> panic_on_warn for non-KASAN warnings, yet be able to use kasan_multishot.
>
> This is particularly important when running KASAN tests, which need to
> trigger multiple KASAN errors: previously these would panic the system if
> panic_on_warn was set, now they can run (and will panic the system should
> non-KASAN warnings show up).
>
> Signed-off-by: David Gow <[email protected]>
> Signed-off-by: Andrew Morton <[email protected]>
> Tested-by: Andrey Konovalov <[email protected]>
> Reviewed-by: Andrey Konovalov <[email protected]>
> Reviewed-by: Brendan Higgins <[email protected]>
> Cc: Andrey Ryabinin <[email protected]>
> Cc: Dmitry Vyukov <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Juri Lelli <[email protected]>
> Cc: Patricia Alfonso <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Shuah Khan <[email protected]>
> Cc: Vincent Guittot <[email protected]>
> Link: https://lkml.kernel.org/r/[email protected]
> Link: https://lkml.kernel.org/r/[email protected]
> Signed-off-by: Linus Torvalds <[email protected]>
> Signed-off-by: Eric Biggers <[email protected]>
> ---
> mm/kasan/report.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/kasan/report.c b/mm/kasan/report.c
> index 5c169aa688fde..90fdb261a5e2d 100644
> --- a/mm/kasan/report.c
> +++ b/mm/kasan/report.c
> @@ -176,7 +176,7 @@ static void kasan_end_report(unsigned long *flags)
> pr_err("==================================================================\n");
> add_taint(TAINT_BAD_PAGE, LOCKDEP_NOW_UNRELIABLE);
> spin_unlock_irqrestore(&report_lock, *flags);
> - if (panic_on_warn)
> + if (panic_on_warn && !test_bit(KASAN_BIT_MULTI_SHOT, &kasan_flags))
> panic("panic_on_warn set ...\n");
> kasan_enable_current();


Seems this introduced a build failure when CONFIG_KASAN is enabled, as also
reported by Sasha[1].

mm/kasan/report.c: In function ‘kasan_end_report’:
mm/kasan/report.c:179:16: error: ‘KASAN_BIT_MULTI_SHOT’ undeclared (first use in this function)
179 | if (!test_bit(KASAN_BIT_MULTI_SHOT, &kasan_flags))
| ^~~~~~~~~~~~~~~~~~~~
arch/x86/include/asm/bitops.h:342:25: note: in definition of macro ‘test_bit’
342 | (__builtin_constant_p((nr)) \
| ^~
mm/kasan/report.c:179:16: note: each undeclared identifier is reported only once for each function it appears in
179 | if (!test_bit(KASAN_BIT_MULTI_SHOT, &kasan_flags))
| ^~~~~~~~~~~~~~~~~~~~
arch/x86/include/asm/bitops.h:342:25: note: in definition of macro ‘test_bit’
342 | (__builtin_constant_p((nr)) \
| ^~
mm/kasan/report.c:179:39: error: ‘kasan_flags’ undeclared (first use in this function)
179 | if (!test_bit(KASAN_BIT_MULTI_SHOT, &kasan_flags))
| ^~~~~~~~~~~
arch/x86/include/asm/bitops.h:343:30: note: in definition of macro ‘test_bit’
343 | ? constant_test_bit((nr), (addr)) \
| ^~~~

I confirmed dropping this patch fixes the build failure. It causes a conflict
to a following patch[1], but seems it's not that difficult to resolve. I
updated kernel/panic.c part of the patch like attached below to resolve the
conflict.

[1] https://lore.kernel.org/stable/Y9v3G6UantaCo29G@sashalap/
[2] https://lore.kernel.org/stable/[email protected]/


Thanks,
SJ


================================ >8 ===========================================

diff --git a/kernel/panic.c b/kernel/panic.c
index a078d413042f..08b8adc55b2b 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -125,6 +125,12 @@ void nmi_panic(struct pt_regs *regs, const char *msg)
}
EXPORT_SYMBOL(nmi_panic);

+void check_panic_on_warn(const char *origin)
+{
+ if (panic_on_warn)
+ panic("%s: panic_on_warn set ...\n", origin);
+}
+
/**
* panic - halt the system
* @fmt: The text string to print
@@ -540,8 +546,7 @@ void __warn(const char *file, int line, void *caller, unsigned taint,
if (args)
vprintk(args->fmt, args->args);

- if (panic_on_warn)
- panic("panic_on_warn set ...\n");
+ check_panic_on_warn("kernel");

print_modules();