We have a number of systems industry-wide that have a subset of their
functionality that works as follows:
1. Receive a message from local kmsg, serial console, or netconsole;
2. Apply a set of rules to classify the message;
3. Do something based on this classification (like scheduling a
remediation for the machine), rinse, and repeat.
As a couple of examples of places we have this implemented just inside
Facebook, although this isn't a Facebook-specific problem, we have this
inside our netconsole processing (for alarm classification), and as part
of our machine health checking. We use these messages to determine
fairly important metrics around production health, and it's important
that we get them right.
While for some kinds of issues we have counters, tracepoints, or metrics
with a stable interface which can reliably indicate the issue, in order
to react to production issues quickly we need to work with the interface
which most kernel developers naturally use when developing: printk.
Most production issues come from unexpected phenomena, and as such
usually the code in question doesn't have easily usable tracepoints or
other counters available for the specific problem being mitigated. We
have a number of lines of monitoring defence against problems in
production (host metrics, process metrics, service metrics, etc), and
where it's not feasible to reliably monitor at another level, this kind
of pragmatic netconsole monitoring is essential.
As one would expect, monitoring using printk is rather brittle for a
number of reasons -- most notably that the message might disappear
entirely in a new version of the kernel, or that the message may change
in some way that the regex or other classification methods start to
silently fail.
One factor that makes this even harder is that, under normal operation,
many of these messages are never expected to be hit. For example, there
may be a rare hardware bug which one wants to detect if it was to ever
happen again, but its recurrence is not likely or anticipated. This
precludes using something like checking whether the printk in question
was printed somewhere fleetwide recently to determine whether the
message in question is still present or not, since we don't anticipate
that it should be printed anywhere, but still need to monitor for its
future presence in the long-term.
This class of issue has happened on a number of occasions, causing
unhealthy machines with hardware issues to remain in production for
longer than ideal. As a recent example, some monitoring around
blk_update_request fell out of date and caused semi-broken machines to
remain in production for longer than would be desirable.
Searching through the codebase to find the message is also extremely
fragile, because many of the messages are further constructed beyond
their callsite (eg. btrfs_printk and other module-specific wrappers,
each with their own functionality). Even if they aren't, guessing the
format and formulation of the underlying message based on the aesthetics
of the message emitted is not a recipe for success at scale, and our
previous issues with fleetwide machine health checking demonstrate as
much.
This patch provides a solution to the issue of silently changed or
deleted printks: we record pointers to all printk format strings known
at compile time into a new .printk_index section, both in vmlinux and
modules. At runtime, this can then be iterated by looking at
<debugfs>/printk/index/<module>, which emits the following format, both
readable by humans and able to be parsed by machines:
$ head -1 vmlinux; shuf -n 5 vmlinux
# <level[,flags]> filename:line function "format"
<5> block/blk-settings.c:661 disk_stack_limits "%s: Warning: Device %s is misaligned\n"
<4> kernel/trace/trace.c:8296 trace_create_file "Could not create tracefs '%s' entry\n"
<6> arch/x86/kernel/hpet.c:144 _hpet_print_config "hpet: %s(%d):\n"
<6> init/do_mounts.c:605 prepare_namespace "Waiting for root device %s...\n"
<6> drivers/acpi/osl.c:1410 acpi_no_auto_serialize_setup "ACPI: auto-serialization disabled\n"
This mitigates the majority of cases where we have a highly-specific
printk which we want to match on, as we can now enumerate and check
whether the format changed or the printk callsite disappeared entirely
in userspace. This allows us to catch changes to printks we monitor
earlier and decide what to do about it before it becomes problematic.
There is no additional runtime cost for printk callers or printk itself,
and the assembly generated is exactly the same.
Signed-off-by: Chris Down <[email protected]>
Cc: Petr Mladek <[email protected]>
Cc: Sergey Senozhatsky <[email protected]>
Cc: John Ogness <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Andrew Morton <[email protected]>
---
v2:
- Use seq_printf instead of step by step accumulation
- Scope fptr closer to its use
- Prevent seq_file from needing to alloc a new buffer
- Always provide KERN_SOH + level, even if caller omitted it
- Provide one file per module
- Update changelog to show ease of parsing
- Provide printk -> _printk for ARCH=um (thanks, LKP robot)
- Move to debugfs (thanks, Steven)
---
v3:
- Reduce locking during open by moving size lifetime to the inode
- Don't explicitly check for debugfs errors (thanks, gregkh)
---
v4:
- Fix a missed `extern int printk` fixup in ia64 (thanks, lkp)
---
v5:
Thanks Petr for the extensive feedback.
- Move all module handling from module notifier to module.[ch] directly
- Change to readable output format
- Handled by new string_escape(ESCAPE_PRINTF)
- Add file/line/function information
- Mass rename everything to printk_index/CONFIG_PRINTK_INDEX/pi_*
- As a result, this version does away with the mutex/hashtable
- Use seq_file iteration API instead of simple_open
- Remove debugfs file first to avoid ENOENT
- Tear down backing datastructure if debugfs file creation fails
- Move code under ifdef guard to kernel/printk/index.c
- Add pi_sec (formerly printk_fmt_sec) documentation
- Handle coexisting LOG_CONT + level
- Add header to debugfs output
- ...and probably some other stuff I forgot. :-)
---
MAINTAINERS | 5 +
arch/arm/kernel/entry-v7m.S | 2 +-
arch/arm/lib/backtrace-clang.S | 2 +-
arch/arm/lib/backtrace.S | 2 +-
arch/arm/mach-rpc/io-acorn.S | 2 +-
arch/arm/vfp/vfphw.S | 6 +-
arch/ia64/include/uapi/asm/cmpxchg.h | 4 +-
arch/openrisc/kernel/entry.S | 6 +-
arch/powerpc/kernel/head_fsl_booke.S | 2 +-
arch/um/include/shared/user.h | 3 +-
arch/x86/kernel/head_32.S | 2 +-
fs/seq_file.c | 21 +++
include/asm-generic/vmlinux.lds.h | 13 ++
include/linux/module.h | 6 +
include/linux/printk.h | 72 ++++++++++-
include/linux/seq_file.h | 1 +
include/linux/string_helpers.h | 2 +
init/Kconfig | 14 ++
kernel/module.c | 14 +-
kernel/printk/Makefile | 1 +
kernel/printk/index.c | 183 +++++++++++++++++++++++++++
kernel/printk/printk.c | 20 ++-
lib/string_helpers.c | 29 ++++-
lib/test-string_helpers.c | 6 +
24 files changed, 386 insertions(+), 32 deletions(-)
create mode 100644 kernel/printk/index.c
diff --git a/MAINTAINERS b/MAINTAINERS
index 3353de0c4bc8..328b3e822223 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -14314,6 +14314,11 @@ S: Maintained
F: include/linux/printk.h
F: kernel/printk/
+PRINTK INDEXING
+R: Chris Down <[email protected]>
+S: Maintained
+F: kernel/printk/index.c
+
PRISM54 WIRELESS DRIVER
M: Luis Chamberlain <[email protected]>
L: [email protected]
diff --git a/arch/arm/kernel/entry-v7m.S b/arch/arm/kernel/entry-v7m.S
index d0e898608d30..7bde93c10962 100644
--- a/arch/arm/kernel/entry-v7m.S
+++ b/arch/arm/kernel/entry-v7m.S
@@ -23,7 +23,7 @@ __invalid_entry:
adr r0, strerr
mrs r1, ipsr
mov r2, lr
- bl printk
+ bl _printk
#endif
mov r0, sp
bl show_regs
diff --git a/arch/arm/lib/backtrace-clang.S b/arch/arm/lib/backtrace-clang.S
index 6174c45f53a5..5b2cdb1003e3 100644
--- a/arch/arm/lib/backtrace-clang.S
+++ b/arch/arm/lib/backtrace-clang.S
@@ -202,7 +202,7 @@ finished_setup:
1006: adr r0, .Lbad
mov r1, loglvl
mov r2, frame
- bl printk
+ bl _printk
no_frame: ldmfd sp!, {r4 - r9, fp, pc}
ENDPROC(c_backtrace)
.pushsection __ex_table,"a"
diff --git a/arch/arm/lib/backtrace.S b/arch/arm/lib/backtrace.S
index 872f658638d9..e8408f22d4dc 100644
--- a/arch/arm/lib/backtrace.S
+++ b/arch/arm/lib/backtrace.S
@@ -103,7 +103,7 @@ for_each_frame: tst frame, mask @ Check for address exceptions
1006: adr r0, .Lbad
mov r1, loglvl
mov r2, frame
- bl printk
+ bl _printk
no_frame: ldmfd sp!, {r4 - r9, pc}
ENDPROC(c_backtrace)
diff --git a/arch/arm/mach-rpc/io-acorn.S b/arch/arm/mach-rpc/io-acorn.S
index b9082a2a2a01..aa9bf0d771c0 100644
--- a/arch/arm/mach-rpc/io-acorn.S
+++ b/arch/arm/mach-rpc/io-acorn.S
@@ -25,4 +25,4 @@ ENTRY(insl)
ENTRY(outsl)
adr r0, .Liosl_warning
mov r1, lr
- b printk
+ b _printk
diff --git a/arch/arm/vfp/vfphw.S b/arch/arm/vfp/vfphw.S
index d5837bf05a9a..6f7926c9c179 100644
--- a/arch/arm/vfp/vfphw.S
+++ b/arch/arm/vfp/vfphw.S
@@ -23,7 +23,7 @@
#ifdef DEBUG
stmfd sp!, {r0-r3, ip, lr}
ldr r0, =1f
- bl printk
+ bl _printk
ldmfd sp!, {r0-r3, ip, lr}
.pushsection .rodata, "a"
@@ -38,7 +38,7 @@
stmfd sp!, {r0-r3, ip, lr}
mov r1, \arg
ldr r0, =1f
- bl printk
+ bl _printk
ldmfd sp!, {r0-r3, ip, lr}
.pushsection .rodata, "a"
@@ -55,7 +55,7 @@
mov r2, \arg2
mov r1, \arg1
ldr r0, =1f
- bl printk
+ bl _printk
ldmfd sp!, {r0-r3, ip, lr}
.pushsection .rodata, "a"
diff --git a/arch/ia64/include/uapi/asm/cmpxchg.h b/arch/ia64/include/uapi/asm/cmpxchg.h
index 5d90307fd6e0..d96438322d13 100644
--- a/arch/ia64/include/uapi/asm/cmpxchg.h
+++ b/arch/ia64/include/uapi/asm/cmpxchg.h
@@ -139,9 +139,9 @@ extern long ia64_cmpxchg_called_with_bad_pointer(void);
do { \
if (_cmpxchg_bugcheck_count-- <= 0) { \
void *ip; \
- extern int printk(const char *fmt, ...); \
+ extern int _printk(const char *fmt, ...); \
ip = (void *) ia64_getreg(_IA64_REG_IP); \
- printk("CMPXCHG_BUGCHECK: stuck at %p on word %p\n", ip, (v));\
+ _printk("CMPXCHG_BUGCHECK: stuck at %p on word %p\n", ip, (v));\
break; \
} \
} while (0)
diff --git a/arch/openrisc/kernel/entry.S b/arch/openrisc/kernel/entry.S
index bc657e55c15f..947613f61d4a 100644
--- a/arch/openrisc/kernel/entry.S
+++ b/arch/openrisc/kernel/entry.S
@@ -551,7 +551,7 @@ EXCEPTION_ENTRY(_external_irq_handler)
l.movhi r3,hi(42f)
l.ori r3,r3,lo(42f)
l.sw 0x0(r1),r3
- l.jal printk
+ l.jal _printk
l.sw 0x4(r1),r4
l.addi r1,r1,0x8
@@ -681,8 +681,8 @@ _syscall_debug:
l.sw -4(r1),r27
l.sw -8(r1),r11
l.addi r1,r1,-8
- l.movhi r27,hi(printk)
- l.ori r27,r27,lo(printk)
+ l.movhi r27,hi(_printk)
+ l.ori r27,r27,lo(_printk)
l.jalr r27
l.nop
l.addi r1,r1,8
diff --git a/arch/powerpc/kernel/head_fsl_booke.S b/arch/powerpc/kernel/head_fsl_booke.S
index fdd4d274c245..f2c7ecef9b14 100644
--- a/arch/powerpc/kernel/head_fsl_booke.S
+++ b/arch/powerpc/kernel/head_fsl_booke.S
@@ -852,7 +852,7 @@ KernelSPE:
ori r3,r3,87f@l
mr r4,r2 /* current */
lwz r5,_NIP(r1)
- bl printk
+ bl _printk
#endif
b ret_from_except
#ifdef CONFIG_PRINTK
diff --git a/arch/um/include/shared/user.h b/arch/um/include/shared/user.h
index e793e4212f0a..dd4badffdeb3 100644
--- a/arch/um/include/shared/user.h
+++ b/arch/um/include/shared/user.h
@@ -38,7 +38,8 @@ extern void panic(const char *fmt, ...)
#define UM_KERN_CONT KERN_CONT
#ifdef UML_CONFIG_PRINTK
-extern int printk(const char *fmt, ...)
+#define printk(...) _printk(__VA_ARGS__)
+extern int _printk(const char *fmt, ...)
__attribute__ ((format (printf, 1, 2)));
#else
static inline int printk(const char *fmt, ...)
diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S
index 7ed84c282233..c207b789dd22 100644
--- a/arch/x86/kernel/head_32.S
+++ b/arch/x86/kernel/head_32.S
@@ -446,7 +446,7 @@ SYM_FUNC_START(early_ignore_irq)
pushl 32(%esp)
pushl 40(%esp)
pushl $int_msg
- call printk
+ call _printk
call dump_stack
diff --git a/fs/seq_file.c b/fs/seq_file.c
index 71a274e7f903..0fd3ae1051d9 100644
--- a/fs/seq_file.c
+++ b/fs/seq_file.c
@@ -376,6 +376,27 @@ void seq_escape(struct seq_file *m, const char *s, const char *esc)
}
EXPORT_SYMBOL(seq_escape);
+/**
+ * seq_escape_printf_format - print string into buffer, escaping
+ * characters that are escaped in printf format (including '"')
+ * @m: target buffer
+ * @s: string
+ *
+ * Puts string into buffer and escape characters that are
+ * escaped in printf format.
+ * Use seq_has_overflowed() to check for errors.
+ */
+void seq_escape_printf_format(struct seq_file *m, const char *s)
+{
+ char *buf;
+ size_t size = seq_get_buf(m, &buf);
+ int ret;
+
+ ret = string_escape_str(s, buf, size, ESCAPE_PRINTF, NULL);
+ seq_commit(m, ret < size ? ret : -1);
+}
+EXPORT_SYMBOL(seq_escape_printf_format);
+
void seq_escape_mem_ascii(struct seq_file *m, const char *src, size_t isz)
{
char *buf;
diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index 34b7e0d2346c..d4e45714405f 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -309,6 +309,17 @@
#define ACPI_PROBE_TABLE(name)
#endif
+#ifdef CONFIG_PRINTK_INDEX
+#define PRINTK_INDEX \
+ .printk_index : AT(ADDR(.printk_index) - LOAD_OFFSET) { \
+ __start_printk_index = .; \
+ *(.printk_index) \
+ __stop_printk_index = .; \
+ }
+#else
+#define PRINTK_INDEX
+#endif
+
#ifdef CONFIG_THERMAL
#define THERMAL_TABLE(name) \
. = ALIGN(8); \
@@ -480,6 +491,8 @@
\
TRACEDATA \
\
+ PRINTK_INDEX \
+ \
/* Kernel symbol table: Normal symbols */ \
__ksymtab : AT(ADDR(__ksymtab) - LOAD_OFFSET) { \
__start___ksymtab = .; \
diff --git a/include/linux/module.h b/include/linux/module.h
index 7a0bcb5b1ffc..5d466b4a23b9 100644
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -516,6 +516,12 @@ struct module {
struct klp_modinfo *klp_info;
#endif
+#ifdef CONFIG_PRINTK_INDEX
+ unsigned int printk_index_size;
+ struct pi_object *printk_index_start;
+ struct pi_sec *pi_sec;
+#endif
+
#ifdef CONFIG_MODULE_UNLOAD
/* What modules depend on me? */
struct list_head source_list;
diff --git a/include/linux/printk.h b/include/linux/printk.h
index d3c08095a9a3..cd2977df820e 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -164,6 +164,11 @@ static inline void printk_nmi_direct_exit(void) { }
struct dev_printk_info;
#ifdef CONFIG_PRINTK
+enum log_flags {
+ LOG_NEWLINE = 2, /* text ended with a newline */
+ LOG_CONT = 8, /* text is a fragment of a continuation line */
+};
+
asmlinkage __printf(4, 0)
int vprintk_emit(int facility, int level,
const struct dev_printk_info *dev_info,
@@ -173,12 +178,12 @@ asmlinkage __printf(1, 0)
int vprintk(const char *fmt, va_list args);
asmlinkage __printf(1, 2) __cold
-int printk(const char *fmt, ...);
+int _printk(const char *fmt, ...);
/*
* Special printk facility for scheduler/timekeeping use only, _DO_NOT_USE_ !
*/
-__printf(1, 2) __cold int printk_deferred(const char *fmt, ...);
+__printf(1, 2) __cold int _printk_deferred(const char *fmt, ...);
/*
* Please don't use printk_ratelimit(), because it shares ratelimiting state
@@ -206,6 +211,7 @@ void __init setup_log_buf(int early);
__printf(1, 2) void dump_stack_set_arch_desc(const char *fmt, ...);
void dump_stack_print_info(const char *log_lvl);
void show_regs_print_info(const char *log_lvl);
+u16 parse_prefix(const char *text, int *level, enum log_flags *lflags);
extern asmlinkage void dump_stack(void) __cold;
extern void printk_safe_flush(void);
extern void printk_safe_flush_on_panic(void);
@@ -216,12 +222,12 @@ int vprintk(const char *s, va_list args)
return 0;
}
static inline __printf(1, 2) __cold
-int printk(const char *s, ...)
+int _printk(const char *s, ...)
{
return 0;
}
static inline __printf(1, 2) __cold
-int printk_deferred(const char *s, ...)
+int _printk_deferred(const char *s, ...)
{
return 0;
}
@@ -301,6 +307,64 @@ extern int kptr_restrict;
#define pr_fmt(fmt) fmt
#endif
+struct module;
+
+#ifdef CONFIG_PRINTK_INDEX
+extern void pi_sec_store(struct module *mod);
+extern void pi_sec_remove(struct module *mod);
+
+struct pi_object {
+ const char *fmt;
+ const char *func;
+ const char *file;
+ unsigned int line;
+};
+
+extern struct pi_object __start_printk_index[];
+extern struct pi_object __stop_printk_index[];
+
+#define pi_sec_elf_embed(_p_func, _fmt, ...) \
+ ({ \
+ int _p_ret; \
+ \
+ if (__builtin_constant_p(_fmt)) { \
+ /*
+ * The compiler may not be able to eliminate this, so
+ * we need to make sure that it doesn't see any
+ * hypothetical assignment for non-constants even
+ * though this is already inside the
+ * __builtin_constant_p guard.
+ */ \
+ static struct pi_object _pi \
+ __section(".printk_index") = { \
+ .fmt = __builtin_constant_p(_fmt) ? (_fmt) : NULL, \
+ .func = __func__, \
+ .file = __FILE__, \
+ .line = __LINE__, \
+ }; \
+ _p_ret = _p_func(_pi.fmt, ##__VA_ARGS__); \
+ } else \
+ _p_ret = _p_func(_fmt, ##__VA_ARGS__); \
+ \
+ _p_ret; \
+ })
+
+#define printk(fmt, ...) pi_sec_elf_embed(_printk, fmt, ##__VA_ARGS__)
+#define printk_deferred(fmt, ...) \
+ pi_sec_elf_embed(_printk_deferred, fmt, ##__VA_ARGS__)
+#else /* !CONFIG_PRINTK_INDEX */
+static inline void pi_sec_store(struct module *mod)
+{
+}
+
+static inline void pi_sec_remove(struct module *mod)
+{
+}
+
+#define printk(...) _printk(__VA_ARGS__)
+#define printk_deferred(...) _printk_deferred(__VA_ARGS__)
+#endif /* CONFIG_PRINTK_INDEX */
+
/**
* pr_emerg - Print an emergency-level message
* @fmt: format string
diff --git a/include/linux/seq_file.h b/include/linux/seq_file.h
index b83b3ae3c877..cfc504a30429 100644
--- a/include/linux/seq_file.h
+++ b/include/linux/seq_file.h
@@ -127,6 +127,7 @@ void seq_put_hex_ll(struct seq_file *m, const char *delimiter,
unsigned long long v, unsigned int width);
void seq_escape(struct seq_file *m, const char *s, const char *esc);
+void seq_escape_printf_format(struct seq_file *m, const char *s);
void seq_escape_mem_ascii(struct seq_file *m, const char *src, size_t isz);
void seq_hex_dump(struct seq_file *m, const char *prefix_str, int prefix_type,
diff --git a/include/linux/string_helpers.h b/include/linux/string_helpers.h
index fa06dcdc481e..95fd63ee4458 100644
--- a/include/linux/string_helpers.h
+++ b/include/linux/string_helpers.h
@@ -51,6 +51,8 @@ static inline int string_unescape_any_inplace(char *buf)
#define ESCAPE_NP 0x10
#define ESCAPE_ANY_NP (ESCAPE_ANY | ESCAPE_NP)
#define ESCAPE_HEX 0x20
+#define ESCAPE_QUOTE 0x40
+#define ESCAPE_PRINTF (ESCAPE_ANY_NP | ESCAPE_QUOTE)
int string_escape_mem(const char *src, size_t isz, char *dst, size_t osz,
unsigned int flags, const char *only);
diff --git a/init/Kconfig b/init/Kconfig
index ab2e916f7e8b..085190760b49 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -764,6 +764,20 @@ config PRINTK_SAFE_LOG_BUF_SHIFT
13 => 8 KB for each CPU
12 => 4 KB for each CPU
+config PRINTK_INDEX
+ bool "Printk indexing debugfs interface"
+ depends on PRINTK && DEBUG_FS
+ help
+ Add support for indexing of all printk formats known at compile time
+ at <debugfs>/printk/index/<module>.
+
+ This can be used as part of maintaining daemons which monitor
+ /dev/kmsg, as it permits auditing the printk formats present in a
+ kernel, allowing monitoring of cases where monitored printks are
+ changed or no longer present.
+
+ There is no additional runtime cost to printk with this enabled.
+
#
# Architectures with an unreliable sched_clock() should select this:
#
diff --git a/kernel/module.c b/kernel/module.c
index 1e5aad812310..44df2913a046 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -1064,6 +1064,7 @@ SYSCALL_DEFINE2(delete_module, const char __user *, name_user,
blocking_notifier_call_chain(&module_notify_list,
MODULE_STATE_GOING, mod);
klp_module_going(mod);
+ pi_sec_remove(mod);
ftrace_release_mod(mod);
async_synchronize_full();
@@ -3429,6 +3430,11 @@ static int find_module_sections(struct module *mod, struct load_info *info)
sizeof(unsigned long),
&mod->num_kprobe_blacklist);
#endif
+#ifdef CONFIG_PRINTK_INDEX
+ mod->printk_index_start = section_objs(info, ".printk_index",
+ sizeof(*mod->printk_index_start),
+ &mod->printk_index_size);
+#endif
#ifdef CONFIG_HAVE_STATIC_CALL_INLINE
mod->static_call_sites = section_objs(info, ".static_call_sites",
sizeof(*mod->static_call_sites),
@@ -3860,6 +3866,7 @@ static noinline int do_init_module(struct module *mod)
blocking_notifier_call_chain(&module_notify_list,
MODULE_STATE_GOING, mod);
klp_module_going(mod);
+ pi_sec_remove(mod);
ftrace_release_mod(mod);
free_module(mod);
wake_up_all(&module_wq);
@@ -3954,11 +3961,15 @@ static int prepare_coming_module(struct module *mod)
if (err)
return err;
+ pi_sec_store(mod);
+
err = blocking_notifier_call_chain_robust(&module_notify_list,
MODULE_STATE_COMING, MODULE_STATE_GOING, mod);
err = notifier_to_errno(err);
- if (err)
+ if (err) {
klp_module_going(mod);
+ pi_sec_remove(mod);
+ }
return err;
}
@@ -4172,6 +4183,7 @@ static int load_module(struct load_info *info, const char __user *uargs,
blocking_notifier_call_chain(&module_notify_list,
MODULE_STATE_GOING, mod);
klp_module_going(mod);
+ pi_sec_remove(mod);
bug_cleanup:
mod->state = MODULE_STATE_GOING;
/* module_bug_cleanup needs module_mutex protection */
diff --git a/kernel/printk/Makefile b/kernel/printk/Makefile
index eee3dc9b60a9..d118739874c0 100644
--- a/kernel/printk/Makefile
+++ b/kernel/printk/Makefile
@@ -3,3 +3,4 @@ obj-y = printk.o
obj-$(CONFIG_PRINTK) += printk_safe.o
obj-$(CONFIG_A11Y_BRAILLE_CONSOLE) += braille.o
obj-$(CONFIG_PRINTK) += printk_ringbuffer.o
+obj-$(CONFIG_PRINTK_INDEX) += index.o
diff --git a/kernel/printk/index.c b/kernel/printk/index.c
new file mode 100644
index 000000000000..8765d982c8d9
--- /dev/null
+++ b/kernel/printk/index.c
@@ -0,0 +1,183 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * printk/index.c - Userspace indexing of printk formats
+ */
+
+#include <linux/debugfs.h>
+#include <linux/module.h>
+#include <linux/printk.h>
+#include <linux/slab.h>
+
+/**
+ * struct pi_sec - printk index section metadata
+ *
+ * @file: The debugfs file where userspace can index these printk formats
+ * @start: Section start boundary
+ * @end: Section end boundary
+ *
+ * Allocated and populated by pi_sec_store.
+ *
+ * @mod is NULL if the printk formats in question are built in to vmlinux
+ * itself.
+ *
+ * @file may be an ERR_PTR value if the file or one of its ancestors was not
+ * successfully created.
+ */
+struct pi_sec {
+ struct dentry *file;
+ struct pi_object *start;
+ struct pi_object *end;
+};
+
+/* The base dir for module formats, typically debugfs/printk/index/ */
+struct dentry *dfs_index;
+
+#ifdef CONFIG_MODULES
+static const char *pi_get_module_name(struct module *mod)
+{
+ return mod ? mod->name : "vmlinux";
+}
+
+void pi_sec_remove(struct module *mod)
+{
+ if (!mod || !mod->pi_sec)
+ return;
+
+ debugfs_remove(mod->pi_sec->file);
+ kfree(mod->pi_sec);
+ mod->pi_sec = NULL;
+}
+#else
+static const char *pi_get_module_name(struct module *mod)
+{
+ return "vmlinux";
+}
+
+void pi_sec_remove(struct module *mod)
+{
+}
+#endif
+
+static void *pi_next(struct seq_file *s, void *v, loff_t *pos)
+{
+ const struct pi_sec *ps = s->file->f_inode->i_private;
+ struct pi_object *pi = NULL;
+ loff_t idx = *pos - 1;
+
+ ++*pos;
+
+ if (idx == -1)
+ return SEQ_START_TOKEN;
+
+ pi = ps->start + idx;
+
+ return pi < ps->end ? pi : NULL;
+}
+
+static void *pi_start(struct seq_file *s, loff_t *pos)
+{
+ return pi_next(s, NULL, pos);
+}
+
+static int pi_show(struct seq_file *s, void *v)
+{
+ const struct pi_object *pi = v;
+ int level = LOGLEVEL_DEFAULT;
+ enum log_flags lflags = 0;
+ u16 prefix_len;
+
+ if (v == SEQ_START_TOKEN) {
+ seq_puts(s,
+ "# <level[,flags]> filename:line function \"format\"\n");
+ return 0;
+ }
+
+ prefix_len = parse_prefix(pi->fmt, &level, &lflags);
+ seq_printf(s, "<%d%s> %s:%d %s \"",
+ level, lflags & LOG_CONT ? ",c" : "", pi->file,
+ pi->line, pi->func);
+ seq_escape_printf_format(s, pi->fmt + prefix_len);
+ seq_puts(s, "\"\n");
+
+ return 0;
+}
+
+static void pi_stop(struct seq_file *p, void *v)
+{
+}
+
+static const struct seq_operations dfs_index_seq_ops = {
+ .start = pi_start,
+ .next = pi_next,
+ .show = pi_show,
+ .stop = pi_stop,
+};
+
+
+static int pi_open(struct inode *inode, struct file *file)
+{
+ return seq_open(file, &dfs_index_seq_ops);
+}
+
+static const struct file_operations dfs_index_fops = {
+ .open = pi_open,
+ .read = seq_read,
+ .llseek = seq_lseek,
+ .release = seq_release
+};
+
+
+void pi_sec_store(struct module *mod)
+{
+ struct pi_sec *ps = NULL;
+ struct pi_object *start = NULL, *end = NULL;
+
+ ps = kmalloc(sizeof(struct pi_sec), GFP_KERNEL);
+ if (!ps)
+ return;
+
+#ifdef CONFIG_MODULES
+ if (mod) {
+ start = mod->printk_index_start;
+ end = start + mod->printk_index_size;
+ }
+#endif
+
+ if (!mod) {
+ /* vmlinux */
+ start = __start_printk_index;
+ end = __stop_printk_index;
+ }
+
+ ps->start = start;
+ ps->end = end;
+ ps->file = debugfs_create_file(pi_get_module_name(mod), 0444, dfs_index,
+ ps, &dfs_index_fops);
+
+ if (IS_ERR(ps->file)) {
+ pi_sec_remove(mod);
+ return;
+ }
+
+#ifdef CONFIG_MODULES
+ if (mod)
+ mod->pi_sec = ps;
+#endif
+
+ /*
+ * vmlinux's pi_sec is only accessible as private data on the inode,
+ * since we never have to free it.
+ */
+}
+
+static int __init pi_init(void)
+{
+ struct dentry *dfs_root = debugfs_create_dir("printk", NULL);
+
+ dfs_index = debugfs_create_dir("index", dfs_root);
+ pi_sec_store(NULL);
+
+ return 0;
+}
+
+core_initcall(pi_init);
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 5a95c688621f..7a6eaa912b3a 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -350,11 +350,6 @@ static int console_msg_format = MSG_FORMAT_DEFAULT;
* non-prinatable characters are escaped in the "\xff" notation.
*/
-enum log_flags {
- LOG_NEWLINE = 2, /* text ended with a newline */
- LOG_CONT = 8, /* text is a fragment of a continuation line */
-};
-
/*
* The logbuf_lock protects kmsg buffer, indices, counters. This can be taken
* within the scheduler's rq lock. It must be released before calling
@@ -1895,7 +1890,7 @@ static inline u32 printk_caller_id(void)
*
* Return: The length of the parsed level and control flags.
*/
-static u16 parse_prefix(char *text, int *level, enum log_flags *lflags)
+u16 parse_prefix(const char *text, int *level, enum log_flags *lflags)
{
u16 prefix_len = 0;
int kern_level;
@@ -2111,10 +2106,13 @@ int vprintk_default(const char *fmt, va_list args)
EXPORT_SYMBOL_GPL(vprintk_default);
/**
- * printk - print a kernel message
+ * _printk - print a kernel message
* @fmt: format string
*
- * This is printk(). It can be called from any context. We want it to work.
+ * This is _printk(). It can be called from any context. We want it to work.
+ *
+ * If printk indexing is enabled, _printk() is called from pi_sec_elf_embed.
+ * Otherwise, printk is simply #defined to _printk.
*
* We try to grab the console_lock. If we succeed, it's easy - we log the
* output and call the console drivers. If we fail to get the semaphore, we
@@ -2131,7 +2129,7 @@ EXPORT_SYMBOL_GPL(vprintk_default);
*
* See the vsnprintf() documentation for format string extensions over C99.
*/
-asmlinkage __visible int printk(const char *fmt, ...)
+asmlinkage __visible int _printk(const char *fmt, ...)
{
va_list args;
int r;
@@ -2142,7 +2140,7 @@ asmlinkage __visible int printk(const char *fmt, ...)
return r;
}
-EXPORT_SYMBOL(printk);
+EXPORT_SYMBOL(_printk);
#else /* CONFIG_PRINTK */
@@ -3133,7 +3131,7 @@ int vprintk_deferred(const char *fmt, va_list args)
return r;
}
-int printk_deferred(const char *fmt, ...)
+int _printk_deferred(const char *fmt, ...)
{
va_list args;
int r;
diff --git a/lib/string_helpers.c b/lib/string_helpers.c
index 7f2d5fbaf243..86bee1b4d392 100644
--- a/lib/string_helpers.c
+++ b/lib/string_helpers.c
@@ -436,6 +436,24 @@ static bool escape_hex(unsigned char c, char **dst, char *end)
return true;
}
+static bool escape_quote(unsigned char c, char **dst, char *end)
+{
+ char *out = *dst;
+
+ if (c != '"')
+ return false;
+
+ if (out < end)
+ *out = '\\';
+ ++out;
+ if (out < end)
+ *out = '"';
+ ++out;
+
+ *dst = out;
+ return true;
+}
+
/**
* string_escape_mem - quote characters in the given memory buffer
* @src: source buffer (unescaped)
@@ -487,6 +505,11 @@ static bool escape_hex(unsigned char c, char **dst, char *end)
* all previous together
* %ESCAPE_HEX:
* '\xHH' - byte with hexadecimal value HH (2 digits)
+ * %ESCAPE_QUOTE:
+ * '"' - ASCII quotation mark
+ * %ESCAPE_PRINTF:
+ * anything one would usually have to quote inside "" to printf,
+ * ie. ESCAPE_SPACE + ESCAPE_SPECIAL + ESCAPE_NP + ESCAPE_QUOTE
*
* Return:
* The total size of the escaped output that would be generated for
@@ -515,7 +538,8 @@ int string_escape_mem(const char *src, size_t isz, char *dst, size_t osz,
* In these cases we just pass through a character to the
* output buffer.
*/
- if ((flags & ESCAPE_NP && isprint(c)) ||
+ if ((flags & ESCAPE_NP && isprint(c) &&
+ (!(flags & ESCAPE_QUOTE) || c != '\"')) ||
(is_dict && !strchr(only, c))) {
/* do nothing */
} else {
@@ -525,6 +549,9 @@ int string_escape_mem(const char *src, size_t isz, char *dst, size_t osz,
if (flags & ESCAPE_SPECIAL && escape_special(c, &p, end))
continue;
+ if (flags & ESCAPE_QUOTE && escape_quote(c, &p, end))
+ continue;
+
if (flags & ESCAPE_NULL && escape_null(c, &p, end))
continue;
diff --git a/lib/test-string_helpers.c b/lib/test-string_helpers.c
index 10360d4ea273..619ddcffdd01 100644
--- a/lib/test-string_helpers.c
+++ b/lib/test-string_helpers.c
@@ -142,6 +142,9 @@ static const struct test_string_2 escape0[] __initconst = {{
.s1 = {{
.out = "\\\\h\\\\\"\\a\\e\\\\",
.flags = ESCAPE_SPECIAL,
+ },{
+ .out = "\\\\h\\\\\\\"\\a\\e\\\\",
+ .flags = ESCAPE_SPECIAL | ESCAPE_QUOTE,
},{
.out = "\\\\\\150\\\\\\042\\a\\e\\\\",
.flags = ESCAPE_SPECIAL | ESCAPE_OCTAL,
@@ -199,6 +202,9 @@ static const struct test_string_2 escape0[] __initconst = {{
},{
.out = "\\x1bb \\C\\x07\"\\x90\\x0d]",
.flags = ESCAPE_NP | ESCAPE_HEX,
+ },{
+ .out = "\\eb \\C\\a\\\"\\220\\r]",
+ .flags = ESCAPE_PRINTF,
},{
/* terminator */
}},
--
2.30.2
Hi Chris,
I love your patch! Yet something to improve:
[auto build test ERROR on jeyu/modules-next]
[also build test ERROR on linux/master soc/for-next openrisc/for-next powerpc/next uml/linux-next tip/x86/core asm-generic/master linus/master v5.12-rc2]
[cannot apply to pmladek/for-next next-20210310]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]
url: https://github.com/0day-ci/linux/commits/Chris-Down/printk-Userspace-format-enumeration-support/20210310-103231
base: https://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux.git modules-next
config: openrisc-randconfig-r022-20210308 (attached as .config)
compiler: or1k-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# https://github.com/0day-ci/linux/commit/097254e932767fc7d5ba0243a83265017d427d74
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Chris-Down/printk-Userspace-format-enumeration-support/20210310-103231
git checkout 097254e932767fc7d5ba0243a83265017d427d74
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=openrisc
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <[email protected]>
All errors (new ones prefixed by >>):
or1k-linux-ld: arch/openrisc/kernel/entry.o: in function `_external_irq_handler':
>> (.text+0x5e4): undefined reference to `_printk'
(.text+0x5e4): relocation truncated to fit: R_OR1K_INSN_REL_26 against undefined symbol `_printk'
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/[email protected]
On Wed, Mar 10, 2021 at 02:30:31AM +0000, Chris Down wrote:
> + ps->file = debugfs_create_file(pi_get_module_name(mod), 0444, dfs_index,
> + ps, &dfs_index_fops);
> +
> + if (IS_ERR(ps->file)) {
> + pi_sec_remove(mod);
> + return;
> + }
No need to check this and try to clean up if there is a problem, just
save the pointer off and call debugfs_remove() when you want to clean
up.
Or better yet, no need to save anything, you can always look it up when
you want to remove it, that will save you one pointer per module.
thanks,
greg k-h
Greg Kroah-Hartman writes:
>On Wed, Mar 10, 2021 at 02:30:31AM +0000, Chris Down wrote:
>> + ps->file = debugfs_create_file(pi_get_module_name(mod), 0444, dfs_index,
>> + ps, &dfs_index_fops);
>> +
>> + if (IS_ERR(ps->file)) {
>> + pi_sec_remove(mod);
>> + return;
>> + }
>
>No need to check this and try to clean up if there is a problem, just
>save the pointer off and call debugfs_remove() when you want to clean
>up.
Petr, what are your thoughts on this, since you requested the cleanup on
debugfs failure? :-)
>Or better yet, no need to save anything, you can always look it up when
>you want to remove it, that will save you one pointer per module.
That's a good point, and with that maybe we can even do away with the pi_sec
entirely then since that only leaves start/end pointers which we can calculate
on demand from existing data.
On Wed, Mar 10, 2021 at 12:12:57PM +0000, Chris Down wrote:
> Greg Kroah-Hartman writes:
> > On Wed, Mar 10, 2021 at 02:30:31AM +0000, Chris Down wrote:
> > > + ps->file = debugfs_create_file(pi_get_module_name(mod), 0444, dfs_index,
> > > + ps, &dfs_index_fops);
> > > +
> > > + if (IS_ERR(ps->file)) {
> > > + pi_sec_remove(mod);
> > > + return;
> > > + }
> >
> > No need to check this and try to clean up if there is a problem, just
> > save the pointer off and call debugfs_remove() when you want to clean
> > up.
>
> Petr, what are your thoughts on this, since you requested the cleanup on
> debugfs failure? :-)
There is nothing to "clean up" if there is a debugfs failure here so I
don't see the need.
> > Or better yet, no need to save anything, you can always look it up when
> > you want to remove it, that will save you one pointer per module.
>
> That's a good point, and with that maybe we can even do away with the pi_sec
> entirely then since that only leaves start/end pointers which we can
> calculate on demand from existing data.
Please do, that makes the code simpler overall.
thanks,
greg k-h
Hey Petr,
Chris Down writes:
> $ head -1 vmlinux; shuf -n 5 vmlinux
> # <level[,flags]> filename:line function "format"
> <5> block/blk-settings.c:661 disk_stack_limits "%s: Warning: Device %s is misaligned\n"
> <4> kernel/trace/trace.c:8296 trace_create_file "Could not create tracefs '%s' entry\n"
> <6> arch/x86/kernel/hpet.c:144 _hpet_print_config "hpet: %s(%d):\n"
> <6> init/do_mounts.c:605 prepare_namespace "Waiting for root device %s...\n"
> <6> drivers/acpi/osl.c:1410 acpi_no_auto_serialize_setup "ACPI: auto-serialization disabled\n"
Regardless of any of the internals, how does this format look to you? I ask
because the sooner we agree on the format, the sooner I can provide an interim
version of this patch to internal customers, even if the eventual
implementation changes a little :-)
Thanks,
Chris
Hi Chris,
I love your patch! Yet something to improve:
[auto build test ERROR on jeyu/modules-next]
[also build test ERROR on linux/master soc/for-next openrisc/for-next powerpc/next uml/linux-next tip/x86/core asm-generic/master linus/master v5.12-rc2]
[cannot apply to pmladek/for-next next-20210310]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]
url: https://github.com/0day-ci/linux/commits/Chris-Down/printk-Userspace-format-enumeration-support/20210310-103231
base: https://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux.git modules-next
config: mips-randconfig-r022-20210309 (attached as .config)
compiler: clang version 13.0.0 (https://github.com/llvm/llvm-project cd9a69289c7825d54450cb6829fef2c8e0f1963a)
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# install mips cross compiling tool for clang build
# apt-get install binutils-mips-linux-gnu
# https://github.com/0day-ci/linux/commit/097254e932767fc7d5ba0243a83265017d427d74
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Chris-Down/printk-Userspace-format-enumeration-support/20210310-103231
git checkout 097254e932767fc7d5ba0243a83265017d427d74
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=mips
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <[email protected]>
All errors (new ones prefixed by >>):
>> ld.lld: error: undefined symbol: printk
>>> referenced by kernel/genex.o:(handle_mcheck) in archive arch/mips/built-in.a
>>> referenced by kernel/genex.o:(handle_reserved) in archive arch/mips/built-in.a
>>> did you mean: _printk
>>> defined in: kernel/built-in.a(printk/printk.o)
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/[email protected]
On Wed 2021-03-10 12:17:51, Chris Down wrote:
> Hey Petr,
>
> Chris Down writes:
> > $ head -1 vmlinux; shuf -n 5 vmlinux
> > # <level[,flags]> filename:line function "format"
> > <5> block/blk-settings.c:661 disk_stack_limits "%s: Warning: Device %s is misaligned\n"
> > <4> kernel/trace/trace.c:8296 trace_create_file "Could not create tracefs '%s' entry\n"
> > <6> arch/x86/kernel/hpet.c:144 _hpet_print_config "hpet: %s(%d):\n"
> > <6> init/do_mounts.c:605 prepare_namespace "Waiting for root device %s...\n"
> > <6> drivers/acpi/osl.c:1410 acpi_no_auto_serialize_setup "ACPI: auto-serialization disabled\n"
>
> Regardless of any of the internals, how does this format look to you? I ask
> because the sooner we agree on the format, the sooner I can provide an
> interim version of this patch to internal customers, even if the eventual
> implementation changes a little :-)
It looks good to me.
I do not have any better idea. And I believe that the filename, line,
and function name might be useful.
Best Regards,
Petr
On Wed 2021-03-10 13:16:43, Greg Kroah-Hartman wrote:
> On Wed, Mar 10, 2021 at 12:12:57PM +0000, Chris Down wrote:
> > Greg Kroah-Hartman writes:
> > > On Wed, Mar 10, 2021 at 02:30:31AM +0000, Chris Down wrote:
> > > > + ps->file = debugfs_create_file(pi_get_module_name(mod), 0444, dfs_index,
> > > > + ps, &dfs_index_fops);
> > > > +
> > > > + if (IS_ERR(ps->file)) {
> > > > + pi_sec_remove(mod);
> > > > + return;
> > > > + }
> > >
> > > No need to check this and try to clean up if there is a problem, just
> > > save the pointer off and call debugfs_remove() when you want to clean
> > > up.
> >
> > Petr, what are your thoughts on this, since you requested the cleanup on
> > debugfs failure? :-)
>
> There is nothing to "clean up" if there is a debugfs failure here so I
> don't see the need.
My main concern is that the allocated struct pi_sec must not be leaked
when debugfs file was not created.
I still have to check if it would be freed even without the file
when the module is going out.
> > > Or better yet, no need to save anything, you can always look it up when
> > > you want to remove it, that will save you one pointer per module.
> >
> > That's a good point, and with that maybe we can even do away with the pi_sec
> > entirely then since that only leaves start/end pointers which we can
> > calculate on demand from existing data.
>
> Please do, that makes the code simpler overall.
Yup, that might make things even easier. Well, I still have to go and
try to better understand the new patch.
Best Regards,
Petr
On Thu, Mar 11, 2021 at 10:34:46AM +0100, Petr Mladek wrote:
> On Wed 2021-03-10 13:16:43, Greg Kroah-Hartman wrote:
> > On Wed, Mar 10, 2021 at 12:12:57PM +0000, Chris Down wrote:
> > > Greg Kroah-Hartman writes:
> > > > On Wed, Mar 10, 2021 at 02:30:31AM +0000, Chris Down wrote:
> > > > > + ps->file = debugfs_create_file(pi_get_module_name(mod), 0444, dfs_index,
> > > > > + ps, &dfs_index_fops);
> > > > > +
> > > > > + if (IS_ERR(ps->file)) {
> > > > > + pi_sec_remove(mod);
> > > > > + return;
> > > > > + }
> > > >
> > > > No need to check this and try to clean up if there is a problem, just
> > > > save the pointer off and call debugfs_remove() when you want to clean
> > > > up.
> > >
> > > Petr, what are your thoughts on this, since you requested the cleanup on
> > > debugfs failure? :-)
> >
> > There is nothing to "clean up" if there is a debugfs failure here so I
> > don't see the need.
>
> My main concern is that the allocated struct pi_sec must not be leaked
> when debugfs file was not created.
>
> I still have to check if it would be freed even without the file
> when the module is going out.
To me it looks like it still will happen as pi_sec_remove() will be
called either way.
thanks,
greg k-h
On Wed 2021-03-10 02:30:31, Chris Down wrote:
> We have a number of systems industry-wide that have a subset of their
> functionality that works as follows:
>
> 1. Receive a message from local kmsg, serial console, or netconsole;
> 2. Apply a set of rules to classify the message;
> 3. Do something based on this classification (like scheduling a
> remediation for the machine), rinse, and repeat.
>
> As a couple of examples of places we have this implemented just inside
> Facebook, although this isn't a Facebook-specific problem, we have this
> inside our netconsole processing (for alarm classification), and as part
> of our machine health checking. We use these messages to determine
> fairly important metrics around production health, and it's important
> that we get them right.
>
> While for some kinds of issues we have counters, tracepoints, or metrics
> with a stable interface which can reliably indicate the issue, in order
> to react to production issues quickly we need to work with the interface
> which most kernel developers naturally use when developing: printk.
> This patch provides a solution to the issue of silently changed or
> deleted printks: we record pointers to all printk format strings known
> at compile time into a new .printk_index section, both in vmlinux and
> modules. At runtime, this can then be iterated by looking at
> <debugfs>/printk/index/<module>, which emits the following format, both
> readable by humans and able to be parsed by machines:
>
> $ head -1 vmlinux; shuf -n 5 vmlinux
> # <level[,flags]> filename:line function "format"
> <5> block/blk-settings.c:661 disk_stack_limits "%s: Warning: Device %s is misaligned\n"
> <4> kernel/trace/trace.c:8296 trace_create_file "Could not create tracefs '%s' entry\n"
> <6> arch/x86/kernel/hpet.c:144 _hpet_print_config "hpet: %s(%d):\n"
> <6> init/do_mounts.c:605 prepare_namespace "Waiting for root device %s...\n"
> <6> drivers/acpi/osl.c:1410 acpi_no_auto_serialize_setup "ACPI: auto-serialization disabled\n"
>
> diff --git a/fs/seq_file.c b/fs/seq_file.c
> index 71a274e7f903..0fd3ae1051d9 100644
> --- a/fs/seq_file.c
> +++ b/fs/seq_file.c
> @@ -376,6 +376,27 @@ void seq_escape(struct seq_file *m, const char *s, const char *esc)
> }
> EXPORT_SYMBOL(seq_escape);
>
> +/**
> + * seq_escape_printf_format - print string into buffer, escaping
> + * characters that are escaped in printf format (including '"')
> + * @m: target buffer
> + * @s: string
> + *
> + * Puts string into buffer and escape characters that are
> + * escaped in printf format.
> + * Use seq_has_overflowed() to check for errors.
> + */
> +void seq_escape_printf_format(struct seq_file *m, const char *s)
> +{
> + char *buf;
> + size_t size = seq_get_buf(m, &buf);
> + int ret;
> +
> + ret = string_escape_str(s, buf, size, ESCAPE_PRINTF, NULL);
> + seq_commit(m, ret < size ? ret : -1);
> +}
> +EXPORT_SYMBOL(seq_escape_printf_format);
> +
Please, create this API in a separate patch and add Al Viro into CC.
> void seq_escape_mem_ascii(struct seq_file *m, const char *src, size_t isz)
> {
> char *buf;
> diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
> index 34b7e0d2346c..d4e45714405f 100644
> --- a/include/asm-generic/vmlinux.lds.h
> +++ b/include/asm-generic/vmlinux.lds.h
> @@ -309,6 +309,17 @@
> #define ACPI_PROBE_TABLE(name)
> #endif
>
> +#ifdef CONFIG_PRINTK_INDEX
> +#define PRINTK_INDEX \
> + .printk_index : AT(ADDR(.printk_index) - LOAD_OFFSET) { \
> + __start_printk_index = .; \
> + *(.printk_index) \
> + __stop_printk_index = .; \
> + }
> +#else
> +#define PRINTK_INDEX
> +#endif
Please, move this below #define TRACEDATA. We should follow the
existing ordering of these definitions.
> +
> #ifdef CONFIG_THERMAL
> #define THERMAL_TABLE(name) \
> . = ALIGN(8); \
> @@ -480,6 +491,8 @@
> \
> TRACEDATA \
> \
> + PRINTK_INDEX \
> + \
> /* Kernel symbol table: Normal symbols */ \
> __ksymtab : AT(ADDR(__ksymtab) - LOAD_OFFSET) { \
> __start___ksymtab = .; \
> diff --git a/include/linux/module.h b/include/linux/module.h
> index 7a0bcb5b1ffc..5d466b4a23b9 100644
> --- a/include/linux/module.h
> +++ b/include/linux/module.h
> @@ -516,6 +516,12 @@ struct module {
> struct klp_modinfo *klp_info;
> #endif
>
> +#ifdef CONFIG_PRINTK_INDEX
> + unsigned int printk_index_size;
> + struct pi_object *printk_index_start;
Sigh, you probably got somehow confused by the pi_object name
that I proposed last time. It was intended for a structure
describing the whole module or vmlinux. You are using
it here for one entry in the format table.
> + struct pi_sec *pi_sec;
It seems that it will be enough to store pointer to
the dentry.
> +#endif
So, plese use:
#ifdef CONFIG_PRINTK_INDEX
struct pi_entry *pi_entries;
unsigned int pi_num_entries;
struct dentry *pi_file;
#endif
"pi_" prefix fits the rest of the API. "entry" will get
explained below in printk.h.
Also, please, add Jessica Yu <[email protected]> into CC. She should be
aware of this change.
> +
> #ifdef CONFIG_MODULE_UNLOAD
> /* What modules depend on me? */
> struct list_head source_list;
> diff --git a/include/linux/printk.h b/include/linux/printk.h
> index d3c08095a9a3..cd2977df820e 100644
> --- a/include/linux/printk.h
> +++ b/include/linux/printk.h
> @@ -164,6 +164,11 @@ static inline void printk_nmi_direct_exit(void) { }
> struct dev_printk_info;
>
> #ifdef CONFIG_PRINTK
> +enum log_flags {
> + LOG_NEWLINE = 2, /* text ended with a newline */
> + LOG_CONT = 8, /* text is a fragment of a continuation line */
> +};
Please, rename the structure to "printk_info_flags" and put it into
kernel/printk/printk_ringbuffer.h
These values are used by @flags in struct printk_info. It would even
makes sense to replace
- u8 flags:5; /* internal record flags */
+ enum printk_info_flags flags:5; /* internal record flags */
But I am not sure if enum might be a bitfield in C.
The name "log_flags" comes from the commit 084681d14e429cb6
"printk: flush continuation lines immediately to console". It was
stored in struct log.
The name was misleading. "struct log" contained meta information for
a single printk record (line). But the name "log" suggests that it is
a global flag for the entire log.
We should have renamed the structure when replacing "struct log" but
we missed it.
Please, do this in a separate patch. Also please rename the variables
using this type from "lflags" to "info_flags".
> +
> asmlinkage __printf(4, 0)
> int vprintk_emit(int facility, int level,
> const struct dev_printk_info *dev_info,
> @@ -173,12 +178,12 @@ asmlinkage __printf(1, 0)
> int vprintk(const char *fmt, va_list args);
>
> asmlinkage __printf(1, 2) __cold
> -int printk(const char *fmt, ...);
> +int _printk(const char *fmt, ...);
>
> /*
> * Special printk facility for scheduler/timekeeping use only, _DO_NOT_USE_ !
> */
> -__printf(1, 2) __cold int printk_deferred(const char *fmt, ...);
> +__printf(1, 2) __cold int _printk_deferred(const char *fmt, ...);
>
> /*
> * Please don't use printk_ratelimit(), because it shares ratelimiting state
> @@ -206,6 +211,7 @@ void __init setup_log_buf(int early);
> __printf(1, 2) void dump_stack_set_arch_desc(const char *fmt, ...);
> void dump_stack_print_info(const char *log_lvl);
> void show_regs_print_info(const char *log_lvl);
> +u16 parse_prefix(const char *text, int *level, enum log_flags *lflags);
Please, rename it to printk_parse_prefix() and define it in
kernel/printk/internal.h.
> extern asmlinkage void dump_stack(void) __cold;
> extern void printk_safe_flush(void);
> extern void printk_safe_flush_on_panic(void);
> @@ -301,6 +307,64 @@ extern int kptr_restrict;
> #define pr_fmt(fmt) fmt
> #endif
>
> +struct module;
> +
> +#ifdef CONFIG_PRINTK_INDEX
> +extern void pi_sec_store(struct module *mod);
> +extern void pi_sec_remove(struct module *mod);
> +
> +struct pi_object {
> + const char *fmt;
> + const char *func;
> + const char *file;
> + unsigned int line;
> +};
It seems that you got confused by pi_object name that I proposed last time.
"object" was supposed to be used for structure storing information
about module or vmlinux. It is "struct pi_sec" in this patch.
You used it for storing the information about the particular format.
This structure did not exist in the previous patch at all.
The name makes some sense even this way. But it confuses me because
we use it in livepatch code the other way.
Please, rename this structure to "pi_entry"?
> +
> +extern struct pi_object __start_printk_index[];
> +extern struct pi_object __stop_printk_index[];
These are used only in index.c. It should be enough to do this
declaration there.
> +
> +#define pi_sec_elf_embed(_p_func, _fmt, ...) \
> + ({ \
> + int _p_ret; \
> + \
> + if (__builtin_constant_p(_fmt)) { \
> + /*
> + * The compiler may not be able to eliminate this, so
> + * we need to make sure that it doesn't see any
> + * hypothetical assignment for non-constants even
> + * though this is already inside the
> + * __builtin_constant_p guard.
> + */ \
> + static struct pi_object _pi \
> + __section(".printk_index") = { \
> + .fmt = __builtin_constant_p(_fmt) ? (_fmt) : NULL, \
> + .func = __func__, \
> + .file = __FILE__, \
> + .line = __LINE__, \
> + }; \
> + _p_ret = _p_func(_pi.fmt, ##__VA_ARGS__); \
> + } else \
> + _p_ret = _p_func(_fmt, ##__VA_ARGS__); \
> + \
> + _p_ret; \
> + })
> +
> +#define printk(fmt, ...) pi_sec_elf_embed(_printk, fmt, ##__VA_ARGS__)
> +#define printk_deferred(fmt, ...) \
> + pi_sec_elf_embed(_printk_deferred, fmt, ##__VA_ARGS__)
> +#else /* !CONFIG_PRINTK_INDEX */
> +static inline void pi_sec_store(struct module *mod)
> +{
> +}
> +
> +static inline void pi_sec_remove(struct module *mod)
> +{
> +}
> +
> +#define printk(...) _printk(__VA_ARGS__)
> +#define printk_deferred(...) _printk_deferred(__VA_ARGS__)
> +#endif /* CONFIG_PRINTK_INDEX */
> +
> /**
> * pr_emerg - Print an emergency-level message
> * @fmt: format string
> diff --git a/init/Kconfig b/init/Kconfig
> index ab2e916f7e8b..085190760b49 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -764,6 +764,20 @@ config PRINTK_SAFE_LOG_BUF_SHIFT
> 13 => 8 KB for each CPU
> 12 => 4 KB for each CPU
>
> +config PRINTK_INDEX
> + bool "Printk indexing debugfs interface"
> + depends on PRINTK && DEBUG_FS
> + help
> + Add support for indexing of all printk formats known at compile time
> + at <debugfs>/printk/index/<module>.
> +
> + This can be used as part of maintaining daemons which monitor
> + /dev/kmsg, as it permits auditing the printk formats present in a
> + kernel, allowing monitoring of cases where monitored printks are
Nit: I would say: "allowing to detect cases where motitored printks are..."
I understand monitoring as a continuous process while the printk
formats are likely checked only once when the monitor gets started.
> + changed or no longer present.
> +
> + There is no additional runtime cost to printk with this enabled.
> +
> #
> # Architectures with an unreliable sched_clock() should select this:
> #
> diff --git a/kernel/module.c b/kernel/module.c
> index 1e5aad812310..44df2913a046 100644
> --- a/kernel/module.c
> +++ b/kernel/module.c
> @@ -1064,6 +1064,7 @@ SYSCALL_DEFINE2(delete_module, const char __user *, name_user,
> blocking_notifier_call_chain(&module_notify_list,
> MODULE_STATE_GOING, mod);
> klp_module_going(mod);
> + pi_sec_remove(mod);
Is there any particular reason why this is not done via the module
notifier, please?
Other subsystems hardcode their callbacks here only when they
require some special ordering that could not be achieved by
the notifiers.
The hardcoded callbacks complicate the error paths in
the module loader code.
Using notifiers would also help to avoid several declarations
in the global printk.h.
> ftrace_release_mod(mod);
>
> async_synchronize_full();
> diff --git a/kernel/printk/index.c b/kernel/printk/index.c
> new file mode 100644
> index 000000000000..8765d982c8d9
> --- /dev/null
> +++ b/kernel/printk/index.c
> @@ -0,0 +1,183 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * printk/index.c - Userspace indexing of printk formats
> + */
> +
> +#include <linux/debugfs.h>
> +#include <linux/module.h>
> +#include <linux/printk.h>
> +#include <linux/slab.h>
> +
> +/**
> + * struct pi_sec - printk index section metadata
> + *
> + * @file: The debugfs file where userspace can index these printk formats
> + * @start: Section start boundary
> + * @end: Section end boundary
> + *
> + * Allocated and populated by pi_sec_store.
> + *
> + * @mod is NULL if the printk formats in question are built in to vmlinux
> + * itself.
@mod is not longer in this structure.
> + *
> + * @file may be an ERR_PTR value if the file or one of its ancestors was not
> + * successfully created.
> + */
> +struct pi_sec {
> + struct dentry *file;
> + struct pi_object *start;
> + struct pi_object *end;
> +};
This structure should not be needed, see below.
> +
> +/* The base dir for module formats, typically debugfs/printk/index/ */
> +struct dentry *dfs_index;
> +
> +#ifdef CONFIG_MODULES
> +static const char *pi_get_module_name(struct module *mod)
> +{
> + return mod ? mod->name : "vmlinux";
> +}
> +
> +void pi_sec_remove(struct module *mod)
> +{
> + if (!mod || !mod->pi_sec)
> + return;
> +
> + debugfs_remove(mod->pi_sec->file);
> + kfree(mod->pi_sec);
> + mod->pi_sec = NULL;
> +}
> +#else
> +static const char *pi_get_module_name(struct module *mod)
> +{
> + return "vmlinux";
> +}
> +
> +void pi_sec_remove(struct module *mod)
> +{
> +}
> +#endif
>
> +static void *pi_next(struct seq_file *s, void *v, loff_t *pos)
> +{
> + const struct pi_sec *ps = s->file->f_inode->i_private;
> + struct pi_object *pi = NULL;
Please, call the variables by the content and not by prefix.
A variable called "pi" might include anything used by "pi" API.
> + loff_t idx = *pos - 1;
Why is -1 here, please?
Later note: I guess that it is because you wanted to reuse
this in pi_start(). Uff, it made the code really
twisted.
> +
> + ++*pos;
> +
> + if (idx == -1)
> + return SEQ_START_TOKEN;
This would not happen if you did not substraced the one above.
> +
> + pi = ps->start + idx;
> +
> + return pi < ps->end ? pi : NULL;
> +}
Honestly, the names and the coding style made the function almost
unreadable.
OK, I suggest that pi_next will return pointer to struct pi_entry.
The code might look like:
static struct pi_entry *pi_get_entry(struct module *mod, int idx)
{
struct pi_entry *entries;
int num_entries;
if (mod) {
entries = mod->entries;
num_entries = num->num_entries;
} else {
entries = vmlinux_entries;
num_entries = vmlinux_num_entries;
}
if (idx >= num_entries)
return NULL;
return entries[idx];
}
static void *pi_next(struct seq_file *s, void *v, loff_t *pos)
{
const struct module *mod = (struct module*)s->file->f_inode->i_private;
struct pi_entry *entry;
entry = pi_get_entry(mod, *pos);
*(pos)++;
return entry;
}
> +
> +static void *pi_start(struct seq_file *s, loff_t *pos)
> +{
> + return pi_next(s, NULL, pos);
Grr, you complicated pi_next() just return SEQ_START_TOKEN here
and keep pos as is? The following should do the same job:
static void *pi_start(struct seq_file *s, loff_t *pos)
{
return SEQ_START_TOKEN;
}
Do not forget to move it above pi_next() where it belongs logically.
> +}
> +
> +static int pi_show(struct seq_file *s, void *v)
> +{
We are passing struct pi_entry via "v" pointer now:
static int pi_show(struct seq_file *s, void *v)
{
const struct pi_entry *entry = v;
int level = LOGLEVEL_DEFAULT;
enum printk_info_flags info_flags = 0;
u16 prefix_len;
if (v == SEQ_START_TOKEN) {
seq_puts(s,
"# <level[,flags]> filename:line function \"format\"\n");
return 0;
}
prefix_len = printk_parse_prefix(entry->fmt, &level, &lflags);
seq_printf(s, "<%d%s> %s:%d %s \"",
level, info_flags & LOG_CONT ? ",c" : "", entry->file,
entry->line, entry->func);
seq_escape_printf_format(s, entry->fmt + prefix_len);
seq_puts(s, "\"\n");
return 0;
}
> +static void pi_stop(struct seq_file *p, void *v)
> +{
> +}
> +
> +static const struct seq_operations dfs_index_seq_ops = {
> + .start = pi_start,
> + .next = pi_next,
> + .show = pi_show,
> + .stop = pi_stop,
> +};
> +
> +
> +static int pi_open(struct inode *inode, struct file *file)
> +{
> + return seq_open(file, &dfs_index_seq_ops);
> +}
> +
> +static const struct file_operations dfs_index_fops = {
> + .open = pi_open,
> + .read = seq_read,
> + .llseek = seq_lseek,
> + .release = seq_release
> +};
> +
> +
> +void pi_sec_store(struct module *mod)
Please, rename this to pi_add_module(). We do not really store
anything here. I would write the code this way:
static struct dentry *pi_create_file(struct module *mod, cost char *name)
{
return debugfs_create_file(name, 0444, dfs_index,
mod, &dfs_index_fops);
}
static void pi_remove_file(struct dentry *dentry)
{
debugfs_remove(dentry);
}
static struct pi_entry *vmlinux_entries;
static int vmlinux_num_entries;
static int __init pi_init(void)
{
struct dentry *dfs_root;
dfs_root = debugfs_create_dir("printk", NULL);
dfs_index = debugfs_create_dir("index", dfs_root);
/* vmlinux is handled via local variables */
vmlinux_entries = __start_printk_index;
vmlinux_num_entries = __stop_printk_index - __start_printk_index;
pi_create_file(NULL, "vmlinux");
return 0;
}
The modules should be handled by notifiers:
static int pi_module_notify(struct notifier_block *nb, unsigned long op,
void *data)
struct module *mod = data;
switch (op) {
case MODULE_STATE_COMING:
mod->pi_file = pi_create_file(mod, mod->name);
break;
case MODULE_STATE_GOING:
pi_remove_file(mod->pi_file);
break;
}
}
return NOTIFY_OK;
}
> diff --git a/lib/string_helpers.c b/lib/string_helpers.c
> index 7f2d5fbaf243..86bee1b4d392 100644
> --- a/lib/string_helpers.c
> +++ b/lib/string_helpers.c
> @@ -436,6 +436,24 @@ static bool escape_hex(unsigned char c, char **dst, char *end)
> return true;
> }
>
> +static bool escape_quote(unsigned char c, char **dst, char *end)
> +{
> + char *out = *dst;
> +
> + if (c != '"')
> + return false;
> +
> + if (out < end)
> + *out = '\\';
> + ++out;
> + if (out < end)
> + *out = '"';
> + ++out;
> +
> + *dst = out;
> + return true;
Hmm, we should bundle this into escape_special(). The opposite
unescape_special() already handles the double quotes.
IMHO, the escape() and unescape() API should be in sync.
Of course, we need to make sure that it does not break some
existing users.
Please, try to udpate escape_special() and check if it is OK
for the existing users.
Anyway, please, update this API in a separate patch. Also add
Andy Shevchenko <[email protected]>
and Rasmus Villemoes <[email protected]> into CC.
> +}
> +
Please, try to put more effort into creating the function and
variable names. I know that I am probably too picky about it.
But you seem to be the other extreme.
Inconsistent, ambiguous, or meaningless names might make even few
lines of code hard to follow. It makes it write-only.
It is hard to review and maintain.
Best Regards,
Petr
Ack to all unmentioned suggestions. :-)
Petr Mladek writes:
>> + changed or no longer present.
>> +
>> + There is no additional runtime cost to printk with this enabled.
>> +
>> #
>> # Architectures with an unreliable sched_clock() should select this:
>> #
>> diff --git a/kernel/module.c b/kernel/module.c
>> index 1e5aad812310..44df2913a046 100644
>> --- a/kernel/module.c
>> +++ b/kernel/module.c
>> @@ -1064,6 +1064,7 @@ SYSCALL_DEFINE2(delete_module, const char __user *, name_user,
>> blocking_notifier_call_chain(&module_notify_list,
>> MODULE_STATE_GOING, mod);
>> klp_module_going(mod);
>> + pi_sec_remove(mod);
>
>Is there any particular reason why this is not done via the module
>notifier, please?
>
>Other subsystems hardcode their callbacks here only when they
>require some special ordering that could not be achieved by
>the notifiers.
>
>The hardcoded callbacks complicate the error paths in
>the module loader code.
Oh! That's exactly what I feel as well, but I mistakenly thought that's what
you were asking for in the feedback for v4. Turns out I misread your statement
about storing the pointer to `struct module` (hence my message last time
querying whether it was sensible or not) as being about not using the module
notifier. Mea culpa.
>> +static void *pi_next(struct seq_file *s, void *v, loff_t *pos)
>> +{
>> + const struct pi_sec *ps = s->file->f_inode->i_private;
>> + struct pi_object *pi = NULL;
>
>Please, call the variables by the content and not by prefix.
>A variable called "pi" might include anything used by "pi" API.
>
>[...]
>
>Please, try to put more effort into creating the function and
>variable names. I know that I am probably too picky about it.
>But you seem to be the other extreme.
>
>Inconsistent, ambiguous, or meaningless names might make even few
>lines of code hard to follow. It makes it write-only.
>It is hard to review and maintain.
Hmm, I'd even say that I agree with this statement, but as I understand it a
`pi` variable always means pi_object, and `ps` always means pi_sec. I'm not
immediately seeing it as meaningless or ambiguous (although maybe your concern
was more abstractly aesthetic with overlapping the `pi_` prefix?).
The "content" here is pretty abstract, so I'm not quite sure what your
suggestion for naming them based on content is. Maybe (assuming it doesn't just
disappear, which it seems it will) a pi_sec named sec, and the pi_object named
fmt_index? I don't feel strongly that this is more clear though, so maybe you
mean something else?
On Fri 2021-03-12 13:53:20, Chris Down wrote:
> Ack to all unmentioned suggestions. :-)
>
> Petr Mladek writes:
> > > + changed or no longer present.
> > > +
> > > + There is no additional runtime cost to printk with this enabled.
> > > +
> > > #
> > > # Architectures with an unreliable sched_clock() should select this:
> > > #
> > > diff --git a/kernel/module.c b/kernel/module.c
> > > index 1e5aad812310..44df2913a046 100644
> > > --- a/kernel/module.c
> > > +++ b/kernel/module.c
> > > @@ -1064,6 +1064,7 @@ SYSCALL_DEFINE2(delete_module, const char __user *, name_user,
> > > blocking_notifier_call_chain(&module_notify_list,
> > > MODULE_STATE_GOING, mod);
> > > klp_module_going(mod);
> > > + pi_sec_remove(mod);
> >
> > Is there any particular reason why this is not done via the module
> > notifier, please?
> >
> > Other subsystems hardcode their callbacks here only when they
> > require some special ordering that could not be achieved by
> > the notifiers.
> >
> > The hardcoded callbacks complicate the error paths in
> > the module loader code.
>
> Oh! That's exactly what I feel as well, but I mistakenly thought that's what
> you were asking for in the feedback for v4. Turns out I misread your
> statement about storing the pointer to `struct module` (hence my message
> last time querying whether it was sensible or not) as being about not using
> the module notifier. Mea culpa.
>
> > > +static void *pi_next(struct seq_file *s, void *v, loff_t *pos)
> > > +{
> > > + const struct pi_sec *ps = s->file->f_inode->i_private;
> > > + struct pi_object *pi = NULL;
> >
> > Please, call the variables by the content and not by prefix.
> > A variable called "pi" might include anything used by "pi" API.
> >
> > [...]
> >
> > Please, try to put more effort into creating the function and
> > variable names. I know that I am probably too picky about it.
> > But you seem to be the other extreme.
> >
> > Inconsistent, ambiguous, or meaningless names might make even few
> > lines of code hard to follow. It makes it write-only.
> > It is hard to review and maintain.
>
> Hmm, I'd even say that I agree with this statement, but as I understand it a
> `pi` variable always means pi_object, and `ps` always means pi_sec. I'm not
> immediately seeing it as meaningless or ambiguous (although maybe your
> concern was more abstractly aesthetic with overlapping the `pi_` prefix?).
"incosistent" was more about the previous (v4) version. v5 was fine
from this POV.
"ambiguous" and "meaningless" was primary about "pi" variable. "pi"
was used as prefix for all functions and structure names in the API.
The variable "pi" might mean any piece of information from this API.
For me it had similar meaning as "x" in the meaning of anything.
Better name would have been "object" because it was a pointer
to struct pi_object.
Another problematic name was "struct pi_sec". It makes sense for
storing start and end pointers to the elf section. But the dentry
pointer has nothing to do with an elf section.
> The "content" here is pretty abstract, so I'm not quite sure what your
> suggestion for naming them based on content is. Maybe (assuming it doesn't
> just disappear, which it seems it will) a pi_sec named sec
My problem with pi_sec is explained above. Anyway, it seems that it
will disappear.
> , and the pi_object named fmt_index?
I do not see "fmt_index" mentioned anywhere in v5 or v4. I suggested
to use "pi_entry" instead of "pi_object".
I agree that "object" and "entry" are equally abstract. As I said,
I prefer "entry" because I maintain also kernel/livepatch code
and we use "object" there in other meaning.
Alternative names would be struct pi_fmt_info or pi_fmt_rec like
metainformation or record about the printk format. They are
more specific than "entry".
> I don't feel strongly that this is more clear though, so maybe you
> mean something else?
I was pretty tired when reviewing the patch. It was not easy for me
to create the mental model of the code. I felt that some other names
would have made it easier.
Also the tricky pi_next() implementation did not help much. It looks
like you changed the code many times to get it working and did not
clean it at the end.
Best Regards,
Petr
Petr Mladek writes:
>> I don't feel strongly that this is more clear though, so maybe you
>> mean something else?
>
>I was pretty tired when reviewing the patch. It was not easy for me
>to create the mental model of the code. I felt that some other names
>would have made it easier.
>
>Also the tricky pi_next() implementation did not help much. It looks
>like you changed the code many times to get it working and did not
>clean it at the end.
No worries. I'm not totally clear on what you're asking for though: do you
meant that you'd like the SEQ_START_TOKEN logic to only be present for
pi_start, or to pull out the logic currently in pi_next into another function
and call it from both, or?
In my mind, pi_start is really just a special case of pi_next, so the code flow
makes sense to me. I'm happy to change it to whatever you like, but it's not
immediately obvious to me what that is :-)
On 10/03/2021 03.30, Chris Down wrote:
> ---
> MAINTAINERS | 5 +
> arch/arm/kernel/entry-v7m.S | 2 +-
> arch/arm/lib/backtrace-clang.S | 2 +-
> arch/arm/lib/backtrace.S | 2 +-
> arch/arm/mach-rpc/io-acorn.S | 2 +-
> arch/arm/vfp/vfphw.S | 6 +-
> arch/ia64/include/uapi/asm/cmpxchg.h | 4 +-
> arch/openrisc/kernel/entry.S | 6 +-
> arch/powerpc/kernel/head_fsl_booke.S | 2 +-
> arch/um/include/shared/user.h | 3 +-
> arch/x86/kernel/head_32.S | 2 +-
> fs/seq_file.c | 21 +++
> include/asm-generic/vmlinux.lds.h | 13 ++
> include/linux/module.h | 6 +
> include/linux/printk.h | 72 ++++++++++-
> include/linux/seq_file.h | 1 +
> include/linux/string_helpers.h | 2 +
> init/Kconfig | 14 ++
> kernel/module.c | 14 +-
> kernel/printk/Makefile | 1 +
> kernel/printk/index.c | 183 +++++++++++++++++++++++++++
> kernel/printk/printk.c | 20 ++-
> lib/string_helpers.c | 29 ++++-
> lib/test-string_helpers.c | 6 +
> 24 files changed, 386 insertions(+), 32 deletions(-)
> create mode 100644 kernel/printk/index.c
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 3353de0c4bc8..328b3e822223 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -14314,6 +14314,11 @@ S: Maintained
> F: include/linux/printk.h
> F: kernel/printk/
>
> +PRINTK INDEXING
> +R: Chris Down <[email protected]>
> +S: Maintained
> +F: kernel/printk/index.c
> +
> PRISM54 WIRELESS DRIVER
> M: Luis Chamberlain <[email protected]>
> L: [email protected]
> diff --git a/arch/arm/kernel/entry-v7m.S b/arch/arm/kernel/entry-v7m.S
> index d0e898608d30..7bde93c10962 100644
> --- a/arch/arm/kernel/entry-v7m.S
> +++ b/arch/arm/kernel/entry-v7m.S
> @@ -23,7 +23,7 @@ __invalid_entry:
> adr r0, strerr
> mrs r1, ipsr
> mov r2, lr
> - bl printk
> + bl _printk
I think it's pointless renaming the symbol to _printk, with all the
churn and reduced readability that involves (especially when reading
assembly "why are we calling _printk and not printk here?"). There's
nothing wrong with providing a macro wrapper by the same name
#define printk(bla bla) ({ do_stuff; printk(bla bla); })
Only two places would need to be updated to surround the word printk in
parentheses to suppress macro expansion: The declaration and the
definition of printk. I.e.
int (printk)(const char *s, ...)
>
> +struct module;
> +
> +#ifdef CONFIG_PRINTK_INDEX
> +extern void pi_sec_store(struct module *mod);
> +extern void pi_sec_remove(struct module *mod);
> +
> +struct pi_object {
> + const char *fmt;
> + const char *func;
> + const char *file;
> + unsigned int line;
> +};
> +
> +extern struct pi_object __start_printk_index[];
> +extern struct pi_object __stop_printk_index[];
Do you need these declarations to be visible to the whole kernel? Can't
they live in printk/index.c?
> +
> +#define pi_sec_elf_embed(_p_func, _fmt, ...) \
> + ({ \
> + int _p_ret; \
> + \
> + if (__builtin_constant_p(_fmt)) { \
> + /*
> + * The compiler may not be able to eliminate this, so
> + * we need to make sure that it doesn't see any
> + * hypothetical assignment for non-constants even
> + * though this is already inside the
> + * __builtin_constant_p guard.
> + */ \
> + static struct pi_object _pi \
static const struct pi_object?
> + __section(".printk_index") = { \
> + .fmt = __builtin_constant_p(_fmt) ? (_fmt) : NULL, \
> + .func = __func__, \
> + .file = __FILE__, \
> + .line = __LINE__, \
> + }; \
> + _p_ret = _p_func(_pi.fmt, ##__VA_ARGS__); \
Is the use of _pi.fmt here a trick to prevent gcc from eliding the _pi
object, so it is seen as "used"? That seems a bit fragile, especially if
the compiler ends up generating the same code in .text - that means gcc
does not load the format string from the _pi object (which it
shouldn't), but then I don't see why it (or the next version of gcc)
couldn't realize that _pi is indeed unused.
There's the __used attribute precisely for this kind of thing. Then you
could also eliminate
> + } else \
> + _p_ret = _p_func(_fmt, ##__VA_ARGS__); \
> + \
this and the _p_ret variable
> + _p_ret; \
and just end the ({}) with _p_func(_fmt, ##__VA_ARGS__);
That would also allow you to more easily wrap, say, dev_printk(), which
returns void - it seems that by not handling dev_printk and friends
you're missing quite a few format strings.
Rasmus
On Mon 2021-03-15 12:20:59, Chris Down wrote:
> Petr Mladek writes:
> > > I don't feel strongly that this is more clear though, so maybe you
> > > mean something else?
> >
> > I was pretty tired when reviewing the patch. It was not easy for me
> > to create the mental model of the code. I felt that some other names
> > would have made it easier.
> >
> > Also the tricky pi_next() implementation did not help much. It looks
> > like you changed the code many times to get it working and did not
> > clean it at the end.
>
> No worries. I'm not totally clear on what you're asking for though: do you
> meant that you'd like the SEQ_START_TOKEN logic to only be present for
> pi_start, or to pull out the logic currently in pi_next into another
> function and call it from both, or?
>
> In my mind, pi_start is really just a special case of pi_next, so the code
> flow makes sense to me. I'm happy to change it to whatever you like, but
> it's not immediately obvious to me what that is :-)
Good question! I have missed that pi_start() can be called also with non-zero
pos when seeking.
OK, pi_start() has to handle pos == 0 special way, so let's handle it
there. Call pi_next() only when pos != 0.
The following code should do the job. I took this from my previous reply.
It is already based on the other suggested changes:
static struct pi_entry *pi_get_entry(struct module *mod, int idx)
{
struct pi_entry *entries;
int num_entries;
if (mod) {
entries = mod->entries;
num_entries = num->num_entries;
} else {
entries = vmlinux_entries;
num_entries = vmlinux_num_entries;
}
if (idx >= num_entries)
return NULL;
return entries[idx];
}
static void *pi_next(struct seq_file *s, void *v, loff_t *pos)
{
const struct module *mod = (struct module*)s->file->f_inode->i_private;
struct pi_entry *entry;
entry = pi_get_entry(mod, *pos);
*(pos)++;
return entry;
}
static void *pi_start(struct seq_file *s, loff_t *pos)
{
/*
* Make show() print the header line. Do not update *pos
* because pi_next() still has to return entry on the index zero.
*/
if (*pos == 0)
return SEQ_START_TOKEN;
return pi_next(s, NULL, pos);
}
Best Regards,
Petr
Petr Mladek writes:
>> In my mind, pi_start is really just a special case of pi_next, so the code
>> flow makes sense to me. I'm happy to change it to whatever you like, but
>> it's not immediately obvious to me what that is :-)
>
>Good question! I have missed that pi_start() can be called also with non-zero
>pos when seeking.
>
>OK, pi_start() has to handle pos == 0 special way, so let's handle it
>there. Call pi_next() only when pos != 0.
>
>The following code should do the job. I took this from my previous reply.
>It is already based on the other suggested changes:
That also looks fine, thanks. I'll hopefully have time to send v6 this week.
Thanks for your detailed feedback! :-)
Rasmus Villemoes writes:
>I think it's pointless renaming the symbol to _printk, with all the
>churn and reduced readability that involves (especially when reading
>assembly "why are we calling _printk and not printk here?"). There's
>nothing wrong with providing a macro wrapper by the same name
>
>#define printk(bla bla) ({ do_stuff; printk(bla bla); })
>
>Only two places would need to be updated to surround the word printk in
>parentheses to suppress macro expansion: The declaration and the
>definition of printk. I.e.
>
>int (printk)(const char *s, ...)
Hmm, I'm indifferent to either. Personally I don't like the ambiguity of having
both a macro and function share the same name and having to think "what's the
preprocessor context here?".
>> +extern struct pi_object __start_printk_index[];
>> +extern struct pi_object __stop_printk_index[];
>
>Do you need these declarations to be visible to the whole kernel? Can't
>they live in printk/index.c?
Yeah, this is a leftover: already noted for rescoping in v6. :-)
>> +
>> +#define pi_sec_elf_embed(_p_func, _fmt, ...) \
>> + ({ \
>> + int _p_ret; \
>> + \
>> + if (__builtin_constant_p(_fmt)) { \
>> + /*
>> + * The compiler may not be able to eliminate this, so
>> + * we need to make sure that it doesn't see any
>> + * hypothetical assignment for non-constants even
>> + * though this is already inside the
>> + * __builtin_constant_p guard.
>> + */ \
>> + static struct pi_object _pi \
>
>static const struct pi_object?
>
>> + __section(".printk_index") = { \
>> + .fmt = __builtin_constant_p(_fmt) ? (_fmt) : NULL, \
>> + .func = __func__, \
>> + .file = __FILE__, \
>> + .line = __LINE__, \
>> + }; \
>> + _p_ret = _p_func(_pi.fmt, ##__VA_ARGS__); \
>
>Is the use of _pi.fmt here a trick to prevent gcc from eliding the _pi
>object, so it is seen as "used"? That seems a bit fragile, especially if
>the compiler ends up generating the same code in .text - that means gcc
>does not load the format string from the _pi object (which it
>shouldn't), but then I don't see why it (or the next version of gcc)
>couldn't realize that _pi is indeed unused.
>
>There's the __used attribute precisely for this kind of thing. Then you
>could also eliminate
>
>> + } else \
>> + _p_ret = _p_func(_fmt, ##__VA_ARGS__); \
>> + \
>
>this and the _p_ret variable
>
>> + _p_ret; \
>
>and just end the ({}) with _p_func(_fmt, ##__VA_ARGS__);
Oh, this is a leftover from the early days of the patch when we used to
explicitly store the formats in ._printk_fmts in order to avoid duplication.
Now that we just store a pointer instead of storing the format itself, it
probably should be fine to move to using _fmt directly and __used. Thanks, I'll
investigate this for v6.
On Tue 2021-03-16 14:28:12, Chris Down wrote:
> Rasmus Villemoes writes:
> > I think it's pointless renaming the symbol to _printk, with all the
> > churn and reduced readability that involves (especially when reading
> > assembly "why are we calling _printk and not printk here?"). There's
> > nothing wrong with providing a macro wrapper by the same name
> >
> > #define printk(bla bla) ({ do_stuff; printk(bla bla); })
> >
> > Only two places would need to be updated to surround the word printk in
> > parentheses to suppress macro expansion: The declaration and the
> > definition of printk. I.e.
> >
> > int (printk)(const char *s, ...)
>
> Hmm, I'm indifferent to either. Personally I don't like the ambiguity of
> having both a macro and function share the same name and having to think
> "what's the preprocessor context here?".
I would prefer to keep _printk. I agree that it creates some churn but
it is easier to see what is going on. Also cscope is able to
find the right thing.
Otherwise, Rasmus, thanks a lot for the review and great hints
about the macro storing the metadata into the elf section.
I am not familiar with these things.
Best Regards,
Petr
On 17/03/2021 09.40, Petr Mladek wrote:
> On Tue 2021-03-16 14:28:12, Chris Down wrote:
>> Rasmus Villemoes writes:
>>> I think it's pointless renaming the symbol to _printk, with all the
>>> churn and reduced readability that involves (especially when reading
>>> assembly "why are we calling _printk and not printk here?"). There's
>>> nothing wrong with providing a macro wrapper by the same name
>>>
>>> #define printk(bla bla) ({ do_stuff; printk(bla bla); })
>>>
>>> Only two places would need to be updated to surround the word printk in
>>> parentheses to suppress macro expansion: The declaration and the
>>> definition of printk. I.e.
>>>
>>> int (printk)(const char *s, ...)
[Of course, one could avoid that on the declaration by making sure the
macro wrapper is defined below.]
>> Hmm, I'm indifferent to either. Personally I don't like the ambiguity of
>> having both a macro and function share the same name and having to think
>> "what's the preprocessor context here?".
>
> I would prefer to keep _printk. I agree that it creates some churn but
> it is easier to see what is going on.
It is? Nobody except the few who happen to remember about the
printk_index thing are going to understand why, when looking at
disassembly, there's now calls of a _printk function. "Huh, my code just
does pr_err(), I'm not calling some internal printk function". But it's
not up to me, so I'll stop there.
Anyway, on to the other thing I mentioned on dev_err and friends: I
think it would improve readability and make it a lot easier to (probably
in a later patch) add support for all those dev_* and net_* and whatever
other subsystems have their own wrappers if one created a new header,
linux/printk-index.h, doing something like
#ifdef CONFIG_PRINTK_INDEX
#define __printk_index_emit(fmt) do {the appropriate magic} while (0)
#else
#define __printk_index_emit(fmt) do {} while (0)
#endif
#define printk_index_wrap(fmt, real_func, ...) ({ \
__printk_index_emit(fmt); \
real_func(__VA_ARGS__); \
})
and then it's a matter of doing
#define printk(fmt, ...) printk_index_wrap(fmt, _printk, fmt, ##__VA_ARGS__)
or
#define _dev_err(dev, fmt, ...) printk_index_wrap(fmt, __dev_err, dev,
fmt, ##__VA_ARGS__)
(yeah, _dev_err is the real function, dev_err is already a macro, so
doing it for dev_* would involve renaming _dev_err to __dev_err. Or one
could fold the printk_index logic into the existing dev_err macro).
That is, avoid defining two different versions of each and every
required wrapper macro depending on CONFIG_PRINTK_INDEX.
One could also record the function a format is being used with - without
that, the display probably can't show a reasonable <level> for those
dev_* function.
Rasmus
On Wed 2021-03-17 11:03:20, Rasmus Villemoes wrote:
> On 17/03/2021 09.40, Petr Mladek wrote:
> > On Tue 2021-03-16 14:28:12, Chris Down wrote:
> >> Rasmus Villemoes writes:
> >>> I think it's pointless renaming the symbol to _printk, with all the
> >>> churn and reduced readability that involves (especially when reading
> >>> assembly "why are we calling _printk and not printk here?"). There's
> >>> nothing wrong with providing a macro wrapper by the same name
> >>>
> >>> #define printk(bla bla) ({ do_stuff; printk(bla bla); })
> >>>
> >>> Only two places would need to be updated to surround the word printk in
> >>> parentheses to suppress macro expansion: The declaration and the
> >>> definition of printk. I.e.
> >>>
> >>> int (printk)(const char *s, ...)
>
> [Of course, one could avoid that on the declaration by making sure the
> macro wrapper is defined below.]
>
> >> Hmm, I'm indifferent to either. Personally I don't like the ambiguity of
> >> having both a macro and function share the same name and having to think
> >> "what's the preprocessor context here?".
> >
> > I would prefer to keep _printk. I agree that it creates some churn but
> > it is easier to see what is going on.
>
> It is? Nobody except the few who happen to remember about the
> printk_index thing are going to understand why, when looking at
> disassembly, there's now calls of a _printk function. "Huh, my code just
> does pr_err(), I'm not calling some internal printk function". But it's
> not up to me, so I'll stop there.
Sure, it makes sense. But I still think that it won't be easy to
follow where the macro is expanded and where it is not expanded.
It might make things comlicated when people debug printk or try
to understand it's code.
BTW: Is the trick with int (printk)(const char *s, ...) documented
somewhere? Is it portable?
Honestly, I was not aware of this behavior. I did some googling.
It looks like something specific for the gcc implementation,
see the section "Looking for a function-like macro’s opening
parenthesis" at
https://gcc.gnu.org/onlinedocs/cppinternals/Macro-Expansion.html
> Anyway, on to the other thing I mentioned on dev_err and friends: I
> think it would improve readability and make it a lot easier to (probably
> in a later patch) add support for all those dev_* and net_* and whatever
> other subsystems have their own wrappers
This is great point! There are many other subsystem specific wrappers,
e,g, ata_dev_printk(), netdev_printk(), snd_printk(), dprintk().
We should make it easy to index them as well.
> if one created a new header,
> linux/printk-index.h, doing something like
>
> #ifdef CONFIG_PRINTK_INDEX
> #define __printk_index_emit(fmt) do {the appropriate magic} while (0)
> #else
> #define __printk_index_emit(fmt) do {} while (0)
> #endif
>
> #define printk_index_wrap(fmt, real_func, ...) ({ \
> __printk_index_emit(fmt); \
> real_func(__VA_ARGS__); \
> })
>
> and then it's a matter of doing
>
> #define printk(fmt, ...) printk_index_wrap(fmt, _printk, fmt, ##__VA_ARGS__)
>
> or
>
> #define _dev_err(dev, fmt, ...) printk_index_wrap(fmt, __dev_err, dev,
> fmt, ##__VA_ARGS__)
>
> (yeah, _dev_err is the real function, dev_err is already a macro, so
> doing it for dev_* would involve renaming _dev_err to __dev_err. Or one
> could fold the printk_index logic into the existing dev_err macro).
>
> That is, avoid defining two different versions of each and every
> required wrapper macro depending on CONFIG_PRINTK_INDEX.
>
> One could also record the function a format is being used with - without
> that, the display probably can't show a reasonable <level> for those
> dev_* function.
Makes sense. I would do this way.
Best Regards,
Petr
On 18/03/2021 11.46, Petr Mladek wrote:
> BTW: Is the trick with int (printk)(const char *s, ...) documented
> somewhere? Is it portable?
It is completely standard and portable C, explicitly spelled out in the
C standard itself. C99:
===
6.10.3 Macro replacement
10 [...] Each subsequent instance of the
function-like macro name followed by a ( as the next preprocessing token
introduces the
sequence of preprocessing tokens that is replaced by the replacement
list in the definition
(an invocation of the macro). [...]
===
and later
===
7.1.4 Use of library functions
1 [...] one
of the techniques shown below can be used to ensure the declaration is
not affected by
such a macro. Any macro definition of a function can be suppressed
locally by enclosing
the name of the function in parentheses, because the name is then not
followed by the left
parenthesis that indicates expansion of a macro function name. For the
same syntactic
reason, it is permitted to take the address of a library function even
if it is also defined as
a macro.
===
Also, the use of printk() inside the definition of a printk()
function-like macro does not lead to infinite recursion, by
===
6.10.3.4 Rescanning and further replacement
2 If the name of the macro being replaced is found during this scan of
the replacement list
(not including the rest of the source file’s preprocessing tokens), it
is not replaced.
===
Rasmus
On Thu 2021-03-18 12:31:44, Rasmus Villemoes wrote:
> On 18/03/2021 11.46, Petr Mladek wrote:
>
> > BTW: Is the trick with int (printk)(const char *s, ...) documented
> > somewhere? Is it portable?
>
> It is completely standard and portable C, explicitly spelled out in the
> C standard itself. C99:
Thanks a lot for the detailed info.
I still prefer to avoid using this "trick". But I am not going to
block it if more people would prefer it.
Best Regards,
Petr
Hey Petr, Rasmus,
Apologies for the delay, I've been out ill for a while so I'm just coming back
to look at this.
Petr Mladek writes:
>> Anyway, on to the other thing I mentioned on dev_err and friends: I
>> think it would improve readability and make it a lot easier to (probably
>> in a later patch) add support for all those dev_* and net_* and whatever
>> other subsystems have their own wrappers
>
>This is great point! There are many other subsystem specific wrappers,
>e,g, ata_dev_printk(), netdev_printk(), snd_printk(), dprintk().
>We should make it easy to index them as well.
These would be nice to have, but we should agree about how we store things
internally.
For example, in printk we typically store the level inline as part of the
format string at compile time. However, for `dev_printk`, it's passed entirely
separately from the format string after preprocessing is already concluded (or
at least, not in a way we can easily parse it the same way we do for
printk()):
void dev_printk(const char *level, const struct device *dev, const char *fmt, ...)
One (ugly) way to handle this would be to have a new "level" field in the
printk index entry, with semantics that if it's some sentinel value, look at
the format itself for the format, otherwise if it's some other value, the level
field itself is the level.
This will work, but it's pretty ugly. Any better suggestions? :-)
Thanks,
Chris
Joe Perches writes:
>On Fri, 2021-04-16 at 14:56 +0100, Chris Down wrote:
>> Any better suggestions? :-)
>
>A gcc plugin that looks for functions marked __printf(fmt, pos)
>so any const fmt is stored.
I fail to see any way in which that can solve the problem described, which is
mobility of the level information, not the existence of the format itself.
`__printf` doesn't communicate or imply any such information, since it's
completely level agnostic.
On Fri, 2021-04-16 at 14:56 +0100, Chris Down wrote:
> Any better suggestions? :-)
A gcc plugin that looks for functions marked __printf(fmt, pos)
so any const fmt is stored.
On 16/04/2021 15.56, Chris Down wrote:
> Hey Petr, Rasmus,
>> This is great point! There are many other subsystem specific wrappers,
>> e,g, ata_dev_printk(), netdev_printk(), snd_printk(), dprintk().
>> We should make it easy to index them as well.
>
> These would be nice to have, but we should agree about how we store
> things internally.
>
> For example, in printk we typically store the level inline as part of
> the format string at compile time. However, for `dev_printk`, it's
> passed entirely separately from the format string after preprocessing is
> already concluded (or at least, not in a way we can easily parse it the
> same way we do for printk()):
>
> ????void dev_printk(const char *level, const struct device *dev, const
> char *fmt, ...)
Hm, yeah, for "naked" dev_printk() calls there's no easy way to grab the
level, for dev_err and friends it's somewhat easier as you could just
hook into the definition of the dev_err macro. I'm not saying you need
to handle everything at once, but doing dev_err and netdev_err would get
you a very long way
> One (ugly) way to handle this would be to have a new "level" field in
> the printk index entry, with semantics that if it's some sentinel value,
> look at the format itself for the format, otherwise if it's some other
> value, the level field itself is the level.
>
> This will work, but it's pretty ugly. Any better suggestions? :-)
Well, that was more or less exactly what I suggested when I wrote
> One could also record the function a format is being used with - without
> that, the display probably can't show a reasonable <level> for those
> dev_* function.
But, I think the real question is, why are we/you interested in the
level at all? Isn't the format string itself enough for the purpose of
tracking which printks have come and gone? IOW, what about, on the
display side, simply skipping over some KERN_* prefix if present?
Rasmus
On Mon 2021-04-19 09:27:43, Rasmus Villemoes wrote:
> On 16/04/2021 15.56, Chris Down wrote:
> > Hey Petr, Rasmus,
>
> >> This is great point! There are many other subsystem specific wrappers,
> >> e,g, ata_dev_printk(), netdev_printk(), snd_printk(), dprintk().
> >> We should make it easy to index them as well.
> >
> > These would be nice to have, but we should agree about how we store
> > things internally.
> >
> > For example, in printk we typically store the level inline as part of
> > the format string at compile time. However, for `dev_printk`, it's
> > passed entirely separately from the format string after preprocessing is
> > already concluded (or at least, not in a way we can easily parse it the
> > same way we do for printk()):
> >
> > ????void dev_printk(const char *level, const struct device *dev, const
> > char *fmt, ...)
>
> Hm, yeah, for "naked" dev_printk() calls there's no easy way to grab the
> level, for dev_err and friends it's somewhat easier as you could just
> hook into the definition of the dev_err macro. I'm not saying you need
> to handle everything at once, but doing dev_err and netdev_err would get
> you a very long way
It is true that there are many messages printed using
dev_printk(). For example, these rough numbers:
$> git grep pr_err | wc -l
19885
$> git grep dev_err | wc -l
58153
> > One (ugly) way to handle this would be to have a new "level" field in
> > the printk index entry, with semantics that if it's some sentinel value,
> > look at the format itself for the format, otherwise if it's some other
> > value, the level field itself is the level.
> >
> > This will work, but it's pretty ugly. Any better suggestions? :-)
We should use the same algorithm that is used in parse_prefix() called
from vprintk_store(). parse_prefix() updates @level only when
the current value is LOGLEVEL_DEFAULT.
We should set the new field in the printk index entry to
LOGLEVEL_DEFAULT by default. It should be set to a particular level
when it is defined by an extra parameter, like in dev_printk().
> Well, that was more or less exactly what I suggested when I wrote
>
> > One could also record the function a format is being used with - without
> > that, the display probably can't show a reasonable <level> for those
> > dev_* function.
>
> But, I think the real question is, why are we/you interested in the
> level at all? Isn't the format string itself enough for the purpose of
> tracking which printks have come and gone? IOW, what about, on the
> display side, simply skipping over some KERN_* prefix if present?
Messages are filtered on consoles by console_loglevel. The loglevel
might be important to decide whether the message is visible or not.
Best Regards,
Petr
On Mon, Apr 19, 2021 at 11:16:43AM +0200, Petr Mladek wrote:
> On Mon 2021-04-19 09:27:43, Rasmus Villemoes wrote:
> > On 16/04/2021 15.56, Chris Down wrote:
> > > Hey Petr, Rasmus,
> >
> > >> This is great point! There are many other subsystem specific wrappers,
> > >> e,g, ata_dev_printk(), netdev_printk(), snd_printk(), dprintk().
> > >> We should make it easy to index them as well.
> > >
> > > These would be nice to have, but we should agree about how we store
> > > things internally.
> > >
> > > For example, in printk we typically store the level inline as part of
> > > the format string at compile time. However, for `dev_printk`, it's
> > > passed entirely separately from the format string after preprocessing is
> > > already concluded (or at least, not in a way we can easily parse it the
> > > same way we do for printk()):
> > >
> > > ????void dev_printk(const char *level, const struct device *dev, const
> > > char *fmt, ...)
> >
> > Hm, yeah, for "naked" dev_printk() calls there's no easy way to grab the
> > level, for dev_err and friends it's somewhat easier as you could just
> > hook into the definition of the dev_err macro. I'm not saying you need
> > to handle everything at once, but doing dev_err and netdev_err would get
> > you a very long way
>
> It is true that there are many messages printed using
> dev_printk(). For example, these rough numbers:
>
> $> git grep pr_err | wc -l
> 19885
> $> git grep dev_err | wc -l
> 58153
You need "-w" :)
And I bet most of those pr_err() should be turned into dev_err(), if
they live in drivers/.
Hm, 12734 of the pr_err() calls do live in drivers/, so most of those
should be dev_err(). Might be something good to throw at interns...
thanks,
greg k-h
On Mon, 2021-04-19 at 11:53 +0200, Greg Kroah-Hartman wrote:
> Hm, 12734 of the pr_err() calls do live in drivers/, so most of those
> should be dev_err(). Might be something good to throw at interns...
That depends on how much churn you want to have in old drivers that
generally don't have any users because the hardware is ancient or
no longer manufactured.
I suggest not changing those.
But I believe a coccinelle script was written quite awhile ago
to convert pr_<level> to dev_<level> when a struct device * is
available.
http://btrlinux.inria.fr/staging-media-replace-pr_-with-dev_/
Rasmus Villemoes writes:
>> One (ugly) way to handle this would be to have a new "level" field in
>> the printk index entry, with semantics that if it's some sentinel value,
>> look at the format itself for the format, otherwise if it's some other
>> value, the level field itself is the level.
>>
>> This will work, but it's pretty ugly. Any better suggestions? :-)
>
>Well, that was more or less exactly what I suggested when I wrote
>
>> One could also record the function a format is being used with - without
>> that, the display probably can't show a reasonable <level> for those
>> dev_* function.
>
>But, I think the real question is, why are we/you interested in the
>level at all? Isn't the format string itself enough for the purpose of
>tracking which printks have come and gone? IOW, what about, on the
>display side, simply skipping over some KERN_* prefix if present?
Hmm, as Petr suggested, it's largely so that we can determine whether it will
be emitted at the current console loglevel. Otherwise, even if the printk site
is present, it might not ever get emitted. To that extent I am pretty convinced
it's necessary to reliably achieve the goals in the changelog.
Judging by the conversation there's no immediately obvious better way, so
unless you or Petr object, I'll send a patch in the v6 series which implements
the "ugly" way with dev_printk support as the first user. That should make it
easier to add other printk-likes in future as needed.
On Wed, 2021-04-21 at 14:14 +0100, Chris Down wrote:
> Rasmus Villemoes writes:
> > > One (ugly) way to handle this would be to have a new "level" field in
> > > the printk index entry, with semantics that if it's some sentinel value,
> > > look at the format itself for the format, otherwise if it's some other
> > > value, the level field itself is the level.
> > >
> > > This will work, but it's pretty ugly. Any better suggestions? :-)
> >
> > Well, that was more or less exactly what I suggested when I wrote
> >
> > > One could also record the function a format is being used with - without
> > > that, the display probably can't show a reasonable <level> for those
> > > dev_* function.
> >
> > But, I think the real question is, why are we/you interested in the
> > level at all? Isn't the format string itself enough for the purpose of
> > tracking which printks have come and gone? IOW, what about, on the
> > display side, simply skipping over some KERN_* prefix if present?
[]
> Judging by the conversation there's no immediately obvious better way, so
> unless you or Petr object, I'll send a patch in the v6 series which implements
> the "ugly" way with dev_printk support as the first user. That should make it
> easier to add other printk-likes in future as needed.
There are dozens of these types of printks extensions in the kernel.
Today, your test case .config uses btrfs which has its own too.
You really should evaluate the utility of log level monitoring and
reconsider adding some compiler extension use of __printf to generate
this format tracking ability.
Joe Perches writes:
>You really should evaluate the utility of log level monitoring and
>reconsider adding some compiler extension use of __printf to generate
>this format tracking ability.
Asking again, because you're repeating what you said last time without any
further clarification: how is __printf tracking supposed to be sufficient? That
tells one which arguments will have printf semantics, so sure, one can get the
format, but (for example) you don't have any access to information about the
log level, which is essential since otherwise it's not known whether the printk
is capable of being emitted or not. Without that, you're suggesting replacing a
functioning implementation with a "solution" which is not fit for the intended
purpose.
For the use case described in the changelog, more or less all of the essential
printks that must be monitored use printk() infrastructure directly. We're not
trying to monitor the world here when we have plenty of better metrics. I'm
sure there will be others which come up with time, but this doesn't have to
cover every single possible subsystem's fancy printk-like out of the gate.