2020-09-23 13:58:08

by Petr Mladek

[permalink] [raw]
Subject: [RFC 0/2] printk: Add more metadata for each record

There have been repeated requests to store and print more metadata
for each line, see the two patches for more details.

This patchset handles the storing part so that we could do it together
with the other changes of the ring buffer structure. It would be nice
to get it into the same kernel release to reduce pain on the crash
tools side.

My proposal is to make this information available via /dev/kmsg as
dictionary values. The console output would probably need to be configurable.

I would ignore syslog syscall that is hard to extend and is basically
obsoleted by /dev/kmsg.


The patchset applies on top of the current printk/linux.git,
branch printk-rework (after the dict ring removal).

You also need to merge "timekeeping-for-printk-2020-08-23" tag
from tip.git. It provides "struct ktime_timestamps" and
ktime_get_fast_timestamps().

I still need to add the gdb-related changes if there is not a strong
opposition to these changes. I just wanted to send what I had before
starting review of the logbuf_lock removal patchset.


Petr Mladek (2):
printk: Store all three timestamps
printk: Add more information about the printk caller

kernel/printk/printk.c | 129 ++++++++++++++++++++----------
kernel/printk/printk_ringbuffer.c | 43 +++++-----
kernel/printk/printk_ringbuffer.h | 42 +++++++++-
3 files changed, 148 insertions(+), 66 deletions(-)

--
2.26.2


2020-09-23 14:01:16

by Petr Mladek

[permalink] [raw]
Subject: [RFC 2/2] printk: Add more information about the printk caller

The information about the printk caller has been added by the commit
15ff2069cb7f967da ("printk: Add caller information to printk() output.").
The main motivation was to reconstruct original messages when they
longer output from different CPUs got mixed.

But there are more usecases. The number of CPUs and tasks running on
a single system is growing. It brings bigger challenges when debugging
problems. The context in which kernel reports its state is sometimes
pretty useful.

In particular, people suggest to remember the task id, cpu id, more details
about the interrupt context, and even the task name [0][1].

Prepare for extending the caller information by replacing caller_id
with struct printk_caller. And always store task id, cpu id, and
exact interrupt context.

The output to console and /dev/kmsg stays the same for now. I will
get improved by followup patches.

[0] https://lore.kernel.org/r/CAHk-=wibPWj9m1jQWeJUq+iPMG9tHaLANY-UaruakHuezFHr4w@mail.gmail.com
[1] https://lore.kernel.org/r/[email protected]

Signed-off-by: Petr Mladek <[email protected]>
---
kernel/printk/printk.c | 107 +++++++++++++++++++-----------
kernel/printk/printk_ringbuffer.c | 43 ++++++------
kernel/printk/printk_ringbuffer.h | 39 ++++++++++-
3 files changed, 131 insertions(+), 58 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 0ed8901916f4..2f904bb744a6 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -333,7 +333,8 @@ static int console_msg_format = MSG_FORMAT_DEFAULT;
* record.info.facility = 0 (LOG_KERN)
* record.info.flags = 0
* record.info.level = 3 (LOG_ERR)
- * record.info.caller_id = 299 (task 299)
+ * record.info.caller.pid = 299 (task 299)
+ * record.info.cpu_ctx = 0x80000003 (in hardirq context on cpu 3)
* record.info.dev_info.subsystem = "pci" (terminated)
* record.info.dev_info.device = "+pci:0000:00:01.0" (terminated)
*
@@ -494,8 +495,9 @@ static void truncate_msg(u16 *text_len, u16 *trunc_msg_len)
}

/* insert record into the buffer, discard old ones, update heads */
-static int log_store(u32 caller_id, int facility, int level,
- enum log_flags flags, struct ktime_timestamps *ts,
+static int log_store(const struct printk_caller *caller,
+ int facility, int level, enum log_flags flags,
+ struct ktime_timestamps *ts,
const struct dev_printk_info *dev_info,
const char *text, u16 text_len)
{
@@ -526,7 +528,7 @@ static int log_store(u32 caller_id, int facility, int level,
r.info->ts = *ts;
else
ktime_get_fast_timestamps(&r.info->ts);
- r.info->caller_id = caller_id;
+ r.info->caller = *caller;
if (dev_info)
memcpy(&r.info->dev_info, dev_info, sizeof(r.info->dev_info));

@@ -588,26 +590,40 @@ static void append_char(char **pp, char *e, char c)
*(*pp)++ = c;
}

+static size_t info_print_caller_id(char *buf, size_t size,
+ const struct printk_caller *caller)
+{
+ enum printk_caller_ctx ctx;
+
+ ctx = printk_to_caller_ctx(caller->cpu_ctx);
+
+ if (ctx == printk_ctx_task)
+ return snprintf(buf, size, "T%u", caller->pid);
+
+ return snprintf(buf, size, "C%u", printk_to_caller_cpu(caller->cpu_ctx));
+}
+
static ssize_t info_print_ext_header(char *buf, size_t size,
struct printk_info *info)
{
u64 ts_mono_usec = info->ts.mono;
- char caller[20];
-#ifdef CONFIG_PRINTK_CALLER
- u32 id = info->caller_id;
-
- snprintf(caller, sizeof(caller), ",caller=%c%u",
- id & 0x80000000 ? 'C' : 'T', id & ~0x80000000);
-#else
- caller[0] = '\0';
-#endif
+ char caller_id[12];
+ size_t len;

do_div(ts_mono_usec, 1000);

- return scnprintf(buf, size, "%u,%llu,%llu,%c%s;",
- (info->facility << 3) | info->level, info->seq,
- ts_mono_usec, info->flags & LOG_CONT ? 'c' : '-',
- caller);
+ len = scnprintf(buf, size, "%u,%llu,%llu,%c",
+ (info->facility << 3) | info->level, info->seq,
+ ts_mono_usec, info->flags & LOG_CONT ? 'c' : '-');
+
+ if (IS_ENABLED(CONFIG_PRINTK_CALLER)) {
+ info_print_caller_id(caller_id, sizeof(caller_id), &info->caller);
+ len += scnprintf(buf + len, size - len, ",caller=%s", caller_id);
+ }
+
+ len += scnprintf(buf + len, size - len, ";");
+
+ return len;
}

static ssize_t msg_add_ext_text(char *buf, size_t size,
@@ -987,7 +1003,7 @@ void log_buf_vmcoreinfo_setup(void)
VMCOREINFO_OFFSET(printk_info, seq);
VMCOREINFO_OFFSET(printk_info, ts);
VMCOREINFO_OFFSET(printk_info, text_len);
- VMCOREINFO_OFFSET(printk_info, caller_id);
+ VMCOREINFO_OFFSET(printk_info, caller);
VMCOREINFO_OFFSET(printk_info, dev_info);

VMCOREINFO_STRUCT_SIZE(ktime_timestamps);
@@ -995,6 +1011,10 @@ void log_buf_vmcoreinfo_setup(void)
VMCOREINFO_OFFSET(ktime_timestamps, boot);
VMCOREINFO_OFFSET(ktime_timestamps, real);

+ VMCOREINFO_STRUCT_SIZE(printk_caller);
+ VMCOREINFO_OFFSET(printk_caller, pid);
+ VMCOREINFO_OFFSET(printk_caller, cpu_ctx);
+
VMCOREINFO_STRUCT_SIZE(dev_printk_info);
VMCOREINFO_OFFSET(dev_printk_info, subsystem);
VMCOREINFO_LENGTH(printk_info_subsystem, sizeof(dev_info->subsystem));
@@ -1103,7 +1123,7 @@ static unsigned int __init add_to_rb(struct printk_ringbuffer *rb,
dest_r.info->level = r->info->level;
dest_r.info->flags = r->info->flags;
dest_r.info->ts = r->info->ts;
- dest_r.info->caller_id = r->info->caller_id;
+ dest_r.info->caller = r->info->caller;
memcpy(&dest_r.info->dev_info, &r->info->dev_info, sizeof(dest_r.info->dev_info));

prb_final_commit(&e);
@@ -1306,18 +1326,13 @@ static size_t print_time(u64 ts, char *buf)
(unsigned long)ts, rem_nsec / 1000);
}

-#ifdef CONFIG_PRINTK_CALLER
-static size_t print_caller(u32 id, char *buf)
+static size_t print_caller_id(const struct printk_caller *caller, char *buf)
{
- char caller[12];
+ char caller_id[12];

- snprintf(caller, sizeof(caller), "%c%u",
- id & 0x80000000 ? 'C' : 'T', id & ~0x80000000);
- return sprintf(buf, "[%6s]", caller);
+ info_print_caller_id(caller_id, sizeof(caller_id), caller);
+ return sprintf(buf, "[%6s]", caller_id);
}
-#else
-#define print_caller(id, buf) 0
-#endif

static size_t info_print_prefix(const struct printk_info *info, bool syslog,
bool time, char *buf)
@@ -1330,7 +1345,8 @@ static size_t info_print_prefix(const struct printk_info *info, bool syslog,
if (time)
len += print_time(info->ts.mono, buf + len);

- len += print_caller(info->caller_id, buf + len);
+ if (IS_ENABLED(CONFIG_PRINTK_CALLER))
+ len += print_caller_id(&info->caller, buf + len);

if (IS_ENABLED(CONFIG_PRINTK_CALLER) || time) {
buf[len++] = ' ';
@@ -1902,24 +1918,42 @@ static inline void printk_delay(void)
}
}

-static inline u32 printk_caller_id(void)
+static enum printk_caller_ctx get_printk_caller_ctx(void)
+{
+ if (in_nmi())
+ return printk_ctx_nmi;
+
+ if (in_irq())
+ return printk_ctx_hardirq;
+
+ if (in_softirq())
+ return printk_ctx_softirq;
+
+ return printk_ctx_task;
+}
+
+static void set_current_printk_caller(struct printk_caller *caller)
{
- return in_task() ? task_pid_nr(current) :
- 0x80000000 + raw_smp_processor_id();
+ caller->pid = task_pid_nr(current);
+
+ caller->cpu_ctx = printk_to_caller_cpu_ctx(raw_smp_processor_id(),
+ get_printk_caller_ctx());
}

static size_t log_output(int facility, int level, enum log_flags lflags,
const struct dev_printk_info *dev_info,
char *text, size_t text_len)
{
- const u32 caller_id = printk_caller_id();
+ struct printk_caller caller;
+
+ set_current_printk_caller(&caller);

if (lflags & LOG_CONT) {
struct prb_reserved_entry e;
struct printk_record r;

prb_rec_init_wr(&r, text_len);
- if (prb_reserve_in_last(&e, prb, &r, caller_id)) {
+ if (prb_reserve_in_last(&e, prb, &r, &caller)) {
memcpy(&r.text_buf[r.info->text_len], text, text_len);
r.info->text_len += text_len;
if (lflags & LOG_NEWLINE) {
@@ -1933,7 +1967,7 @@ static size_t log_output(int facility, int level, enum log_flags lflags,
}

/* Store it in the record log */
- return log_store(caller_id, facility, level, lflags, 0,
+ return log_store(&caller, facility, level, lflags, 0,
dev_info, text, text_len);
}

@@ -3095,8 +3129,7 @@ EXPORT_SYMBOL(__printk_ratelimit);
* printk_timed_ratelimit - caller-controlled printk ratelimiting
* @caller_jiffies: pointer to caller's state
* @interval_msecs: minimum interval between prints
- *
- * printk_timed_ratelimit() returns true if more than @interval_msecs
+ *_timed_ratelimit() returns true if more than @interval_msecs
* milliseconds have elapsed since the last time printk_timed_ratelimit()
* returned true.
*/
diff --git a/kernel/printk/printk_ringbuffer.c b/kernel/printk/printk_ringbuffer.c
index 13b94b92342e..86d98e692d75 100644
--- a/kernel/printk/printk_ringbuffer.c
+++ b/kernel/printk/printk_ringbuffer.c
@@ -175,7 +175,7 @@
*
* r.info->text_len = strlen(textstr);
* r.info->ts_nsec = local_clock();
- * r.info->caller_id = printk_caller_id();
+ * set_current_printk_caller(&r.info->caller);
*
* // commit and finalize the record
* prb_final_commit(&e);
@@ -191,7 +191,7 @@
*
* r.info->text_len = strlen(textstr);
* r.info->ts_nsec = local_clock();
- * r.info->caller_id = printk_caller_id();
+ * set_current_printk_caller(&r.info->caller);
*
* // commit the record (but do not finalize yet)
* prb_commit(&e);
@@ -202,7 +202,7 @@
* // specify additional 5 bytes text space to extend
* prb_rec_init_wr(&r, 5);
*
- * if (prb_reserve_in_last(&e, &test_rb, &r, printk_caller_id())) {
+ * if (prb_reserve_in_last(&e, &test_rb, &r, &caller)) {
* snprintf(&r.text_buf[r.info->text_len],
* r.text_buf_size - r.info->text_len, "hello");
*
@@ -424,13 +424,13 @@ static enum desc_state get_desc_state(unsigned long id,
* descriptor is in an inconsistent state (miss or reserved), the caller can
* only expect the descriptor's @state_var field to be valid.
*
- * The sequence number and caller_id can be optionally retrieved. Like all
- * non-state_var data, they are only valid if the descriptor is in a
- * consistent state.
+ * The sequence number and caller information can be optionally retrieved.
+ * Like all non-state_var data, they are only valid if the descriptor is
+ * in a consistent state.
*/
static enum desc_state desc_read(struct prb_desc_ring *desc_ring,
unsigned long id, struct prb_desc *desc_out,
- u64 *seq_out, u32 *caller_id_out)
+ u64 *seq_out, struct printk_caller *caller_out)
{
struct printk_info *info = to_info(desc_ring, id);
struct prb_desc *desc = to_desc(desc_ring, id);
@@ -477,8 +477,8 @@ static enum desc_state desc_read(struct prb_desc_ring *desc_ring,
sizeof(desc_out->text_blk_lpos)); /* LMM(desc_read:C) */
if (seq_out)
*seq_out = info->seq; /* also part of desc_read:C */
- if (caller_id_out)
- *caller_id_out = info->caller_id; /* also part of desc_read:C */
+ if (caller_out)
+ *caller_out = info->caller; /* also part of desc_read:C */

/*
* 1. Guarantee the descriptor content is loaded before re-checking
@@ -1253,24 +1253,28 @@ static const char *get_data(struct prb_data_ring *data_ring,
* if the descriptor is not yet finalized and the provided @caller_id matches.
*/
static struct prb_desc *desc_reopen_last(struct prb_desc_ring *desc_ring,
- u32 caller_id, unsigned long *id_out)
+ const struct printk_caller *caller,
+ unsigned long *id_out)
{
unsigned long prev_state_val;
+ struct printk_caller cal;
enum desc_state d_state;
struct prb_desc desc;
struct prb_desc *d;
unsigned long id;
- u32 cid;

id = atomic_long_read(&desc_ring->head_id);

/*
* To reduce unnecessarily reopening, first check if the descriptor
- * state and caller ID are correct.
+ * state and caller infromation are correct.
*/
- d_state = desc_read(desc_ring, id, &desc, NULL, &cid);
- if (d_state != desc_committed || cid != caller_id)
+ d_state = desc_read(desc_ring, id, &desc, NULL, &cal);
+ if (d_state != desc_committed ||
+ cal.pid != caller->pid ||
+ cal.cpu_ctx != caller->cpu_ctx) {
return NULL;
+ }

d = to_desc(desc_ring, id);

@@ -1308,7 +1312,7 @@ static struct prb_desc *desc_reopen_last(struct prb_desc_ring *desc_ring,
* @e: The entry structure to setup.
* @rb: The ringbuffer to re-reserve and extend data in.
* @r: The record structure to allocate buffers for.
- * @caller_id: The caller ID of the caller (reserving writer).
+ * @caller: The information about the caller (reserving writer).
*
* This is the public function available to writers to re-reserve and extend
* data.
@@ -1317,7 +1321,7 @@ static struct prb_desc *desc_reopen_last(struct prb_desc_ring *desc_ring,
* setting the @text_buf_size field of @r. To ensure proper initialization
* of @r, prb_rec_init_wr() should be used.
*
- * This function will fail if @caller_id does not match the caller ID of the
+ * This function will fail if @caller does not match the caller of the
* newest record. In that case the caller must reserve new data using
* prb_reserve().
*
@@ -1343,7 +1347,7 @@ static struct prb_desc *desc_reopen_last(struct prb_desc_ring *desc_ring,
* @r->info->text_len after concatenating.
*/
bool prb_reserve_in_last(struct prb_reserved_entry *e, struct printk_ringbuffer *rb,
- struct printk_record *r, u32 caller_id)
+ struct printk_record *r, const struct printk_caller *caller)
{
struct prb_desc_ring *desc_ring = &rb->desc_ring;
struct printk_info *info;
@@ -1354,7 +1358,7 @@ bool prb_reserve_in_last(struct prb_reserved_entry *e, struct printk_ringbuffer
local_irq_save(e->irqflags);

/* Transition the newest descriptor back to the reserved state. */
- d = desc_reopen_last(desc_ring, caller_id, &id);
+ d = desc_reopen_last(desc_ring, caller, &id);
if (!d) {
local_irq_restore(e->irqflags);
goto fail_reopen;
@@ -1376,7 +1380,8 @@ bool prb_reserve_in_last(struct prb_reserved_entry *e, struct printk_ringbuffer
* exclusive access at that point. The descriptor may have
* changed since then.
*/
- if (caller_id != info->caller_id)
+ if (caller->pid != info->caller.pid ||
+ caller->cpu_ctx != info->caller.cpu_ctx)
goto fail;

if (BLK_DATALESS(&d->text_blk_lpos)) {
diff --git a/kernel/printk/printk_ringbuffer.h b/kernel/printk/printk_ringbuffer.h
index 09082c8472d3..93117bd82afb 100644
--- a/kernel/printk/printk_ringbuffer.h
+++ b/kernel/printk/printk_ringbuffer.h
@@ -7,6 +7,12 @@
#include <linux/dev_printk.h>
#include <linux/timekeeping.h>

+/* Information about the process and context that adds the message */
+struct printk_caller {
+ pid_t pid; /* thread id */
+ u32 cpu_ctx; /* processor id and interrupt context */
+};
+
/*
* Meta information about each stored message.
*
@@ -20,8 +26,8 @@ struct printk_info {
u8 facility; /* syslog facility */
u8 flags:5; /* internal record flags */
u8 level:3; /* syslog level */
- u32 caller_id; /* thread id or processor id */

+ struct printk_caller caller; /* information about the caller */
struct dev_printk_info dev_info;
};

@@ -118,6 +124,35 @@ enum desc_state {
desc_reusable = 0x3, /* free, not yet used by any writer */
};

+/* Handling the shared cpu id and interupt context caller value */
+#define PRINTK_CALLER_CTX_MASK 0xc0000000
+#define PRINTK_CALLER_CPU_MASK 0x3fffffff
+
+#define PRINTK_CALLER_CTX_SHIFT (32 - 2)
+
+enum printk_caller_ctx {
+ printk_ctx_task = 0x0,
+ printk_ctx_softirq = 0x1,
+ printk_ctx_hardirq = 0x2,
+ printk_ctx_nmi = 0x3,
+};
+
+/* Get context id from the common cpu and interrupt context id value. */
+#define printk_to_caller_ctx(cpu_ctx) \
+ (enum printk_caller_ctx) \
+ (((cpu_ctx) & PRINTK_CALLER_CTX_MASK) >> PRINTK_CALLER_CTX_SHIFT)
+
+/* Get cpu id from the common cpu and interrupt context id value. */
+#define printk_to_caller_cpu(cpu_ctx) \
+ (u32)((cpu_ctx) & PRINTK_CALLER_CPU_MASK)
+
+/* Merge cpu and interrupt context id into a single value */
+#define printk_to_caller_cpu_ctx(cpu, ctx) \
+ (u32) \
+ (((cpu) & PRINTK_CALLER_CPU_MASK) | \
+ (((ctx) << PRINTK_CALLER_CTX_SHIFT) & PRINTK_CALLER_CTX_MASK))
+
+/* Ringbuffer handling */
#define _DATA_SIZE(sz_bits) (1UL << (sz_bits))
#define _DESCS_COUNT(ct_bits) (1U << (ct_bits))
#define DESC_SV_BITS (sizeof(unsigned long) * 8)
@@ -304,7 +339,7 @@ static inline void prb_rec_init_wr(struct printk_record *r,
bool prb_reserve(struct prb_reserved_entry *e, struct printk_ringbuffer *rb,
struct printk_record *r);
bool prb_reserve_in_last(struct prb_reserved_entry *e, struct printk_ringbuffer *rb,
- struct printk_record *r, u32 caller_id);
+ struct printk_record *r, const struct printk_caller *caller);
void prb_commit(struct prb_reserved_entry *e);
void prb_final_commit(struct prb_reserved_entry *e);

--
2.26.2

2020-09-23 14:01:24

by Petr Mladek

[permalink] [raw]
Subject: [PATCH 1/2] printk: Store all three timestamps

printk() historically shows the timestamps from the monotonic clock.
It is fast, available early during boot, in any context, and even
without using any lock.

There are repeated requests [1][2][3] to show the timestamps from other
clocks. The main motivation is to make it easier to correlate the kernel
and userspace logs. Where userspace logs usually use the real time
clock.

Unfortunately, it is not possible to simply replace the default clock.
Userspace tools, like journalctl, dmesg, expect to get the timestamps
from the mono via /dev/kmsg interface or syslog syscall [4].
Also administrators would be confused when logs from different
systems use different clocks depending on kernel version or
build option [5].

As a result, the mono clock has to stay as the default clock
and has to be used in the current user interfaces.

The problem is solved by storing timestamps from all three clock
sources in struct printk_info for each message. The other timestamps
can be later passed to userspace by extending the existing formats.

One might ask if the timestamp from the boot clock is really needed.
The values are mostly redundant with the mono clock. And there really
does not seems to be any strong reason. Anyway, the boot clock
is useful when suspend and resume is part of the debugged/monitored
systems. And it is still easier to handle than a full real time
format.

[1] https://lore.kernel.org/r/[email protected]
[2] https://lore.kernel.org/r/[email protected]
[3] https://lore.kernel.org/r/[email protected]
[4] https://lore.kernel.org/r/CA+55aFzzvBD4_WYm-5Bm7TeRGR8nNu1oz0oWNcRNmTNJm=M4Lw@mail.gmail.com
[5] https://lore.kernel.org/r/[email protected]

Signed-off-by: Petr Mladek <[email protected]>
---
kernel/printk/printk.c | 30 +++++++++++++++++++-----------
kernel/printk/printk_ringbuffer.h | 3 ++-
2 files changed, 21 insertions(+), 12 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 1560649cbd35..0ed8901916f4 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -326,7 +326,9 @@ static int console_msg_format = MSG_FORMAT_DEFAULT;
* Example of record values:
* record.text_buf = "it's a line" (unterminated)
* record.info.seq = 56
- * record.info.ts_nsec = 36863
+ * record.info.ts.mono = 363863
+ * record.info.ts.boot = 5122562
+ * record.info.ts.real = 1599211091487260162
* record.info.text_len = 11
* record.info.facility = 0 (LOG_KERN)
* record.info.flags = 0
@@ -493,7 +495,7 @@ static void truncate_msg(u16 *text_len, u16 *trunc_msg_len)

/* insert record into the buffer, discard old ones, update heads */
static int log_store(u32 caller_id, int facility, int level,
- enum log_flags flags, u64 ts_nsec,
+ enum log_flags flags, struct ktime_timestamps *ts,
const struct dev_printk_info *dev_info,
const char *text, u16 text_len)
{
@@ -520,10 +522,10 @@ static int log_store(u32 caller_id, int facility, int level,
r.info->facility = facility;
r.info->level = level & 7;
r.info->flags = flags & 0x1f;
- if (ts_nsec > 0)
- r.info->ts_nsec = ts_nsec;
+ if (ts)
+ r.info->ts = *ts;
else
- r.info->ts_nsec = local_clock();
+ ktime_get_fast_timestamps(&r.info->ts);
r.info->caller_id = caller_id;
if (dev_info)
memcpy(&r.info->dev_info, dev_info, sizeof(r.info->dev_info));
@@ -589,7 +591,7 @@ static void append_char(char **pp, char *e, char c)
static ssize_t info_print_ext_header(char *buf, size_t size,
struct printk_info *info)
{
- u64 ts_usec = info->ts_nsec;
+ u64 ts_mono_usec = info->ts.mono;
char caller[20];
#ifdef CONFIG_PRINTK_CALLER
u32 id = info->caller_id;
@@ -600,11 +602,12 @@ static ssize_t info_print_ext_header(char *buf, size_t size,
caller[0] = '\0';
#endif

- do_div(ts_usec, 1000);
+ do_div(ts_mono_usec, 1000);

return scnprintf(buf, size, "%u,%llu,%llu,%c%s;",
(info->facility << 3) | info->level, info->seq,
- ts_usec, info->flags & LOG_CONT ? 'c' : '-', caller);
+ ts_mono_usec, info->flags & LOG_CONT ? 'c' : '-',
+ caller);
}

static ssize_t msg_add_ext_text(char *buf, size_t size,
@@ -982,11 +985,16 @@ void log_buf_vmcoreinfo_setup(void)

VMCOREINFO_STRUCT_SIZE(printk_info);
VMCOREINFO_OFFSET(printk_info, seq);
- VMCOREINFO_OFFSET(printk_info, ts_nsec);
+ VMCOREINFO_OFFSET(printk_info, ts);
VMCOREINFO_OFFSET(printk_info, text_len);
VMCOREINFO_OFFSET(printk_info, caller_id);
VMCOREINFO_OFFSET(printk_info, dev_info);

+ VMCOREINFO_STRUCT_SIZE(ktime_timestamps);
+ VMCOREINFO_OFFSET(ktime_timestamps, mono);
+ VMCOREINFO_OFFSET(ktime_timestamps, boot);
+ VMCOREINFO_OFFSET(ktime_timestamps, real);
+
VMCOREINFO_STRUCT_SIZE(dev_printk_info);
VMCOREINFO_OFFSET(dev_printk_info, subsystem);
VMCOREINFO_LENGTH(printk_info_subsystem, sizeof(dev_info->subsystem));
@@ -1094,7 +1102,7 @@ static unsigned int __init add_to_rb(struct printk_ringbuffer *rb,
dest_r.info->facility = r->info->facility;
dest_r.info->level = r->info->level;
dest_r.info->flags = r->info->flags;
- dest_r.info->ts_nsec = r->info->ts_nsec;
+ dest_r.info->ts = r->info->ts;
dest_r.info->caller_id = r->info->caller_id;
memcpy(&dest_r.info->dev_info, &r->info->dev_info, sizeof(dest_r.info->dev_info));

@@ -1320,7 +1328,7 @@ static size_t info_print_prefix(const struct printk_info *info, bool syslog,
len = print_syslog((info->facility << 3) | info->level, buf);

if (time)
- len += print_time(info->ts_nsec, buf + len);
+ len += print_time(info->ts.mono, buf + len);

len += print_caller(info->caller_id, buf + len);

diff --git a/kernel/printk/printk_ringbuffer.h b/kernel/printk/printk_ringbuffer.h
index 0adaa685d1ca..09082c8472d3 100644
--- a/kernel/printk/printk_ringbuffer.h
+++ b/kernel/printk/printk_ringbuffer.h
@@ -5,6 +5,7 @@

#include <linux/atomic.h>
#include <linux/dev_printk.h>
+#include <linux/timekeeping.h>

/*
* Meta information about each stored message.
@@ -14,7 +15,7 @@
*/
struct printk_info {
u64 seq; /* sequence number */
- u64 ts_nsec; /* timestamp in nanoseconds */
+ struct ktime_timestamps ts; /* timestamps */
u16 text_len; /* length of text message */
u8 facility; /* syslog facility */
u8 flags:5; /* internal record flags */
--
2.26.2

2020-09-23 22:16:23

by John Ogness

[permalink] [raw]
Subject: Re: [PATCH 1/2] printk: Store all three timestamps

On 2020-09-23, Petr Mladek <[email protected]> wrote:
> printk() historically shows the timestamps from the monotonic clock.

printk() uses the local clock, not the monotonic clock.

> It is fast, available early during boot, in any context, and even
> without using any lock.
>
> There are repeated requests [1][2][3] to show the timestamps from other
> clocks. The main motivation is to make it easier to correlate the kernel
> and userspace logs. Where userspace logs usually use the real time
> clock.
>
> Unfortunately, it is not possible to simply replace the default clock.
> Userspace tools, like journalctl, dmesg, expect to get the timestamps
> from the mono via /dev/kmsg interface or syslog syscall [4].
> Also administrators would be confused when logs from different
> systems use different clocks depending on kernel version or
> build option [5].
>
> As a result, the mono clock has to stay as the default clock
> and has to be used in the current user interfaces.

Actually this series is changing the default clock from local to
monotonic. I for one welcome this change (and wish ftrace would do it as
well), but it is a change.

> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index 1560649cbd35..0ed8901916f4 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -520,10 +522,10 @@ static int log_store(u32 caller_id, int facility, int level,
> r.info->facility = facility;
> r.info->level = level & 7;
> r.info->flags = flags & 0x1f;
> - if (ts_nsec > 0)
> - r.info->ts_nsec = ts_nsec;
> + if (ts)
> + r.info->ts = *ts;
> else
> - r.info->ts_nsec = local_clock();
> + ktime_get_fast_timestamps(&r.info->ts);

I am wondering if we still want to keep the local_clock() as well (and
as the default). ftrace also uses it by default, which means traces and
printk logs could be coordinated by default until now.

The two clocks can vary quite a bit. I have a laptop where the local
clock drifts away from monotonic at about 50us per second.

> diff --git a/kernel/printk/printk_ringbuffer.h b/kernel/printk/printk_ringbuffer.h
> index 0adaa685d1ca..09082c8472d3 100644
> --- a/kernel/printk/printk_ringbuffer.h
> +++ b/kernel/printk/printk_ringbuffer.h
> @@ -14,7 +15,7 @@
> */
> struct printk_info {
> u64 seq; /* sequence number */
> - u64 ts_nsec; /* timestamp in nanoseconds */
> + struct ktime_timestamps ts; /* timestamps */
> u16 text_len; /* length of text message */
> u8 facility; /* syslog facility */
> u8 flags:5; /* internal record flags */

If we wanted to keep the local clock, should the local clock be a part
of struct ktime_timestamps? Or should struct printk_info maintain that
separately (either as @ts_nsec or @ts_local or whatever).

John Ogness

2020-09-24 00:02:05

by John Ogness

[permalink] [raw]
Subject: Re: [PATCH 1/2] printk: Store all three timestamps

On 2020-09-23, Petr Mladek <[email protected]> wrote:
> diff --git a/kernel/printk/printk_ringbuffer.h b/kernel/printk/printk_ringbuffer.h
> index 0adaa685d1ca..09082c8472d3 100644
> --- a/kernel/printk/printk_ringbuffer.h
> +++ b/kernel/printk/printk_ringbuffer.h
> @@ -14,7 +15,7 @@
> */
> struct printk_info {
> u64 seq; /* sequence number */
> - u64 ts_nsec; /* timestamp in nanoseconds */
> + struct ktime_timestamps ts; /* timestamps */

Until now struct printk_info has contained generic types. If we add
struct ktime_timestamps, we may start storing more than we need. For
example, if more (possibly internal) fields are added to struct
ktime_timestamps that printk doesn't care about. We may prefer to
generically and explicitly store the information we care about:

u64 ts_mono;
u64 ts_boot;
u64 ts_real;

Or create our own struct printk_ts to copy the fields of interest to.

John Ogness

2020-09-24 01:43:56

by Sergey Senozhatsky

[permalink] [raw]
Subject: Re: [RFC 2/2] printk: Add more information about the printk caller

On (20/09/23 15:56), Petr Mladek wrote:
> The information about the printk caller has been added by the commit
> 15ff2069cb7f967da ("printk: Add caller information to printk() output.").
> The main motivation was to reconstruct original messages when they
> longer output from different CPUs got mixed.
>
> But there are more usecases. The number of CPUs and tasks running on
> a single system is growing. It brings bigger challenges when debugging
> problems. The context in which kernel reports its state is sometimes
> pretty useful.
>
> In particular, people suggest to remember the task id, cpu id, more details
> about the interrupt context, and even the task name [0][1].
>
> Prepare for extending the caller information by replacing caller_id
> with struct printk_caller. And always store task id, cpu id, and
> exact interrupt context.

Wild idea:

Currently, we have two sides to the problem, I think. There are tasks
that store messages, but then there are tasks that print those messages
on the consoles. And those tasks and contexts are not always the same.
What I found helpful in the past was not only the caller_id, but the
emitter_id (especially preemption count and irqs state of the context
that prints messages on the slow consoles).

-ss

2020-09-24 01:49:02

by Sergey Senozhatsky

[permalink] [raw]
Subject: Re: [PATCH 1/2] printk: Store all three timestamps

On (20/09/24 00:18), John Ogness wrote:
> > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> > index 1560649cbd35..0ed8901916f4 100644
> > --- a/kernel/printk/printk.c
> > +++ b/kernel/printk/printk.c
> > @@ -520,10 +522,10 @@ static int log_store(u32 caller_id, int facility, int level,
> > r.info->facility = facility;
> > r.info->level = level & 7;
> > r.info->flags = flags & 0x1f;
> > - if (ts_nsec > 0)
> > - r.info->ts_nsec = ts_nsec;
> > + if (ts)
> > + r.info->ts = *ts;
> > else
> > - r.info->ts_nsec = local_clock();
> > + ktime_get_fast_timestamps(&r.info->ts);
>
> I am wondering if we still want to keep the local_clock() as well (and
> as the default). ftrace also uses it by default, which means traces and
> printk logs could be coordinated by default until now.

Good point.

> If we wanted to keep the local clock, should the local clock be a part
> of struct ktime_timestamps? Or should struct printk_info maintain that
> separately (either as @ts_nsec or @ts_local or whatever).

I like the idea of having a dedicated printk_timestamps structure
with the timestamps.

-ss

2020-09-24 02:22:00

by Sergey Senozhatsky

[permalink] [raw]
Subject: Re: [RFC 2/2] printk: Add more information about the printk caller

On (20/09/23 15:56), Petr Mladek wrote:
[..]
> /*
> * To reduce unnecessarily reopening, first check if the descriptor
> - * state and caller ID are correct.
> + * state and caller infromation are correct.
> */
> - d_state = desc_read(desc_ring, id, &desc, NULL, &cid);
> - if (d_state != desc_committed || cid != caller_id)
> + d_state = desc_read(desc_ring, id, &desc, NULL, &cal);
> + if (d_state != desc_committed ||
> + cal.pid != caller->pid ||
> + cal.cpu_ctx != caller->cpu_ctx) {

You probably might want to factor out ctx check into a static
inline helper. Since you use this check in several places, and
we may check more context fields in the future.

[..]
> +/* Information about the process and context that adds the message */
> +struct printk_caller {
> + pid_t pid; /* thread id */
> + u32 cpu_ctx; /* processor id and interrupt context */
> +};

A question. Suppose we have a task which does

CPU0

pr_err(...);

preempt_disable();
pr_err(...);
preempt_enable();

pr_err(...);

rcu_read_lock();
pr_info(...);
rcu_read_unlock();

Should we distinguish those as 3 different contexts?

- normal printk
- printk under disabled preemption (affects scheduling)
- printk under RCU read side lock (affects RCU grace periods)

-ss

2020-09-24 04:27:50

by Ahmed S. Darwish

[permalink] [raw]
Subject: Re: [RFC 2/2] printk: Add more information about the printk caller

On Wed, Sep 23, 2020 at 03:56:17PM +0200, Petr Mladek wrote:
...
>
> -static inline u32 printk_caller_id(void)
> +static enum printk_caller_ctx get_printk_caller_ctx(void)
> +{
> + if (in_nmi())
> + return printk_ctx_nmi;
> +
> + if (in_irq())
> + return printk_ctx_hardirq;
> +
> + if (in_softirq())
> + return printk_ctx_softirq;
> +
> + return printk_ctx_task;
> +}
> +

in_softirq() here will be true for both softirq contexts *and*
BH-disabled regions. Did you mean in_serving_softirq() instead?

Thanks,

--
Ahmed S. Darwish
Linutronix GmbH

2020-09-24 08:25:16

by John Ogness

[permalink] [raw]
Subject: Re: [RFC 2/2] printk: Add more information about the printk caller

On 2020-09-24, Sergey Senozhatsky <[email protected]> wrote:
> A question. Suppose we have a task which does
>
> CPU0
>
> pr_err(...);
>
> preempt_disable();
> pr_err(...);
> preempt_enable();
>
> pr_err(...);
>
> rcu_read_lock();
> pr_info(...);
> rcu_read_unlock();
>
> Should we distinguish those as 3 different contexts?
>
> - normal printk
> - printk under disabled preemption (affects scheduling)
> - printk under RCU read side lock (affects RCU grace periods)

The internal purpose of the printk_info's @caller_id is to support
_some_ sanity with LOG_CONT. True LOG_CONT sanity will not be available
until we provide a context API like we previously talked about [0]. But
for now it is good enough (even if it is not good).

We can also store various flags and counters to describe the current
context. But these would only exist to provide the user with information
rather than because printk itself needs it. I only mention this so that
we don't get things mixed up. We can add as much information as want,
but LOG_CONT will only be interested in a subset of that.

John Ogness

2020-09-24 10:39:19

by Petr Mladek

[permalink] [raw]
Subject: Re: [PATCH 1/2] printk: Store all three timestamps

On Thu 2020-09-24 02:06:12, John Ogness wrote:
> On 2020-09-23, Petr Mladek <[email protected]> wrote:
> > diff --git a/kernel/printk/printk_ringbuffer.h b/kernel/printk/printk_ringbuffer.h
> > index 0adaa685d1ca..09082c8472d3 100644
> > --- a/kernel/printk/printk_ringbuffer.h
> > +++ b/kernel/printk/printk_ringbuffer.h
> > @@ -14,7 +15,7 @@
> > */
> > struct printk_info {
> > u64 seq; /* sequence number */
> > - u64 ts_nsec; /* timestamp in nanoseconds */
> > + struct ktime_timestamps ts; /* timestamps */
>
> Until now struct printk_info has contained generic types. If we add
> struct ktime_timestamps, we may start storing more than we need. For
> example, if more (possibly internal) fields are added to struct
> ktime_timestamps that printk doesn't care about. We may prefer to
> generically and explicitly store the information we care about:
>
> u64 ts_mono;
> u64 ts_boot;
> u64 ts_real;
>
> Or create our own struct printk_ts to copy the fields of interest to.

I would like to have a structure if we have more timestamps.

Honestly, printk-specific structure sounds like an overhead to me.
How big is the chance that struct ktime_timestamps ts would get
modified? It has been created for printk after all.

That said, I could live with printk-specific structure.
We might even need it if we need to store also local_clock().

Best Regards,
Petr

2020-09-24 11:53:25

by Petr Mladek

[permalink] [raw]
Subject: Re: [PATCH 1/2] printk: Store all three timestamps

On Thu 2020-09-24 00:18:53, John Ogness wrote:
> On 2020-09-23, Petr Mladek <[email protected]> wrote:
> > printk() historically shows the timestamps from the monotonic clock.
>
> printk() uses the local clock, not the monotonic clock.
>
> > It is fast, available early during boot, in any context, and even
> > without using any lock.
> >
> > There are repeated requests [1][2][3] to show the timestamps from other
> > clocks. The main motivation is to make it easier to correlate the kernel
> > and userspace logs. Where userspace logs usually use the real time
> > clock.
> >
> > Unfortunately, it is not possible to simply replace the default clock.
> > Userspace tools, like journalctl, dmesg, expect to get the timestamps
> > from the mono via /dev/kmsg interface or syslog syscall [4].
> > Also administrators would be confused when logs from different
> > systems use different clocks depending on kernel version or
> > build option [5].
> >
> > As a result, the mono clock has to stay as the default clock
> > and has to be used in the current user interfaces.
>
> Actually this series is changing the default clock from local to
> monotonic. I for one welcome this change (and wish ftrace would do it as
> well), but it is a change.

Good point! I expected that it is enough when boot timestamps are
monotonic.

IMHO, it is really important to use the same clock by printk() and
ftrace! Otherwise, it would be hard to match them.

Honestly, I do not know enough about the different clocks and their
users. My view is:

1. printk() needs a clock that is:

+ monotonic (always increasing)
+ lockless access
+ updated fast enough (precision)
+ available during early boot
+ match clock used by other kernel logs (ftrace)

2. ftrace needs a clock that is:

+ same requirements as for printk()
+ fast access to reduce performance problems

3. user space developers need:

+ real time clock to match userspace logs


Now, we have 4 clocks. IMHO, storing all 4 timestamps is not worth it.
We need at least two:

+ local, mono, or boot clock as legacy kernel timestamp
+ real clock to match timestamps from userspace tools


I am not able to say what clock is the best one for kernel timestamps.
The following questions come to my mind:

Is mono or boot clock fast enough for ftrace (performance wise)?
Is mono or boot clock available early enough during boot?
Is is boot clock acceptable as the default kernel clock?


IMHO, the boot clock is interesting because it has similar semantic
as the real time (running even during sleep). But I am not sure if
some developers do not rely on the mono clock or even local clock.

I guess that this need opinion from a wide audience or developers
experienced in many areas. I do not feel quialified to make
a decision.

Best Regards,
Petr

2020-09-24 12:00:01

by Petr Mladek

[permalink] [raw]
Subject: Re: [RFC 2/2] printk: Add more information about the printk caller

On Thu 2020-09-24 10:40:10, Sergey Senozhatsky wrote:
> On (20/09/23 15:56), Petr Mladek wrote:
> > The information about the printk caller has been added by the commit
> > 15ff2069cb7f967da ("printk: Add caller information to printk() output.").
> > The main motivation was to reconstruct original messages when they
> > longer output from different CPUs got mixed.
> >
> > But there are more usecases. The number of CPUs and tasks running on
> > a single system is growing. It brings bigger challenges when debugging
> > problems. The context in which kernel reports its state is sometimes
> > pretty useful.
> >
> > In particular, people suggest to remember the task id, cpu id, more details
> > about the interrupt context, and even the task name [0][1].
> >
> > Prepare for extending the caller information by replacing caller_id
> > with struct printk_caller. And always store task id, cpu id, and
> > exact interrupt context.
>
> Wild idea:
>
> Currently, we have two sides to the problem, I think. There are tasks
> that store messages, but then there are tasks that print those messages
> on the consoles. And those tasks and contexts are not always the same.
> What I found helpful in the past was not only the caller_id, but the
> emitter_id (especially preemption count and irqs state of the context
> that prints messages on the slow consoles).

Yeah, it might be useful. But it should get discussed in a separate thread.

The information about the emitter do not need to get stored into the
metadata. It is mostly need only for debugging printk-related
problems. It should become less important if we succeed with
introducing the printk kthreads.

Best Regards,
Petr

2020-09-24 12:54:25

by Petr Mladek

[permalink] [raw]
Subject: Re: [RFC 2/2] printk: Add more information about the printk caller

On Thu 2020-09-24 06:24:14, Ahmed S. Darwish wrote:
> On Wed, Sep 23, 2020 at 03:56:17PM +0200, Petr Mladek wrote:
> ...
> >
> > -static inline u32 printk_caller_id(void)
> > +static enum printk_caller_ctx get_printk_caller_ctx(void)
> > +{
> > + if (in_nmi())
> > + return printk_ctx_nmi;
> > +
> > + if (in_irq())
> > + return printk_ctx_hardirq;
> > +
> > + if (in_softirq())
> > + return printk_ctx_softirq;
> > +
> > + return printk_ctx_task;
> > +}
> > +
>
> in_softirq() here will be true for both softirq contexts *and*
> BH-disabled regions. Did you mean in_serving_softirq() instead?

Good question!

I am not sure if people would want to distinguish these two
situations.

Otherwise, I think that is_softirq() more close to the meaning of
in_irq(). They both describe a context where a new interrupt has
to wait until the handling gets enabled again.

Best Regards,
Petr

2020-09-24 13:08:51

by Petr Mladek

[permalink] [raw]
Subject: Re: [RFC 2/2] printk: Add more information about the printk caller

On Thu 2020-09-24 11:17:56, Sergey Senozhatsky wrote:
> On (20/09/23 15:56), Petr Mladek wrote:
> [..]
> > /*
> > * To reduce unnecessarily reopening, first check if the descriptor
> > - * state and caller ID are correct.
> > + * state and caller infromation are correct.
> > */
> > - d_state = desc_read(desc_ring, id, &desc, NULL, &cid);
> > - if (d_state != desc_committed || cid != caller_id)
> > + d_state = desc_read(desc_ring, id, &desc, NULL, &cal);
> > + if (d_state != desc_committed ||
> > + cal.pid != caller->pid ||
> > + cal.cpu_ctx != caller->cpu_ctx) {
>
> You probably might want to factor out ctx check into a static
> inline helper. Since you use this check in several places, and
> we may check more context fields in the future.

Makes sense.

> [..]
> > +/* Information about the process and context that adds the message */
> > +struct printk_caller {
> > + pid_t pid; /* thread id */
> > + u32 cpu_ctx; /* processor id and interrupt context */
> > +};
>
> A question. Suppose we have a task which does
>
> CPU0
>
> pr_err(...);
>
> preempt_disable();
> pr_err(...);
> preempt_enable();
>
> pr_err(...);
>
> rcu_read_lock();
> pr_info(...);
> rcu_read_unlock();
>
> Should we distinguish those as 3 different contexts?
>
> - normal printk
> - printk under disabled preemption (affects scheduling)
> - printk under RCU read side lock (affects RCU grace periods)

Good question. Well, these contexts could not get detected in
PREEMPT_NONE or PREEMPT_VOLUNTARY. Also I wonder where it would
stop. There are several RCU flavors.

I would not distinguish them unless there is a real demand.

Best Regards,
Petr

2020-09-24 13:28:59

by Petr Mladek

[permalink] [raw]
Subject: Re: [RFC 2/2] printk: Add more information about the printk caller

On Thu 2020-09-24 10:29:31, John Ogness wrote:
> On 2020-09-24, Sergey Senozhatsky <[email protected]> wrote:
> > A question. Suppose we have a task which does
> >
> > CPU0
> >
> > pr_err(...);
> >
> > preempt_disable();
> > pr_err(...);
> > preempt_enable();
> >
> > pr_err(...);
> >
> > rcu_read_lock();
> > pr_info(...);
> > rcu_read_unlock();
> >
> > Should we distinguish those as 3 different contexts?
> >
> > - normal printk
> > - printk under disabled preemption (affects scheduling)
> > - printk under RCU read side lock (affects RCU grace periods)
>
> The internal purpose of the printk_info's @caller_id is to support
> _some_ sanity with LOG_CONT. True LOG_CONT sanity will not be available
> until we provide a context API like we previously talked about [0]. But
> for now it is good enough (even if it is not good).
>
> We can also store various flags and counters to describe the current
> context. But these would only exist to provide the user with information
> rather than because printk itself needs it. I only mention this so that
> we don't get things mixed up. We can add as much information as want,
> but LOG_CONT will only be interested in a subset of that.

Yeah, we need to keep this in mind, so that some "tiny" changes in the
context would not prevent concatenating related pieces of the message.

Best Regards,
Petr

2020-09-24 13:40:25

by Petr Mladek

[permalink] [raw]
Subject: Re: [RFC 2/2] printk: Add more information about the printk caller

On Thu 2020-09-24 14:53:01, Petr Mladek wrote:
> On Thu 2020-09-24 06:24:14, Ahmed S. Darwish wrote:
> > On Wed, Sep 23, 2020 at 03:56:17PM +0200, Petr Mladek wrote:
> > ...
> > >
> > > -static inline u32 printk_caller_id(void)
> > > +static enum printk_caller_ctx get_printk_caller_ctx(void)
> > > +{
> > > + if (in_nmi())
> > > + return printk_ctx_nmi;
> > > +
> > > + if (in_irq())
> > > + return printk_ctx_hardirq;
> > > +
> > > + if (in_softirq())
> > > + return printk_ctx_softirq;
> > > +
> > > + return printk_ctx_task;
> > > +}
> > > +
> >
> > in_softirq() here will be true for both softirq contexts *and*
> > BH-disabled regions. Did you mean in_serving_softirq() instead?
>
> Good question!
>
> I am not sure if people would want to distinguish these two
> situations.
>
> Otherwise, I think that is_softirq() more close to the meaning of
> in_irq(). They both describe a context where a new interrupt has
> to wait until the handling gets enabled again.

Grrrr, I wonder why I thought that in_irq() covered also the situation
when IRQ was disabled. It was likely my wish because disabled
interrupts are problem for printk() because the console might
cause a softlockup.

in_irq() actually behaves like in_serving_softirq().

I am confused and puzzled now. I wonder what contexts are actually
interesting for developers. It goes back to the ideas from Sergey
about preemption disabled, ...

/me feels shameful and is going to hide under a stone.

Best Regards,
Petr

2020-09-25 00:55:52

by Sergey Senozhatsky

[permalink] [raw]
Subject: Re: [RFC 2/2] printk: Add more information about the printk caller

On (20/09/24 15:38), Petr Mladek wrote:
[..]
>
> Grrrr, I wonder why I thought that in_irq() covered also the situation
> when IRQ was disabled. It was likely my wish because disabled
> interrupts are problem for printk() because the console might
> cause a softlockup.

preempt_disable() can also trigger softlockup.

> in_irq() actually behaves like in_serving_softirq().
>
> I am confused and puzzled now. I wonder what contexts are actually
> interesting for developers. It goes back to the ideas from Sergey
> about preemption disabled, ...

Are we talking about context tracking for LOG_CONT or context on
the serial console and /dev/kmsg?

If the latter, then my 5 cents, is that something like preemptible(),
which checks

(preempt_count() == 0 && !irqs_disabled())

does not look completely unreasonable.

We had a rather OK context tracking in printk() before, but for a
completely different purpose:

console_may_schedule = !oops_in_progress &&
preemptible() &&
!rcu_preempt_depth();

We know that printk() can cause RCU stalls [0]. Tracking this part
of the context state is sort of meaningful.

Let's look at this from this POV - why do we add in_irq()/etc tracking
info? Perhaps because we want to connect the dots between printk() caller
state and watchdog reports. Do we cover all watchdogs? No, I don't think
so. RCU stalls, local_irq_disable(), preempt_disable() are not covered.

Do we have any technical reasons not to add those missing bits?

[0] https://lkml.org/lkml/2018/1/9/485

-ss

2020-09-25 09:55:31

by Petr Mladek

[permalink] [raw]
Subject: Re: [PATCH 1/2] printk: Store all three timestamps

On Thu 2020-09-24 10:07:57, Mark Salyzyn wrote:
> I would hope for monotonic_raw, boottime and realtime as being the most
> useful for most situations.
>
> [TL;DR]
>
> Currently kernel logs actually uses monotonic_raw (no timing clock
> correction), not monotonic (timing correction).
>
> Whereas boot (timing clock adjusted, still monotonic) and realtime (timing
> clock _and_ time adjusted, non monotonic), when we try to correlate to user
> space is workable, but we will have troubles correlating monotonic (w.r.t.
> monotonic_raw) clocks if used in user space.
>
> In Android we have the option of monotonic and realtime only right now for
> the user space logger (which integrates logd, klogd and auditd, the later
> two come from the kernel). I have bugs open to consider boottime, but it is
> blocked on boottime availability from kernel space logging (this change). I
> have another bug to consider switching the logger to monotonic_raw instead
> of monotonic, to make it correlate better with existing kernel logs. But
> alas a lot of resistance for phones switching to monotonic(_raw), the only
> devices that chose monotonic(_raw) is everything else (google glass,
> watches, ...). As such, phones, and the associated developers, will
> continue to want realtime correlated in the kernel logs (yes, this change
> too).
>
> realtime sucks for the first 10 seconds on Android, since phones generally
> do not get their time correction until then from network resources, and
> many of their rtc clocks are not adjustable, they store a correction factor
> that does not get picked up from user space until userdata is mounted
> (about 20 seconds in). But only kernel developers care about this first
> part of boot, everything after that (and associated correlated kernel
> interactions) are for user space folks.

Thanks a lot for this detailed feedback.

Just to be sure that I understand it correctly. You suggest to store
three timestamps: local_clock(), boot and real clock.

It makes sense to me. I just wonder if there might be any use case
when the adjusted mono clock is needed or preferred over local_clock().

Best Regards,
Petr

2020-09-25 10:23:35

by Petr Mladek

[permalink] [raw]
Subject: Re: [RFC 2/2] printk: Add more information about the printk caller

On Fri 2020-09-25 09:54:00, Sergey Senozhatsky wrote:
> On (20/09/24 15:38), Petr Mladek wrote:
> [..]
> >
> > Grrrr, I wonder why I thought that in_irq() covered also the situation
> > when IRQ was disabled. It was likely my wish because disabled
> > interrupts are problem for printk() because the console might
> > cause a softlockup.
>
> preempt_disable() can also trigger softlockup.
>
> > in_irq() actually behaves like in_serving_softirq().
> >
> > I am confused and puzzled now. I wonder what contexts are actually
> > interesting for developers. It goes back to the ideas from Sergey
> > about preemption disabled, ...
>
> Are we talking about context tracking for LOG_CONT or context on
> the serial console and /dev/kmsg?

OK, it is clear that LOG_CONT need to know when a different code
is called suddenly. I mean task code vs. an interrupt handler.

But it was actually also the original purpose of the caller_id.
AFAIK, people wanted to sort related messages when they were mixed
with ones from other CPUs.


> If the latter, then my 5 cents, is that something like preemptible(),
> which checks
>
> (preempt_count() == 0 && !irqs_disabled())
>
> does not look completely unreasonable.
>
> We had a rather OK context tracking in printk() before, but for a
> completely different purpose:
>
> console_may_schedule = !oops_in_progress &&
> preemptible() &&
> !rcu_preempt_depth();
>
> We know that printk() can cause RCU stalls [0]. Tracking this part
> of the context state is sort of meaningful.
>
> Let's look at this from this POV - why do we add in_irq()/etc tracking
> info? Perhaps because we want to connect the dots between printk() caller
> state and watchdog reports. Do we cover all watchdogs? No, I don't think
> so. RCU stalls, local_irq_disable(), preempt_disable() are not covered.

I agree that it would be handy to see this context as well. It might
make it easier when hunting down various lockups and stall. But
I have some concerns.

First, the information is not always reliable (PREEMPT_NONE). I wonder
if it might cause more harm than good. People might get confused
or they might want to fix it by some crazy printk code.

Second, the information might not be detailed enough. Many lockups
depends on the fact that a particular lock is held. Backtraces
are likely more important. Or people would need to distinguish
many contexts. It would require another complex code.


I am not sure that this is woth it. After all, it might be enough
to distinguish the 4 basic contexts just to allow sorting mixed
messages.

Best Regards,
Petr

2020-10-21 13:52:36

by 김창기

[permalink] [raw]
Subject: RE: [RFC 2/2] printk: Add more information about the printk caller

> +static size_t info_print_caller_id(char *buf, size_t size,
> + const struct printk_caller *caller) {
> + enum printk_caller_ctx ctx;
> +
> + ctx = printk_to_caller_ctx(caller->cpu_ctx);
> +
> + if (ctx == printk_ctx_task)
> + return snprintf(buf, size, "T%u", caller->pid);
> +
> + return snprintf(buf, size, "C%u",
> +printk_to_caller_cpu(caller->cpu_ctx));
> +}
> +

When I apply and test this patch, there is no change of print format
compared with previous PRINTK_CALLER.
The patch that I made is always to print the CPU ID and process name
together.