2015-07-03 10:53:35

by Marcin Niesluchowski

[permalink] [raw]
Subject: [RFC 0/8] Additional kmsg devices

Dear All,

This series of patches extends kmsg interface with ability to dynamicaly
create (and destroy) kmsg-like devices which can be used by user space
for logging. Logging to kernel has number of benefits, including but not
limited to - always available, requiring no userspace, automatically
rotating and low overhead.

User-space logging to kernel cyclic buffers was already successfully used
in android logger concept but it had certain flaws that this commits try
to address:
* drops hardcoded number of devices and static paths in favor for dynamic
configuration by ioctl interface in userspace
* extends existing driver instead of creating completely new one

Those patches apply on branch 'master',
(commit 9bdc771f2c29a11920f477fba05a58e23ee42554):
git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

Marcin Niesluchowski (8):
printk: move code regarding log message storing format
printk: add one function for storing log in proper format
kmsg: introduce additional kmsg devices support
kmsg: add function for adding and deleting additional buffers
kmsg: device support in mem class
kmsg: add predefined _PID, _TID, _COMM keywords to kmsg* log dict
kmsg: add ioctl for adding and deleting kmsg* devices
kmsg: add ioctl for kmsg* devices operating on buffers

Documentation/ioctl/ioctl-number.txt | 1 +
drivers/char/mem.c | 154 +++-
fs/proc/kmsg.c | 4 +-
include/linux/printk.h | 6 +
include/uapi/linux/Kbuild | 1 +
include/uapi/linux/kmsg_ioctl.h | 45 ++
kernel/printk/printk.c | 1361 ++++++++++++++++++++++------------
7 files changed, 1087 insertions(+), 485 deletions(-)
create mode 100644 include/uapi/linux/kmsg_ioctl.h

--

Best Regards,
Marcin Niesluchowski


2015-07-03 10:50:27

by Marcin Niesluchowski

[permalink] [raw]
Subject: [RFC 1/8] printk: move code regarding log message storing format

Preparation commit for future changes purpose.

Moves some code responsible for storing log messages in proper format.

Signed-off-by: Marcin Niesluchowski <[email protected]>
---
kernel/printk/printk.c | 254 ++++++++++++++++++++++++-------------------------
1 file changed, 127 insertions(+), 127 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index cf8c242..98f5af5 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -475,6 +475,133 @@ static int log_store(int facility, int level,
return msg->text_len;
}

+static bool printk_time = IS_ENABLED(CONFIG_PRINTK_TIME);
+module_param_named(time, printk_time, bool, S_IRUGO | S_IWUSR);
+
+static size_t print_time(u64 ts, char *buf)
+{
+ unsigned long rem_nsec;
+
+ if (!printk_time)
+ return 0;
+
+ rem_nsec = do_div(ts, 1000000000);
+
+ if (!buf)
+ return snprintf(NULL, 0, "[%5lu.000000] ", (unsigned long)ts);
+
+ return sprintf(buf, "[%5lu.%06lu] ",
+ (unsigned long)ts, rem_nsec / 1000);
+}
+
+/*
+ * Continuation lines are buffered, and not committed to the record buffer
+ * until the line is complete, or a race forces it. The line fragments
+ * though, are printed immediately to the consoles to ensure everything has
+ * reached the console in case of a kernel crash.
+ */
+static struct cont {
+ char buf[LOG_LINE_MAX];
+ size_t len; /* length == 0 means unused buffer */
+ size_t cons; /* bytes written to console */
+ struct task_struct *owner; /* task of first print*/
+ u64 ts_nsec; /* time of first print */
+ u8 level; /* log level of first message */
+ u8 facility; /* log facility of first message */
+ enum log_flags flags; /* prefix, newline flags */
+ bool flushed:1; /* buffer sealed and committed */
+} cont;
+
+static void cont_flush(enum log_flags flags)
+{
+ if (cont.flushed)
+ return;
+ if (cont.len == 0)
+ return;
+
+ if (cont.cons) {
+ /*
+ * If a fragment of this line was directly flushed to the
+ * console; wait for the console to pick up the rest of the
+ * line. LOG_NOCONS suppresses a duplicated output.
+ */
+ log_store(cont.facility, cont.level, flags | LOG_NOCONS,
+ cont.ts_nsec, NULL, 0, cont.buf, cont.len);
+ cont.flags = flags;
+ cont.flushed = true;
+ } else {
+ /*
+ * If no fragment of this line ever reached the console,
+ * just submit it to the store and free the buffer.
+ */
+ log_store(cont.facility, cont.level, flags, 0,
+ NULL, 0, cont.buf, cont.len);
+ cont.len = 0;
+ }
+}
+
+static bool cont_add(int facility, int level, const char *text, size_t len)
+{
+ if (cont.len && cont.flushed)
+ return false;
+
+ /*
+ * If ext consoles are present, flush and skip in-kernel
+ * continuation. See nr_ext_console_drivers definition. Also, if
+ * the line gets too long, split it up in separate records.
+ */
+ if (nr_ext_console_drivers || cont.len + len > sizeof(cont.buf)) {
+ cont_flush(LOG_CONT);
+ return false;
+ }
+
+ if (!cont.len) {
+ cont.facility = facility;
+ cont.level = level;
+ cont.owner = current;
+ cont.ts_nsec = local_clock();
+ cont.flags = 0;
+ cont.cons = 0;
+ cont.flushed = false;
+ }
+
+ memcpy(cont.buf + cont.len, text, len);
+ cont.len += len;
+
+ if (cont.len > (sizeof(cont.buf) * 80) / 100)
+ cont_flush(LOG_CONT);
+
+ return true;
+}
+
+static size_t cont_print_text(char *text, size_t size)
+{
+ size_t textlen = 0;
+ size_t len;
+
+ if (cont.cons == 0 && (console_prev & LOG_NEWLINE)) {
+ textlen += print_time(cont.ts_nsec, text);
+ size -= textlen;
+ }
+
+ len = cont.len - cont.cons;
+ if (len > 0) {
+ if (len+1 > size)
+ len = size-1;
+ memcpy(text + textlen, cont.buf + cont.cons, len);
+ textlen += len;
+ cont.cons = cont.len;
+ }
+
+ if (cont.flushed) {
+ if (cont.flags & LOG_NEWLINE)
+ text[textlen++] = '\n';
+ /* got everything, release buffer */
+ cont.len = 0;
+ }
+ return textlen;
+}
+
int dmesg_restrict = IS_ENABLED(CONFIG_SECURITY_DMESG_RESTRICT);

static int syslog_action_restricted(int type)
@@ -1030,25 +1157,6 @@ static inline void boot_delay_msec(int level)
}
#endif

-static bool printk_time = IS_ENABLED(CONFIG_PRINTK_TIME);
-module_param_named(time, printk_time, bool, S_IRUGO | S_IWUSR);
-
-static size_t print_time(u64 ts, char *buf)
-{
- unsigned long rem_nsec;
-
- if (!printk_time)
- return 0;
-
- rem_nsec = do_div(ts, 1000000000);
-
- if (!buf)
- return snprintf(NULL, 0, "[%5lu.000000] ", (unsigned long)ts);
-
- return sprintf(buf, "[%5lu.%06lu] ",
- (unsigned long)ts, rem_nsec / 1000);
-}
-
static size_t print_prefix(const struct printk_log *msg, bool syslog, char *buf)
{
size_t len = 0;
@@ -1544,114 +1652,6 @@ static inline void printk_delay(void)
}
}

-/*
- * Continuation lines are buffered, and not committed to the record buffer
- * until the line is complete, or a race forces it. The line fragments
- * though, are printed immediately to the consoles to ensure everything has
- * reached the console in case of a kernel crash.
- */
-static struct cont {
- char buf[LOG_LINE_MAX];
- size_t len; /* length == 0 means unused buffer */
- size_t cons; /* bytes written to console */
- struct task_struct *owner; /* task of first print*/
- u64 ts_nsec; /* time of first print */
- u8 level; /* log level of first message */
- u8 facility; /* log facility of first message */
- enum log_flags flags; /* prefix, newline flags */
- bool flushed:1; /* buffer sealed and committed */
-} cont;
-
-static void cont_flush(enum log_flags flags)
-{
- if (cont.flushed)
- return;
- if (cont.len == 0)
- return;
-
- if (cont.cons) {
- /*
- * If a fragment of this line was directly flushed to the
- * console; wait for the console to pick up the rest of the
- * line. LOG_NOCONS suppresses a duplicated output.
- */
- log_store(cont.facility, cont.level, flags | LOG_NOCONS,
- cont.ts_nsec, NULL, 0, cont.buf, cont.len);
- cont.flags = flags;
- cont.flushed = true;
- } else {
- /*
- * If no fragment of this line ever reached the console,
- * just submit it to the store and free the buffer.
- */
- log_store(cont.facility, cont.level, flags, 0,
- NULL, 0, cont.buf, cont.len);
- cont.len = 0;
- }
-}
-
-static bool cont_add(int facility, int level, const char *text, size_t len)
-{
- if (cont.len && cont.flushed)
- return false;
-
- /*
- * If ext consoles are present, flush and skip in-kernel
- * continuation. See nr_ext_console_drivers definition. Also, if
- * the line gets too long, split it up in separate records.
- */
- if (nr_ext_console_drivers || cont.len + len > sizeof(cont.buf)) {
- cont_flush(LOG_CONT);
- return false;
- }
-
- if (!cont.len) {
- cont.facility = facility;
- cont.level = level;
- cont.owner = current;
- cont.ts_nsec = local_clock();
- cont.flags = 0;
- cont.cons = 0;
- cont.flushed = false;
- }
-
- memcpy(cont.buf + cont.len, text, len);
- cont.len += len;
-
- if (cont.len > (sizeof(cont.buf) * 80) / 100)
- cont_flush(LOG_CONT);
-
- return true;
-}
-
-static size_t cont_print_text(char *text, size_t size)
-{
- size_t textlen = 0;
- size_t len;
-
- if (cont.cons == 0 && (console_prev & LOG_NEWLINE)) {
- textlen += print_time(cont.ts_nsec, text);
- size -= textlen;
- }
-
- len = cont.len - cont.cons;
- if (len > 0) {
- if (len+1 > size)
- len = size-1;
- memcpy(text + textlen, cont.buf + cont.cons, len);
- textlen += len;
- cont.cons = cont.len;
- }
-
- if (cont.flushed) {
- if (cont.flags & LOG_NEWLINE)
- text[textlen++] = '\n';
- /* got everything, release buffer */
- cont.len = 0;
- }
- return textlen;
-}
-
asmlinkage int vprintk_emit(int facility, int level,
const char *dict, size_t dictlen,
const char *fmt, va_list args)
--
1.9.1

2015-07-03 10:50:46

by Marcin Niesluchowski

[permalink] [raw]
Subject: [RFC 2/8] printk: add one function for storing log in proper format

Preparation commit for future changes purpose.

Separate code responsible for storing log message in proper format
from operations on consoles by putting it in another function.

Signed-off-by: Marcin Niesluchowski <[email protected]>
---
kernel/printk/printk.c | 183 ++++++++++++++++++++++++++-----------------------
1 file changed, 98 insertions(+), 85 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 98f5af5..105887c 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -602,6 +602,102 @@ static size_t cont_print_text(char *text, size_t size)
return textlen;
}

+static int log_format_and_store(int facility, int level,
+ const char *dict, size_t dictlen,
+ const char *fmt, va_list args)
+{
+ static char textbuf[LOG_LINE_MAX];
+ char *text = textbuf;
+ size_t text_len = 0;
+ enum log_flags lflags = 0;
+ int printed_len = 0;
+
+ /*
+ * The printf needs to come first; we need the syslog
+ * prefix which might be passed-in as a parameter.
+ */
+ text_len = vscnprintf(text, sizeof(textbuf), fmt, args);
+
+ /* mark and strip a trailing newline */
+ if (text_len && text[text_len-1] == '\n') {
+ text_len--;
+ lflags |= LOG_NEWLINE;
+ }
+
+ /* strip kernel syslog prefix and extract log level or control flags */
+ if (facility == 0) {
+ int kern_level = printk_get_level(text);
+
+ if (kern_level) {
+ const char *end_of_header = printk_skip_level(text);
+
+ switch (kern_level) {
+ case '0' ... '7':
+ if (level == LOGLEVEL_DEFAULT)
+ level = kern_level - '0';
+ /* fallthrough */
+ case 'd': /* KERN_DEFAULT */
+ lflags |= LOG_PREFIX;
+ }
+ /*
+ * No need to check length here because vscnprintf
+ * put '\0' at the end of the string. Only valid and
+ * newly printed level is detected.
+ */
+ text_len -= end_of_header - text;
+ text = (char *)end_of_header;
+ }
+ }
+
+ if (level == LOGLEVEL_DEFAULT)
+ level = default_message_loglevel;
+
+ if (dict)
+ lflags |= LOG_PREFIX|LOG_NEWLINE;
+
+ if (!(lflags & LOG_NEWLINE)) {
+ /*
+ * Flush the conflicting buffer. An earlier newline was missing,
+ * or another task also prints continuation lines.
+ */
+ if (cont.len && (lflags & LOG_PREFIX || cont.owner != current))
+ cont_flush(LOG_NEWLINE);
+
+ /* buffer line if possible, otherwise store it right away */
+ if (cont_add(facility, level, text, text_len))
+ printed_len += text_len;
+ else
+ printed_len += log_store(facility, level,
+ lflags | LOG_CONT, 0,
+ dict, dictlen, text, text_len);
+ } else {
+ bool stored = false;
+
+ /*
+ * If an earlier newline was missing and it was the same task,
+ * either merge it with the current buffer and flush, or if
+ * there was a race with interrupts (prefix == true) then just
+ * flush it out and store this line separately.
+ * If the preceding printk was from a different task and missed
+ * a newline, flush and append the newline.
+ */
+ if (cont.len) {
+ if (cont.owner == current && !(lflags & LOG_PREFIX))
+ stored = cont_add(facility, level, text,
+ text_len);
+ cont_flush(LOG_NEWLINE);
+ }
+
+ if (stored)
+ printed_len += text_len;
+ else
+ printed_len += log_store(facility, level,
+ lflags, 0, dict, dictlen,
+ text, text_len);
+ }
+ return printed_len;
+}
+
int dmesg_restrict = IS_ENABLED(CONFIG_SECURITY_DMESG_RESTRICT);

static int syslog_action_restricted(int type)
@@ -1657,10 +1753,6 @@ asmlinkage int vprintk_emit(int facility, int level,
const char *fmt, va_list args)
{
static int recursion_bug;
- static char textbuf[LOG_LINE_MAX];
- char *text = textbuf;
- size_t text_len = 0;
- enum log_flags lflags = 0;
unsigned long flags;
int this_cpu;
int printed_len = 0;
@@ -1714,87 +1806,8 @@ asmlinkage int vprintk_emit(int facility, int level,
strlen(recursion_msg));
}

- /*
- * The printf needs to come first; we need the syslog
- * prefix which might be passed-in as a parameter.
- */
- text_len = vscnprintf(text, sizeof(textbuf), fmt, args);
-
- /* mark and strip a trailing newline */
- if (text_len && text[text_len-1] == '\n') {
- text_len--;
- lflags |= LOG_NEWLINE;
- }
-
- /* strip kernel syslog prefix and extract log level or control flags */
- if (facility == 0) {
- int kern_level = printk_get_level(text);
-
- if (kern_level) {
- const char *end_of_header = printk_skip_level(text);
- switch (kern_level) {
- case '0' ... '7':
- if (level == LOGLEVEL_DEFAULT)
- level = kern_level - '0';
- /* fallthrough */
- case 'd': /* KERN_DEFAULT */
- lflags |= LOG_PREFIX;
- }
- /*
- * No need to check length here because vscnprintf
- * put '\0' at the end of the string. Only valid and
- * newly printed level is detected.
- */
- text_len -= end_of_header - text;
- text = (char *)end_of_header;
- }
- }
-
- if (level == LOGLEVEL_DEFAULT)
- level = default_message_loglevel;
-
- if (dict)
- lflags |= LOG_PREFIX|LOG_NEWLINE;
-
- if (!(lflags & LOG_NEWLINE)) {
- /*
- * Flush the conflicting buffer. An earlier newline was missing,
- * or another task also prints continuation lines.
- */
- if (cont.len && (lflags & LOG_PREFIX || cont.owner != current))
- cont_flush(LOG_NEWLINE);
-
- /* buffer line if possible, otherwise store it right away */
- if (cont_add(facility, level, text, text_len))
- printed_len += text_len;
- else
- printed_len += log_store(facility, level,
- lflags | LOG_CONT, 0,
- dict, dictlen, text, text_len);
- } else {
- bool stored = false;
-
- /*
- * If an earlier newline was missing and it was the same task,
- * either merge it with the current buffer and flush, or if
- * there was a race with interrupts (prefix == true) then just
- * flush it out and store this line separately.
- * If the preceding printk was from a different task and missed
- * a newline, flush and append the newline.
- */
- if (cont.len) {
- if (cont.owner == current && !(lflags & LOG_PREFIX))
- stored = cont_add(facility, level, text,
- text_len);
- cont_flush(LOG_NEWLINE);
- }
-
- if (stored)
- printed_len += text_len;
- else
- printed_len += log_store(facility, level, lflags, 0,
- dict, dictlen, text, text_len);
- }
+ printed_len += log_format_and_store(facility, level, dict, dictlen,
+ fmt, args);

logbuf_cpu = UINT_MAX;
raw_spin_unlock(&logbuf_lock);
--
1.9.1

2015-07-03 10:52:14

by Marcin Niesluchowski

[permalink] [raw]
Subject: [RFC 3/8] kmsg: introduce additional kmsg devices support

kmsg device provides operations on cyclic logging buffer used mainly
by kernel but also in userspace by privileged processes.

Additional kmsg devices keep the same log format but may be added
dynamically with custom size.

Signed-off-by: Marcin Niesluchowski <[email protected]>
---
drivers/char/mem.c | 8 +
fs/proc/kmsg.c | 4 +-
include/linux/printk.h | 3 +
kernel/printk/printk.c | 659 ++++++++++++++++++++++++++++++++-----------------
4 files changed, 439 insertions(+), 235 deletions(-)

diff --git a/drivers/char/mem.c b/drivers/char/mem.c
index 6b1721f..e518040 100644
--- a/drivers/char/mem.c
+++ b/drivers/char/mem.c
@@ -804,6 +804,10 @@ static const struct memdev {
#endif
};

+#ifdef CONFIG_PRINTK
+#define KMSG_MINOR 11
+#endif
+
static int memory_open(struct inode *inode, struct file *filp)
{
int minor;
@@ -851,6 +855,10 @@ static int __init chr_dev_init(void)
if (IS_ERR(mem_class))
return PTR_ERR(mem_class);

+#ifdef CONFIG_PRINTK
+ init_kmsg_minor(KMSG_MINOR);
+#endif
+
mem_class->devnode = mem_devnode;
for (minor = 1; minor < ARRAY_SIZE(devlist); minor++) {
if (!devlist[minor].name)
diff --git a/fs/proc/kmsg.c b/fs/proc/kmsg.c
index 05f8dcd..0d354e4 100644
--- a/fs/proc/kmsg.c
+++ b/fs/proc/kmsg.c
@@ -17,7 +17,7 @@
#include <asm/uaccess.h>
#include <asm/io.h>

-extern wait_queue_head_t log_wait;
+extern wait_queue_head_t *log_wait;

static int kmsg_open(struct inode * inode, struct file * file)
{
@@ -41,7 +41,7 @@ static ssize_t kmsg_read(struct file *file, char __user *buf,

static unsigned int kmsg_poll(struct file *file, poll_table *wait)
{
- poll_wait(file, &log_wait, wait);
+ poll_wait(file, log_wait, wait);
if (do_syslog(SYSLOG_ACTION_SIZE_UNREAD, NULL, 0, SYSLOG_FROM_PROC))
return POLLIN | POLLRDNORM;
return 0;
diff --git a/include/linux/printk.h b/include/linux/printk.h
index 58b1fec..d3b5f23 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -418,6 +418,9 @@ do { \
#endif

extern const struct file_operations kmsg_fops;
+extern void init_kmsg_minor(int minor);
+
+extern int kmsg_sys_mode(int minor, umode_t *mode);

enum {
DUMP_PREFIX_NONE,
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 105887c..7f30c8b 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -234,29 +234,37 @@ struct printk_log {
u8 level:3; /* syslog level */
};

+struct log_buffer {
+#ifdef CONFIG_PRINTK
+ struct list_head list; /* kmsg as head of the list */
+ char *buf; /* cyclic log buffer */
+ u32 len; /* buffer length */
+ wait_queue_head_t wait; /* wait queue for kmsg buffer */
+#endif
/*
- * The logbuf_lock protects kmsg buffer, indices, counters. This can be taken
- * within the scheduler's rq lock. It must be released before calling
- * console_unlock() or anything else that might wake up a process.
+ * The lock protects kmsg buffer, indices, counters. This can be taken within
+ * the scheduler's rq lock. It must be released before calling console_unlock()
+ * or anything else that might wake up a process.
*/
-static DEFINE_RAW_SPINLOCK(logbuf_lock);
+ raw_spinlock_t lock;
+ u64 first_seq; /* sequence number of the first record stored */
+ u32 first_idx; /* index of the first record stored */
+/* sequence number of the next record to store */
+ u64 next_seq;
+#ifdef CONFIG_PRINTK
+ u32 next_idx; /* index of the next record to store */
+ int mode; /* mode of device (kmsg_sys only) */
+ int minor; /* minor representing buffer device */
+#endif
+};

#ifdef CONFIG_PRINTK
-DECLARE_WAIT_QUEUE_HEAD(log_wait);
/* the next printk record to read by syslog(READ) or /proc/kmsg */
static u64 syslog_seq;
static u32 syslog_idx;
static enum log_flags syslog_prev;
static size_t syslog_partial;

-/* index and sequence number of the first record stored in the buffer */
-static u64 log_first_seq;
-static u32 log_first_idx;
-
-/* index and sequence number of the next record to store in the buffer */
-static u64 log_next_seq;
-static u32 log_next_idx;
-
/* the next printk record to write to the console */
static u64 console_seq;
static u32 console_idx;
@@ -275,21 +283,35 @@ static u32 clear_idx;
#else
#define LOG_ALIGN __alignof__(struct printk_log)
#endif
-#define __LOG_BUF_LEN (1 << CONFIG_LOG_BUF_SHIFT)
-static char __log_buf[__LOG_BUF_LEN] __aligned(LOG_ALIGN);
-static char *log_buf = __log_buf;
-static u32 log_buf_len = __LOG_BUF_LEN;
+#define __LOG_BUF_K_LEN (1 << CONFIG_LOG_BUF_SHIFT)
+static char __log_buf_k[__LOG_BUF_K_LEN] __aligned(LOG_ALIGN);
+
+static struct log_buffer log_buf = {
+ .list = LIST_HEAD_INIT(log_buf.list),
+ .buf = __log_buf_k,
+ .len = __LOG_BUF_K_LEN,
+ .lock = __RAW_SPIN_LOCK_UNLOCKED(log_buf.lock),
+ .wait = __WAIT_QUEUE_HEAD_INITIALIZER(log_buf.wait),
+ .first_seq = 0,
+ .first_idx = 0,
+ .next_seq = 0,
+ .next_idx = 0,
+ .mode = 0,
+ .minor = 0,
+};
+
+wait_queue_head_t *log_wait = &log_buf.wait;

/* Return log buffer address */
char *log_buf_addr_get(void)
{
- return log_buf;
+ return log_buf.buf;
}

/* Return log buffer size */
u32 log_buf_len_get(void)
{
- return log_buf_len;
+ return log_buf.len;
}

/* human readable text of the record */
@@ -305,23 +327,23 @@ static char *log_dict(const struct printk_log *msg)
}

/* get record by index; idx must point to valid msg */
-static struct printk_log *log_from_idx(u32 idx)
+static struct printk_log *log_from_idx(struct log_buffer *log_b, u32 idx)
{
- struct printk_log *msg = (struct printk_log *)(log_buf + idx);
+ struct printk_log *msg = (struct printk_log *)(log_b->buf + idx);

/*
* A length == 0 record is the end of buffer marker. Wrap around and
* read the message at the start of the buffer.
*/
if (!msg->len)
- return (struct printk_log *)log_buf;
+ return (struct printk_log *)log_b->buf;
return msg;
}

/* get next record; idx must point to valid msg */
-static u32 log_next(u32 idx)
+static u32 log_next(struct log_buffer *log_b, u32 idx)
{
- struct printk_log *msg = (struct printk_log *)(log_buf + idx);
+ struct printk_log *msg = (struct printk_log *)(log_b->buf + idx);

/* length == 0 indicates the end of the buffer; wrap */
/*
@@ -330,7 +352,7 @@ static u32 log_next(u32 idx)
* return the one after that.
*/
if (!msg->len) {
- msg = (struct printk_log *)log_buf;
+ msg = (struct printk_log *)log_b->buf;
return msg->len;
}
return idx + msg->len;
@@ -345,14 +367,14 @@ static u32 log_next(u32 idx)
* If the buffer is empty, we must respect the position of the indexes.
* They cannot be reset to the beginning of the buffer.
*/
-static int logbuf_has_space(u32 msg_size, bool empty)
+static int logbuf_has_space(struct log_buffer *log_b, u32 msg_size, bool empty)
{
u32 free;

- if (log_next_idx > log_first_idx || empty)
- free = max(log_buf_len - log_next_idx, log_first_idx);
+ if (log_b->next_idx > log_b->first_idx || empty)
+ free = max(log_b->len - log_b->next_idx, log_b->first_idx);
else
- free = log_first_idx - log_next_idx;
+ free = log_b->first_idx - log_b->next_idx;

/*
* We need space also for an empty header that signalizes wrapping
@@ -361,18 +383,18 @@ static int logbuf_has_space(u32 msg_size, bool empty)
return free >= msg_size + sizeof(struct printk_log);
}

-static int log_make_free_space(u32 msg_size)
+static int log_make_free_space(struct log_buffer *log_b, u32 msg_size)
{
- while (log_first_seq < log_next_seq) {
- if (logbuf_has_space(msg_size, false))
+ while (log_b->first_seq < log_b->next_seq) {
+ if (logbuf_has_space(log_b, msg_size, false))
return 0;
/* drop old messages until we have enough contiguous space */
- log_first_idx = log_next(log_first_idx);
- log_first_seq++;
+ log_b->first_idx = log_next(log_b, log_b->first_idx);
+ log_b->first_seq++;
}

/* sequence numbers are equal, so the log buffer is empty */
- if (logbuf_has_space(msg_size, true))
+ if (logbuf_has_space(log_b, msg_size, true))
return 0;

return -ENOMEM;
@@ -398,14 +420,15 @@ static u32 msg_used_size(u16 text_len, u16 dict_len, u32 *pad_len)
#define MAX_LOG_TAKE_PART 4
static const char trunc_msg[] = "<truncated>";

-static u32 truncate_msg(u16 *text_len, u16 *trunc_msg_len,
+static u32 truncate_msg(struct log_buffer *log_b,
+ u16 *text_len, u16 *trunc_msg_len,
u16 *dict_len, u32 *pad_len)
{
/*
* The message should not take the whole buffer. Otherwise, it might
* get removed too soon.
*/
- u32 max_text_len = log_buf_len / MAX_LOG_TAKE_PART;
+ u32 max_text_len = log_b->len / MAX_LOG_TAKE_PART;
if (*text_len > max_text_len)
*text_len = max_text_len;
/* enable the warning message */
@@ -417,7 +440,8 @@ static u32 truncate_msg(u16 *text_len, u16 *trunc_msg_len,
}

/* insert record into the buffer, discard old ones, update heads */
-static int log_store(int facility, int level,
+static int log_store(struct log_buffer *log_b,
+ int facility, int level,
enum log_flags flags, u64 ts_nsec,
const char *dict, u16 dict_len,
const char *text, u16 text_len)
@@ -429,27 +453,28 @@ static int log_store(int facility, int level,
/* number of '\0' padding bytes to next message */
size = msg_used_size(text_len, dict_len, &pad_len);

- if (log_make_free_space(size)) {
+ if (log_make_free_space(log_b, size)) {
/* truncate the message if it is too long for empty buffer */
- size = truncate_msg(&text_len, &trunc_msg_len,
+ size = truncate_msg(log_b, &text_len, &trunc_msg_len,
&dict_len, &pad_len);
/* survive when the log buffer is too small for trunc_msg */
- if (log_make_free_space(size))
+ if (log_make_free_space(log_b, size))
return 0;
}

- if (log_next_idx + size + sizeof(struct printk_log) > log_buf_len) {
+ if (log_b->next_idx + size + sizeof(struct printk_log) > log_b->len) {
/*
* This message + an additional empty header does not fit
* at the end of the buffer. Add an empty header with len == 0
* to signify a wrap around.
*/
- memset(log_buf + log_next_idx, 0, sizeof(struct printk_log));
- log_next_idx = 0;
+ memset(log_b->buf + log_b->next_idx, 0,
+ sizeof(struct printk_log));
+ log_b->next_idx = 0;
}

/* fill message */
- msg = (struct printk_log *)(log_buf + log_next_idx);
+ msg = (struct printk_log *)(log_b->buf + log_b->next_idx);
memcpy(log_text(msg), text, text_len);
msg->text_len = text_len;
if (trunc_msg_len) {
@@ -469,8 +494,8 @@ static int log_store(int facility, int level,
msg->len = size;

/* insert message */
- log_next_idx += msg->len;
- log_next_seq++;
+ log_b->next_idx += msg->len;
+ log_b->next_seq++;

return msg->text_len;
}
@@ -525,8 +550,9 @@ static void cont_flush(enum log_flags flags)
* console; wait for the console to pick up the rest of the
* line. LOG_NOCONS suppresses a duplicated output.
*/
- log_store(cont.facility, cont.level, flags | LOG_NOCONS,
- cont.ts_nsec, NULL, 0, cont.buf, cont.len);
+ log_store(&log_buf, cont.facility, cont.level,
+ flags | LOG_NOCONS, cont.ts_nsec, NULL, 0,
+ cont.buf, cont.len);
cont.flags = flags;
cont.flushed = true;
} else {
@@ -534,7 +560,7 @@ static void cont_flush(enum log_flags flags)
* If no fragment of this line ever reached the console,
* just submit it to the store and free the buffer.
*/
- log_store(cont.facility, cont.level, flags, 0,
+ log_store(&log_buf, cont.facility, cont.level, flags, 0,
NULL, 0, cont.buf, cont.len);
cont.len = 0;
}
@@ -602,7 +628,8 @@ static size_t cont_print_text(char *text, size_t size)
return textlen;
}

-static int log_format_and_store(int facility, int level,
+static int log_format_and_store(struct log_buffer *log_b,
+ int facility, int level,
const char *dict, size_t dictlen,
const char *fmt, va_list args)
{
@@ -655,6 +682,10 @@ static int log_format_and_store(int facility, int level,
if (dict)
lflags |= LOG_PREFIX|LOG_NEWLINE;

+ if (log_b != &log_buf)
+ return log_store(log_b, facility, level, lflags, 0,
+ dict, dictlen, text, text_len);
+
if (!(lflags & LOG_NEWLINE)) {
/*
* Flush the conflicting buffer. An earlier newline was missing,
@@ -667,7 +698,7 @@ static int log_format_and_store(int facility, int level,
if (cont_add(facility, level, text, text_len))
printed_len += text_len;
else
- printed_len += log_store(facility, level,
+ printed_len += log_store(log_b, facility, level,
lflags | LOG_CONT, 0,
dict, dictlen, text, text_len);
} else {
@@ -691,7 +722,7 @@ static int log_format_and_store(int facility, int level,
if (stored)
printed_len += text_len;
else
- printed_len += log_store(facility, level,
+ printed_len += log_store(log_b, facility, level,
lflags, 0, dict, dictlen,
text, text_len);
}
@@ -831,6 +862,34 @@ struct devkmsg_user {
char buf[CONSOLE_EXT_LOG_MAX];
};

+static int kmsg_sys_write(int minor, int level, const char *fmt, ...)
+{
+ va_list args;
+ int ret = -ENXIO;
+ struct log_buffer *log_b;
+
+ rcu_read_lock();
+ list_for_each_entry_rcu(log_b, &log_buf.list, list) {
+ if (log_b->minor != minor)
+ continue;
+
+ raw_spin_lock(&log_b->lock);
+
+ va_start(args, fmt);
+ log_format_and_store(log_b, 1 /* LOG_USER */, level,
+ NULL, 0, fmt, args);
+ va_end(args);
+ wake_up_interruptible(&log_b->wait);
+
+ raw_spin_unlock(&log_b->lock);
+
+ ret = 0;
+ break;
+ }
+ rcu_read_unlock();
+ return ret;
+}
+
static ssize_t devkmsg_write(struct kiocb *iocb, struct iov_iter *from)
{
char *buf, *line;
@@ -839,6 +898,7 @@ static ssize_t devkmsg_write(struct kiocb *iocb, struct iov_iter *from)
int facility = 1; /* LOG_USER */
size_t len = iov_iter_count(from);
ssize_t ret = len;
+ int minor = iminor(iocb->ki_filp->f_inode);

if (len > LOG_LINE_MAX)
return -EINVAL;
@@ -876,51 +936,56 @@ static ssize_t devkmsg_write(struct kiocb *iocb, struct iov_iter *from)
}
}

- printk_emit(facility, level, NULL, 0, "%s", line);
+ if (minor == log_buf.minor) {
+ printk_emit(facility, level, NULL, 0, "%s", line);
+ } else {
+ int error = kmsg_sys_write(minor, level, "%s", line);
+
+ if (error)
+ ret = error;
+ }
+
kfree(buf);
return ret;
}

-static ssize_t devkmsg_read(struct file *file, char __user *buf,
- size_t count, loff_t *ppos)
+static ssize_t kmsg_read(struct log_buffer *log_b, struct file *file,
+ char __user *buf, size_t count, loff_t *ppos)
{
struct devkmsg_user *user = file->private_data;
struct printk_log *msg;
size_t len;
ssize_t ret;

- if (!user)
- return -EBADF;
-
ret = mutex_lock_interruptible(&user->lock);
if (ret)
return ret;
- raw_spin_lock_irq(&logbuf_lock);
- while (user->seq == log_next_seq) {
+ raw_spin_lock_irq(&log_b->lock);
+ while (user->seq == log_b->next_seq) {
if (file->f_flags & O_NONBLOCK) {
ret = -EAGAIN;
- raw_spin_unlock_irq(&logbuf_lock);
+ raw_spin_unlock_irq(&log_b->lock);
goto out;
}

- raw_spin_unlock_irq(&logbuf_lock);
- ret = wait_event_interruptible(log_wait,
- user->seq != log_next_seq);
+ raw_spin_unlock_irq(&log_b->lock);
+ ret = wait_event_interruptible(log_b->wait,
+ user->seq != log_b->next_seq);
if (ret)
goto out;
- raw_spin_lock_irq(&logbuf_lock);
+ raw_spin_lock_irq(&log_b->lock);
}

- if (user->seq < log_first_seq) {
+ if (user->seq < log_b->first_seq) {
/* our last seen message is gone, return error and reset */
- user->idx = log_first_idx;
- user->seq = log_first_seq;
+ user->idx = log_b->first_idx;
+ user->seq = log_b->first_seq;
ret = -EPIPE;
- raw_spin_unlock_irq(&logbuf_lock);
+ raw_spin_unlock_irq(&log_b->lock);
goto out;
}

- msg = log_from_idx(user->idx);
+ msg = log_from_idx(log_b, user->idx);
len = msg_print_ext_header(user->buf, sizeof(user->buf),
msg, user->seq, user->prev);
len += msg_print_ext_body(user->buf + len, sizeof(user->buf) - len,
@@ -928,9 +993,9 @@ static ssize_t devkmsg_read(struct file *file, char __user *buf,
log_text(msg), msg->text_len);

user->prev = msg->flags;
- user->idx = log_next(user->idx);
+ user->idx = log_next(log_b, user->idx);
user->seq++;
- raw_spin_unlock_irq(&logbuf_lock);
+ raw_spin_unlock_irq(&log_b->lock);

if (len > count) {
ret = -EINVAL;
@@ -945,26 +1010,53 @@ static ssize_t devkmsg_read(struct file *file, char __user *buf,
out:
mutex_unlock(&user->lock);
return ret;
+
}

-static loff_t devkmsg_llseek(struct file *file, loff_t offset, int whence)
+static ssize_t devkmsg_read(struct file *file, char __user *buf,
+ size_t count, loff_t *ppos)
{
struct devkmsg_user *user = file->private_data;
- loff_t ret = 0;
+ ssize_t ret = -ENXIO;
+ int minor = iminor(file->f_inode);
+ struct log_buffer *log_b;

if (!user)
return -EBADF;
- if (offset)
- return -ESPIPE;

- raw_spin_lock_irq(&logbuf_lock);
+ if (minor == log_buf.minor)
+ return kmsg_read(&log_buf, file, buf, count, ppos);
+
+ rcu_read_lock();
+ list_for_each_entry_rcu(log_b, &log_buf.list, list) {
+ if (log_b->minor == minor) {
+ ret = kmsg_read(log_b, file, buf, count, ppos);
+ break;
+ }
+ }
+ rcu_read_unlock();
+ return ret;
+}
+
+static loff_t kmsg_llseek(struct log_buffer *log_b, struct file *file,
+ int whence)
+{
+ struct devkmsg_user *user = file->private_data;
+ loff_t ret = 0;
+
+ raw_spin_lock_irq(&log_b->lock);
switch (whence) {
case SEEK_SET:
/* the first record */
- user->idx = log_first_idx;
- user->seq = log_first_seq;
+ user->idx = log_b->first_idx;
+ user->seq = log_b->first_seq;
break;
case SEEK_DATA:
+ /* no clear index for kmsg_sys buffers */
+ if (log_b != &log_buf) {
+ ret = -EINVAL;
+ break;
+ }
/*
* The first record after the last SYSLOG_ACTION_CLEAR,
* like issued by 'dmesg -c'. Reading /dev/kmsg itself
@@ -975,52 +1067,90 @@ static loff_t devkmsg_llseek(struct file *file, loff_t offset, int whence)
break;
case SEEK_END:
/* after the last record */
- user->idx = log_next_idx;
- user->seq = log_next_seq;
+ user->idx = log_b->next_idx;
+ user->seq = log_b->next_seq;
break;
default:
ret = -EINVAL;
}
- raw_spin_unlock_irq(&logbuf_lock);
+ raw_spin_unlock_irq(&log_b->lock);
return ret;
}

-static unsigned int devkmsg_poll(struct file *file, poll_table *wait)
+static loff_t devkmsg_llseek(struct file *file, loff_t offset, int whence)
{
struct devkmsg_user *user = file->private_data;
- int ret = 0;
+ loff_t ret = -ENXIO;
+ int minor = iminor(file->f_inode);
+ struct log_buffer *log_b;

if (!user)
- return POLLERR|POLLNVAL;
+ return -EBADF;
+ if (offset)
+ return -ESPIPE;
+
+ if (minor == log_buf.minor)
+ return kmsg_llseek(&log_buf, file, whence);
+
+ rcu_read_lock();
+ list_for_each_entry_rcu(log_b, &log_buf.list, list) {
+ if (log_b->minor == minor) {
+ ret = kmsg_llseek(log_b, file, whence);
+ break;
+ }
+ }
+ rcu_read_unlock();
+ return ret;
+}

- poll_wait(file, &log_wait, wait);
+static unsigned int kmsg_poll(struct log_buffer *log_b,
+ struct file *file, poll_table *wait)
+{
+ struct devkmsg_user *user = file->private_data;
+ int ret = 0;
+
+ poll_wait(file, &log_b->wait, wait);

- raw_spin_lock_irq(&logbuf_lock);
- if (user->seq < log_next_seq) {
+ raw_spin_lock_irq(&log_b->lock);
+ if (user->seq < log_b->next_seq) {
/* return error when data has vanished underneath us */
- if (user->seq < log_first_seq)
+ if (user->seq < log_b->first_seq)
ret = POLLIN|POLLRDNORM|POLLERR|POLLPRI;
else
ret = POLLIN|POLLRDNORM;
}
- raw_spin_unlock_irq(&logbuf_lock);
+ raw_spin_unlock_irq(&log_b->lock);

return ret;
}

-static int devkmsg_open(struct inode *inode, struct file *file)
+static unsigned int devkmsg_poll(struct file *file, poll_table *wait)
{
- struct devkmsg_user *user;
- int err;
+ struct devkmsg_user *user = file->private_data;
+ int ret = POLLERR|POLLNVAL;
+ int minor = iminor(file->f_inode);
+ struct log_buffer *log_b;

- /* write-only does not need any file context */
- if ((file->f_flags & O_ACCMODE) == O_WRONLY)
- return 0;
+ if (!user)
+ return POLLERR|POLLNVAL;

- err = check_syslog_permissions(SYSLOG_ACTION_READ_ALL,
- SYSLOG_FROM_READER);
- if (err)
- return err;
+ if (minor == log_buf.minor)
+ return kmsg_poll(&log_buf, file, wait);
+
+ rcu_read_lock();
+ list_for_each_entry_rcu(log_b, &log_buf.list, list) {
+ if (log_b->minor == minor) {
+ ret = kmsg_poll(log_b, file, wait);
+ break;
+ }
+ }
+ rcu_read_unlock();
+ return ret;
+}
+
+static int kmsg_open(struct log_buffer *log_b, struct file *file)
+{
+ struct devkmsg_user *user;

user = kmalloc(sizeof(struct devkmsg_user), GFP_KERNEL);
if (!user)
@@ -1028,15 +1158,45 @@ static int devkmsg_open(struct inode *inode, struct file *file)

mutex_init(&user->lock);

- raw_spin_lock_irq(&logbuf_lock);
- user->idx = log_first_idx;
- user->seq = log_first_seq;
- raw_spin_unlock_irq(&logbuf_lock);
+ raw_spin_lock_irq(&log_b->lock);
+ user->idx = log_b->first_idx;
+ user->seq = log_b->first_seq;
+ raw_spin_unlock_irq(&log_b->lock);

file->private_data = user;
return 0;
}

+static int devkmsg_open(struct inode *inode, struct file *file)
+{
+ int ret = -ENXIO;
+ int minor = iminor(file->f_inode);
+ struct log_buffer *log_b;
+
+ /* write-only does not need any file context */
+ if ((file->f_flags & O_ACCMODE) == O_WRONLY)
+ return 0;
+
+ if (minor == log_buf.minor) {
+ ret = check_syslog_permissions(SYSLOG_ACTION_READ_ALL,
+ SYSLOG_FROM_READER);
+ if (ret)
+ return ret;
+
+ return kmsg_open(&log_buf, file);
+ }
+
+ rcu_read_lock();
+ list_for_each_entry_rcu(log_b, &log_buf.list, list) {
+ if (log_b->minor == minor) {
+ ret = kmsg_open(log_b, file);
+ break;
+ }
+ }
+ rcu_read_unlock();
+ return ret;
+}
+
static int devkmsg_release(struct inode *inode, struct file *file)
{
struct devkmsg_user *user = file->private_data;
@@ -1058,6 +1218,30 @@ const struct file_operations kmsg_fops = {
.release = devkmsg_release,
};

+/* Should be used before device registration */
+void init_kmsg_minor(int minor)
+{
+ log_buf.minor = minor;
+}
+
+int kmsg_sys_mode(int minor, umode_t *mode)
+{
+ int ret = -ENXIO;
+ struct log_buffer *log_b;
+
+ rcu_read_lock();
+ list_for_each_entry_rcu(log_b, &log_buf.list, list) {
+ if (log_b->minor == minor) {
+ *mode = log_b->mode;
+ ret = 0;
+ break;
+ }
+ }
+ rcu_read_unlock();
+
+ return ret;
+}
+
#ifdef CONFIG_KEXEC
/*
* This appends the listed symbols to /proc/vmcore
@@ -1069,10 +1253,10 @@ const struct file_operations kmsg_fops = {
*/
void log_buf_kexec_setup(void)
{
- VMCOREINFO_SYMBOL(log_buf);
- VMCOREINFO_SYMBOL(log_buf_len);
- VMCOREINFO_SYMBOL(log_first_idx);
- VMCOREINFO_SYMBOL(log_next_idx);
+ VMCOREINFO_SYMBOL(log_buf.buf);
+ VMCOREINFO_SYMBOL(log_buf.len);
+ VMCOREINFO_SYMBOL(log_buf.first_idx);
+ VMCOREINFO_SYMBOL(log_buf.next_idx);
/*
* Export struct printk_log size and field offsets. User space tools can
* parse it and detect any changes to structure down the line.
@@ -1093,7 +1277,7 @@ static void __init log_buf_len_update(unsigned size)
{
if (size)
size = roundup_pow_of_two(size);
- if (size > log_buf_len)
+ if (size > log_buf.len)
new_log_buf_len = size;
}

@@ -1106,7 +1290,7 @@ static int __init log_buf_len_setup(char *str)

return 0;
}
-early_param("log_buf_len", log_buf_len_setup);
+early_param("log_buf.len", log_buf_len_setup);

#ifdef CONFIG_SMP
#define __LOG_CPU_MAX_BUF_LEN (1 << CONFIG_LOG_CPU_MAX_BUF_SHIFT)
@@ -1126,16 +1310,16 @@ static void __init log_buf_add_cpu(void)
cpu_extra = (num_possible_cpus() - 1) * __LOG_CPU_MAX_BUF_LEN;

/* by default this will only continue through for large > 64 CPUs */
- if (cpu_extra <= __LOG_BUF_LEN / 2)
+ if (cpu_extra <= __LOG_BUF_K_LEN / 2)
return;

- pr_info("log_buf_len individual max cpu contribution: %d bytes\n",
+ pr_info("log_buf.len individual max cpu contribution: %d bytes\n",
__LOG_CPU_MAX_BUF_LEN);
- pr_info("log_buf_len total cpu_extra contributions: %d bytes\n",
+ pr_info("log_buf.len total cpu_extra contributions: %d bytes\n",
cpu_extra);
- pr_info("log_buf_len min size: %d bytes\n", __LOG_BUF_LEN);
+ pr_info("log_buf.len min size: %d bytes\n", __LOG_BUF_K_LEN);

- log_buf_len_update(cpu_extra + __LOG_BUF_LEN);
+ log_buf_len_update(cpu_extra + __LOG_BUF_K_LEN);
}
#else /* !CONFIG_SMP */
static inline void log_buf_add_cpu(void) {}
@@ -1147,7 +1331,7 @@ void __init setup_log_buf(int early)
char *new_log_buf;
int free;

- if (log_buf != __log_buf)
+ if (log_buf.buf != __log_buf_k)
return;

if (!early && !new_log_buf_len)
@@ -1165,22 +1349,22 @@ void __init setup_log_buf(int early)
}

if (unlikely(!new_log_buf)) {
- pr_err("log_buf_len: %ld bytes not available\n",
+ pr_err("log_buf.len: %ld bytes not available\n",
new_log_buf_len);
return;
}

- raw_spin_lock_irqsave(&logbuf_lock, flags);
- log_buf_len = new_log_buf_len;
- log_buf = new_log_buf;
+ raw_spin_lock_irqsave(&log_buf.lock, flags);
+ log_buf.len = new_log_buf_len;
+ log_buf.buf = new_log_buf;
new_log_buf_len = 0;
- free = __LOG_BUF_LEN - log_next_idx;
- memcpy(log_buf, __log_buf, __LOG_BUF_LEN);
- raw_spin_unlock_irqrestore(&logbuf_lock, flags);
+ free = __LOG_BUF_K_LEN - log_buf.next_idx;
+ memcpy(log_buf.buf, __log_buf_k, __LOG_BUF_K_LEN);
+ raw_spin_unlock_irqrestore(&log_buf.lock, flags);

- pr_info("log_buf_len: %d bytes\n", log_buf_len);
+ pr_info("log_buf.len: %d bytes\n", log_buf.len);
pr_info("early log buf free: %d(%d%%)\n",
- free, (free * 100) / __LOG_BUF_LEN);
+ free, (free * 100) / __LOG_BUF_K_LEN);
}

static bool __read_mostly ignore_loglevel;
@@ -1349,26 +1533,26 @@ static int syslog_print(char __user *buf, int size)
size_t n;
size_t skip;

- raw_spin_lock_irq(&logbuf_lock);
- if (syslog_seq < log_first_seq) {
+ raw_spin_lock_irq(&log_buf.lock);
+ if (syslog_seq < log_buf.first_seq) {
/* messages are gone, move to first one */
- syslog_seq = log_first_seq;
- syslog_idx = log_first_idx;
+ syslog_seq = log_buf.first_seq;
+ syslog_idx = log_buf.first_idx;
syslog_prev = 0;
syslog_partial = 0;
}
- if (syslog_seq == log_next_seq) {
- raw_spin_unlock_irq(&logbuf_lock);
+ if (syslog_seq == log_buf.next_seq) {
+ raw_spin_unlock_irq(&log_buf.lock);
break;
}

skip = syslog_partial;
- msg = log_from_idx(syslog_idx);
+ msg = log_from_idx(&log_buf, syslog_idx);
n = msg_print_text(msg, syslog_prev, true, text,
LOG_LINE_MAX + PREFIX_MAX);
if (n - syslog_partial <= size) {
/* message fits into buffer, move forward */
- syslog_idx = log_next(syslog_idx);
+ syslog_idx = log_next(&log_buf, syslog_idx);
syslog_seq++;
syslog_prev = msg->flags;
n -= syslog_partial;
@@ -1379,7 +1563,7 @@ static int syslog_print(char __user *buf, int size)
syslog_partial += n;
} else
n = 0;
- raw_spin_unlock_irq(&logbuf_lock);
+ raw_spin_unlock_irq(&log_buf.lock);

if (!n)
break;
@@ -1408,17 +1592,17 @@ static int syslog_print_all(char __user *buf, int size, bool clear)
if (!text)
return -ENOMEM;

- raw_spin_lock_irq(&logbuf_lock);
+ raw_spin_lock_irq(&log_buf.lock);
if (buf) {
u64 next_seq;
u64 seq;
u32 idx;
enum log_flags prev;

- if (clear_seq < log_first_seq) {
+ if (clear_seq < log_buf.first_seq) {
/* messages are gone, move to first available one */
- clear_seq = log_first_seq;
- clear_idx = log_first_idx;
+ clear_seq = log_buf.first_seq;
+ clear_idx = log_buf.first_idx;
}

/*
@@ -1428,12 +1612,12 @@ static int syslog_print_all(char __user *buf, int size, bool clear)
seq = clear_seq;
idx = clear_idx;
prev = 0;
- while (seq < log_next_seq) {
- struct printk_log *msg = log_from_idx(idx);
+ while (seq < log_buf.next_seq) {
+ struct printk_log *msg = log_from_idx(&log_buf, idx);

len += msg_print_text(msg, prev, true, NULL, 0);
prev = msg->flags;
- idx = log_next(idx);
+ idx = log_next(&log_buf, idx);
seq++;
}

@@ -1441,21 +1625,21 @@ static int syslog_print_all(char __user *buf, int size, bool clear)
seq = clear_seq;
idx = clear_idx;
prev = 0;
- while (len > size && seq < log_next_seq) {
- struct printk_log *msg = log_from_idx(idx);
+ while (len > size && seq < log_buf.next_seq) {
+ struct printk_log *msg = log_from_idx(&log_buf, idx);

len -= msg_print_text(msg, prev, true, NULL, 0);
prev = msg->flags;
- idx = log_next(idx);
+ idx = log_next(&log_buf, idx);
seq++;
}

/* last message fitting into this dump */
- next_seq = log_next_seq;
+ next_seq = log_buf.next_seq;

len = 0;
while (len >= 0 && seq < next_seq) {
- struct printk_log *msg = log_from_idx(idx);
+ struct printk_log *msg = log_from_idx(&log_buf, idx);
int textlen;

textlen = msg_print_text(msg, prev, true, text,
@@ -1464,31 +1648,31 @@ static int syslog_print_all(char __user *buf, int size, bool clear)
len = textlen;
break;
}
- idx = log_next(idx);
+ idx = log_next(&log_buf, idx);
seq++;
prev = msg->flags;

- raw_spin_unlock_irq(&logbuf_lock);
+ raw_spin_unlock_irq(&log_buf.lock);
if (copy_to_user(buf + len, text, textlen))
len = -EFAULT;
else
len += textlen;
- raw_spin_lock_irq(&logbuf_lock);
+ raw_spin_lock_irq(&log_buf.lock);

- if (seq < log_first_seq) {
+ if (seq < log_buf.first_seq) {
/* messages are gone, move to next one */
- seq = log_first_seq;
- idx = log_first_idx;
+ seq = log_buf.first_seq;
+ idx = log_buf.first_idx;
prev = 0;
}
}
}

if (clear) {
- clear_seq = log_next_seq;
- clear_idx = log_next_idx;
+ clear_seq = log_buf.next_seq;
+ clear_idx = log_buf.next_idx;
}
- raw_spin_unlock_irq(&logbuf_lock);
+ raw_spin_unlock_irq(&log_buf.lock);

kfree(text);
return len;
@@ -1520,8 +1704,8 @@ int do_syslog(int type, char __user *buf, int len, int source)
error = -EFAULT;
goto out;
}
- error = wait_event_interruptible(log_wait,
- syslog_seq != log_next_seq);
+ error = wait_event_interruptible(log_buf.wait,
+ syslog_seq != log_buf.next_seq);
if (error)
goto out;
error = syslog_print(buf, len);
@@ -1575,11 +1759,11 @@ int do_syslog(int type, char __user *buf, int len, int source)
break;
/* Number of chars in the log buffer */
case SYSLOG_ACTION_SIZE_UNREAD:
- raw_spin_lock_irq(&logbuf_lock);
- if (syslog_seq < log_first_seq) {
+ raw_spin_lock_irq(&log_buf.lock);
+ if (syslog_seq < log_buf.first_seq) {
/* messages are gone, move to first one */
- syslog_seq = log_first_seq;
- syslog_idx = log_first_idx;
+ syslog_seq = log_buf.first_seq;
+ syslog_idx = log_buf.first_idx;
syslog_prev = 0;
syslog_partial = 0;
}
@@ -1589,28 +1773,30 @@ int do_syslog(int type, char __user *buf, int len, int source)
* for pending data, not the size; return the count of
* records, not the length.
*/
- error = log_next_seq - syslog_seq;
+ error = log_buf.next_seq - syslog_seq;
} else {
u64 seq = syslog_seq;
u32 idx = syslog_idx;
enum log_flags prev = syslog_prev;

error = 0;
- while (seq < log_next_seq) {
- struct printk_log *msg = log_from_idx(idx);
+ while (seq < log_buf.next_seq) {
+ struct printk_log *msg = log_from_idx(&log_buf,
+ idx);

- error += msg_print_text(msg, prev, true, NULL, 0);
- idx = log_next(idx);
+ error += msg_print_text(msg, prev, true,
+ NULL, 0);
+ idx = log_next(&log_buf, idx);
seq++;
prev = msg->flags;
}
error -= syslog_partial;
}
- raw_spin_unlock_irq(&logbuf_lock);
+ raw_spin_unlock_irq(&log_buf.lock);
break;
/* Size of the log buffer */
case SYSLOG_ACTION_SIZE_BUFFER:
- error = log_buf_len;
+ error = log_buf.len;
break;
default:
error = -EINVAL;
@@ -1677,7 +1863,7 @@ static void zap_locks(void)

debug_locks_off();
/* If a crash is occurring, make sure we can't deadlock */
- raw_spin_lock_init(&logbuf_lock);
+ raw_spin_lock_init(&log_buf.lock);
/* And make sure that we print immediately */
sema_init(&console_sem, 1);
}
@@ -1757,7 +1943,7 @@ asmlinkage int vprintk_emit(int facility, int level,
int this_cpu;
int printed_len = 0;
bool in_sched = false;
- /* cpu currently holding logbuf_lock in this function */
+ /* cpu currently holding log_buf.lock in this function */
static unsigned int logbuf_cpu = UINT_MAX;

if (level == LOGLEVEL_SCHED) {
@@ -1792,7 +1978,7 @@ asmlinkage int vprintk_emit(int facility, int level,
}

lockdep_off();
- raw_spin_lock(&logbuf_lock);
+ raw_spin_lock(&log_buf.lock);
logbuf_cpu = this_cpu;

if (unlikely(recursion_bug)) {
@@ -1801,16 +1987,17 @@ asmlinkage int vprintk_emit(int facility, int level,

recursion_bug = 0;
/* emit KERN_CRIT message */
- printed_len += log_store(0, 2, LOG_PREFIX|LOG_NEWLINE, 0,
+ printed_len += log_store(&log_buf, 0, 2,
+ LOG_PREFIX|LOG_NEWLINE, 0,
NULL, 0, recursion_msg,
strlen(recursion_msg));
}

- printed_len += log_format_and_store(facility, level, dict, dictlen,
- fmt, args);
+ printed_len += log_format_and_store(&log_buf, facility, level,
+ dict, dictlen, fmt, args);

logbuf_cpu = UINT_MAX;
- raw_spin_unlock(&logbuf_lock);
+ raw_spin_unlock(&log_buf.lock);
lockdep_on();
local_irq_restore(flags);

@@ -1933,14 +2120,18 @@ EXPORT_SYMBOL(printk);
#define LOG_LINE_MAX 0
#define PREFIX_MAX 0

+static struct log_buffer log_buf = {
+ .lock = __RAW_SPIN_LOCK_UNLOCKED(log_buf.lock),
+ .first_seq = 0,
+ .first_idx = 0,
+ .next_seq = 0,
+};
+
static u64 syslog_seq;
static u32 syslog_idx;
static u64 console_seq;
static u32 console_idx;
static enum log_flags syslog_prev;
-static u64 log_first_seq;
-static u32 log_first_idx;
-static u64 log_next_seq;
static enum log_flags console_prev;
static struct cont {
size_t len;
@@ -1950,8 +2141,9 @@ static struct cont {
} cont;
static char *log_text(const struct printk_log *msg) { return NULL; }
static char *log_dict(const struct printk_log *msg) { return NULL; }
-static struct printk_log *log_from_idx(u32 idx) { return NULL; }
-static u32 log_next(u32 idx) { return 0; }
+static struct printk_log *log_from_idx(struct log_buffer *log_b,
+ u32 idx) { return NULL; }
+static u32 log_next(struct log_buffer *log_b, u32 idx) { return 0; }
static ssize_t msg_print_ext_header(char *buf, size_t size,
struct printk_log *msg, u64 seq,
enum log_flags prev_flags) { return 0; }
@@ -2197,7 +2389,7 @@ static void console_cont_flush(char *text, size_t size)
unsigned long flags;
size_t len;

- raw_spin_lock_irqsave(&logbuf_lock, flags);
+ raw_spin_lock_irqsave(&log_buf.lock, flags);

if (!cont.len)
goto out;
@@ -2207,18 +2399,18 @@ static void console_cont_flush(char *text, size_t size)
* busy. The earlier ones need to be printed before this one, we
* did not flush any fragment so far, so just let it queue up.
*/
- if (console_seq < log_next_seq && !cont.cons)
+ if (console_seq < log_buf.next_seq && !cont.cons)
goto out;

len = cont_print_text(text, size);
- raw_spin_unlock(&logbuf_lock);
+ raw_spin_unlock(&log_buf.lock);
stop_critical_timings();
call_console_drivers(cont.level, NULL, 0, text, len);
start_critical_timings();
local_irq_restore(flags);
return;
out:
- raw_spin_unlock_irqrestore(&logbuf_lock, flags);
+ raw_spin_unlock_irqrestore(&log_buf.lock, flags);
}

/**
@@ -2260,34 +2452,34 @@ again:
size_t len;
int level;

- raw_spin_lock_irqsave(&logbuf_lock, flags);
- if (seen_seq != log_next_seq) {
+ raw_spin_lock_irqsave(&log_buf.lock, flags);
+ if (seen_seq != log_buf.next_seq) {
wake_klogd = true;
- seen_seq = log_next_seq;
+ seen_seq = log_buf.next_seq;
}

- if (console_seq < log_first_seq) {
+ if (console_seq < log_buf.first_seq) {
len = sprintf(text, "** %u printk messages dropped ** ",
- (unsigned)(log_first_seq - console_seq));
+ (unsigned)(log_buf.first_seq - console_seq));

/* messages are gone, move to first one */
- console_seq = log_first_seq;
- console_idx = log_first_idx;
+ console_seq = log_buf.first_seq;
+ console_idx = log_buf.first_idx;
console_prev = 0;
} else {
len = 0;
}
skip:
- if (console_seq == log_next_seq)
+ if (console_seq == log_buf.next_seq)
break;

- msg = log_from_idx(console_idx);
+ msg = log_from_idx(&log_buf, console_idx);
if (msg->flags & LOG_NOCONS) {
/*
* Skip record we have buffered and already printed
* directly to the console when we received it.
*/
- console_idx = log_next(console_idx);
+ console_idx = log_next(&log_buf, console_idx);
console_seq++;
/*
* We will get here again when we register a new
@@ -2302,6 +2494,7 @@ skip:
level = msg->level;
len += msg_print_text(msg, console_prev, false,
text + len, sizeof(text) - len);
+
if (nr_ext_console_drivers) {
ext_len = msg_print_ext_header(ext_text,
sizeof(ext_text),
@@ -2311,10 +2504,10 @@ skip:
log_dict(msg), msg->dict_len,
log_text(msg), msg->text_len);
}
- console_idx = log_next(console_idx);
+ console_idx = log_next(&log_buf, console_idx);
console_seq++;
console_prev = msg->flags;
- raw_spin_unlock(&logbuf_lock);
+ raw_spin_unlock(&log_buf.lock);

stop_critical_timings(); /* don't trace print latency */
call_console_drivers(level, ext_text, ext_len, text, len);
@@ -2327,7 +2520,7 @@ skip:
if (unlikely(exclusive_console))
exclusive_console = NULL;

- raw_spin_unlock(&logbuf_lock);
+ raw_spin_unlock(&log_buf.lock);

up_console_sem();

@@ -2337,9 +2530,9 @@ skip:
* there's a new owner and the console_unlock() from them will do the
* flush, no worries.
*/
- raw_spin_lock(&logbuf_lock);
- retry = console_seq != log_next_seq;
- raw_spin_unlock_irqrestore(&logbuf_lock, flags);
+ raw_spin_lock(&log_buf.lock);
+ retry = console_seq != log_buf.next_seq;
+ raw_spin_unlock_irqrestore(&log_buf.lock, flags);

if (retry && console_trylock())
goto again;
@@ -2583,11 +2776,11 @@ void register_console(struct console *newcon)
* console_unlock(); will print out the buffered messages
* for us.
*/
- raw_spin_lock_irqsave(&logbuf_lock, flags);
+ raw_spin_lock_irqsave(&log_buf.lock, flags);
console_seq = syslog_seq;
console_idx = syslog_idx;
console_prev = syslog_prev;
- raw_spin_unlock_irqrestore(&logbuf_lock, flags);
+ raw_spin_unlock_irqrestore(&log_buf.lock, flags);
/*
* We're about to replay the log buffer. Only do this to the
* just-registered console to avoid excessive message spam to
@@ -2701,7 +2894,7 @@ static void wake_up_klogd_work_func(struct irq_work *irq_work)
}

if (pending & PRINTK_PENDING_WAKEUP)
- wake_up_interruptible(&log_wait);
+ wake_up_interruptible(&log_buf.wait);
}

static DEFINE_PER_CPU(struct irq_work, wake_up_klogd_work) = {
@@ -2712,7 +2905,7 @@ static DEFINE_PER_CPU(struct irq_work, wake_up_klogd_work) = {
void wake_up_klogd(void)
{
preempt_disable();
- if (waitqueue_active(&log_wait)) {
+ if (waitqueue_active(&log_buf.wait)) {
this_cpu_or(printk_pending, PRINTK_PENDING_WAKEUP);
irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
}
@@ -2857,12 +3050,12 @@ void kmsg_dump(enum kmsg_dump_reason reason)
/* initialize iterator with data about the stored records */
dumper->active = true;

- raw_spin_lock_irqsave(&logbuf_lock, flags);
+ raw_spin_lock_irqsave(&log_buf.lock, flags);
dumper->cur_seq = clear_seq;
dumper->cur_idx = clear_idx;
- dumper->next_seq = log_next_seq;
- dumper->next_idx = log_next_idx;
- raw_spin_unlock_irqrestore(&logbuf_lock, flags);
+ dumper->next_seq = log_buf.next_seq;
+ dumper->next_idx = log_buf.next_idx;
+ raw_spin_unlock_irqrestore(&log_buf.lock, flags);

/* invoke dumper which will iterate over records */
dumper->dump(dumper, reason);
@@ -2902,20 +3095,20 @@ bool kmsg_dump_get_line_nolock(struct kmsg_dumper *dumper, bool syslog,
if (!dumper->active)
goto out;

- if (dumper->cur_seq < log_first_seq) {
+ if (dumper->cur_seq < log_buf.first_seq) {
/* messages are gone, move to first available one */
- dumper->cur_seq = log_first_seq;
- dumper->cur_idx = log_first_idx;
+ dumper->cur_seq = log_buf.first_seq;
+ dumper->cur_idx = log_buf.first_idx;
}

/* last entry */
- if (dumper->cur_seq >= log_next_seq)
+ if (dumper->cur_seq >= log_buf.next_seq)
goto out;

- msg = log_from_idx(dumper->cur_idx);
+ msg = log_from_idx(&log_buf, dumper->cur_idx);
l = msg_print_text(msg, 0, syslog, line, size);

- dumper->cur_idx = log_next(dumper->cur_idx);
+ dumper->cur_idx = log_next(&log_buf, dumper->cur_idx);
dumper->cur_seq++;
ret = true;
out:
@@ -2947,9 +3140,9 @@ bool kmsg_dump_get_line(struct kmsg_dumper *dumper, bool syslog,
unsigned long flags;
bool ret;

- raw_spin_lock_irqsave(&logbuf_lock, flags);
+ raw_spin_lock_irqsave(&log_buf.lock, flags);
ret = kmsg_dump_get_line_nolock(dumper, syslog, line, size, len);
- raw_spin_unlock_irqrestore(&logbuf_lock, flags);
+ raw_spin_unlock_irqrestore(&log_buf.lock, flags);

return ret;
}
@@ -2989,16 +3182,16 @@ bool kmsg_dump_get_buffer(struct kmsg_dumper *dumper, bool syslog,
if (!dumper->active)
goto out;

- raw_spin_lock_irqsave(&logbuf_lock, flags);
- if (dumper->cur_seq < log_first_seq) {
+ raw_spin_lock_irqsave(&log_buf.lock, flags);
+ if (dumper->cur_seq < log_buf.first_seq) {
/* messages are gone, move to first available one */
- dumper->cur_seq = log_first_seq;
- dumper->cur_idx = log_first_idx;
+ dumper->cur_seq = log_buf.first_seq;
+ dumper->cur_idx = log_buf.first_idx;
}

/* last entry */
if (dumper->cur_seq >= dumper->next_seq) {
- raw_spin_unlock_irqrestore(&logbuf_lock, flags);
+ raw_spin_unlock_irqrestore(&log_buf.lock, flags);
goto out;
}

@@ -3007,10 +3200,10 @@ bool kmsg_dump_get_buffer(struct kmsg_dumper *dumper, bool syslog,
idx = dumper->cur_idx;
prev = 0;
while (seq < dumper->next_seq) {
- struct printk_log *msg = log_from_idx(idx);
+ struct printk_log *msg = log_from_idx(&log_buf, idx);

l += msg_print_text(msg, prev, true, NULL, 0);
- idx = log_next(idx);
+ idx = log_next(&log_buf, idx);
seq++;
prev = msg->flags;
}
@@ -3020,10 +3213,10 @@ bool kmsg_dump_get_buffer(struct kmsg_dumper *dumper, bool syslog,
idx = dumper->cur_idx;
prev = 0;
while (l > size && seq < dumper->next_seq) {
- struct printk_log *msg = log_from_idx(idx);
+ struct printk_log *msg = log_from_idx(&log_buf, idx);

l -= msg_print_text(msg, prev, true, NULL, 0);
- idx = log_next(idx);
+ idx = log_next(&log_buf, idx);
seq++;
prev = msg->flags;
}
@@ -3034,10 +3227,10 @@ bool kmsg_dump_get_buffer(struct kmsg_dumper *dumper, bool syslog,

l = 0;
while (seq < dumper->next_seq) {
- struct printk_log *msg = log_from_idx(idx);
+ struct printk_log *msg = log_from_idx(&log_buf, idx);

l += msg_print_text(msg, prev, syslog, buf + l, size - l);
- idx = log_next(idx);
+ idx = log_next(&log_buf, idx);
seq++;
prev = msg->flags;
}
@@ -3045,7 +3238,7 @@ bool kmsg_dump_get_buffer(struct kmsg_dumper *dumper, bool syslog,
dumper->next_seq = next_seq;
dumper->next_idx = next_idx;
ret = true;
- raw_spin_unlock_irqrestore(&logbuf_lock, flags);
+ raw_spin_unlock_irqrestore(&log_buf.lock, flags);
out:
if (len)
*len = l;
@@ -3067,8 +3260,8 @@ void kmsg_dump_rewind_nolock(struct kmsg_dumper *dumper)
{
dumper->cur_seq = clear_seq;
dumper->cur_idx = clear_idx;
- dumper->next_seq = log_next_seq;
- dumper->next_idx = log_next_idx;
+ dumper->next_seq = log_buf.next_seq;
+ dumper->next_idx = log_buf.next_idx;
}

/**
@@ -3083,9 +3276,9 @@ void kmsg_dump_rewind(struct kmsg_dumper *dumper)
{
unsigned long flags;

- raw_spin_lock_irqsave(&logbuf_lock, flags);
+ raw_spin_lock_irqsave(&log_buf.lock, flags);
kmsg_dump_rewind_nolock(dumper);
- raw_spin_unlock_irqrestore(&logbuf_lock, flags);
+ raw_spin_unlock_irqrestore(&log_buf.lock, flags);
}
EXPORT_SYMBOL_GPL(kmsg_dump_rewind);

--
1.9.1

2015-07-03 10:52:04

by Marcin Niesluchowski

[permalink] [raw]
Subject: [RFC 4/8] kmsg: add function for adding and deleting additional buffers

Additional kmsg buffers should be created and deleted dynamically.

Adding two functions
* kmsg_sys_buffer_add() creates additional kmsg buffer returning minor
* kmsg_sys_buffer_del() deletes one based on provided minor

Signed-off-by: Marcin Niesluchowski <[email protected]>
---
include/linux/printk.h | 3 ++
kernel/printk/printk.c | 122 +++++++++++++++++++++++++++++++++++++++++++++++--
2 files changed, 122 insertions(+), 3 deletions(-)

diff --git a/include/linux/printk.h b/include/linux/printk.h
index d3b5f23..5806982 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -422,6 +422,9 @@ extern void init_kmsg_minor(int minor);

extern int kmsg_sys_mode(int minor, umode_t *mode);

+extern int kmsg_sys_buffer_add(size_t size, umode_t mode);
+extern void kmsg_sys_buffer_del(int minor);
+
enum {
DUMP_PREFIX_NONE,
DUMP_PREFIX_ADDRESS,
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 7f30c8b..abe78c1 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -46,6 +46,9 @@
#include <linux/utsname.h>
#include <linux/ctype.h>
#include <linux/uio.h>
+#include <linux/slab.h>
+#include <linux/kref.h>
+#include <linux/kdev_t.h>

#include <asm/uaccess.h>

@@ -240,6 +243,7 @@ struct log_buffer {
char *buf; /* cyclic log buffer */
u32 len; /* buffer length */
wait_queue_head_t wait; /* wait queue for kmsg buffer */
+ struct kref refcount; /* refcount for kmsg_sys buffers */
#endif
/*
* The lock protects kmsg buffer, indices, counters. This can be taken within
@@ -292,6 +296,7 @@ static struct log_buffer log_buf = {
.len = __LOG_BUF_K_LEN,
.lock = __RAW_SPIN_LOCK_UNLOCKED(log_buf.lock),
.wait = __WAIT_QUEUE_HEAD_INITIALIZER(log_buf.wait),
+ .refcount = { .refcount = { .counter = 0 } },
.first_seq = 0,
.first_idx = 0,
.next_seq = 0,
@@ -862,6 +867,15 @@ struct devkmsg_user {
char buf[CONSOLE_EXT_LOG_MAX];
};

+void log_buf_release(struct kref *ref)
+{
+ struct log_buffer *log_b = container_of(ref, struct log_buffer,
+ refcount);
+
+ kfree(log_b->buf);
+ kfree(log_b);
+}
+
static int kmsg_sys_write(int minor, int level, const char *fmt, ...)
{
va_list args;
@@ -969,10 +983,24 @@ static ssize_t kmsg_read(struct log_buffer *log_b, struct file *file,
}

raw_spin_unlock_irq(&log_b->lock);
- ret = wait_event_interruptible(log_b->wait,
- user->seq != log_b->next_seq);
+
+ if (log_b == &log_buf) {
+ ret = wait_event_interruptible(log_b->wait,
+ user->seq != log_b->next_seq);
+ } else {
+ rcu_read_unlock();
+ kref_get(&log_b->refcount);
+ ret = wait_event_interruptible(log_b->wait,
+ user->seq != log_b->next_seq);
+ if (log_b->minor == -1)
+ ret = -ENXIO;
+ if (kref_put(&log_b->refcount, log_buf_release))
+ ret = -ENXIO;
+ rcu_read_lock();
+ }
if (ret)
goto out;
+
raw_spin_lock_irq(&log_b->lock);
}

@@ -1140,8 +1168,14 @@ static unsigned int devkmsg_poll(struct file *file, poll_table *wait)
rcu_read_lock();
list_for_each_entry_rcu(log_b, &log_buf.list, list) {
if (log_b->minor == minor) {
+ kref_get(&log_b->refcount);
+ rcu_read_unlock();
+
ret = kmsg_poll(log_b, file, wait);
- break;
+
+ if (kref_put(&log_b->refcount, log_buf_release))
+ return POLLERR|POLLNVAL;
+ return ret;
}
}
rcu_read_unlock();
@@ -1242,6 +1276,88 @@ int kmsg_sys_mode(int minor, umode_t *mode)
return ret;
}

+static DEFINE_SPINLOCK(kmsg_sys_list_lock);
+
+int kmsg_sys_buffer_add(size_t size, umode_t mode)
+{
+ unsigned long flags;
+ int minor = log_buf.minor;
+ struct log_buffer *log_b;
+ struct log_buffer *log_b_new;
+
+ if (size < LOG_LINE_MAX + PREFIX_MAX)
+ return -EINVAL;
+
+ log_b_new = kzalloc(sizeof(struct log_buffer), GFP_KERNEL);
+ if (!log_b_new)
+ return -ENOMEM;
+
+ log_b_new->buf = kmalloc(size, GFP_KERNEL);
+ if (!log_b_new->buf) {
+ kfree(log_b_new);
+ return -ENOMEM;
+ }
+
+ log_b_new->len = size;
+ log_b_new->lock = __RAW_SPIN_LOCK_UNLOCKED(log_b_new->lock);
+ init_waitqueue_head(&log_b_new->wait);
+ kref_init(&log_b_new->refcount);
+ log_b_new->mode = mode;
+
+ kref_get(&log_b_new->refcount);
+
+ spin_lock_irqsave(&kmsg_sys_list_lock, flags);
+
+ list_for_each_entry(log_b, &log_buf.list, list) {
+ if (log_b->minor - minor > 1)
+ break;
+
+ minor = log_b->minor;
+ }
+
+ if (!(minor & MINORMASK)) {
+ kref_put(&log_b->refcount, log_buf_release);
+ spin_unlock_irqrestore(&kmsg_sys_list_lock, flags);
+ return -ERANGE;
+ }
+
+ minor += 1;
+ log_b_new->minor = minor;
+
+ list_add_tail_rcu(&log_b_new->list, &log_b->list);
+
+ spin_unlock_irqrestore(&kmsg_sys_list_lock, flags);
+
+ return minor;
+}
+
+void kmsg_sys_buffer_del(int minor)
+{
+ unsigned long flags;
+ struct log_buffer *log_b;
+
+ spin_lock_irqsave(&kmsg_sys_list_lock, flags);
+
+ list_for_each_entry(log_b, &log_buf.list, list) {
+ if (log_b->minor == minor)
+ break;
+ }
+
+ if (log_b == &log_buf) {
+ spin_unlock_irqrestore(&kmsg_sys_list_lock, flags);
+ return;
+ }
+
+ list_del_rcu(&log_b->list);
+
+ spin_unlock_irqrestore(&kmsg_sys_list_lock, flags);
+
+ log_b->minor = -1;
+ wake_up_interruptible(&log_b->wait);
+
+ kref_put(&log_b->refcount, log_buf_release);
+}
+
#ifdef CONFIG_KEXEC
/*
* This appends the listed symbols to /proc/vmcore
--
1.9.1

2015-07-03 10:51:57

by Marcin Niesluchowski

[permalink] [raw]
Subject: [RFC 5/8] kmsg: device support in mem class

Mem class is current class in which kmsg device is holded in.

Mem class is exteded by kmsg_sys devices handling.

Signed-off-by: Marcin Niesluchowski <[email protected]>
---
drivers/char/mem.c | 20 ++++++++++++++++++--
1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/drivers/char/mem.c b/drivers/char/mem.c
index e518040..8d5ba0d 100644
--- a/drivers/char/mem.c
+++ b/drivers/char/mem.c
@@ -815,7 +815,11 @@ static int memory_open(struct inode *inode, struct file *filp)

minor = iminor(inode);
if (minor >= ARRAY_SIZE(devlist))
+#ifdef CONFIG_PRINTK
+ minor = KMSG_MINOR;
+#else
return -ENXIO;
+#endif

dev = &devlist[minor];
if (!dev->fops)
@@ -837,8 +841,20 @@ static const struct file_operations memory_fops = {

static char *mem_devnode(struct device *dev, umode_t *mode)
{
- if (mode && devlist[MINOR(dev->devt)].mode)
- *mode = devlist[MINOR(dev->devt)].mode;
+ int minor;
+
+ if (!mode)
+ return NULL;
+
+ minor = MINOR(dev->devt);
+
+#ifdef CONFIG_PRINTK
+ if (minor >= ARRAY_SIZE(devlist))
+ kmsg_sys_mode(minor, mode);
+ else
+#endif
+ if (devlist[minor].mode)
+ *mode = devlist[minor].mode;
return NULL;
}

--
1.9.1

2015-07-03 10:51:40

by Marcin Niesluchowski

[permalink] [raw]
Subject: [RFC 6/8] kmsg: add predefined _PID, _TID, _COMM keywords to kmsg* log dict

kmsg* devices write operation wrote no dict along with message
Due to usage of kmsg devices in userspace dict has been added
identifying pid, tid and comm of writing process.

Signed-off-by: Marcin Niesluchowski <[email protected]>
---
kernel/printk/printk.c | 40 ++++++++++++++++++++++++++++++++++++----
1 file changed, 36 insertions(+), 4 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index abe78c1..2a7f6a4 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -876,7 +876,34 @@ void log_buf_release(struct kref *ref)
kfree(log_b);
}

-static int kmsg_sys_write(int minor, int level, const char *fmt, ...)
+#define MAX_PID_LEN 20
+#define MAX_TID_LEN 20
+/*
+ * Fromat below describes dict appended to message written from userspace:
+ * "_PID=<pid>\0_TID=<tid>\0_COMM=<comm>"
+ * KMSG_DICT_MAX_LEN definition represents maximal length of this dict.
+ */
+#define KMSG_DICT_MAX_LEN (5 + MAX_PID_LEN + 1 + \
+ 5 + MAX_TID_LEN + 1 + \
+ 6 + TASK_COMM_LEN)
+
+static size_t set_kmsg_dict(char *buf)
+{
+ size_t len;
+
+ len = sprintf(buf, "_PID=%d", task_tgid_nr(current)) + 1;
+ len += sprintf(buf + len, "_TID=%d", task_pid_nr(current)) + 1;
+ memcpy(buf + len, "_COMM=", 6);
+ len += 6;
+ get_task_comm(buf + len, current);
+ while (buf[len] != '\0')
+ len++;
+ return len;
+}
+
+static int kmsg_sys_write(int minor, int level,
+ const char *dict, size_t dictlen,
+ const char *fmt, ...)
{
va_list args;
int ret = -ENXIO;
@@ -891,7 +918,7 @@ static int kmsg_sys_write(int minor, int level, const char *fmt, ...)

va_start(args, fmt);
log_format_and_store(log_b, 1 /* LOG_USER */, level,
- NULL, 0, fmt, args);
+ dict, dictlen, fmt, args);
va_end(args);
wake_up_interruptible(&log_b->wait);

@@ -911,6 +938,8 @@ static ssize_t devkmsg_write(struct kiocb *iocb, struct iov_iter *from)
int level = default_message_loglevel;
int facility = 1; /* LOG_USER */
size_t len = iov_iter_count(from);
+ char dict[KMSG_DICT_MAX_LEN];
+ size_t dictlen;
ssize_t ret = len;
int minor = iminor(iocb->ki_filp->f_inode);

@@ -950,10 +979,13 @@ static ssize_t devkmsg_write(struct kiocb *iocb, struct iov_iter *from)
}
}

+ dictlen = set_kmsg_dict(dict);
+
if (minor == log_buf.minor) {
- printk_emit(facility, level, NULL, 0, "%s", line);
+ printk_emit(facility, level, dict, dictlen, "%s", line);
} else {
- int error = kmsg_sys_write(minor, level, "%s", line);
+ int error = kmsg_sys_write(minor, level, dict, dictlen,
+ "%s", line);

if (error)
ret = error;
--
1.9.1

2015-07-03 10:51:49

by Marcin Niesluchowski

[permalink] [raw]
Subject: [RFC 7/8] kmsg: add ioctl for adding and deleting kmsg* devices

There is no possibility to add/delete kmsg* buffers from userspace.

Adds following ioctl for main kmsg device adding and deleting
additional kmsg devices:
* KMSG_CMD_BUFFER_ADD
* KMSG_CMD_BUFFER_DEL

Signed-off-by: Marcin Niesluchowski <[email protected]>
---
Documentation/ioctl/ioctl-number.txt | 1 +
drivers/char/mem.c | 134 +++++++++++++++++++++++++++++++++--
include/uapi/linux/Kbuild | 1 +
include/uapi/linux/kmsg_ioctl.h | 30 ++++++++
4 files changed, 159 insertions(+), 7 deletions(-)
create mode 100644 include/uapi/linux/kmsg_ioctl.h

diff --git a/Documentation/ioctl/ioctl-number.txt b/Documentation/ioctl/ioctl-number.txt
index 611c522..26c0e53 100644
--- a/Documentation/ioctl/ioctl-number.txt
+++ b/Documentation/ioctl/ioctl-number.txt
@@ -312,6 +312,7 @@ Code Seq#(hex) Include File Comments
<mailto:[email protected]>
0xB1 00-1F PPPoX <mailto:[email protected]>
0xB3 00 linux/mmc/ioctl.h
+0xBB 00-02 uapi/linux/kmsg_ioctl.h
0xC0 00-0F linux/usb/iowarrior.h
0xCA 00-0F uapi/misc/cxl.h
0xCB 00-1F CBM serial IEC bus in development:
diff --git a/drivers/char/mem.c b/drivers/char/mem.c
index 8d5ba0d..2893d8e 100644
--- a/drivers/char/mem.c
+++ b/drivers/char/mem.c
@@ -34,8 +34,14 @@
# include <linux/efi.h>
#endif

+#ifdef CONFIG_PRINTK
+#include <linux/kmsg_ioctl.h>
+#endif
+
#define DEVPORT_MINOR 4

+static struct class *mem_class;
+
static inline unsigned long size_inside_page(unsigned long start,
unsigned long size)
{
@@ -715,6 +721,113 @@ static int open_port(struct inode *inode, struct file *filp)
return capable(CAP_SYS_RAWIO) ? 0 : -EPERM;
}

+#ifdef CONFIG_PRINTK
+#define KMSG_MINOR 11
+
+#define MAX_MINOR_LEN 20
+
+static int kmsg_open_ext(struct inode *inode, struct file *file)
+{
+ return kmsg_fops.open(inode, file);
+}
+
+static ssize_t kmsg_write_iter_ext(struct kiocb *iocb, struct iov_iter *from)
+{
+ return kmsg_fops.write_iter(iocb, from);
+}
+
+static ssize_t kmsg_read_ext(struct file *file, char __user *buf,
+ size_t count, loff_t *ppos)
+{
+ return kmsg_fops.read(file, buf, count, ppos);
+}
+
+static loff_t kmsg_llseek_ext(struct file *file, loff_t offset, int whence)
+{
+ return kmsg_fops.llseek(file, offset, whence);
+}
+
+static unsigned int kmsg_poll_ext(struct file *file,
+ struct poll_table_struct *wait)
+{
+ return kmsg_fops.poll(file, wait);
+}
+
+static long kmsg_ioctl_buffers(struct file *file, unsigned int cmd,
+ unsigned long arg)
+{
+ void __user *argp = (void __user *)arg;
+ size_t size;
+ umode_t mode;
+ char name[4 + MAX_MINOR_LEN + 1];
+ struct device *dev;
+ int minor;
+
+ if (iminor(file->f_inode) != KMSG_MINOR)
+ return -ENOTTY;
+
+ switch (cmd) {
+ case KMSG_CMD_BUFFER_ADD:
+ if (copy_from_user(&size, argp, sizeof(size)))
+ return -EFAULT;
+ argp += sizeof(size);
+ if (copy_from_user(&mode, argp, sizeof(mode)))
+ return -EFAULT;
+ argp += sizeof(mode);
+ minor = kmsg_sys_buffer_add(size, mode);
+ if (minor < 0)
+ return minor;
+ sprintf(name, "kmsg%d", minor);
+ dev = device_create(mem_class, NULL, MKDEV(MEM_MAJOR, minor),
+ NULL, name);
+ if (IS_ERR(dev)) {
+ kmsg_sys_buffer_del(minor);
+ return PTR_ERR(dev);
+ }
+ if (copy_to_user(argp, &minor, sizeof(minor))) {
+ device_destroy(mem_class, MKDEV(MEM_MAJOR, minor));
+ kmsg_sys_buffer_del(minor);
+ return -EFAULT;
+ }
+ return 0;
+ case KMSG_CMD_BUFFER_DEL:
+ if (copy_from_user(&minor, argp, sizeof(minor)))
+ return -EFAULT;
+ if (minor <= KMSG_MINOR)
+ return -EINVAL;
+ device_destroy(mem_class, MKDEV(MEM_MAJOR, minor));
+ kmsg_sys_buffer_del(minor);
+ return 0;
+ }
+ return -ENOTTY;
+}
+
+static long kmsg_unlocked_ioctl_ext(struct file *file, unsigned int cmd,
+ unsigned long arg)
+{
+ long ret = kmsg_ioctl_buffers(file, cmd, arg);
+
+ if (ret == -ENOTTY)
+ return kmsg_fops.unlocked_ioctl(file, cmd, arg);
+ return ret;
+}
+
+static long kmsg_compat_ioctl_ext(struct file *file, unsigned int cmd,
+ unsigned long arg)
+{
+ long ret = kmsg_ioctl_buffers(file, cmd, arg);
+
+ if (ret == -ENOTTY)
+ return kmsg_fops.compat_ioctl(file, cmd, arg);
+ return ret;
+}
+
+static int kmsg_release_ext(struct inode *inode, struct file *file)
+{
+ return kmsg_fops.release(inode, file);
+}
+#endif
+
#define zero_lseek null_lseek
#define full_lseek null_lseek
#define write_zero write_null
@@ -779,6 +892,19 @@ static const struct file_operations full_fops = {
.write = write_full,
};

+#ifdef CONFIG_PRINTK
+static const struct file_operations kmsg_fops_ext = {
+ .open = kmsg_open_ext,
+ .read = kmsg_read_ext,
+ .write_iter = kmsg_write_iter_ext,
+ .llseek = kmsg_llseek_ext,
+ .poll = kmsg_poll_ext,
+ .unlocked_ioctl = kmsg_unlocked_ioctl_ext,
+ .compat_ioctl = kmsg_compat_ioctl_ext,
+ .release = kmsg_release_ext,
+};
+#endif
+
static const struct memdev {
const char *name;
umode_t mode;
@@ -800,14 +926,10 @@ static const struct memdev {
[8] = { "random", 0666, &random_fops, 0 },
[9] = { "urandom", 0666, &urandom_fops, 0 },
#ifdef CONFIG_PRINTK
- [11] = { "kmsg", 0644, &kmsg_fops, 0 },
+ [11] = { "kmsg", 0644, &kmsg_fops_ext, 0 },
#endif
};

-#ifdef CONFIG_PRINTK
-#define KMSG_MINOR 11
-#endif
-
static int memory_open(struct inode *inode, struct file *filp)
{
int minor;
@@ -858,8 +980,6 @@ static char *mem_devnode(struct device *dev, umode_t *mode)
return NULL;
}

-static struct class *mem_class;
-
static int __init chr_dev_init(void)
{
int minor;
diff --git a/include/uapi/linux/Kbuild b/include/uapi/linux/Kbuild
index 1ff9942..faf13a8 100644
--- a/include/uapi/linux/Kbuild
+++ b/include/uapi/linux/Kbuild
@@ -224,6 +224,7 @@ header-y += kernel-page-flags.h
header-y += kexec.h
header-y += keyboard.h
header-y += keyctl.h
+header-y += kmsg_ioctl.h

ifneq ($(wildcard $(srctree)/arch/$(SRCARCH)/include/uapi/asm/kvm.h \
$(srctree)/arch/$(SRCARCH)/include/asm/kvm.h),)
diff --git a/include/uapi/linux/kmsg_ioctl.h b/include/uapi/linux/kmsg_ioctl.h
new file mode 100644
index 0000000..89c0c61
--- /dev/null
+++ b/include/uapi/linux/kmsg_ioctl.h
@@ -0,0 +1,30 @@
+/*
+ * This is ioctl include for kmsg* devices
+ */
+
+#ifndef _KMSG_IOCTL_H_
+#define _KMSG_IOCTL_H_
+
+#include <linux/ioctl.h>
+#include <linux/types.h>
+
+struct kmsg_cmd_buffer_add {
+ size_t size;
+ unsigned short mode;
+ int minor;
+} __attribute__((packed));
+
+#define KMSG_IOCTL_MAGIC 0xBB
+
+/*
+ * A ioctl interface for kmsg device.
+ *
+ * KMSG_CMD_BUFFER_ADD: Creates additional kmsg device based on its size
+ * and mode. Minor of created device is put.
+ * KMSG_CMD_BUFFER_DEL: Removes additional kmsg device based on its minor
+ */
+#define KMSG_CMD_BUFFER_ADD _IOWR(KMSG_IOCTL_MAGIC, 0x00, \
+ struct kmsg_cmd_buffer_add)
+#define KMSG_CMD_BUFFER_DEL _IOW(KMSG_IOCTL_MAGIC, 0x01, int)
+
+#endif
--
1.9.1

2015-07-03 10:51:15

by Marcin Niesluchowski

[permalink] [raw]
Subject: [RFC 8/8] kmsg: add ioctl for kmsg* devices operating on buffers

There is no possibility to clear additional kmsg buffers,
get size of them or know what size should be passed to read
file operation (too small size causes it to retrun -EINVAL).

Add following ioctls which solve those issues:
* KMSG_CMD_GET_BUF_SIZE
* KMSG_CMD_GET_READ_SIZE_MAX
* KMSG_CMD_CLEAR

Signed-off-by: Marcin Niesluchowski <[email protected]>
---
Documentation/ioctl/ioctl-number.txt | 2 +-
include/uapi/linux/kmsg_ioctl.h | 15 +++++
kernel/printk/printk.c | 103 ++++++++++++++++++++++++++---------
3 files changed, 93 insertions(+), 27 deletions(-)

diff --git a/Documentation/ioctl/ioctl-number.txt b/Documentation/ioctl/ioctl-number.txt
index 26c0e53..4b5f715 100644
--- a/Documentation/ioctl/ioctl-number.txt
+++ b/Documentation/ioctl/ioctl-number.txt
@@ -312,7 +312,7 @@ Code Seq#(hex) Include File Comments
<mailto:[email protected]>
0xB1 00-1F PPPoX <mailto:[email protected]>
0xB3 00 linux/mmc/ioctl.h
-0xBB 00-02 uapi/linux/kmsg_ioctl.h
+0xBB 00-83 uapi/linux/kmsg_ioctl.h
0xC0 00-0F linux/usb/iowarrior.h
0xCA 00-0F uapi/misc/cxl.h
0xCB 00-1F CBM serial IEC bus in development:
diff --git a/include/uapi/linux/kmsg_ioctl.h b/include/uapi/linux/kmsg_ioctl.h
index 89c0c61..2389d9f 100644
--- a/include/uapi/linux/kmsg_ioctl.h
+++ b/include/uapi/linux/kmsg_ioctl.h
@@ -27,4 +27,19 @@ struct kmsg_cmd_buffer_add {
struct kmsg_cmd_buffer_add)
#define KMSG_CMD_BUFFER_DEL _IOW(KMSG_IOCTL_MAGIC, 0x01, int)

+/*
+ * A ioctl interface for kmsg* devices.
+ *
+ * KMSG_CMD_GET_BUF_SIZE: Retrieve cyclic log buffer size associated with
+ * device.
+ * KMSG_CMD_GET_READ_SIZE_MAX: Retrieve max size of data read by kmsg read
+ * operation.
+ * KMSG_CMD_CLEAR: Clears cyclic log buffer. After that operation
+ * there is no data to read from buffer unless
+ * logs are written.
+ */
+#define KMSG_CMD_GET_BUF_SIZE _IOR(KMSG_IOCTL_MAGIC, 0x80, __u32)
+#define KMSG_CMD_GET_READ_SIZE_MAX _IOR(KMSG_IOCTL_MAGIC, 0x81, __u32)
+#define KMSG_CMD_CLEAR _IO(KMSG_IOCTL_MAGIC, 0x82)
+
#endif
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 2a7f6a4..740ba79 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -49,6 +49,7 @@
#include <linux/slab.h>
#include <linux/kref.h>
#include <linux/kdev_t.h>
+#include <linux/kmsg_ioctl.h>

#include <asm/uaccess.h>

@@ -257,6 +258,10 @@ struct log_buffer {
u64 next_seq;
#ifdef CONFIG_PRINTK
u32 next_idx; /* index of the next record to store */
+/* sequence number of the next record to read after last 'clear' command */
+ u64 clear_seq;
+/* index of the next record to read after last 'clear' command */
+ u32 clear_idx;
int mode; /* mode of device (kmsg_sys only) */
int minor; /* minor representing buffer device */
#endif
@@ -274,10 +279,6 @@ static u64 console_seq;
static u32 console_idx;
static enum log_flags console_prev;

-/* the next printk record to read after the last 'clear' command */
-static u64 clear_seq;
-static u32 clear_idx;
-
#define PREFIX_MAX 32
#define LOG_LINE_MAX (1024 - PREFIX_MAX)

@@ -301,6 +302,8 @@ static struct log_buffer log_buf = {
.first_idx = 0,
.next_seq = 0,
.next_idx = 0,
+ .clear_seq = 0,
+ .clear_idx = 0,
.mode = 0,
.minor = 0,
};
@@ -1112,18 +1115,14 @@ static loff_t kmsg_llseek(struct log_buffer *log_b, struct file *file,
user->seq = log_b->first_seq;
break;
case SEEK_DATA:
- /* no clear index for kmsg_sys buffers */
- if (log_b != &log_buf) {
- ret = -EINVAL;
- break;
- }
/*
* The first record after the last SYSLOG_ACTION_CLEAR,
- * like issued by 'dmesg -c'. Reading /dev/kmsg itself
- * changes no global state, and does not clear anything.
+ * like issued by 'dmesg -c' or KMSG_CMD_CLEAR ioctl
+ * command. Reading /dev/kmsg itself changes no global
+ * state, and does not clear anything.
*/
- user->idx = clear_idx;
- user->seq = clear_seq;
+ user->idx = log_b->clear_idx;
+ user->seq = log_b->clear_seq;
break;
case SEEK_END:
/* after the last record */
@@ -1263,6 +1262,56 @@ static int devkmsg_open(struct inode *inode, struct file *file)
return ret;
}

+static long kmsg_ioctl(struct log_buffer *log_b, unsigned int cmd,
+ unsigned long arg)
+{
+ void __user *argp = (void __user *)arg;
+ static const u32 read_size_max = CONSOLE_EXT_LOG_MAX;
+
+ switch (cmd) {
+ case KMSG_CMD_GET_BUF_SIZE:
+ if (copy_to_user(argp, &log_b->len, sizeof(u32)))
+ return -EFAULT;
+ break;
+ case KMSG_CMD_GET_READ_SIZE_MAX:
+ if (copy_to_user(argp, &read_size_max, sizeof(u32)))
+ return -EFAULT;
+ break;
+ case KMSG_CMD_CLEAR:
+ if (!capable(CAP_SYSLOG))
+ return -EPERM;
+ raw_spin_lock_irq(&log_b->lock);
+ log_b->clear_seq = log_b->next_seq;
+ log_b->clear_idx = log_b->next_idx;
+ raw_spin_unlock_irq(&log_b->lock);
+ break;
+ default:
+ return -ENOTTY;
+ }
+ return 0;
+}
+
+static long devkmsg_ioctl(struct file *file, unsigned int cmd,
+ unsigned long arg)
+{
+ long ret = -ENXIO;
+ int minor = iminor(file->f_inode);
+ struct log_buffer *log_b;
+
+ if (minor == log_buf.minor)
+ return kmsg_ioctl(&log_buf, cmd, arg);
+
+ rcu_read_lock();
+ list_for_each_entry_rcu(log_b, &log_buf.list, list) {
+ if (log_b->minor == minor) {
+ ret = kmsg_ioctl(log_b, cmd, arg);
+ break;
+ }
+ }
+ rcu_read_unlock();
+ return ret;
+}
+
static int devkmsg_release(struct inode *inode, struct file *file)
{
struct devkmsg_user *user = file->private_data;
@@ -1281,6 +1330,8 @@ const struct file_operations kmsg_fops = {
.write_iter = devkmsg_write,
.llseek = devkmsg_llseek,
.poll = devkmsg_poll,
+ .unlocked_ioctl = devkmsg_ioctl,
+ .compat_ioctl = devkmsg_ioctl,
.release = devkmsg_release,
};

@@ -1747,18 +1798,18 @@ static int syslog_print_all(char __user *buf, int size, bool clear)
u32 idx;
enum log_flags prev;

- if (clear_seq < log_buf.first_seq) {
+ if (log_buf.clear_seq < log_buf.first_seq) {
/* messages are gone, move to first available one */
- clear_seq = log_buf.first_seq;
- clear_idx = log_buf.first_idx;
+ log_buf.clear_seq = log_buf.first_seq;
+ log_buf.clear_idx = log_buf.first_idx;
}

/*
* Find first record that fits, including all following records,
* into the user-provided buffer for this dump.
*/
- seq = clear_seq;
- idx = clear_idx;
+ seq = log_buf.clear_seq;
+ idx = log_buf.clear_idx;
prev = 0;
while (seq < log_buf.next_seq) {
struct printk_log *msg = log_from_idx(&log_buf, idx);
@@ -1770,8 +1821,8 @@ static int syslog_print_all(char __user *buf, int size, bool clear)
}

/* move first record forward until length fits into the buffer */
- seq = clear_seq;
- idx = clear_idx;
+ seq = log_buf.clear_seq;
+ idx = log_buf.clear_idx;
prev = 0;
while (len > size && seq < log_buf.next_seq) {
struct printk_log *msg = log_from_idx(&log_buf, idx);
@@ -1817,8 +1868,8 @@ static int syslog_print_all(char __user *buf, int size, bool clear)
}

if (clear) {
- clear_seq = log_buf.next_seq;
- clear_idx = log_buf.next_idx;
+ log_buf.clear_seq = log_buf.next_seq;
+ log_buf.clear_idx = log_buf.next_idx;
}
raw_spin_unlock_irq(&log_buf.lock);

@@ -3199,8 +3250,8 @@ void kmsg_dump(enum kmsg_dump_reason reason)
dumper->active = true;

raw_spin_lock_irqsave(&log_buf.lock, flags);
- dumper->cur_seq = clear_seq;
- dumper->cur_idx = clear_idx;
+ dumper->cur_seq = log_buf.clear_seq;
+ dumper->cur_idx = log_buf.clear_idx;
dumper->next_seq = log_buf.next_seq;
dumper->next_idx = log_buf.next_idx;
raw_spin_unlock_irqrestore(&log_buf.lock, flags);
@@ -3406,8 +3457,8 @@ EXPORT_SYMBOL_GPL(kmsg_dump_get_buffer);
*/
void kmsg_dump_rewind_nolock(struct kmsg_dumper *dumper)
{
- dumper->cur_seq = clear_seq;
- dumper->cur_idx = clear_idx;
+ dumper->cur_seq = log_buf.clear_seq;
+ dumper->cur_idx = log_buf.clear_idx;
dumper->next_seq = log_buf.next_seq;
dumper->next_idx = log_buf.next_idx;
}
--
1.9.1

2015-07-03 11:21:52

by Richard Weinberger

[permalink] [raw]
Subject: Re: [RFC 0/8] Additional kmsg devices

On Fri, Jul 3, 2015 at 12:49 PM, Marcin Niesluchowski
<[email protected]> wrote:
> Dear All,
>
> This series of patches extends kmsg interface with ability to dynamicaly
> create (and destroy) kmsg-like devices which can be used by user space
> for logging. Logging to kernel has number of benefits, including but not
> limited to - always available, requiring no userspace, automatically
> rotating and low overhead.
>
> User-space logging to kernel cyclic buffers was already successfully used
> in android logger concept but it had certain flaws that this commits try
> to address:
> * drops hardcoded number of devices and static paths in favor for dynamic
> configuration by ioctl interface in userspace
> * extends existing driver instead of creating completely new one

So, now we start moving syslogd into kernel land because userspace is
too broken to provide
decent logging?

I can understand the systemd is using kmsg if no other logging service
is available
but I really don't think we should encourage other programs to do so.

Why can't you just make sure that your target has a working
syslogd/rsyslogd/journald/whatever?
All can be done perfectly fine in userspace.

Just my two cents.

--
Thanks,
//richard

2015-07-03 15:09:16

by Marcin Niesluchowski

[permalink] [raw]
Subject: Re: [RFC 0/8] Additional kmsg devices

On 07/03/2015 01:21 PM, Richard Weinberger wrote:
> On Fri, Jul 3, 2015 at 12:49 PM, Marcin Niesluchowski
> <[email protected]> wrote:
>> Dear All,
>>
>> This series of patches extends kmsg interface with ability to dynamicaly
>> create (and destroy) kmsg-like devices which can be used by user space
>> for logging. Logging to kernel has number of benefits, including but not
>> limited to - always available, requiring no userspace, automatically
>> rotating and low overhead.
>>
>> User-space logging to kernel cyclic buffers was already successfully used
>> in android logger concept but it had certain flaws that this commits try
>> to address:
>> * drops hardcoded number of devices and static paths in favor for dynamic
>> configuration by ioctl interface in userspace
>> * extends existing driver instead of creating completely new one
> So, now we start moving syslogd into kernel land because userspace is
> too broken to provide
> decent logging?
>
> I can understand the systemd is using kmsg if no other logging service
> is available
> but I really don't think we should encourage other programs to do so.
>
> Why can't you just make sure that your target has a working
> syslogd/rsyslogd/journald/whatever?
> All can be done perfectly fine in userspace.
* Message credibility: Lets imagine simple service which collects logs
via unix sockets. There is no reliable way of identifying logging
process. getsockopt() with SO_PEERCRED option would give pid form cred
structure, but according to manual it may not be of actual logging process:
"The returned credentials are those that were in effect at the time
of the call to connect(2) or socketpair(2)."
- select(7)

* Early userspace tool: Helpful especially for embeded systems.

* Reliability: Userspace service may be killed due to out of memory
(OOM). This is kernel cyclic buffer, which size can be specified
differently according to situation.

* Possibility of using it with pstore: This code could be extended to
log additional buffers to persistent storage same way main (kmsg) log
buffer is.

* Use case of attaching file descriptor to stdout/stderr: Especially in
early userspace.

* Performance: Those services mentioned by You are weeker solutions in
that case. Especially systemd-journald is much too heavy soulution.

--

Best Regards,
Marcin Niesluchowski

2015-07-03 15:39:09

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [RFC 5/8] kmsg: device support in mem class

On Fri, Jul 03, 2015 at 12:49:52PM +0200, Marcin Niesluchowski wrote:
> Mem class is current class in which kmsg device is holded in.
>
> Mem class is exteded by kmsg_sys devices handling.
>
> Signed-off-by: Marcin Niesluchowski <[email protected]>
> ---
> drivers/char/mem.c | 20 ++++++++++++++++++--
> 1 file changed, 18 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/char/mem.c b/drivers/char/mem.c
> index e518040..8d5ba0d 100644
> --- a/drivers/char/mem.c
> +++ b/drivers/char/mem.c
> @@ -815,7 +815,11 @@ static int memory_open(struct inode *inode, struct file *filp)
>
> minor = iminor(inode);
> if (minor >= ARRAY_SIZE(devlist))
> +#ifdef CONFIG_PRINTK
> + minor = KMSG_MINOR;
> +#else
> return -ENXIO;
> +#endif

Ick, you are going to have to come up with a better api if you want this
to be able to be merged. I don't want to see #ifdef in .c files, as
these changes just make things much messier.

> @@ -837,8 +841,20 @@ static const struct file_operations memory_fops = {
>
> static char *mem_devnode(struct device *dev, umode_t *mode)
> {
> - if (mode && devlist[MINOR(dev->devt)].mode)
> - *mode = devlist[MINOR(dev->devt)].mode;
> + int minor;
> +
> + if (!mode)
> + return NULL;
> +
> + minor = MINOR(dev->devt);
> +
> +#ifdef CONFIG_PRINTK
> + if (minor >= ARRAY_SIZE(devlist))
> + kmsg_sys_mode(minor, mode);
> + else
> +#endif

Ick, not ok.

greg k-h

2015-07-03 16:58:20

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [RFC 0/8] Additional kmsg devices

On Fri, Jul 3, 2015 at 8:09 AM, Marcin Niesluchowski
<[email protected]> wrote:
>
> * Message credibility: Lets imagine simple service which collects logs via
> unix sockets. There is no reliable way of identifying logging process.
> getsockopt() with SO_PEERCRED option would give pid form cred structure, but
> according to manual it may not be of actual logging process:
> "The returned credentials are those that were in effect at the time of the
> call to connect(2) or socketpair(2)."
> - select(7)

There's SCM_CREDENTIALS, which is dangerous, but it's dangerous in
exactly the same way that your patches are dangerous. You're
collecting PID/TID when write(2) is called, and it's very easy to get
another process to call write(2) on your behalf, because write(2)
isn't supposed to collect credentials.

--Andy

2015-07-07 13:11:38

by Petr Mladek

[permalink] [raw]
Subject: Re: [RFC 0/8] Additional kmsg devices

On Fri 2015-07-03 17:09:03, Marcin Niesluchowski wrote:
> On 07/03/2015 01:21 PM, Richard Weinberger wrote:
> >On Fri, Jul 3, 2015 at 12:49 PM, Marcin Niesluchowski
> ><[email protected]> wrote:
> >>Dear All,
> >>
> >>This series of patches extends kmsg interface with ability to dynamicaly
> >>create (and destroy) kmsg-like devices which can be used by user space
> >>for logging. Logging to kernel has number of benefits, including but not
> >>limited to - always available, requiring no userspace, automatically
> >>rotating and low overhead.
> >>
> >>User-space logging to kernel cyclic buffers was already successfully used
> >>in android logger concept but it had certain flaws that this commits try
> >>to address:
> >>* drops hardcoded number of devices and static paths in favor for dynamic
> >> configuration by ioctl interface in userspace
> >>* extends existing driver instead of creating completely new one
> >So, now we start moving syslogd into kernel land because userspace is
> >too broken to provide
> >decent logging?
> >
> >I can understand the systemd is using kmsg if no other logging service
> >is available
> >but I really don't think we should encourage other programs to do so.
> >
> >Why can't you just make sure that your target has a working
> >syslogd/rsyslogd/journald/whatever?
> >All can be done perfectly fine in userspace.
> * Message credibility: Lets imagine simple service which collects
> logs via unix sockets. There is no reliable way of identifying
> logging process. getsockopt() with SO_PEERCRED option would give pid
> form cred structure, but according to manual it may not be of actual
> logging process:
> "The returned credentials are those that were in effect at the
> time of the call to connect(2) or socketpair(2)."
> - select(7)
>
> * Early userspace tool: Helpful especially for embeded systems.
>
> * Reliability: Userspace service may be killed due to out of memory
> (OOM). This is kernel cyclic buffer, which size can be specified
> differently according to situation.

But then many services will fight for the space in the kernel ring
buffer. We will need a mechanism to guarantee a space for each
service. We will need priorities to throttle various services
various ways. It will be easier to lost messages. It might be
harder to get the important messages on the console when
the system is going down. It will be harder to handle continuous
lines. I am not sure that we want to go this way.

Best Regards,
Petr

2015-07-07 17:12:33

by Karol Lewandowski

[permalink] [raw]
Subject: Re: [RFC 0/8] Additional kmsg devices

On 2015-07-07 15:11, Petr Mladek wrote:
> On Fri 2015-07-03 17:09:03, Marcin Niesluchowski wrote:
>> On 07/03/2015 01:21 PM, Richard Weinberger wrote:
>>> On Fri, Jul 3, 2015 at 12:49 PM, Marcin Niesluchowski
>>> <[email protected]> wrote:
>>>> Dear All,
>>>>
>>>> This series of patches extends kmsg interface with ability to dynamicaly
>>>> create (and destroy) kmsg-like devices which can be used by user space
>>>> for logging. Logging to kernel has number of benefits, including but not
>>>> limited to - always available, requiring no userspace, automatically
>>>> rotating and low overhead.
>>>>
>>>> User-space logging to kernel cyclic buffers was already successfully used
>>>> in android logger concept but it had certain flaws that this commits try
>>>> to address:
>>>> * drops hardcoded number of devices and static paths in favor for dynamic
>>>> configuration by ioctl interface in userspace
>>>> * extends existing driver instead of creating completely new one
>>> So, now we start moving syslogd into kernel land because userspace is
>>> too broken to provide
>>> decent logging?
>>>
>>> I can understand the systemd is using kmsg if no other logging service
>>> is available
>>> but I really don't think we should encourage other programs to do so.
>>>
>>> Why can't you just make sure that your target has a working
>>> syslogd/rsyslogd/journald/whatever?
>>> All can be done perfectly fine in userspace.
>> * Message credibility: Lets imagine simple service which collects
>> logs via unix sockets. There is no reliable way of identifying
>> logging process. getsockopt() with SO_PEERCRED option would give pid
>> form cred structure, but according to manual it may not be of actual
>> logging process:
>> "The returned credentials are those that were in effect at the
>> time of the call to connect(2) or socketpair(2)."
>> - select(7)
>>
>> * Early userspace tool: Helpful especially for embeded systems.
>>
>> * Reliability: Userspace service may be killed due to out of memory
>> (OOM). This is kernel cyclic buffer, which size can be specified
>> differently according to situation.
> But then many services will fight for the space in the kernel ring
> buffer.

Yes. Please note however that problems you describe are also valid for
/dev/kmsg today.

User space has used (one) writeable kmsg for some time already - which
has caused number of interesting problems caused by the fact that
messages from different domains (kernel, userspace) are written to one
place (ie. systemd "debug" problem). One of the goals is to avoid this
particular problem and let userspace create, destroy and user their own
buffers at will.

> We will need a mechanism to guarantee a space for each service.

We preserve semantics of kmsg, so I don't see why we would need to give
any more guarantees that what was provided there.

> We will need priorities to throttle various services various ways.

Appropriate buffer size will throttle messages automatically - I don't
think we need anything more than that. See also below.

> It will be easier to lost messages.

Ability to lose messages is one of the goals - if we exceed buffer size
and there are no one actively reading buffers this is what we want to
happen.

Say we have one buffer where debug messages go - we are not at all
interested in the content and are very happy to lose these... unless in
the case of crash, where we will have all of it dumped to persistent
storage by pstore (this is when content might be interesting and helpful).

This is just one of many possible scenarios.

> It might be harder to get the important messages on the console when
> the system is going down.

Messages from additional buffers are not intended to be written to console.

> It will be harder to handle continuous lines.

I don't see how it would be different from what we have today.

> I am not sure that we want to go this way.

This is why this thread has RFC tag anyway :^)


Thanks

--
Karol Lewandowski, Samsung R&D Institute Poland

2015-07-08 08:30:26

by Marcin Niesluchowski

[permalink] [raw]
Subject: Re: [RFC 0/8] Additional kmsg devices

On 07/03/2015 05:19 PM, Richard Weinberger wrote:
> Am 03.07.2015 um 17:09 schrieb Marcin Niesluchowski:
>>> Why can't you just make sure that your target has a working
>>> syslogd/rsyslogd/journald/whatever?
>>> All can be done perfectly fine in userspace.
>> * Message credibility: Lets imagine simple service which collects logs via unix sockets. There is no reliable way of identifying logging process. getsockopt() with SO_PEERCRED
>> option would give pid form cred structure, but according to manual it may not be of actual logging process:
>> "The returned credentials are those that were in effect at the time of the call to connect(2) or socketpair(2)."
>> - select(7)
> This interface can be improved. Should be easy.

What kind of improvement do you have in mind?

>> * Early userspace tool: Helpful especially for embeded systems.
> This is what we do already. In early user space spawn your logger as early as possible.
> "embedded Linux is special" is not an excuse btw. ;)

I would say "embedded Linux is real use case"instead of "special". What
I meant that it does only require one ioctl and no additional resources
are needed.

>> * Reliability: Userspace service may be killed due to out of memory (OOM). This is kernel cyclic buffer, which size can be specified differently according to situation.
> This is what we have /proc/<pid>/oom_adj and /proc/<pid>/oom_score_adj for.

You are right, but additional resources and complexity is required.

>> * Possibility of using it with pstore: This code could be extended to log additional buffers to persistent storage same way main (kmsg) log buffer is.
> pstorefs and friends?

pstore filesystem is used to access already stored kernel data (e.g.
kmsg buffer). But does not provide mechanism of storing userspace memory.

>> * Use case of attaching file descriptor to stdout/stderr: Especially in early userspace.
> You can redirect these also in userspace.

True for that, but as I said in my first argument there is no
possibility of logging process identification in case of sockets.


>> * Performance: Those services mentioned by You are weeker solutions in that case. Especially systemd-journald is much too heavy soulution.
> Do you have numbers? I agree systemd-journald is heavy wight. But it is by far not the only logging daemon we have...

I compared write operations on kmsg buffervia write/read operations on
socketon SOCK_STREAM socket and sendmsg/recv on SOCK_DGRAM socket.
Compared toSOCK_STREAM socket it was about 39% slowerbut compared
toSOCK_DGRAM socket it was about 326% faster.syslogfor example uses
SOCK_DGRAM sockets.In all cases there were 2^20 (1048576) write/sendmsg
operations of 2^8 (256) bytes.

Best Regards,
Marcin Niesluchowski

2015-07-08 10:45:29

by Petr Mladek

[permalink] [raw]
Subject: Re: [RFC 0/8] Additional kmsg devices

On Tue 2015-07-07 19:10:56, Karol Lewandowski wrote:
> On 2015-07-07 15:11, Petr Mladek wrote:
> > On Fri 2015-07-03 17:09:03, Marcin Niesluchowski wrote:
> >> On 07/03/2015 01:21 PM, Richard Weinberger wrote:
> >>> On Fri, Jul 3, 2015 at 12:49 PM, Marcin Niesluchowski
> >>> <[email protected]> wrote:
> >>>> Dear All,
> >>>>
> >>>> This series of patches extends kmsg interface with ability to dynamicaly
> >>>> create (and destroy) kmsg-like devices which can be used by user space
> >>>> for logging. Logging to kernel has number of benefits, including but not
> >>>> limited to - always available, requiring no userspace, automatically
> >>>> rotating and low overhead.
> >>>>
> >>>> User-space logging to kernel cyclic buffers was already successfully used
> >>>> in android logger concept but it had certain flaws that this commits try
> >>>> to address:
> >>>> * drops hardcoded number of devices and static paths in favor for dynamic
> >>>> configuration by ioctl interface in userspace
> >>>> * extends existing driver instead of creating completely new one
> >>> So, now we start moving syslogd into kernel land because userspace is
> >>> too broken to provide
> >>> decent logging?
> >>>
> >>> I can understand the systemd is using kmsg if no other logging service
> >>> is available
> >>> but I really don't think we should encourage other programs to do so.
> >>>
> >>> Why can't you just make sure that your target has a working
> >>> syslogd/rsyslogd/journald/whatever?
> >>> All can be done perfectly fine in userspace.
> >> * Message credibility: Lets imagine simple service which collects
> >> logs via unix sockets. There is no reliable way of identifying
> >> logging process. getsockopt() with SO_PEERCRED option would give pid
> >> form cred structure, but according to manual it may not be of actual
> >> logging process:
> >> "The returned credentials are those that were in effect at the
> >> time of the call to connect(2) or socketpair(2)."
> >> - select(7)
> >>
> >> * Early userspace tool: Helpful especially for embeded systems.
> >>
> >> * Reliability: Userspace service may be killed due to out of memory
> >> (OOM). This is kernel cyclic buffer, which size can be specified
> >> differently according to situation.
> > But then many services will fight for the space in the kernel ring
> > buffer.
>
> Yes. Please note however that problems you describe are also valid for
> /dev/kmsg today.

Mea culpa, I should not write when I do not have enough time to read
the patches. I somehow missed the patch that added more buffers, so
many opinions were misleading.

> > It will be harder to handle continuous lines.
>
> I don't see how it would be different from what we have today.

There is currently only one buffer for continuous lines. It is flushed
when you write from another CPU even before the current message is completed.
It needs to be flushed also when you write from another devkmsg.

I thought that there would be much bigger chance to mix parts of
messages in a single buffer but it is not true after all.


Best Regards,
Petr

2015-07-08 11:11:10

by Petr Mladek

[permalink] [raw]
Subject: Re: [RFC 3/8] kmsg: introduce additional kmsg devices support

On Fri 2015-07-03 12:49:50, Marcin Niesluchowski wrote:
> kmsg device provides operations on cyclic logging buffer used mainly
> by kernel but also in userspace by privileged processes.
>
> Additional kmsg devices keep the same log format but may be added
> dynamically with custom size.
>
> Signed-off-by: Marcin Niesluchowski <[email protected]>

> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -234,29 +234,37 @@ struct printk_log {
> u8 level:3; /* syslog level */
> };

Just in case, this is accepted. If you already touch the API, I would
suggest to rename struct printk_log to printk_msg. The current name
is pretty misleading.

> +struct log_buffer {
> +#ifdef CONFIG_PRINTK
> + struct list_head list; /* kmsg as head of the list */
> + char *buf; /* cyclic log buffer */
> + u32 len; /* buffer length */
> + wait_queue_head_t wait; /* wait queue for kmsg buffer */
> +#endif
> /*
> - * The logbuf_lock protects kmsg buffer, indices, counters. This can be taken
> - * within the scheduler's rq lock. It must be released before calling
> - * console_unlock() or anything else that might wake up a process.
> + * The lock protects kmsg buffer, indices, counters. This can be taken within
> + * the scheduler's rq lock. It must be released before calling console_unlock()
> + * or anything else that might wake up a process.
> */
> -static DEFINE_RAW_SPINLOCK(logbuf_lock);
> + raw_spinlock_t lock;
> + u64 first_seq; /* sequence number of the first record stored */
> + u32 first_idx; /* index of the first record stored */
> +/* sequence number of the next record to store */
> + u64 next_seq;
> +#ifdef CONFIG_PRINTK
> + u32 next_idx; /* index of the next record to store */
> + int mode; /* mode of device (kmsg_sys only) */
> + int minor; /* minor representing buffer device */
> +#endif
> +};


> @@ -1069,10 +1253,10 @@ const struct file_operations kmsg_fops = {
> */
> void log_buf_kexec_setup(void)
> {
> - VMCOREINFO_SYMBOL(log_buf);
> - VMCOREINFO_SYMBOL(log_buf_len);
> - VMCOREINFO_SYMBOL(log_first_idx);
> - VMCOREINFO_SYMBOL(log_next_idx);
> + VMCOREINFO_SYMBOL(log_buf.buf);
> + VMCOREINFO_SYMBOL(log_buf.len);
> + VMCOREINFO_SYMBOL(log_buf.first_idx);
> + VMCOREINFO_SYMBOL(log_buf.next_idx);

Have you tried to use this in crash or some other utility, please?
I guess that it will need to be exported a similar way like
struct printk_log, something like:

VMCOREINFO_SYMBOL(log_buf);
VMCOREINFO_STRUCT_SIZE(log_buffer);
VMCOREINFO_OFFSET(log_buffer, buf);
VMCOREINFO_OFFSET(log_buffer, len);
VMCOREINFO_OFFSET(log_buffer, first_idx);
VMCOREINFO_OFFSET(log_buffer, next_idx);

Best Regards,
Petr

Subject: Re: [RFC 0/8] Additional kmsg devices


Hi,

On Wednesday, July 08, 2015 10:36:32 AM Richard Weinberger wrote:
> Am 08.07.2015 um 10:30 schrieb Marcin Niesluchowski:
> > On 07/03/2015 05:19 PM, Richard Weinberger wrote:
> >> Am 03.07.2015 um 17:09 schrieb Marcin Niesluchowski:
> >>>> Why can't you just make sure that your target has a working
> >>>> syslogd/rsyslogd/journald/whatever?
> >>>> All can be done perfectly fine in userspace.
> >>> * Message credibility: Lets imagine simple service which collects logs via unix sockets. There is no reliable way of identifying logging process. getsockopt() with SO_PEERCRED
> >>> option would give pid form cred structure, but according to manual it may not be of actual logging process:
> >>> "The returned credentials are those that were in effect at the time of the call to connect(2) or socketpair(2)."
> >>> - select(7)
> >> This interface can be improved. Should be easy.
> >
> > What kind of improvement do you have in mind?
>
> I was wrong, we have the needed functionality already.
> See Andy's reply.

Please note that Andy has pointed out that the existing interface
(SCM_CREDENTIALS) is dangerous (=> should not be used).

Unfortunately his code for SCM_IDENTITY (which would replace
SCM_CREDENTIALS) has not materialized beyond initial 10% done
a year ago during SCP_PROCINFO discussion (it also has not been
explained enough to allow implementation by someone else).

> >>> * Early userspace tool: Helpful especially for embeded systems.
> >> This is what we do already. In early user space spawn your logger as early as possible.
> >> "embedded Linux is special" is not an excuse btw. ;)
> >
> > I would say "embedded Linux is real use case"instead of "special". What I meant that it does only require one ioctl and no additional resources are needed.
> >
> >>> * Reliability: Userspace service may be killed due to out of memory (OOM). This is kernel cyclic buffer, which size can be specified differently according to situation.
> >> This is what we have /proc/<pid>/oom_adj and /proc/<pid>/oom_score_adj for.
> >
> > You are right, but additional resources and complexity is required.
>
> A few "echo foo > /proc/xy/bar" commands are far less complexity than adding a pseudo syslogd to kernel land...

Please read actual patches. In roughly 600 new LOC they are doing
mainly two things:

* adding possibility to have more than one /dev/kmsg device & kernel
log buffer (~200 LOC)

* adding user interface for managing these additional devices/buffers
(~400 LOC)

I actually imagine that some time in the future we may also want to
have separate kernel log buffers for kernel usage itself..

> >>> * Possibility of using it with pstore: This code could be extended to log additional buffers to persistent storage same way main (kmsg) log buffer is.
> >> pstorefs and friends?
> >
> > pstore filesystem is used to access already stored kernel data (e.g. kmsg buffer). But does not provide mechanism of storing userspace memory.
>
> Which can be easily improved. Again, it will be less complex than your current approach.
>
> >>> * Use case of attaching file descriptor to stdout/stderr: Especially in early userspace.
> >> You can redirect these also in userspace.
> >
> > True for that, but as I said in my first argument there is no possibility of logging process identification in case of sockets.
> >
> >
> >>> * Performance: Those services mentioned by You are weeker solutions in that case. Especially systemd-journald is much too heavy soulution.
> >> Do you have numbers? I agree systemd-journald is heavy wight. But it is by far not the only logging daemon we have...
> >
> > I compared write operations on kmsg buffervia write/read operations on socketon SOCK_STREAM socket and sendmsg/recv on SOCK_DGRAM socket. Compared toSOCK_STREAM socket it was about
> > 39% slowerbut compared toSOCK_DGRAM socket it was about 326% faster.syslogfor example uses SOCK_DGRAM sockets.In all cases there were 2^20 (1048576) write/sendmsg operations of 2^8
> > (256) bytes.
>
> I still think the whole approach is wrong. Instead of giving up and going to kernel land, come up with a minimal userspace ringbuffer-syslogd.
> If the kernel lacks some support you need, add it. But don't move the whole thing int the kernel.

When it comes to possibility of logging things from user space to
kernel log buffer (through /dev/kmsg) then it has been added 3 years
ago in v3.5..

The changes being proposed are not doing what you're are trying to
imply - this is not kernel syslogd (like kdbus is a kernel dbus
implementation). They are merely enhancing existing /dev/kmsg
interface and may be useful also for kernel logging purposes some
time in the future..

Best regards,
--
Bartlomiej Zolnierkiewicz
Samsung R&D Institute Poland
Samsung Electronics