2013-08-07 07:37:35

by Rui Xiang

[permalink] [raw]
Subject: [PATCH v3 00/11] Add namespace support for syslog

This patchset introduces a system log namespace.

In container scenario, all logs are exported to the ring
buffer of host. Then logs belong to a contaier can't be
distinguished. Sometimes, Some of our guest are
administered by other sys-admin, that should not have
access to these infomations.

Syslog namespace is used to include a independent log
buf own to a container, some logs which need to be
displayed in container are stored to the buf, and others
are all exported to HOST.

The link of the 1st version is http://lwn.net/Articles/525728/.
In that version, syslog_namespace was added into nsproxy
and created through a new clone flag CLONE_SYSLOG when
cloning a process. There were some discussion in last
November about the 1st version. The 2nd version
(http://lwn.net/Articles/561271/) used these important
advice, and referred to Serge's patch
(http://lwn.net/Articles/525629/).

In this patchset, patch01-10 implement a mechanism for
syslog_ns. Iptables is an actual scene to use syslog ns.
So patch11 uses ns_printk interface to isolate logs.

Syslog namespace is tied to a user namespace. And we
must create a new user ns before create a new syslog ns,
because that will make users have full capabilities in
this new userns after cloning a new user ns. The syslog
namespace can be created through a new command(11) to
__NR_syslog syscall. That owe to a new syslog flag
SYSLOG_ACTION_NEW_NS.

In syslog_namespace, some necessary identifiers for
handling syslog buf are containerized. When one container
creates a new syslog ns, individual buf will be allocated
to store log ownned this container.

A new interface ns_printk is added to print the logs
which we want to see in the container. Through ns_printk,
we can get more logs related to a specific net ns, for
instance, iptables. Here we use it to report iptable
logs per contianer. Then default printk targeted at the
init_syslog_ns will continue to print out most kernel
log to host.

One task in a new syslog ns could affect only current
container through "dmesg", "dmesg -c" and /dev/kmsg
actions. The read/write interface such as /dev/kmsg,
/pro/kmsg and syslog syscall continue to be useful for
container users.

This patchset is based on linus' linux tree.

v1 --> v2:
-- Add syslog_ns to user namespace instead of nsproxy.
-- Create syslog_ns through a new command(11) to __NR_syslog
syscall instead of CLONE_SYSLOG.
-- Alter related interfaces and parameters.

v2 --> v3:
-- Add some changlogs to illustrate the purpose of syslog ns.
-- Add ns_printk_emit for namespace and make devkmsg_writev
per namespace.
-- Add ns_console_unlock for namespace.
-- Put user ns while freeing syslog_ns.
-- Use net instead of skb->dev->nd_net in sb_close.
-- Clean up.

Rui Xiang (11):
syslog_ns: add syslog_namespace and put/get_syslog_ns
syslog_ns: add syslog_ns into user_namespace
syslog_ns: add init syslog_ns for global syslog
syslog_ns: make syslog handling per namespace
syslog_ns: make permisiion check per user namespace
syslog_ns: use init syslog_ns for console action
syslog_ns: implement function for creating syslog ns
syslog_ns: implement ns_printk for specific syslog_ns
syslog_ns: implement ns_printk_emit for specific syslog_ns
syslog_ns: implement ns_console_unlock for specific syslog_ns
netfilter: use ns_printk in iptable context

fs/proc/kmsg.c | 17 +-
include/linux/console.h | 1 +
include/linux/printk.h | 11 +-
include/linux/syslog.h | 81 ++++-
include/linux/user_namespace.h | 2 +
include/net/netfilter/xt_log.h | 6 +-
kernel/printk.c | 784 +++++++++++++++++++++++++++--------------
kernel/sysctl.c | 3 +-
kernel/user.c | 3 +
kernel/user_namespace.c | 4 +
net/netfilter/xt_LOG.c | 4 +-
11 files changed, 636 insertions(+), 280 deletions(-)

--
1.8.2.2


2013-08-07 07:37:42

by Rui Xiang

[permalink] [raw]
Subject: [PATCH v3 06/11] syslog_ns: use init syslog_ns for console action

While flags SYSLOG_ACTION_CONSOLE_ON/OFF/LEVEL of
console actin are used in syslog syscall, the related
hanlding should be targeted at host by init_syslog_ns.

Signed-off-by: Rui Xiang <[email protected]>
---
kernel/printk.c | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/kernel/printk.c b/kernel/printk.c
index ca951e7..bdb7ed4 100644
--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -373,6 +373,11 @@ static int check_syslog_permissions(int type, bool from_file,
if (from_file && type != SYSLOG_ACTION_OPEN)
return 0;

+ if (type == SYSLOG_ACTION_CONSOLE_OFF
+ || type == SYSLOG_ACTION_CONSOLE_ON
+ || type == SYSLOG_ACTION_CONSOLE_LEVEL)
+ ns = &init_syslog_ns;
+
if (syslog_action_restricted(type, ns)) {
if (ns_capable(ns->owner, CAP_SYSLOG))
return 0;
--
1.8.2.2

2013-08-07 07:37:49

by Rui Xiang

[permalink] [raw]
Subject: [PATCH v3 08/11] syslog_ns: implement ns_printk for specific syslog_ns

Add a new interface named ns_printk, and assign an
patamater ns. Log which belong to a container can
be printed by ns_printk.

Signed-off-by: Rui Xiang <[email protected]>
---
include/linux/printk.h | 4 ++++
kernel/printk.c | 53 ++++++++++++++++++++++++++++++++++++++++++++++----
2 files changed, 53 insertions(+), 4 deletions(-)

diff --git a/include/linux/printk.h b/include/linux/printk.h
index 29e3f85..bf83ad9 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -6,6 +6,7 @@
#include <linux/kern_levels.h>
#include <linux/linkage.h>

+struct syslog_namespace;
extern const char linux_banner[];
extern const char linux_proc_banner[];

@@ -123,6 +124,9 @@ asmlinkage int printk_emit(int facility, int level,
asmlinkage __printf(1, 2) __cold
int printk(const char *fmt, ...);

+asmlinkage __printf(2, 3) __cold
+int ns_printk(struct syslog_namespace *ns, const char *fmt, ...);
+
/*
* Special printk facility for scheduler use only, _DO_NOT_USE_ !
*/
diff --git a/kernel/printk.c b/kernel/printk.c
index a812a88..38e8869 100644
--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -1548,9 +1548,10 @@ static size_t cont_print_text(char *text, size_t size)
return textlen;
}

-asmlinkage int vprintk_emit(int facility, int level,
- const char *dict, size_t dictlen,
- const char *fmt, va_list args)
+static int ns_vprintk_emit(int facility, int level,
+ const char *dict, size_t dictlen,
+ const char *fmt, va_list args,
+ struct syslog_namespace *ns)
{
static int recursion_bug;
static char textbuf[LOG_LINE_MAX];
@@ -1560,7 +1561,6 @@ asmlinkage int vprintk_emit(int facility, int level,
unsigned long flags;
int this_cpu;
int printed_len = 0;
- struct syslog_namespace *ns = &init_syslog_ns;

boot_delay_msec(level);
printk_delay();
@@ -1691,6 +1691,14 @@ out_restore_irqs:

return printed_len;
}
+
+asmlinkage int vprintk_emit(int facility, int level,
+ const char *dict, size_t dictlen,
+ const char *fmt, va_list args)
+{
+ return ns_vprintk_emit(facility, level, dict, dictlen, fmt, args,
+ &init_syslog_ns);
+}
EXPORT_SYMBOL(vprintk_emit);

asmlinkage int vprintk(const char *fmt, va_list args)
@@ -1756,6 +1764,43 @@ asmlinkage int printk(const char *fmt, ...)
}
EXPORT_SYMBOL(printk);

+/**
+ * ns_printk - print a kernel message in syslog_ns
+ * @ns: syslog namespace
+ * @fmt: format string
+ *
+ * This is ns_printk().
+ * It can be called from container context. We add a param
+ * ns to record current syslog namespace, because we need to
+ * print some log which are not generated by host, but contaner.
+ *
+ * See the vsnprintf() documentation for format string extensions over C99.
+ **/
+asmlinkage int ns_printk(struct syslog_namespace *ns,
+ const char *fmt, ...)
+{
+ va_list args;
+ int r;
+
+ if (!ns)
+ ns = current_user_ns()->syslog_ns;
+
+#ifdef CONFIG_KGDB_KDB
+ if (unlikely(kdb_trap_printk)) {
+ va_start(args, fmt);
+ r = vkdb_printf(fmt, args);
+ va_end(args);
+ return r;
+ }
+#endif
+ va_start(args, fmt);
+ r = ns_vprintk_emit(0, -1, NULL, 0, fmt, args, ns);
+ va_end(args);
+
+ return r;
+}
+EXPORT_SYMBOL(ns_printk);
+
#else /* CONFIG_PRINTK */

#define LOG_LINE_MAX 0
--
1.8.2.2

2013-08-07 07:38:04

by Rui Xiang

[permalink] [raw]
Subject: [PATCH v3 05/11] syslog_ns: make permisiion check per user namespace

Use ns_capable to check capability in user ns,
instead of capable function. The user ns is the
owner of current syslog ns.

Signed-off-by: Rui Xiang <[email protected]>
---
kernel/printk.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/printk.c b/kernel/printk.c
index e508ab2..ca951e7 100644
--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -374,13 +374,13 @@ static int check_syslog_permissions(int type, bool from_file,
return 0;

if (syslog_action_restricted(type, ns)) {
- if (capable(CAP_SYSLOG))
+ if (ns_capable(ns->owner, CAP_SYSLOG))
return 0;
/*
* For historical reasons, accept CAP_SYS_ADMIN too, with
* a warning.
*/
- if (capable(CAP_SYS_ADMIN)) {
+ if (ns_capable(ns->owner, CAP_SYS_ADMIN)) {
pr_warn_once("%s (%d): Attempt to access syslog with "
"CAP_SYS_ADMIN but no CAP_SYSLOG "
"(deprecated).\n",
--
1.8.2.2

2013-08-07 07:38:19

by Rui Xiang

[permalink] [raw]
Subject: [PATCH v3 04/11] syslog_ns: make syslog handling per namespace

This patch makes syslog buf and other fields per
namespace.

Here use ns->log_buf(log_buf_len, logbuf_lock,
log_first_seq, logbuf_lock, and so on) fields
instead of global ones to handle syslog.

Syslog interfaces such as /dev/kmsg, /proc/kmsg,
and syslog syscall are all containerized for
container users.

Signed-off-by: Rui Xiang <[email protected]>
---
fs/proc/kmsg.c | 17 +-
include/linux/printk.h | 1 -
include/linux/syslog.h | 3 +-
kernel/printk.c | 513 +++++++++++++++++++++++++------------------------
kernel/sysctl.c | 3 +-
5 files changed, 273 insertions(+), 264 deletions(-)

diff --git a/fs/proc/kmsg.c b/fs/proc/kmsg.c
index bdfabda..cb98431 100644
--- a/fs/proc/kmsg.c
+++ b/fs/proc/kmsg.c
@@ -13,6 +13,8 @@
#include <linux/proc_fs.h>
#include <linux/fs.h>
#include <linux/syslog.h>
+#include <linux/cred.h>
+#include <linux/user_namespace.h>

#include <asm/uaccess.h>
#include <asm/io.h>
@@ -21,12 +23,14 @@ extern wait_queue_head_t log_wait;

static int kmsg_open(struct inode * inode, struct file * file)
{
- return do_syslog(SYSLOG_ACTION_OPEN, NULL, 0, SYSLOG_FROM_PROC);
+ return do_syslog(SYSLOG_ACTION_OPEN, NULL, 0, SYSLOG_FROM_PROC,
+ file->f_cred->user_ns->syslog_ns);
}

static int kmsg_release(struct inode * inode, struct file * file)
{
- (void) do_syslog(SYSLOG_ACTION_CLOSE, NULL, 0, SYSLOG_FROM_PROC);
+ (void) do_syslog(SYSLOG_ACTION_CLOSE, NULL, 0, SYSLOG_FROM_PROC,
+ file->f_cred->user_ns->syslog_ns);
return 0;
}

@@ -34,15 +38,18 @@ static ssize_t kmsg_read(struct file *file, char __user *buf,
size_t count, loff_t *ppos)
{
if ((file->f_flags & O_NONBLOCK) &&
- !do_syslog(SYSLOG_ACTION_SIZE_UNREAD, NULL, 0, SYSLOG_FROM_PROC))
+ !do_syslog(SYSLOG_ACTION_SIZE_UNREAD, NULL, 0, SYSLOG_FROM_PROC,
+ file->f_cred->user_ns->syslog_ns))
return -EAGAIN;
- return do_syslog(SYSLOG_ACTION_READ, buf, count, SYSLOG_FROM_PROC);
+ return do_syslog(SYSLOG_ACTION_READ, buf, count, SYSLOG_FROM_PROC,
+ file->f_cred->user_ns->syslog_ns);
}

static unsigned int kmsg_poll(struct file *file, poll_table *wait)
{
poll_wait(file, &log_wait, wait);
- if (do_syslog(SYSLOG_ACTION_SIZE_UNREAD, NULL, 0, SYSLOG_FROM_PROC))
+ if (do_syslog(SYSLOG_ACTION_SIZE_UNREAD, NULL, 0, SYSLOG_FROM_PROC,
+ file->f_cred->user_ns->syslog_ns))
return POLLIN | POLLRDNORM;
return 0;
}
diff --git a/include/linux/printk.h b/include/linux/printk.h
index 22c7052..29e3f85 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -139,7 +139,6 @@ extern bool printk_timed_ratelimit(unsigned long *caller_jiffies,
unsigned int interval_msec);

extern int printk_delay_msec;
-extern int dmesg_restrict;
extern int kptr_restrict;

extern void wake_up_klogd(void);
diff --git a/include/linux/syslog.h b/include/linux/syslog.h
index 363bc56..fbf0cb6 100644
--- a/include/linux/syslog.h
+++ b/include/linux/syslog.h
@@ -120,7 +120,8 @@ static inline void put_syslog_ns(struct syslog_namespace *ns)
kref_put(&ns->kref, free_syslog_ns);
}

-int do_syslog(int type, char __user *buf, int count, bool from_file);
+int do_syslog(int type, char __user *buf, int count, bool from_file,
+ struct syslog_namespace *ns);

extern struct syslog_namespace init_syslog_ns;
#endif /* _LINUX_SYSLOG_H */
diff --git a/kernel/printk.c b/kernel/printk.c
index f288934..e508ab2 100644
--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -205,37 +205,10 @@ struct log {
u8 level:3; /* syslog level */
};

-/*
- * The logbuf_lock protects kmsg buffer, indices, counters. It is also
- * used in interesting ways to provide interlocking in console_unlock();
- */
-static DEFINE_RAW_SPINLOCK(logbuf_lock);
-
#ifdef CONFIG_PRINTK
DECLARE_WAIT_QUEUE_HEAD(log_wait);
-/* the next printk record to read by syslog(READ) or /proc/kmsg */
-static u64 syslog_seq;
-static u32 syslog_idx;
-static enum log_flags syslog_prev;
-static size_t syslog_partial;
-
-/* index and sequence number of the first record stored in the buffer */
-static u64 log_first_seq;
-static u32 log_first_idx;
-
-/* index and sequence number of the next record to store in the buffer */
-static u64 log_next_seq;
-static u32 log_next_idx;
-
-/* the next printk record to write to the console */
-static u64 console_seq;
-static u32 console_idx;
static enum log_flags console_prev;

-/* the next printk record to read after the last 'clear' command */
-static u64 clear_seq;
-static u32 clear_idx;
-
#define PREFIX_MAX 32
#define LOG_LINE_MAX 1024 - PREFIX_MAX

@@ -246,12 +219,8 @@ static u32 clear_idx;
#define LOG_ALIGN __alignof__(struct log)
#endif
#define __LOG_BUF_LEN (1 << CONFIG_LOG_BUF_SHIFT)
+/* this buf only for init_syslog_ns */
static char __log_buf[__LOG_BUF_LEN] __aligned(LOG_ALIGN);
-static char *log_buf = __log_buf;
-static u32 log_buf_len = __LOG_BUF_LEN;
-
-/* cpu currently holding logbuf_lock */
-static volatile unsigned int logbuf_cpu = UINT_MAX;

struct syslog_namespace init_syslog_ns = {
.kref = {
@@ -282,23 +251,23 @@ static char *log_dict(const struct log *msg)
}

/* get record by index; idx must point to valid msg */
-static struct log *log_from_idx(u32 idx)
+static struct log *log_from_idx(u32 idx, struct syslog_namespace *ns)
{
- struct log *msg = (struct log *)(log_buf + idx);
+ struct log *msg = (struct log *)(ns->log_buf + idx);

/*
* A length == 0 record is the end of buffer marker. Wrap around and
* read the message at the start of the buffer.
*/
if (!msg->len)
- return (struct log *)log_buf;
+ return (struct log *)ns->log_buf;
return msg;
}

/* get next record; idx must point to valid msg */
-static u32 log_next(u32 idx)
+static u32 log_next(u32 idx, struct syslog_namespace *ns)
{
- struct log *msg = (struct log *)(log_buf + idx);
+ struct log *msg = (struct log *)(ns->log_buf + idx);

/* length == 0 indicates the end of the buffer; wrap */
/*
@@ -307,7 +276,7 @@ static u32 log_next(u32 idx)
* return the one after that.
*/
if (!msg->len) {
- msg = (struct log *)log_buf;
+ msg = (struct log *)ns->log_buf;
return msg->len;
}
return idx + msg->len;
@@ -317,7 +286,8 @@ static u32 log_next(u32 idx)
static void log_store(int facility, int level,
enum log_flags flags, u64 ts_nsec,
const char *dict, u16 dict_len,
- const char *text, u16 text_len)
+ const char *text, u16 text_len,
+ struct syslog_namespace *ns)
{
struct log *msg;
u32 size, pad_len;
@@ -327,34 +297,40 @@ static void log_store(int facility, int level,
pad_len = (-size) & (LOG_ALIGN - 1);
size += pad_len;

- while (log_first_seq < log_next_seq) {
+ while (ns->log_first_seq < ns->log_next_seq) {
u32 free;

- if (log_next_idx > log_first_idx)
- free = max(log_buf_len - log_next_idx, log_first_idx);
+ if (ns->log_next_idx > ns->log_first_idx)
+ free = max(ns->log_buf_len -
+ ns->log_next_idx,
+ ns->log_first_idx);
else
- free = log_first_idx - log_next_idx;
+ free = ns->log_first_idx -
+ ns->log_next_idx;

if (free > size + sizeof(struct log))
break;

/* drop old messages until we have enough contiuous space */
- log_first_idx = log_next(log_first_idx);
- log_first_seq++;
+ ns->log_first_idx =
+ log_next(ns->log_first_idx, ns);
+ ns->log_first_seq++;
}

- if (log_next_idx + size + sizeof(struct log) >= log_buf_len) {
+ if (ns->log_next_idx + size + sizeof(struct log) >=
+ ns->log_buf_len) {
/*
* This message + an additional empty header does not fit
* at the end of the buffer. Add an empty header with len == 0
* to signify a wrap around.
*/
- memset(log_buf + log_next_idx, 0, sizeof(struct log));
- log_next_idx = 0;
+ memset(ns->log_buf + ns->log_next_idx,
+ 0, sizeof(struct log));
+ ns->log_next_idx = 0;
}

/* fill message */
- msg = (struct log *)(log_buf + log_next_idx);
+ msg = (struct log *)(ns->log_buf + ns->log_next_idx);
memcpy(log_text(msg), text, text_len);
msg->text_len = text_len;
memcpy(log_dict(msg), dict, dict_len);
@@ -370,19 +346,14 @@ static void log_store(int facility, int level,
msg->len = sizeof(struct log) + text_len + dict_len + pad_len;

/* insert message */
- log_next_idx += msg->len;
- log_next_seq++;
+ ns->log_next_idx += msg->len;
+ ns->log_next_seq++;
}

-#ifdef CONFIG_SECURITY_DMESG_RESTRICT
-int dmesg_restrict = 1;
-#else
-int dmesg_restrict;
-#endif
-
-static int syslog_action_restricted(int type)
+static int syslog_action_restricted(int type,
+ struct syslog_namespace *ns)
{
- if (dmesg_restrict)
+ if (ns->dmesg_restrict)
return 1;
/*
* Unless restricted, we allow "read all" and "get buffer size"
@@ -392,7 +363,8 @@ static int syslog_action_restricted(int type)
type != SYSLOG_ACTION_SIZE_BUFFER;
}

-static int check_syslog_permissions(int type, bool from_file)
+static int check_syslog_permissions(int type, bool from_file,
+ struct syslog_namespace *ns)
{
/*
* If this is from /proc/kmsg and we've already opened it, then we've
@@ -401,7 +373,7 @@ static int check_syslog_permissions(int type, bool from_file)
if (from_file && type != SYSLOG_ACTION_OPEN)
return 0;

- if (syslog_action_restricted(type)) {
+ if (syslog_action_restricted(type, ns)) {
if (capable(CAP_SYSLOG))
return 0;
/*
@@ -496,6 +468,8 @@ static ssize_t devkmsg_read(struct file *file, char __user *buf,
char cont = '-';
size_t len;
ssize_t ret;
+ struct syslog_namespace *ns =
+ file->f_cred->user_ns->syslog_ns;

if (!user)
return -EBADF;
@@ -503,32 +477,32 @@ static ssize_t devkmsg_read(struct file *file, char __user *buf,
ret = mutex_lock_interruptible(&user->lock);
if (ret)
return ret;
- raw_spin_lock_irq(&logbuf_lock);
- while (user->seq == log_next_seq) {
+ raw_spin_lock_irq(&ns->logbuf_lock);
+ while (user->seq == ns->log_next_seq) {
if (file->f_flags & O_NONBLOCK) {
ret = -EAGAIN;
- raw_spin_unlock_irq(&logbuf_lock);
+ raw_spin_unlock_irq(&ns->logbuf_lock);
goto out;
}

- raw_spin_unlock_irq(&logbuf_lock);
+ raw_spin_unlock_irq(&ns->logbuf_lock);
ret = wait_event_interruptible(log_wait,
- user->seq != log_next_seq);
+ user->seq != ns->log_next_seq);
if (ret)
goto out;
- raw_spin_lock_irq(&logbuf_lock);
+ raw_spin_lock_irq(&ns->logbuf_lock);
}

- if (user->seq < log_first_seq) {
+ if (user->seq < ns->log_first_seq) {
/* our last seen message is gone, return error and reset */
- user->idx = log_first_idx;
- user->seq = log_first_seq;
+ user->idx = ns->log_first_idx;
+ user->seq = ns->log_first_seq;
ret = -EPIPE;
- raw_spin_unlock_irq(&logbuf_lock);
+ raw_spin_unlock_irq(&ns->logbuf_lock);
goto out;
}

- msg = log_from_idx(user->idx);
+ msg = log_from_idx(user->idx, ns);
ts_usec = msg->ts_nsec;
do_div(ts_usec, 1000);

@@ -589,9 +563,9 @@ static ssize_t devkmsg_read(struct file *file, char __user *buf,
user->buf[len++] = '\n';
}

- user->idx = log_next(user->idx);
+ user->idx = log_next(user->idx, ns);
user->seq++;
- raw_spin_unlock_irq(&logbuf_lock);
+ raw_spin_unlock_irq(&ns->logbuf_lock);

if (len > count) {
ret = -EINVAL;
@@ -612,18 +586,19 @@ static loff_t devkmsg_llseek(struct file *file, loff_t offset, int whence)
{
struct devkmsg_user *user = file->private_data;
loff_t ret = 0;
+ struct syslog_namespace *ns = file->f_cred->user_ns->syslog_ns;

if (!user)
return -EBADF;
if (offset)
return -ESPIPE;

- raw_spin_lock_irq(&logbuf_lock);
+ raw_spin_lock_irq(&ns->logbuf_lock);
switch (whence) {
case SEEK_SET:
/* the first record */
- user->idx = log_first_idx;
- user->seq = log_first_seq;
+ user->idx = ns->log_first_idx;
+ user->seq = ns->log_first_seq;
break;
case SEEK_DATA:
/*
@@ -631,18 +606,18 @@ static loff_t devkmsg_llseek(struct file *file, loff_t offset, int whence)
* like issued by 'dmesg -c'. Reading /dev/kmsg itself
* changes no global state, and does not clear anything.
*/
- user->idx = clear_idx;
- user->seq = clear_seq;
+ user->idx = ns->clear_idx;
+ user->seq = ns->clear_seq;
break;
case SEEK_END:
/* after the last record */
- user->idx = log_next_idx;
- user->seq = log_next_seq;
+ user->idx = ns->log_next_idx;
+ user->seq = ns->log_next_seq;
break;
default:
ret = -EINVAL;
}
- raw_spin_unlock_irq(&logbuf_lock);
+ raw_spin_unlock_irq(&ns->logbuf_lock);
return ret;
}

@@ -650,21 +625,22 @@ static unsigned int devkmsg_poll(struct file *file, poll_table *wait)
{
struct devkmsg_user *user = file->private_data;
int ret = 0;
+ struct syslog_namespace *ns = file->f_cred->user_ns->syslog_ns;

if (!user)
return POLLERR|POLLNVAL;

poll_wait(file, &log_wait, wait);

- raw_spin_lock_irq(&logbuf_lock);
- if (user->seq < log_next_seq) {
+ raw_spin_lock_irq(&ns->logbuf_lock);
+ if (user->seq < ns->log_next_seq) {
/* return error when data has vanished underneath us */
- if (user->seq < log_first_seq)
+ if (user->seq < ns->log_first_seq)
ret = POLLIN|POLLRDNORM|POLLERR|POLLPRI;
else
ret = POLLIN|POLLRDNORM;
}
- raw_spin_unlock_irq(&logbuf_lock);
+ raw_spin_unlock_irq(&ns->logbuf_lock);

return ret;
}
@@ -673,13 +649,14 @@ static int devkmsg_open(struct inode *inode, struct file *file)
{
struct devkmsg_user *user;
int err;
+ struct syslog_namespace *ns = file->f_cred->user_ns->syslog_ns;

/* write-only does not need any file context */
if ((file->f_flags & O_ACCMODE) == O_WRONLY)
return 0;

err = check_syslog_permissions(SYSLOG_ACTION_READ_ALL,
- SYSLOG_FROM_READER);
+ SYSLOG_FROM_READER, ns);
if (err)
return err;

@@ -689,10 +666,10 @@ static int devkmsg_open(struct inode *inode, struct file *file)

mutex_init(&user->lock);

- raw_spin_lock_irq(&logbuf_lock);
- user->idx = log_first_idx;
- user->seq = log_first_seq;
- raw_spin_unlock_irq(&logbuf_lock);
+ raw_spin_lock_irq(&ns->logbuf_lock);
+ user->idx = ns->log_first_idx;
+ user->seq = ns->log_first_seq;
+ raw_spin_unlock_irq(&ns->logbuf_lock);

file->private_data = user;
return 0;
@@ -730,10 +707,11 @@ const struct file_operations kmsg_fops = {
*/
void log_buf_kexec_setup(void)
{
- VMCOREINFO_SYMBOL(log_buf);
- VMCOREINFO_SYMBOL(log_buf_len);
- VMCOREINFO_SYMBOL(log_first_idx);
- VMCOREINFO_SYMBOL(log_next_idx);
+ struct syslog_namespace *ns = &init_syslog_ns;
+ VMCOREINFO_SYMBOL(ns->log_buf);
+ VMCOREINFO_SYMBOL(ns->log_buf_len);
+ VMCOREINFO_SYMBOL(ns->log_first_idx);
+ VMCOREINFO_SYMBOL(ns->log_next_idx);
/*
* Export struct log size and field offsets. User space tools can
* parse it and detect any changes to structure down the line.
@@ -753,10 +731,11 @@ static unsigned long __initdata new_log_buf_len;
static int __init log_buf_len_setup(char *str)
{
unsigned size = memparse(str, &str);
+ struct syslog_namespace *ns = &init_syslog_ns;

if (size)
size = roundup_pow_of_two(size);
- if (size > log_buf_len)
+ if (size > ns->log_buf_len)
new_log_buf_len = size;

return 0;
@@ -768,6 +747,7 @@ void __init setup_log_buf(int early)
unsigned long flags;
char *new_log_buf;
int free;
+ struct syslog_namespace *ns = &init_syslog_ns;

if (!new_log_buf_len)
return;
@@ -789,15 +769,15 @@ void __init setup_log_buf(int early)
return;
}

- raw_spin_lock_irqsave(&logbuf_lock, flags);
- log_buf_len = new_log_buf_len;
- log_buf = new_log_buf;
+ raw_spin_lock_irqsave(&ns->logbuf_lock, flags);
+ memcpy(new_log_buf, ns->log_buf, __LOG_BUF_LEN);
+ ns->log_buf_len = new_log_buf_len;
+ ns->log_buf = new_log_buf;
new_log_buf_len = 0;
- free = __LOG_BUF_LEN - log_next_idx;
- memcpy(log_buf, __log_buf, __LOG_BUF_LEN);
- raw_spin_unlock_irqrestore(&logbuf_lock, flags);
+ free = __LOG_BUF_LEN - ns->log_next_idx;
+ raw_spin_unlock_irqrestore(&ns->logbuf_lock, flags);

- pr_info("log_buf_len: %d\n", log_buf_len);
+ pr_info("log_buf_len: %d\n", ns->log_buf_len);
pr_info("early log buf free: %d(%d%%)\n",
free, (free * 100) / __LOG_BUF_LEN);
}
@@ -977,7 +957,8 @@ static size_t msg_print_text(const struct log *msg, enum log_flags prev,
return len;
}

-static int syslog_print(char __user *buf, int size)
+static int syslog_print(char __user *buf, int size,
+ struct syslog_namespace *ns)
{
char *text;
struct log *msg;
@@ -991,37 +972,38 @@ static int syslog_print(char __user *buf, int size)
size_t n;
size_t skip;

- raw_spin_lock_irq(&logbuf_lock);
- if (syslog_seq < log_first_seq) {
+ raw_spin_lock_irq(&ns->logbuf_lock);
+ if (ns->syslog_seq < ns->log_first_seq) {
/* messages are gone, move to first one */
- syslog_seq = log_first_seq;
- syslog_idx = log_first_idx;
- syslog_prev = 0;
- syslog_partial = 0;
+ ns->syslog_seq = ns->log_first_seq;
+ ns->syslog_idx = ns->log_first_idx;
+ ns->syslog_prev = 0;
+ ns->syslog_partial = 0;
}
- if (syslog_seq == log_next_seq) {
- raw_spin_unlock_irq(&logbuf_lock);
+ if (ns->syslog_seq == ns->log_next_seq) {
+ raw_spin_unlock_irq(&ns->logbuf_lock);
break;
}

- skip = syslog_partial;
- msg = log_from_idx(syslog_idx);
- n = msg_print_text(msg, syslog_prev, true, text,
+ skip = ns->syslog_partial;
+ msg = log_from_idx(ns->syslog_idx, ns);
+ n = msg_print_text(msg, ns->syslog_prev, true, text,
LOG_LINE_MAX + PREFIX_MAX);
- if (n - syslog_partial <= size) {
+ if (n - ns->syslog_partial <= size) {
/* message fits into buffer, move forward */
- syslog_idx = log_next(syslog_idx);
- syslog_seq++;
- syslog_prev = msg->flags;
- n -= syslog_partial;
- syslog_partial = 0;
+ ns->syslog_idx =
+ log_next(ns->syslog_idx, ns);
+ ns->syslog_seq++;
+ ns->syslog_prev = msg->flags;
+ n -= ns->syslog_partial;
+ ns->syslog_partial = 0;
} else if (!len){
/* partial read(), remember position */
n = size;
- syslog_partial += n;
+ ns->syslog_partial += n;
} else
n = 0;
- raw_spin_unlock_irq(&logbuf_lock);
+ raw_spin_unlock_irq(&ns->logbuf_lock);

if (!n)
break;
@@ -1041,7 +1023,8 @@ static int syslog_print(char __user *buf, int size)
return len;
}

-static int syslog_print_all(char __user *buf, int size, bool clear)
+static int syslog_print_all(char __user *buf, int size, bool clear,
+ struct syslog_namespace *ns)
{
char *text;
int len = 0;
@@ -1050,55 +1033,55 @@ static int syslog_print_all(char __user *buf, int size, bool clear)
if (!text)
return -ENOMEM;

- raw_spin_lock_irq(&logbuf_lock);
+ raw_spin_lock_irq(&ns->logbuf_lock);
if (buf) {
u64 next_seq;
u64 seq;
u32 idx;
enum log_flags prev;

- if (clear_seq < log_first_seq) {
+ if (ns->clear_seq < ns->log_first_seq) {
/* messages are gone, move to first available one */
- clear_seq = log_first_seq;
- clear_idx = log_first_idx;
+ ns->clear_seq = ns->log_first_seq;
+ ns->clear_idx = ns->log_first_idx;
}

/*
* Find first record that fits, including all following records,
* into the user-provided buffer for this dump.
*/
- seq = clear_seq;
- idx = clear_idx;
+ seq = ns->clear_seq;
+ idx = ns->clear_idx;
prev = 0;
- while (seq < log_next_seq) {
- struct log *msg = log_from_idx(idx);
+ while (seq < ns->log_next_seq) {
+ struct log *msg = log_from_idx(idx, ns);

len += msg_print_text(msg, prev, true, NULL, 0);
prev = msg->flags;
- idx = log_next(idx);
+ idx = log_next(idx, ns);
seq++;
}

/* move first record forward until length fits into the buffer */
- seq = clear_seq;
- idx = clear_idx;
+ seq = ns->clear_seq;
+ idx = ns->clear_idx;
prev = 0;
- while (len > size && seq < log_next_seq) {
- struct log *msg = log_from_idx(idx);
+ while (len > size && seq < ns->log_next_seq) {
+ struct log *msg = log_from_idx(idx, ns);

len -= msg_print_text(msg, prev, true, NULL, 0);
prev = msg->flags;
- idx = log_next(idx);
+ idx = log_next(idx, ns);
seq++;
}

/* last message fitting into this dump */
- next_seq = log_next_seq;
+ next_seq = ns->log_next_seq;

len = 0;
prev = 0;
while (len >= 0 && seq < next_seq) {
- struct log *msg = log_from_idx(idx);
+ struct log *msg = log_from_idx(idx, ns);
int textlen;

textlen = msg_print_text(msg, prev, true, text,
@@ -1107,43 +1090,44 @@ static int syslog_print_all(char __user *buf, int size, bool clear)
len = textlen;
break;
}
- idx = log_next(idx);
+ idx = log_next(idx, ns);
seq++;
prev = msg->flags;

- raw_spin_unlock_irq(&logbuf_lock);
+ raw_spin_unlock_irq(&ns->logbuf_lock);
if (copy_to_user(buf + len, text, textlen))
len = -EFAULT;
else
len += textlen;
- raw_spin_lock_irq(&logbuf_lock);
+ raw_spin_lock_irq(&ns->logbuf_lock);

- if (seq < log_first_seq) {
+ if (seq < ns->log_first_seq) {
/* messages are gone, move to next one */
- seq = log_first_seq;
- idx = log_first_idx;
+ seq = ns->log_first_seq;
+ idx = ns->log_first_idx;
prev = 0;
}
}
}

if (clear) {
- clear_seq = log_next_seq;
- clear_idx = log_next_idx;
+ ns->clear_seq = ns->log_next_seq;
+ ns->clear_idx = ns->log_next_idx;
}
- raw_spin_unlock_irq(&logbuf_lock);
+ raw_spin_unlock_irq(&ns->logbuf_lock);

kfree(text);
return len;
}

-int do_syslog(int type, char __user *buf, int len, bool from_file)
+int do_syslog(int type, char __user *buf, int len, bool from_file,
+ struct syslog_namespace *ns)
{
bool clear = false;
static int saved_console_loglevel = -1;
int error;

- error = check_syslog_permissions(type, from_file);
+ error = check_syslog_permissions(type, from_file, ns);
if (error)
goto out;

@@ -1168,10 +1152,10 @@ int do_syslog(int type, char __user *buf, int len, bool from_file)
goto out;
}
error = wait_event_interruptible(log_wait,
- syslog_seq != log_next_seq);
+ ns->syslog_seq != ns->log_next_seq);
if (error)
goto out;
- error = syslog_print(buf, len);
+ error = syslog_print(buf, len, ns);
break;
/* Read/clear last kernel messages */
case SYSLOG_ACTION_READ_CLEAR:
@@ -1189,11 +1173,11 @@ int do_syslog(int type, char __user *buf, int len, bool from_file)
error = -EFAULT;
goto out;
}
- error = syslog_print_all(buf, len, clear);
+ error = syslog_print_all(buf, len, clear, ns);
break;
/* Clear ring buffer */
case SYSLOG_ACTION_CLEAR:
- syslog_print_all(NULL, 0, true);
+ syslog_print_all(NULL, 0, true, ns);
break;
/* Disable logging to console */
case SYSLOG_ACTION_CONSOLE_OFF:
@@ -1222,13 +1206,13 @@ int do_syslog(int type, char __user *buf, int len, bool from_file)
break;
/* Number of chars in the log buffer */
case SYSLOG_ACTION_SIZE_UNREAD:
- raw_spin_lock_irq(&logbuf_lock);
- if (syslog_seq < log_first_seq) {
+ raw_spin_lock_irq(&ns->logbuf_lock);
+ if (ns->syslog_seq < ns->log_first_seq) {
/* messages are gone, move to first one */
- syslog_seq = log_first_seq;
- syslog_idx = log_first_idx;
- syslog_prev = 0;
- syslog_partial = 0;
+ ns->syslog_seq = ns->log_first_seq;
+ ns->syslog_idx = ns->log_first_idx;
+ ns->syslog_prev = 0;
+ ns->syslog_partial = 0;
}
if (from_file) {
/*
@@ -1236,28 +1220,28 @@ int do_syslog(int type, char __user *buf, int len, bool from_file)
* for pending data, not the size; return the count of
* records, not the length.
*/
- error = log_next_idx - syslog_idx;
+ error = ns->log_next_idx - ns->syslog_idx;
} else {
- u64 seq = syslog_seq;
- u32 idx = syslog_idx;
- enum log_flags prev = syslog_prev;
+ u64 seq = ns->syslog_seq;
+ u32 idx = ns->syslog_idx;
+ enum log_flags prev = ns->syslog_prev;

error = 0;
- while (seq < log_next_seq) {
- struct log *msg = log_from_idx(idx);
+ while (seq < ns->log_next_seq) {
+ struct log *msg = log_from_idx(idx, ns);

error += msg_print_text(msg, prev, true, NULL, 0);
- idx = log_next(idx);
+ idx = log_next(idx, ns);
seq++;
prev = msg->flags;
}
- error -= syslog_partial;
+ error -= ns->syslog_partial;
}
- raw_spin_unlock_irq(&logbuf_lock);
+ raw_spin_unlock_irq(&ns->logbuf_lock);
break;
/* Size of the log buffer */
case SYSLOG_ACTION_SIZE_BUFFER:
- error = log_buf_len;
+ error = ns->log_buf_len;
break;
default:
error = -EINVAL;
@@ -1269,7 +1253,8 @@ out:

SYSCALL_DEFINE3(syslog, int, type, char __user *, buf, int, len)
{
- return do_syslog(type, buf, len, SYSLOG_FROM_READER);
+ return do_syslog(type, buf, len, SYSLOG_FROM_READER,
+ current_user_ns()->syslog_ns);
}

/*
@@ -1307,7 +1292,7 @@ static void call_console_drivers(int level, const char *text, size_t len)
* every 10 seconds, to leave time for slow consoles to print a
* full oops.
*/
-static void zap_locks(void)
+static void zap_locks(struct syslog_namespace *ns)
{
static unsigned long oops_timestamp;

@@ -1319,7 +1304,7 @@ static void zap_locks(void)

debug_locks_off();
/* If a crash is occurring, make sure we can't deadlock */
- raw_spin_lock_init(&logbuf_lock);
+ raw_spin_lock_init(&ns->logbuf_lock);
/* And make sure that we print immediately */
sema_init(&console_sem, 1);
}
@@ -1359,8 +1344,9 @@ static inline int can_use_console(unsigned int cpu)
* interrupts disabled. It should return with 'lockbuf_lock'
* released but interrupts still disabled.
*/
-static int console_trylock_for_printk(unsigned int cpu)
- __releases(&logbuf_lock)
+static int console_trylock_for_printk(unsigned int cpu,
+ struct syslog_namespace *ns)
+ __releases(&ns->logbuf_lock)
{
int retval = 0, wake = 0;

@@ -1379,8 +1365,8 @@ static int console_trylock_for_printk(unsigned int cpu)
retval = 0;
}
}
- logbuf_cpu = UINT_MAX;
- raw_spin_unlock(&logbuf_lock);
+ ns->logbuf_cpu = UINT_MAX;
+ raw_spin_unlock(&ns->logbuf_lock);
if (wake)
up(&console_sem);
return retval;
@@ -1418,7 +1404,7 @@ static struct cont {
bool flushed:1; /* buffer sealed and committed */
} cont;

-static void cont_flush(enum log_flags flags)
+static void cont_flush(enum log_flags flags, struct syslog_namespace *ns)
{
if (cont.flushed)
return;
@@ -1432,7 +1418,7 @@ static void cont_flush(enum log_flags flags)
* line. LOG_NOCONS suppresses a duplicated output.
*/
log_store(cont.facility, cont.level, flags | LOG_NOCONS,
- cont.ts_nsec, NULL, 0, cont.buf, cont.len);
+ cont.ts_nsec, NULL, 0, cont.buf, cont.len, ns);
cont.flags = flags;
cont.flushed = true;
} else {
@@ -1441,19 +1427,20 @@ static void cont_flush(enum log_flags flags)
* just submit it to the store and free the buffer.
*/
log_store(cont.facility, cont.level, flags, 0,
- NULL, 0, cont.buf, cont.len);
+ NULL, 0, cont.buf, cont.len, ns);
cont.len = 0;
}
}

-static bool cont_add(int facility, int level, const char *text, size_t len)
+static bool cont_add(int facility, int level, const char *text, size_t len,
+ struct syslog_namespace *ns)
{
if (cont.len && cont.flushed)
return false;

if (cont.len + len > sizeof(cont.buf)) {
/* the line gets too long, split it up in separate records */
- cont_flush(LOG_CONT);
+ cont_flush(LOG_CONT, ns);
return false;
}

@@ -1471,7 +1458,7 @@ static bool cont_add(int facility, int level, const char *text, size_t len)
cont.len += len;

if (cont.len > (sizeof(cont.buf) * 80) / 100)
- cont_flush(LOG_CONT);
+ cont_flush(LOG_CONT, ns);

return true;
}
@@ -1516,6 +1503,7 @@ asmlinkage int vprintk_emit(int facility, int level,
unsigned long flags;
int this_cpu;
int printed_len = 0;
+ struct syslog_namespace *ns = &init_syslog_ns;

boot_delay_msec(level);
printk_delay();
@@ -1527,7 +1515,7 @@ asmlinkage int vprintk_emit(int facility, int level,
/*
* Ouch, printk recursed into itself!
*/
- if (unlikely(logbuf_cpu == this_cpu)) {
+ if (unlikely(ns->logbuf_cpu == this_cpu)) {
/*
* If a crash is occurring during printk() on this CPU,
* then try to get the crash message out but make sure
@@ -1539,12 +1527,12 @@ asmlinkage int vprintk_emit(int facility, int level,
recursion_bug = 1;
goto out_restore_irqs;
}
- zap_locks();
+ zap_locks(ns);
}

lockdep_off();
- raw_spin_lock(&logbuf_lock);
- logbuf_cpu = this_cpu;
+ raw_spin_lock(&ns->logbuf_lock);
+ ns->logbuf_cpu = this_cpu;

if (recursion_bug) {
static const char recursion_msg[] =
@@ -1554,7 +1542,7 @@ asmlinkage int vprintk_emit(int facility, int level,
printed_len += strlen(recursion_msg);
/* emit KERN_CRIT message */
log_store(0, 2, LOG_PREFIX|LOG_NEWLINE, 0,
- NULL, 0, recursion_msg, printed_len);
+ NULL, 0, recursion_msg, printed_len, ns);
}

/*
@@ -1601,12 +1589,12 @@ asmlinkage int vprintk_emit(int facility, int level,
* or another task also prints continuation lines.
*/
if (cont.len && (lflags & LOG_PREFIX || cont.owner != current))
- cont_flush(LOG_NEWLINE);
+ cont_flush(LOG_NEWLINE, ns);

/* buffer line if possible, otherwise store it right away */
- if (!cont_add(facility, level, text, text_len))
+ if (!cont_add(facility, level, text, text_len, ns))
log_store(facility, level, lflags | LOG_CONT, 0,
- dict, dictlen, text, text_len);
+ dict, dictlen, text, text_len, ns);
} else {
bool stored = false;

@@ -1618,13 +1606,14 @@ asmlinkage int vprintk_emit(int facility, int level,
*/
if (cont.len && cont.owner == current) {
if (!(lflags & LOG_PREFIX))
- stored = cont_add(facility, level, text, text_len);
- cont_flush(LOG_NEWLINE);
+ stored = cont_add(facility, level, text,
+ text_len, ns);
+ cont_flush(LOG_NEWLINE, ns);
}

if (!stored)
log_store(facility, level, lflags, 0,
- dict, dictlen, text, text_len);
+ dict, dictlen, text, text_len, ns);
}
printed_len += text_len;

@@ -1636,7 +1625,7 @@ asmlinkage int vprintk_emit(int facility, int level,
* The console_trylock_for_printk() function will release 'logbuf_lock'
* regardless of whether it actually gets the console semaphore or not.
*/
- if (console_trylock_for_printk(this_cpu))
+ if (console_trylock_for_printk(this_cpu, ns))
console_unlock();

lockdep_on();
@@ -1995,12 +1984,13 @@ int is_console_locked(void)
return console_locked;
}

-static void console_cont_flush(char *text, size_t size)
+static void console_cont_flush(char *text, size_t size,
+ struct syslog_namespace *ns)
{
unsigned long flags;
size_t len;

- raw_spin_lock_irqsave(&logbuf_lock, flags);
+ raw_spin_lock_irqsave(&ns->logbuf_lock, flags);

if (!cont.len)
goto out;
@@ -2010,18 +2000,18 @@ static void console_cont_flush(char *text, size_t size)
* busy. The earlier ones need to be printed before this one, we
* did not flush any fragment so far, so just let it queue up.
*/
- if (console_seq < log_next_seq && !cont.cons)
+ if (ns->console_seq < ns->log_next_seq && !cont.cons)
goto out;

len = cont_print_text(text, size);
- raw_spin_unlock(&logbuf_lock);
+ raw_spin_unlock(&ns->logbuf_lock);
stop_critical_timings();
call_console_drivers(cont.level, text, len);
start_critical_timings();
local_irq_restore(flags);
return;
out:
- raw_spin_unlock_irqrestore(&logbuf_lock, flags);
+ raw_spin_unlock_irqrestore(&ns->logbuf_lock, flags);
}

/**
@@ -2045,6 +2035,7 @@ void console_unlock(void)
unsigned long flags;
bool wake_klogd = false;
bool retry;
+ struct syslog_namespace *ns = &init_syslog_ns;

if (console_suspended) {
up(&console_sem);
@@ -2054,37 +2045,38 @@ void console_unlock(void)
console_may_schedule = 0;

/* flush buffered message fragment immediately to console */
- console_cont_flush(text, sizeof(text));
+ console_cont_flush(text, sizeof(text), ns);
again:
for (;;) {
struct log *msg;
size_t len;
int level;

- raw_spin_lock_irqsave(&logbuf_lock, flags);
- if (seen_seq != log_next_seq) {
+ raw_spin_lock_irqsave(&ns->logbuf_lock, flags);
+ if (seen_seq != ns->log_next_seq) {
wake_klogd = true;
- seen_seq = log_next_seq;
+ seen_seq = ns->log_next_seq;
}

- if (console_seq < log_first_seq) {
+ if (ns->console_seq < ns->log_first_seq) {
/* messages are gone, move to first one */
- console_seq = log_first_seq;
- console_idx = log_first_idx;
+ ns->console_seq = ns->log_first_seq;
+ ns->console_idx = ns->log_first_idx;
console_prev = 0;
}
skip:
- if (console_seq == log_next_seq)
+ if (ns->console_seq == ns->log_next_seq)
break;

- msg = log_from_idx(console_idx);
+ msg = log_from_idx(ns->console_idx, ns);
if (msg->flags & LOG_NOCONS) {
/*
* Skip record we have buffered and already printed
* directly to the console when we received it.
*/
- console_idx = log_next(console_idx);
- console_seq++;
+ ns->console_idx =
+ log_next(ns->console_idx, ns);
+ ns->console_seq++;
/*
* We will get here again when we register a new
* CON_PRINTBUFFER console. Clear the flag so we
@@ -2098,10 +2090,11 @@ skip:
level = msg->level;
len = msg_print_text(msg, console_prev, false,
text, sizeof(text));
- console_idx = log_next(console_idx);
- console_seq++;
+ ns->console_idx =
+ log_next(ns->console_idx, ns);
+ ns->console_seq++;
console_prev = msg->flags;
- raw_spin_unlock(&logbuf_lock);
+ raw_spin_unlock(&ns->logbuf_lock);

stop_critical_timings(); /* don't trace print latency */
call_console_drivers(level, text, len);
@@ -2115,7 +2108,7 @@ skip:
if (unlikely(exclusive_console))
exclusive_console = NULL;

- raw_spin_unlock(&logbuf_lock);
+ raw_spin_unlock(&ns->logbuf_lock);

up(&console_sem);

@@ -2125,9 +2118,9 @@ skip:
* there's a new owner and the console_unlock() from them will do the
* flush, no worries.
*/
- raw_spin_lock(&logbuf_lock);
- retry = console_seq != log_next_seq;
- raw_spin_unlock_irqrestore(&logbuf_lock, flags);
+ raw_spin_lock(&ns->logbuf_lock);
+ retry = ns->console_seq != ns->log_next_seq;
+ raw_spin_unlock_irqrestore(&ns->logbuf_lock, flags);

if (retry && console_trylock())
goto again;
@@ -2252,6 +2245,7 @@ void register_console(struct console *newcon)
int i;
unsigned long flags;
struct console *bcon = NULL;
+ struct syslog_namespace *ns = &init_syslog_ns;

/*
* before we register a new CON_BOOT console, make sure we don't
@@ -2361,11 +2355,11 @@ void register_console(struct console *newcon)
* console_unlock(); will print out the buffered messages
* for us.
*/
- raw_spin_lock_irqsave(&logbuf_lock, flags);
- console_seq = syslog_seq;
- console_idx = syslog_idx;
- console_prev = syslog_prev;
- raw_spin_unlock_irqrestore(&logbuf_lock, flags);
+ raw_spin_lock_irqsave(&ns->logbuf_lock, flags);
+ ns->console_seq = ns->syslog_seq;
+ ns->console_idx = ns->syslog_idx;
+ console_prev = ns->syslog_prev;
+ raw_spin_unlock_irqrestore(&ns->logbuf_lock, flags);
/*
* We're about to replay the log buffer. Only do this to the
* just-registered console to avoid excessive message spam to
@@ -2627,6 +2621,7 @@ void kmsg_dump(enum kmsg_dump_reason reason)
{
struct kmsg_dumper *dumper;
unsigned long flags;
+ struct syslog_namespace *ns = &init_syslog_ns;

if ((reason > KMSG_DUMP_OOPS) && !always_kmsg_dump)
return;
@@ -2639,12 +2634,12 @@ void kmsg_dump(enum kmsg_dump_reason reason)
/* initialize iterator with data about the stored records */
dumper->active = true;

- raw_spin_lock_irqsave(&logbuf_lock, flags);
- dumper->cur_seq = clear_seq;
- dumper->cur_idx = clear_idx;
- dumper->next_seq = log_next_seq;
- dumper->next_idx = log_next_idx;
- raw_spin_unlock_irqrestore(&logbuf_lock, flags);
+ raw_spin_lock_irqsave(&ns->logbuf_lock, flags);
+ dumper->cur_seq = ns->clear_seq;
+ dumper->cur_idx = ns->clear_idx;
+ dumper->next_seq = ns->log_next_seq;
+ dumper->next_idx = ns->log_next_idx;
+ raw_spin_unlock_irqrestore(&ns->logbuf_lock, flags);

/* invoke dumper which will iterate over records */
dumper->dump(dumper, reason);
@@ -2680,24 +2675,25 @@ bool kmsg_dump_get_line_nolock(struct kmsg_dumper *dumper, bool syslog,
struct log *msg;
size_t l = 0;
bool ret = false;
+ struct syslog_namespace *ns = &init_syslog_ns;

if (!dumper->active)
goto out;

- if (dumper->cur_seq < log_first_seq) {
+ if (dumper->cur_seq < ns->log_first_seq) {
/* messages are gone, move to first available one */
- dumper->cur_seq = log_first_seq;
- dumper->cur_idx = log_first_idx;
+ dumper->cur_seq = ns->log_first_seq;
+ dumper->cur_idx = ns->log_first_idx;
}

/* last entry */
- if (dumper->cur_seq >= log_next_seq)
+ if (dumper->cur_seq >= ns->log_next_seq)
goto out;

- msg = log_from_idx(dumper->cur_idx);
+ msg = log_from_idx(dumper->cur_idx, ns);
l = msg_print_text(msg, 0, syslog, line, size);

- dumper->cur_idx = log_next(dumper->cur_idx);
+ dumper->cur_idx = log_next(dumper->cur_idx, ns);
dumper->cur_seq++;
ret = true;
out:
@@ -2728,10 +2724,11 @@ bool kmsg_dump_get_line(struct kmsg_dumper *dumper, bool syslog,
{
unsigned long flags;
bool ret;
+ struct syslog_namespace *ns = &init_syslog_ns;

- raw_spin_lock_irqsave(&logbuf_lock, flags);
+ raw_spin_lock_irqsave(&ns->logbuf_lock, flags);
ret = kmsg_dump_get_line_nolock(dumper, syslog, line, size, len);
- raw_spin_unlock_irqrestore(&logbuf_lock, flags);
+ raw_spin_unlock_irqrestore(&ns->logbuf_lock, flags);

return ret;
}
@@ -2767,20 +2764,21 @@ bool kmsg_dump_get_buffer(struct kmsg_dumper *dumper, bool syslog,
enum log_flags prev;
size_t l = 0;
bool ret = false;
+ struct syslog_namespace *ns = &init_syslog_ns;

if (!dumper->active)
goto out;

- raw_spin_lock_irqsave(&logbuf_lock, flags);
- if (dumper->cur_seq < log_first_seq) {
+ raw_spin_lock_irqsave(&ns->logbuf_lock, flags);
+ if (dumper->cur_seq < ns->log_first_seq) {
/* messages are gone, move to first available one */
- dumper->cur_seq = log_first_seq;
- dumper->cur_idx = log_first_idx;
+ dumper->cur_seq = ns->log_first_seq;
+ dumper->cur_idx = ns->log_first_idx;
}

/* last entry */
if (dumper->cur_seq >= dumper->next_seq) {
- raw_spin_unlock_irqrestore(&logbuf_lock, flags);
+ raw_spin_unlock_irqrestore(&ns->logbuf_lock, flags);
goto out;
}

@@ -2789,10 +2787,10 @@ bool kmsg_dump_get_buffer(struct kmsg_dumper *dumper, bool syslog,
idx = dumper->cur_idx;
prev = 0;
while (seq < dumper->next_seq) {
- struct log *msg = log_from_idx(idx);
+ struct log *msg = log_from_idx(idx, ns);

l += msg_print_text(msg, prev, true, NULL, 0);
- idx = log_next(idx);
+ idx = log_next(idx, ns);
seq++;
prev = msg->flags;
}
@@ -2802,10 +2800,10 @@ bool kmsg_dump_get_buffer(struct kmsg_dumper *dumper, bool syslog,
idx = dumper->cur_idx;
prev = 0;
while (l > size && seq < dumper->next_seq) {
- struct log *msg = log_from_idx(idx);
+ struct log *msg = log_from_idx(idx, ns);

l -= msg_print_text(msg, prev, true, NULL, 0);
- idx = log_next(idx);
+ idx = log_next(idx, ns);
seq++;
prev = msg->flags;
}
@@ -2817,10 +2815,10 @@ bool kmsg_dump_get_buffer(struct kmsg_dumper *dumper, bool syslog,
l = 0;
prev = 0;
while (seq < dumper->next_seq) {
- struct log *msg = log_from_idx(idx);
+ struct log *msg = log_from_idx(idx, ns);

l += msg_print_text(msg, prev, syslog, buf + l, size - l);
- idx = log_next(idx);
+ idx = log_next(idx, ns);
seq++;
prev = msg->flags;
}
@@ -2828,7 +2826,7 @@ bool kmsg_dump_get_buffer(struct kmsg_dumper *dumper, bool syslog,
dumper->next_seq = next_seq;
dumper->next_idx = next_idx;
ret = true;
- raw_spin_unlock_irqrestore(&logbuf_lock, flags);
+ raw_spin_unlock_irqrestore(&ns->logbuf_lock, flags);
out:
if (len)
*len = l;
@@ -2848,10 +2846,12 @@ EXPORT_SYMBOL_GPL(kmsg_dump_get_buffer);
*/
void kmsg_dump_rewind_nolock(struct kmsg_dumper *dumper)
{
- dumper->cur_seq = clear_seq;
- dumper->cur_idx = clear_idx;
- dumper->next_seq = log_next_seq;
- dumper->next_idx = log_next_idx;
+ struct syslog_namespace *ns = &init_syslog_ns;
+
+ dumper->cur_seq = ns->clear_seq;
+ dumper->cur_idx = ns->clear_idx;
+ dumper->next_seq = ns->log_next_seq;
+ dumper->next_idx = ns->log_next_idx;
}

/**
@@ -2865,10 +2865,11 @@ void kmsg_dump_rewind_nolock(struct kmsg_dumper *dumper)
void kmsg_dump_rewind(struct kmsg_dumper *dumper)
{
unsigned long flags;
+ struct syslog_namespace *ns = &init_syslog_ns;

- raw_spin_lock_irqsave(&logbuf_lock, flags);
+ raw_spin_lock_irqsave(&ns->logbuf_lock, flags);
kmsg_dump_rewind_nolock(dumper);
- raw_spin_unlock_irqrestore(&logbuf_lock, flags);
+ raw_spin_unlock_irqrestore(&ns->logbuf_lock, flags);
}
EXPORT_SYMBOL_GPL(kmsg_dump_rewind);

diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index ac09d98..0954b09 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -62,6 +62,7 @@
#include <linux/capability.h>
#include <linux/binfmts.h>
#include <linux/sched/sysctl.h>
+#include <linux/syslog.h>

#include <asm/uaccess.h>
#include <asm/processor.h>
@@ -773,7 +774,7 @@ static struct ctl_table kern_table[] = {
},
{
.procname = "dmesg_restrict",
- .data = &dmesg_restrict,
+ .data = &init_syslog_ns.dmesg_restrict,
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = proc_dointvec_minmax_sysadmin,
--
1.8.2.2

2013-08-07 07:38:28

by Rui Xiang

[permalink] [raw]
Subject: [PATCH v3 10/11] syslog_ns: implement ns_console_unlock for specific syslog_ns

While monitoring embedded devices that provide access
to the console over a serial port, in order to obtain
kernel logs from containers, it is necessary to include
consoles in the syslog_ns.

This patch adds a new interface named ns_console_unlock,
and use syslog ns as a parameter to dispaly logs from
current syslog namespace on cosoles, not just init_
syslog_ns.

Signed-off-by: Rui Xiang <[email protected]>
---
include/linux/console.h | 1 +
kernel/printk.c | 116 +++++++++++++++++++++++++++++++++++++++++++++++-
2 files changed, 116 insertions(+), 1 deletion(-)

diff --git a/include/linux/console.h b/include/linux/console.h
index 7571a16..4c02fe6 100644
--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -148,6 +148,7 @@ extern struct console *console_drivers;
extern void console_lock(void);
extern int console_trylock(void);
extern void console_unlock(void);
+extern void ns_console_unlock(struct syslog_namespace *ns);
extern void console_conditional_schedule(void);
extern void console_unblank(void);
extern struct tty_driver *console_device(int *);
diff --git a/kernel/printk.c b/kernel/printk.c
index b60c1d4..39bb9db 100644
--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -1684,7 +1684,7 @@ static int ns_vprintk_emit(int facility, int level,
* regardless of whether it actually gets the console semaphore or not.
*/
if (console_trylock_for_printk(this_cpu, ns))
- console_unlock();
+ ns_console_unlock(ns);

lockdep_on();
out_restore_irqs:
@@ -2135,6 +2135,120 @@ out:
}

/**
+ * ns_console_unlock - unlock the console system for syslog_namespace
+ *
+ * Releases the console_lock which the caller holds on the console system
+ * and the console driver list.
+ *
+ * While the console_lock was held, console output may have been buffered
+ * by printk(). If this is the case, syslog_console_unlock(); emits
+ * the output prior to releasing the lock.
+ *
+ * If there is output waiting, we wake /dev/kmsg and syslog() users.
+ *
+ * syslog_console_unlock(); may be called from any context.
+ */
+void ns_console_unlock(struct syslog_namespace *ns)
+{
+ static char text[LOG_LINE_MAX + PREFIX_MAX];
+ static u64 seen_seq;
+ unsigned long flags;
+ bool wake_klogd = false;
+ bool retry;
+
+ if (console_suspended) {
+ up(&console_sem);
+ return;
+ }
+
+ console_may_schedule = 0;
+
+ /* flush buffered message fragment immediately to console */
+ console_cont_flush(text, sizeof(text), ns);
+again:
+ for (;;) {
+ struct log *msg;
+ size_t len;
+ int level;
+
+ raw_spin_lock_irqsave(&ns->logbuf_lock, flags);
+ if (seen_seq != ns->log_next_seq) {
+ wake_klogd = true;
+ seen_seq = ns->log_next_seq;
+ }
+
+ if (ns->console_seq < ns->log_first_seq) {
+ /* messages are gone, move to first one */
+ ns->console_seq = ns->log_first_seq;
+ ns->console_idx = ns->log_first_idx;
+ console_prev = 0;
+ }
+skip:
+ if (ns->console_seq == ns->log_next_seq)
+ break;
+
+ msg = log_from_idx(ns->console_idx, ns);
+ if (msg->flags & LOG_NOCONS) {
+ /*
+ * Skip record we have buffered and already printed
+ * directly to the console when we received it.
+ */
+ ns->console_idx =
+ log_next(ns->console_idx, ns);
+ ns->console_seq++;
+ /*
+ * We will get here again when we register a new
+ * CON_PRINTBUFFER console. Clear the flag so we
+ * will properly dump everything later.
+ */
+ msg->flags &= ~LOG_NOCONS;
+ console_prev = msg->flags;
+ goto skip;
+ }
+
+ level = msg->level;
+ len = msg_print_text(msg, console_prev, false,
+ text, sizeof(text));
+ ns->console_idx =
+ log_next(ns->console_idx, ns);
+ ns->console_seq++;
+ console_prev = msg->flags;
+ raw_spin_unlock(&ns->logbuf_lock);
+
+ stop_critical_timings(); /* don't trace print latency */
+ call_console_drivers(level, text, len);
+ start_critical_timings();
+ local_irq_restore(flags);
+ }
+ console_locked = 0;
+
+ /* Release the exclusive_console once it is used */
+ if (unlikely(exclusive_console))
+ exclusive_console = NULL;
+
+ raw_spin_unlock(&ns->logbuf_lock);
+
+ up(&console_sem);
+
+ /*
+ * Someone could have filled up the buffer again, so re-check if there's
+ * something to flush. In case we cannot trylock the console_sem again,
+ * there's a new owner and the console_unlock() from them will do the
+ * flush, no worries.
+ */
+ raw_spin_lock(&ns->logbuf_lock);
+ retry = ns->console_seq != ns->log_next_seq;
+ raw_spin_unlock_irqrestore(&ns->logbuf_lock, flags);
+
+ if (retry && console_trylock())
+ goto again;
+
+ if (wake_klogd)
+ wake_up_klogd();
+}
+EXPORT_SYMBOL(ns_console_unlock);
+
+/**
* console_unlock - unlock the console system
*
* Releases the console_lock which the caller holds on the console system
--
1.8.2.2

2013-08-07 07:38:15

by Rui Xiang

[permalink] [raw]
Subject: [PATCH v3 09/11] syslog_ns: implement ns_printk_emit for specific syslog_ns

Prink_emit use init_syslog_ns as a parameter of namespace,
but in some context, it will cause that logs generated
through printk_emit will be exported to host, for instance,
devkmsg_writev.

We add a new interface ns_printk_emit to pass an syslog_ns
parameter, and use it in devkmsg_writev.

Signed-off-by: Rui Xiang <[email protected]>
---
include/linux/printk.h | 6 ++++++
kernel/printk.c | 20 +++++++++++++++++++-
2 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/include/linux/printk.h b/include/linux/printk.h
index bf83ad9..4c7e2be 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -121,6 +121,12 @@ asmlinkage int printk_emit(int facility, int level,
const char *dict, size_t dictlen,
const char *fmt, ...);

+asmlinkage __printf(6, 7) __cold
+asmlinkage int ns_printk_emit(struct syslog_namespace *ns,
+ int facility, int level,
+ const char *dict, size_t dictlen,
+ const char *fmt, ...);
+
asmlinkage __printf(1, 2) __cold
int printk(const char *fmt, ...);

diff --git a/kernel/printk.c b/kernel/printk.c
index 38e8869..b60c1d4 100644
--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -420,6 +420,7 @@ static ssize_t devkmsg_writev(struct kiocb *iocb, const struct iovec *iv,
int facility = 1; /* LOG_USER */
size_t len = iov_length(iv, count);
ssize_t ret = len;
+ struct namespace *ns = current_user_ns()->syslog_ns;

if (len > LOG_LINE_MAX)
return -EINVAL;
@@ -461,7 +462,7 @@ static ssize_t devkmsg_writev(struct kiocb *iocb, const struct iovec *iv,
}
line[len] = '\0';

- printk_emit(facility, level, NULL, 0, "%s", line);
+ ns_printk_emit(ns, facility, level, NULL, 0, "%s", line);
out:
kfree(buf);
return ret;
@@ -1722,6 +1723,23 @@ asmlinkage int printk_emit(int facility, int level,
}
EXPORT_SYMBOL(printk_emit);

+asmlinkage int ns_printk_emit(struct syslog_namespace *ns,
+ int facility, int level,
+ const char *dict, size_t dictlen,
+ const char *fmt, ...)
+{
+ va_list args;
+ int r;
+
+ va_start(args, fmt);
+ r = ns_vprintk_emit(ns, facility, level, dict, dictlen, fmt,
+ args);
+ va_end(args);
+
+ return r;
+}
+EXPORT_SYMBOL(ns_printk_emit);
+
/**
* printk - print a kernel message
* @fmt: format string
--
1.8.2.2

2013-08-07 07:38:56

by Rui Xiang

[permalink] [raw]
Subject: [PATCH v3 11/11] netfilter: use ns_printk in iptable context

To containerise iptables log, use ns_printk
to report individual logs to container as
getting syslog_ns from net->user_ns.

Signed-off-by: Rui Xiang <[email protected]>
---
include/net/netfilter/xt_log.h | 6 +++++-
net/netfilter/xt_LOG.c | 4 ++--
2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/include/net/netfilter/xt_log.h b/include/net/netfilter/xt_log.h
index 9d9756c..834d972 100644
--- a/include/net/netfilter/xt_log.h
+++ b/include/net/netfilter/xt_log.h
@@ -39,10 +39,14 @@ static struct sbuff *sb_open(void)
return m;
}

-static void sb_close(struct sbuff *m)
+static void sb_close(struct sbuff *m, struct net *net)
{
m->buf[m->count] = 0;
+#ifdef CONFIG_NET_NS
+ ns_printk(net->user_ns->syslog_ns, "%s\n", m->buf);
+#else
printk("%s\n", m->buf);
+#endif

if (likely(m != &emergency))
kfree(m);
diff --git a/net/netfilter/xt_LOG.c b/net/netfilter/xt_LOG.c
index 5ab2484..e034a74 100644
--- a/net/netfilter/xt_LOG.c
+++ b/net/netfilter/xt_LOG.c
@@ -493,7 +493,7 @@ ipt_log_packet(struct net *net,

dump_ipv4_packet(m, loginfo, skb, 0);

- sb_close(m);
+ sb_close(m, net);
}

#if IS_ENABLED(CONFIG_IP6_NF_IPTABLES)
@@ -824,7 +824,7 @@ ip6t_log_packet(struct net *net,

dump_ipv6_packet(m, loginfo, skb, skb_network_offset(skb), 1);

- sb_close(m);
+ sb_close(m, net);
}
#endif

--
1.8.2.2

2013-08-07 07:39:53

by Rui Xiang

[permalink] [raw]
Subject: [PATCH v3 03/11] syslog_ns: add init syslog_ns for global syslog

Add init_syslog_ns to manage host log buffer, and
initilize its fileds as the global variables.

Printk by default in kernel will continue to be
targeted at init_syslog_ns. So the buf of init
ns is just the same as the original global buf.

Signed-off-by: Rui Xiang <[email protected]>
---
include/linux/syslog.h | 1 +
include/linux/user_namespace.h | 1 +
kernel/printk.c | 18 ++++++++++++++++++
kernel/user.c | 3 +++
kernel/user_namespace.c | 4 ++++
5 files changed, 27 insertions(+)

diff --git a/include/linux/syslog.h b/include/linux/syslog.h
index 62ce47f..363bc56 100644
--- a/include/linux/syslog.h
+++ b/include/linux/syslog.h
@@ -122,4 +122,5 @@ static inline void put_syslog_ns(struct syslog_namespace *ns)

int do_syslog(int type, char __user *buf, int count, bool from_file);

+extern struct syslog_namespace init_syslog_ns;
#endif /* _LINUX_SYSLOG_H */
diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
index ce2de5b..4b5e190 100644
--- a/include/linux/user_namespace.h
+++ b/include/linux/user_namespace.h
@@ -32,6 +32,7 @@ struct user_namespace {
};

extern struct user_namespace init_user_ns;
+extern struct syslog_namespace init_syslog_ns;

#ifdef CONFIG_USER_NS

diff --git a/kernel/printk.c b/kernel/printk.c
index 665cfdc..f288934 100644
--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -45,6 +45,8 @@
#include <linux/poll.h>
#include <linux/irq_work.h>
#include <linux/utsname.h>
+#include <linux/cred.h>
+#include <linux/user_namespace.h>

#include <asm/uaccess.h>

@@ -251,6 +253,22 @@ static u32 log_buf_len = __LOG_BUF_LEN;
/* cpu currently holding logbuf_lock */
static volatile unsigned int logbuf_cpu = UINT_MAX;

+struct syslog_namespace init_syslog_ns = {
+ .kref = {
+ .refcount = ATOMIC_INIT(2),
+ },
+ .logbuf_lock = __RAW_SPIN_LOCK_UNLOCKED(init_syslog_ns.logbuf_lock),
+ .logbuf_cpu = UINT_MAX,
+ .log_buf_len = __LOG_BUF_LEN,
+ .log_buf = __log_buf,
+ .owner = &init_user_ns,
+#ifdef CONFIG_SECURITY_DMESG_RESTRICT
+ .dmesg_restrict = 1,
+#else
+ .dmesg_restrict = 0,
+#endif
+};
+
/* human readable text of the record */
static char *log_text(const struct log *msg)
{
diff --git a/kernel/user.c b/kernel/user.c
index 69b4c3d..0bbd4f7 100644
--- a/kernel/user.c
+++ b/kernel/user.c
@@ -18,6 +18,8 @@
#include <linux/user_namespace.h>
#include <linux/proc_ns.h>

+struct syslog_namespace;
+
/*
* userns count is 1 for root user, 1 for init_uts_ns,
* and 1 for... ?
@@ -53,6 +55,7 @@ struct user_namespace init_user_ns = {
.proc_inum = PROC_USER_INIT_INO,
.may_mount_sysfs = true,
.may_mount_proc = true,
+ .syslog_ns = &init_syslog_ns,
};
EXPORT_SYMBOL_GPL(init_user_ns);

diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index d8c30db..20f402f 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -22,6 +22,7 @@
#include <linux/ctype.h>
#include <linux/projid.h>
#include <linux/fs_struct.h>
+#include <linux/syslog.h>

static struct kmem_cache *user_ns_cachep __read_mostly;

@@ -95,6 +96,8 @@ int create_user_ns(struct cred *new)
ns->owner = owner;
ns->group = group;

+ ns->syslog_ns = get_syslog_ns(parent_ns->syslog_ns);
+
set_cred_user_ns(new, ns);

update_mnt_policy(ns);
@@ -122,6 +125,7 @@ void free_user_ns(struct user_namespace *ns)
struct user_namespace *parent;

do {
+ put_syslog_ns(ns->syslog_ns);
parent = ns->parent;
proc_free_inum(ns->proc_inum);
kmem_cache_free(user_ns_cachep, ns);
--
1.8.2.2

2013-08-07 07:37:48

by Rui Xiang

[permalink] [raw]
Subject: [PATCH v3 07/11] syslog_ns: implement function for creating syslog ns

Add create_syslog_ns function to create a new ns. We
must create a user_ns before create a new syslog ns.
And then tie the new syslog_ns to current user_ns
instead of original syslog_ns which comes from
parent user_ns.

Add a new syslog flag SYSLOG_ACTION_NEW_NS to implement
a new command(11) of __NR_syslog system call. Through
that command, we can create a new syslog ns in user
space.

Signed-off-by: Rui Xiang <[email protected]>
---
include/linux/syslog.h | 4 ++++
kernel/printk.c | 52 ++++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 56 insertions(+)

diff --git a/include/linux/syslog.h b/include/linux/syslog.h
index fbf0cb6..118970c 100644
--- a/include/linux/syslog.h
+++ b/include/linux/syslog.h
@@ -23,6 +23,7 @@

#include <linux/slab.h>
#include <linux/kref.h>
+#include <linux/user_namespace.h>

/* Close the log. Currently a NOP. */
#define SYSLOG_ACTION_CLOSE 0
@@ -46,6 +47,8 @@
#define SYSLOG_ACTION_SIZE_UNREAD 9
/* Return size of the log buffer */
#define SYSLOG_ACTION_SIZE_BUFFER 10
+/* Create a new syslog ns */
+#define SYSLOG_ACTION_NEW_NS 11

#define SYSLOG_FROM_READER 0
#define SYSLOG_FROM_PROC 1
@@ -110,6 +113,7 @@ static inline void free_syslog_ns(struct kref *kref)
struct syslog_namespace *ns;
ns = container_of(kref, struct syslog_namespace, kref);

+ put_user_ns(ns->owner);
kfree(ns->log_buf);
kfree(ns);
}
diff --git a/kernel/printk.c b/kernel/printk.c
index bdb7ed4..a812a88 100644
--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -378,6 +378,10 @@ static int check_syslog_permissions(int type, bool from_file,
|| type == SYSLOG_ACTION_CONSOLE_LEVEL)
ns = &init_syslog_ns;

+ /* create a new syslog ns */
+ if (type == SYSLOG_ACTION_NEW_NS)
+ return 0;
+
if (syslog_action_restricted(type, ns)) {
if (ns_capable(ns->owner, CAP_SYSLOG))
return 0;
@@ -1125,6 +1129,51 @@ static int syslog_print_all(char __user *buf, int size, bool clear,
return len;
}

+static int create_syslog_ns(void)
+{
+ struct user_namespace *userns = current_user_ns();
+ struct syslog_namespace *oldns, *newns;
+ int err;
+
+ /*
+ * syslog ns belongs to a user ns. So you can only unshare your
+ * user_ns if you share a user_ns with your parent userns
+ */
+ if (userns == &init_user_ns ||
+ userns->syslog_ns != userns->parent->syslog_ns)
+ return -EINVAL;
+
+ if (!ns_capable(userns, CAP_SYSLOG))
+ return -EPERM;
+
+ err = -ENOMEM;
+ oldns = userns->syslog_ns;
+ newns = kzalloc(sizeof(*newns), GFP_ATOMIC);
+ if (!newns)
+ goto out;
+ newns->log_buf_len = __LOG_BUF_LEN;
+ newns->log_buf = kzalloc(newns->log_buf_len, GFP_ATOMIC);
+ if (!newns->log_buf)
+ goto out;
+
+ newns->owner = get_user_ns(userns);
+ raw_spin_lock_init(&(newns->logbuf_lock));
+ newns->logbuf_cpu = UINT_MAX;
+ newns->dmesg_restrict = oldns->dmesg_restrict;
+ put_syslog_ns(oldns);
+ kref_init(&newns->kref);
+ userns->syslog_ns = newns;
+ newns = NULL;
+
+ err = 0;
+out:
+ if (newns) {
+ kfree(newns->log_buf);
+ kfree(newns);
+ }
+ return err;
+}
+
int do_syslog(int type, char __user *buf, int len, bool from_file,
struct syslog_namespace *ns)
{
@@ -1248,6 +1297,9 @@ int do_syslog(int type, char __user *buf, int len, bool from_file,
case SYSLOG_ACTION_SIZE_BUFFER:
error = ns->log_buf_len;
break;
+ case SYSLOG_ACTION_NEW_NS:
+ error = create_syslog_ns();
+ break;
default:
error = -EINVAL;
break;
--
1.8.2.2

2013-08-07 07:40:38

by Rui Xiang

[permalink] [raw]
Subject: [PATCH v3 01/11] syslog_ns: add syslog_namespace and put/get_syslog_ns

Add a struct syslog_namespace which contains the necessary
members for hanlding syslog and realize get_syslog_ns and
put_syslog_ns API.

Signed-off-by: Rui Xiang <[email protected]>
---
include/linux/syslog.h | 68 ++++++++++++++++++++++++++++++++++++++++++++++++++
kernel/printk.c | 7 ------
2 files changed, 68 insertions(+), 7 deletions(-)

diff --git a/include/linux/syslog.h b/include/linux/syslog.h
index 98a3153..425fafe 100644
--- a/include/linux/syslog.h
+++ b/include/linux/syslog.h
@@ -21,6 +21,9 @@
#ifndef _LINUX_SYSLOG_H
#define _LINUX_SYSLOG_H

+#include <linux/slab.h>
+#include <linux/kref.h>
+
/* Close the log. Currently a NOP. */
#define SYSLOG_ACTION_CLOSE 0
/* Open the log. Currently a NOP. */
@@ -47,6 +50,71 @@
#define SYSLOG_FROM_READER 0
#define SYSLOG_FROM_PROC 1

+enum log_flags {
+ LOG_NOCONS = 1, /* already flushed, do not print to console */
+ LOG_NEWLINE = 2, /* text ended with a newline */
+ LOG_PREFIX = 4, /* text started with a prefix */
+ LOG_CONT = 8, /* text is a fragment of a continuation line */
+};
+
+struct syslog_namespace {
+ struct kref kref; /* syslog_ns reference count & control */
+
+ raw_spinlock_t logbuf_lock; /* access conflict locker */
+ /* cpu currently holding logbuf_lock of ns */
+ unsigned int logbuf_cpu;
+
+ /* index and sequence number of the first record stored in the buffer */
+ u64 log_first_seq;
+ u32 log_first_idx;
+
+ /* index and sequence number of the next record stored in the buffer */
+ u64 log_next_seq;
+ u32 log_next_idx;
+
+ /* the next printk record to read after the last 'clear' command */
+ u64 clear_seq;
+ u32 clear_idx;
+
+ char *log_buf;
+ u32 log_buf_len;
+
+ /* the next printk record to write to the console */
+ u64 console_seq;
+ u32 console_idx;
+
+ /* the next printk record to read by syslog(READ) or /proc/kmsg */
+ u64 syslog_seq;
+ u32 syslog_idx;
+ enum log_flags syslog_prev;
+ size_t syslog_partial;
+
+ int dmesg_restrict;
+};
+
+static inline struct syslog_namespace *get_syslog_ns(
+ struct syslog_namespace *ns)
+{
+ if (ns)
+ kref_get(&ns->kref);
+ return ns;
+}
+
+static inline void free_syslog_ns(struct kref *kref)
+{
+ struct syslog_namespace *ns;
+ ns = container_of(kref, struct syslog_namespace, kref);
+
+ kfree(ns->log_buf);
+ kfree(ns);
+}
+
+static inline void put_syslog_ns(struct syslog_namespace *ns)
+{
+ if (ns)
+ kref_put(&ns->kref, free_syslog_ns);
+}
+
int do_syslog(int type, char __user *buf, int count, bool from_file);

#endif /* _LINUX_SYSLOG_H */
diff --git a/kernel/printk.c b/kernel/printk.c
index 69b0890..665cfdc 100644
--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -193,13 +193,6 @@ static int console_may_schedule;
* separated by ',', and find the message after the ';' character.
*/

-enum log_flags {
- LOG_NOCONS = 1, /* already flushed, do not print to console */
- LOG_NEWLINE = 2, /* text ended with a newline */
- LOG_PREFIX = 4, /* text started with a prefix */
- LOG_CONT = 8, /* text is a fragment of a continuation line */
-};
-
struct log {
u64 ts_nsec; /* timestamp in nanoseconds */
u16 len; /* length of entire record */
--
1.8.2.2

2013-08-07 07:41:39

by Rui Xiang

[permalink] [raw]
Subject: [PATCH v3 02/11] syslog_ns: add syslog_ns into user_namespace

Add a syslog_ns pointer to user_namespace, and make
syslog_ns per user_namespace, not global.

Since syslog_ns is assigned to user_ns, we can have
full capabilities in new user_ns to create a new syslog_ns.

Signed-off-by: Rui Xiang <[email protected]>
---
include/linux/syslog.h | 5 +++++
include/linux/user_namespace.h | 1 +
2 files changed, 6 insertions(+)

diff --git a/include/linux/syslog.h b/include/linux/syslog.h
index 425fafe..62ce47f 100644
--- a/include/linux/syslog.h
+++ b/include/linux/syslog.h
@@ -90,6 +90,11 @@ struct syslog_namespace {
size_t syslog_partial;

int dmesg_restrict;
+
+ /*
+ * user namespace which owns this syslog ns.
+ */
+ struct user_namespace *owner;
};

static inline struct syslog_namespace *get_syslog_ns(
diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
index b6b215f..ce2de5b 100644
--- a/include/linux/user_namespace.h
+++ b/include/linux/user_namespace.h
@@ -28,6 +28,7 @@ struct user_namespace {
unsigned int proc_inum;
bool may_mount_sysfs;
bool may_mount_proc;
+ struct syslog_namespace *syslog_ns;
};

extern struct user_namespace init_user_ns;
--
1.8.2.2

2013-08-07 07:55:47

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH v3 00/11] Add namespace support for syslog


Since this still has not been addressed. I am going to repeat Andrews
objection again.

Isn't there a better way to get iptables information out than to use
syslog. I did not have time to follow up on that but it did appear that
someone did have a better way to get the information out.

Essentially the argument against this goes. The kernel logging facility
is really not a particularly good tool to be using for anything other
than kernel debugging information, and there appear to be no substantial
uses for a separate syslog that should not be done in other ways.

That design objection must be addressed before merging this code can be
given serious consideration.

Eric

2013-08-07 09:17:33

by Pablo Neira Ayuso

[permalink] [raw]
Subject: Re: [PATCH v3 11/11] netfilter: use ns_printk in iptable context

Hi,

On Wed, Aug 07, 2013 at 03:37:15PM +0800, Rui Xiang wrote:
> To containerise iptables log, use ns_printk
> to report individual logs to container as
> getting syslog_ns from net->user_ns.

This patch is missing the removal of a couple of LOC at the very
beginning of ipt_log_packet and ip6t_log_packet to get this working.

Please, revamp it. Thanks.

2013-08-07 13:48:36

by Serge Hallyn

[permalink] [raw]
Subject: Re: [PATCH v3 00/11] Add namespace support for syslog

Quoting Eric W. Biederman ([email protected]):
>
> Since this still has not been addressed. I am going to repeat Andrews
> objection again.
>
> Isn't there a better way to get iptables information out than to use
> syslog. I did not have time to follow up on that but it did appear that

Bruno suggested NFLOG target + ulogd. That's not ideal, but doable. At
least each container should be able to do that for itself. What it
won't do is let a host admin make sure that he doesn't get corrupted
syslog entries when partial-lines get sent from several containers and
the kernel and randomly spliced together. It also would simply be
better if the information was *always* sent to userspace instead of
syslog.

> someone did have a better way to get the information out.
>
> Essentially the argument against this goes. The kernel logging facility
> is really not a particularly good tool to be using for anything other
> than kernel debugging information, and there appear to be no substantial
> uses for a separate syslog that should not be done in other ways.
>
> That design objection must be addressed before merging this code can be
> given serious consideration.
>
> Eric
> _______________________________________________
> Containers mailing list
> [email protected]
> https://lists.linuxfoundation.org/mailman/listinfo/containers

2013-08-07 18:41:48

by Ben Hutchings

[permalink] [raw]
Subject: Re: [PATCH v3 05/11] syslog_ns: make permisiion check per user namespace

On Wed, 2013-08-07 at 15:37 +0800, Rui Xiang wrote:
> Use ns_capable to check capability in user ns,
> instead of capable function. The user ns is the
> owner of current syslog ns.
>
> Signed-off-by: Rui Xiang <[email protected]>
> ---
> kernel/printk.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/printk.c b/kernel/printk.c
> index e508ab2..ca951e7 100644
> --- a/kernel/printk.c
> +++ b/kernel/printk.c
> @@ -374,13 +374,13 @@ static int check_syslog_permissions(int type, bool from_file,
> return 0;
>
> if (syslog_action_restricted(type, ns)) {
> - if (capable(CAP_SYSLOG))
> + if (ns_capable(ns->owner, CAP_SYSLOG))
> return 0;
> /*
> * For historical reasons, accept CAP_SYS_ADMIN too, with
> * a warning.
> */
> - if (capable(CAP_SYS_ADMIN)) {
> + if (ns_capable(ns->owner, CAP_SYS_ADMIN)) {
> pr_warn_once("%s (%d): Attempt to access syslog with "
> "CAP_SYS_ADMIN but no CAP_SYSLOG "
> "(deprecated).\n",

Since CAP_SYS_ADMIN is only accepted for backward compatibility, is it
really necessary to accept it as a per-namespace capability too?

Ben.

--
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

2013-08-07 18:53:10

by Ben Hutchings

[permalink] [raw]
Subject: Re: [PATCH v3 04/11] syslog_ns: make syslog handling per namespace

On Wed, 2013-08-07 at 15:37 +0800, Rui Xiang wrote:
> This patch makes syslog buf and other fields per
> namespace.
>
> Here use ns->log_buf(log_buf_len, logbuf_lock,
> log_first_seq, logbuf_lock, and so on) fields
> instead of global ones to handle syslog.
[...]
> --- a/kernel/printk.c
> +++ b/kernel/printk.c
[...]
> }
>
> -#ifdef CONFIG_SECURITY_DMESG_RESTRICT
> -int dmesg_restrict = 1;
> -#else
> -int dmesg_restrict;
> -#endif
> -
> -static int syslog_action_restricted(int type)
> +static int syslog_action_restricted(int type,
> + struct syslog_namespace *ns)
> {
> - if (dmesg_restrict)
> + if (ns->dmesg_restrict)
> return 1;
> /*
> * Unless restricted, we allow "read all" and "get buffer size"
[...]

I don't think this should be a per-namespace setting. And it certainly
should not be possible for child namespaces to disable dmesg_restrict if
it is enabled by a parent namespace.

In later patches, it appears to be copied into child namespaces but not
made visible or controllable there. So if an administrator enables
dmesg_restrict in the initial syslog namespace after another syslog
namespace has been created, she won't be able to tell that it is still
disabled in that other namespace.

Ben.

--
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

2013-08-07 18:59:38

by Ben Hutchings

[permalink] [raw]
Subject: Re: [PATCH v3 07/11] syslog_ns: implement function for creating syslog ns

On Wed, 2013-08-07 at 15:37 +0800, Rui Xiang wrote:
[...]
> --- a/kernel/printk.c
> +++ b/kernel/printk.c
[...]
> @@ -1125,6 +1129,51 @@ static int syslog_print_all(char __user *buf, int size, bool clear,
> return len;
> }
>
> +static int create_syslog_ns(void)
> +{
> + struct user_namespace *userns = current_user_ns();
> + struct syslog_namespace *oldns, *newns;
> + int err;
> +
> + /*
> + * syslog ns belongs to a user ns. So you can only unshare your
> + * user_ns if you share a user_ns with your parent userns

It looks like this should say:
* syslog_ns if you share a syslog_ns with your parent user_ns

> + */
> + if (userns == &init_user_ns ||
> + userns->syslog_ns != userns->parent->syslog_ns)
> + return -EINVAL;
> +
> + if (!ns_capable(userns, CAP_SYSLOG))
> + return -EPERM;
> +
> + err = -ENOMEM;
> + oldns = userns->syslog_ns;
> + newns = kzalloc(sizeof(*newns), GFP_ATOMIC);

This doesn't appear to be an atomic context, so use GFP_KERNEL.

> + if (!newns)
> + goto out;
> + newns->log_buf_len = __LOG_BUF_LEN;
> + newns->log_buf = kzalloc(newns->log_buf_len, GFP_ATOMIC);
[...]

Same here.

Also¸ I'm not sure that __LOG_BUF_LEN is the right length. OpenVZ
certainly uses a small buffer for container syslogs (4K) rather than
using the same length as the global syslog. Maybe it should be
separately configutable.

Ben.

--
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

2013-08-08 01:36:20

by Gao feng

[permalink] [raw]
Subject: Re: [PATCH v3 00/11] Add namespace support for syslog

On 08/07/2013 03:55 PM, Eric W. Biederman wrote:
>
> Since this still has not been addressed. I am going to repeat Andrews
> objection again.
>
> Isn't there a better way to get iptables information out than to use
> syslog. I did not have time to follow up on that but it did appear that
> someone did have a better way to get the information out.
>
> Essentially the argument against this goes. The kernel logging facility
> is really not a particularly good tool to be using for anything other
> than kernel debugging information, and there appear to be no substantial
> uses for a separate syslog that should not be done in other ways.

containerizing syslog is not only for iptables, it also isolates the /dev/kmsg,
/proc/kmsg, syslog(2)... user space tools in container may use this interface
to read/generate syslog.

But I don't know how important/urgent this containerizing syslog work is,
Rui Xiang, can you find an important/popular user space tool which uses this
interfaces to generate kernel syslog?

Thanks
Gao

2013-08-08 11:13:51

by Rui Xiang

[permalink] [raw]
Subject: Re: [PATCH v3 00/11] Add namespace support for syslog

On 2013/8/8 9:37, Gao feng wrote:
> On 08/07/2013 03:55 PM, Eric W. Biederman wrote:
>>
>> Since this still has not been addressed. I am going to repeat Andrews
>> objection again.
>>
>> Isn't there a better way to get iptables information out than to use
>> syslog. I did not have time to follow up on that but it did appear that
>> someone did have a better way to get the information out.
>>
>> Essentially the argument against this goes. The kernel logging facility
>> is really not a particularly good tool to be using for anything other
>> than kernel debugging information, and there appear to be no substantial
>> uses for a separate syslog that should not be done in other ways.
>
> containerizing syslog is not only for iptables, it also isolates the /dev/kmsg,
> /proc/kmsg, syslog(2)... user space tools in container may use this interface
> to read/generate syslog.
>
> But I don't know how important/urgent this containerizing syslog work is,
> Rui Xiang, can you find an important/popular user space tool which uses this
> interfaces to generate kernel syslog?
>

There are some other cases. Some warnings (bad mount options for tmpfs,
bad uid owner for many of them, etc) emerged in the container should
be exported to the container. Some belong on the host - if they show
a corrupt superblock which may indicate an attempt by the container
to crash the kernel. Like these, Kernel will print out warnings when
userspace in container uses a deprecated something or other, and these
logs should be invisible and specific for current container.

I can't say this work is terribly compelling and important, but the
impact may be obvious, IMO.


Thanks.

2013-08-14 15:30:21

by Serge E. Hallyn

[permalink] [raw]
Subject: Re: [PATCH v3 00/11] Add namespace support for syslog

Quoting Rui Xiang ([email protected]):
> On 2013/8/8 9:37, Gao feng wrote:
> > On 08/07/2013 03:55 PM, Eric W. Biederman wrote:
> >>
> >> Since this still has not been addressed. I am going to repeat Andrews
> >> objection again.
> >>
> >> Isn't there a better way to get iptables information out than to use
> >> syslog. I did not have time to follow up on that but it did appear that
> >> someone did have a better way to get the information out.
> >>
> >> Essentially the argument against this goes. The kernel logging facility
> >> is really not a particularly good tool to be using for anything other
> >> than kernel debugging information, and there appear to be no substantial
> >> uses for a separate syslog that should not be done in other ways.
> >
> > containerizing syslog is not only for iptables, it also isolates the /dev/kmsg,
> > /proc/kmsg, syslog(2)... user space tools in container may use this interface
> > to read/generate syslog.
> >
> > But I don't know how important/urgent this containerizing syslog work is,
> > Rui Xiang, can you find an important/popular user space tool which uses this
> > interfaces to generate kernel syslog?
> >
>
> There are some other cases. Some warnings (bad mount options for tmpfs,
> bad uid owner for many of them, etc) emerged in the container should
> be exported to the container. Some belong on the host - if they show
> a corrupt superblock which may indicate an attempt by the container
> to crash the kernel. Like these, Kernel will print out warnings when
> userspace in container uses a deprecated something or other, and these
> logs should be invisible and specific for current container.
>
> I can't say this work is terribly compelling and important, but the
> impact may be obvious, IMO.

Aug 9 21:49:13 sergeh1 kernel: [4644829.672768] init: Failed to spawn network-interface (veth8Ehlvj) post-stop process: unable to change root directory: No such file ricr:aeohgrticr cfe rty444984 n:aetswnw-ta(ht -rrsultheoit: hlrro<4865i:i sntkta(ht ttpe btheoit: hlrrob r6ezt)nrgoadgte644915 c0pt(tyg ti wd a
Aug 9 21:49:13 sergeh1 kernel: X3f d-6:uigitra ora
Aug 9 22:19:54 sergeh1 kernel: 6[642.175 X3f d-6:mutdflsse ihodrddt oe==99 rfl=lccnanrdfutwt-etn"nm=/a/ah/x/lu-ui/m.AExdu"pi=91 om=mut sye"x3name="/devlo0" lg=r"ol pmc r=3an19pfel-nireu-tntgne/rc/cldudmHEqu d97o=otfy=x"ra=d/o/fg""8b:o vhc)nrgibdte646013 veeMzWe oso d<[4715]xr r1eMc egset
Aug

2013-08-14 19:22:23

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH v3 00/11] Add namespace support for syslog

"Serge E. Hallyn" <[email protected]> writes:

> Quoting Rui Xiang ([email protected]):
>> On 2013/8/8 9:37, Gao feng wrote:
>> > On 08/07/2013 03:55 PM, Eric W. Biederman wrote:
>> >>
>> >> Since this still has not been addressed. I am going to repeat Andrews
>> >> objection again.
>> >>
>> >> Isn't there a better way to get iptables information out than to use
>> >> syslog. I did not have time to follow up on that but it did appear that
>> >> someone did have a better way to get the information out.
>> >>
>> >> Essentially the argument against this goes. The kernel logging facility
>> >> is really not a particularly good tool to be using for anything other
>> >> than kernel debugging information, and there appear to be no substantial
>> >> uses for a separate syslog that should not be done in other ways.
>> >
>> > containerizing syslog is not only for iptables, it also isolates the /dev/kmsg,
>> > /proc/kmsg, syslog(2)... user space tools in container may use this interface
>> > to read/generate syslog.
>> >
>> > But I don't know how important/urgent this containerizing syslog work is,
>> > Rui Xiang, can you find an important/popular user space tool which uses this
>> > interfaces to generate kernel syslog?
>> >
>>
>> There are some other cases. Some warnings (bad mount options for tmpfs,
>> bad uid owner for many of them, etc) emerged in the container should
>> be exported to the container. Some belong on the host - if they show
>> a corrupt superblock which may indicate an attempt by the container
>> to crash the kernel. Like these, Kernel will print out warnings when
>> userspace in container uses a deprecated something or other, and these
>> logs should be invisible and specific for current container.
>>
>> I can't say this work is terribly compelling and important, but the
>> impact may be obvious, IMO.
>
> Aug 9 21:49:13 sergeh1 kernel: [4644829.672768] init: Failed to spawn network-interface (veth8Ehlvj) post-stop process: unable to change root directory: No such file ricr:aeohgrticr cfe rty444984 n:aetswnw-ta(ht -rrsultheoit: hlrro<4865i:i sntkta(ht ttpe btheoit: hlrrob r6ezt)nrgoadgte644915 c0pt(tyg ti wd a
> Aug 9 21:49:13 sergeh1 kernel: X3f d-6:uigitra ora
> Aug 9 22:19:54 sergeh1 kernel: 6[642.175 X3f d-6:mutdflsse ihodrddt oe==99 rfl=lccnanrdfutwt-etn"nm=/a/ah/x/lu-ui/m.AExdu"pi=91 om=mut sye"x3name="/devlo0" lg=r"ol pmc r=3an19pfel-nireu-tntgne/rc/cldudmHEqu d97o=otfy=x"ra=d/o/fg""8b:o vhc)nrgibdte646013 veeMzWe oso d<[4715]xr r1eMc egset
> Aug

That is certainly a mess. Now I don't believe we allow processes in a
user namespace to write to the kernels log (certainly we shouldn't be)
so part of that is not a problem.

What is interleaving messages into syslog?

And to be clear my only perspective is that we need to make certain we
have this thought out.

Eric

2013-08-17 13:38:56

by Serge E. Hallyn

[permalink] [raw]
Subject: Re: [PATCH v3 00/11] Add namespace support for syslog

Quoting Eric W. Biederman ([email protected]):
> "Serge E. Hallyn" <[email protected]> writes:
>
> > Quoting Rui Xiang ([email protected]):
> >> On 2013/8/8 9:37, Gao feng wrote:
> >> > On 08/07/2013 03:55 PM, Eric W. Biederman wrote:
> >> >>
> >> >> Since this still has not been addressed. I am going to repeat Andrews
> >> >> objection again.
> >> >>
> >> >> Isn't there a better way to get iptables information out than to use
> >> >> syslog. I did not have time to follow up on that but it did appear that
> >> >> someone did have a better way to get the information out.
> >> >>
> >> >> Essentially the argument against this goes. The kernel logging facility
> >> >> is really not a particularly good tool to be using for anything other
> >> >> than kernel debugging information, and there appear to be no substantial
> >> >> uses for a separate syslog that should not be done in other ways.
> >> >
> >> > containerizing syslog is not only for iptables, it also isolates the /dev/kmsg,
> >> > /proc/kmsg, syslog(2)... user space tools in container may use this interface
> >> > to read/generate syslog.
> >> >
> >> > But I don't know how important/urgent this containerizing syslog work is,
> >> > Rui Xiang, can you find an important/popular user space tool which uses this
> >> > interfaces to generate kernel syslog?
> >> >
> >>
> >> There are some other cases. Some warnings (bad mount options for tmpfs,
> >> bad uid owner for many of them, etc) emerged in the container should
> >> be exported to the container. Some belong on the host - if they show
> >> a corrupt superblock which may indicate an attempt by the container
> >> to crash the kernel. Like these, Kernel will print out warnings when
> >> userspace in container uses a deprecated something or other, and these
> >> logs should be invisible and specific for current container.
> >>
> >> I can't say this work is terribly compelling and important, but the
> >> impact may be obvious, IMO.
> >
> > Aug 9 21:49:13 sergeh1 kernel: [4644829.672768] init: Failed to spawn network-interface (veth8Ehlvj) post-stop process: unable to change root directory: No such file ricr:aeohgrticr cfe rty444984 n:aetswnw-ta(ht -rrsultheoit: hlrro<4865i:i sntkta(ht ttpe btheoit: hlrrob r6ezt)nrgoadgte644915 c0pt(tyg ti wd a
> > Aug 9 21:49:13 sergeh1 kernel: X3f d-6:uigitra ora
> > Aug 9 22:19:54 sergeh1 kernel: 6[642.175 X3f d-6:mutdflsse ihodrddt oe==99 rfl=lccnanrdfutwt-etn"nm=/a/ah/x/lu-ui/m.AExdu"pi=91 om=mut sye"x3name="/devlo0" lg=r"ol pmc r=3an19pfel-nireu-tntgne/rc/cldudmHEqu d97o=otfy=x"ra=d/o/fg""8b:o vhc)nrgibdte646013 veeMzWe oso d<[4715]xr r1eMc egset
> > Aug
>
> That is certainly a mess. Now I don't believe we allow processes in a
> user namespace to write to the kernels log (certainly we shouldn't be)
> so part of that is not a problem.

I'll have to test with user namespaces (next week - don't have a
setup right now)

However user namespaces can't be used in every case, due especially to
device restrictions and cap_capable calls. So it may not be ok to say
"user namespaces will stop this."

> What is interleaving messages into syslog?

Heh, I'm actually not sure. I do know it happens every time I start
a container. I believe one of the sources is net/ipv6/addrconf.c.
Another looks like a apparmor=denied message (I say that due to the
"name=" in the garbled output). Which would beg the question is this
actually a bug in audit. (This is on a 3.2 kernel) Another other may
be an upstart job relating to the nic going up.

> And to be clear my only perspective is that we need to make certain we
> have this thought out.

Agreed. I need to test a few things as soon as I have time.

-serge