2013-07-29 02:34:07

by Rui Xiang

[permalink] [raw]
Subject: [PATCH 0/9] Add namespace support for syslog v2

This patchset introduces a system log namespace.

It is the 2nd version. The link of the 1st version is
http://lwn.net/Articles/525728/. In that version, syslog_
namespace was added into nsproxy and created through a new
clone flag CLONE_SYSLOG when cloning a process.

There were some discussion in last November about the 1st
version. This version used these important advice, and
referred to Serge's patch(http://lwn.net/Articles/525629/).

Unlike the 1st version, in this patchset, syslog namespace
is tied to a user namespace. Add we must create a new user
ns before create a new syslog ns, because that will make
users have full capabilities in this new userns after
cloning a new user ns. The syslog namespace can be created
through a new command(11) to __NR_syslog syscall. That owe
to a new syslog flag SYSLOG_ACTION_NEW_NS.

In syslog_namespace, some necessary identifiers for handling
syslog buf are containerized. When one container creates a
new syslog ns, individual buf will be allocated to store log
ownned this container.

A new interface ns_printk is added to print the logs which
we want to see in the container. Through ns_printk, we can
get more logs related to a specific net ns, for instance,
iptables. Here we use it to report iptable logs per
contianer.

Then default printk targeted at the init_syslog_ns will
continue to print out most kernel log to host.

One task in a new syslog ns could affect only current
container through "dmesg", "dmesg -c" and /dev/kmsg
actions. The read/write interface such as /dev/kmsg,
/pro/kmsg and syslog syscall continue to be useful for
container users.

This patchset is based on linus' linux tree.

Rui Xiang (9):
syslog_ns: add syslog_namespace and put/get_syslog_ns
syslog_ns: add syslog_ns into user_namespace
syslog_ns: add init syslog_ns for global syslog
syslog_ns: make syslog handling per namespace
syslog_ns: make permisiion check per user namespace
syslog_ns: use init syslog_ns for console action
syslog_ns: implement function for creating syslog ns
syslog_ns: implement ns_printk for specific syslog_ns
netfilter: use ns_printk in iptable context

fs/proc/kmsg.c | 17 +-
include/linux/printk.h | 5 +-
include/linux/syslog.h | 79 ++++-
include/linux/user_namespace.h | 2 +
include/net/netfilter/xt_log.h | 6 +-
kernel/printk.c | 642 ++++++++++++++++++++++++-----------------
kernel/sysctl.c | 3 +-
kernel/user.c | 3 +
kernel/user_namespace.c | 4 +
net/netfilter/xt_LOG.c | 4 +-
10 files changed, 493 insertions(+), 272 deletions(-)

--
1.8.2.2


2013-07-29 02:34:13

by Rui Xiang

[permalink] [raw]
Subject: [PATCH 1/9] syslog_ns: add syslog_namespace and put/get_syslog_ns

Add a struct syslog_namespace which contains the necessary
members for hanlding syslog and realize get_syslog_ns and
put_syslog_ns API.

Signed-off-by: Rui Xiang <[email protected]>
---
include/linux/syslog.h | 68 ++++++++++++++++++++++++++++++++++++++++++++++++++
kernel/printk.c | 7 ------
2 files changed, 68 insertions(+), 7 deletions(-)

diff --git a/include/linux/syslog.h b/include/linux/syslog.h
index 98a3153..425fafe 100644
--- a/include/linux/syslog.h
+++ b/include/linux/syslog.h
@@ -21,6 +21,9 @@
#ifndef _LINUX_SYSLOG_H
#define _LINUX_SYSLOG_H

+#include <linux/slab.h>
+#include <linux/kref.h>
+
/* Close the log. Currently a NOP. */
#define SYSLOG_ACTION_CLOSE 0
/* Open the log. Currently a NOP. */
@@ -47,6 +50,71 @@
#define SYSLOG_FROM_READER 0
#define SYSLOG_FROM_PROC 1

+enum log_flags {
+ LOG_NOCONS = 1, /* already flushed, do not print to console */
+ LOG_NEWLINE = 2, /* text ended with a newline */
+ LOG_PREFIX = 4, /* text started with a prefix */
+ LOG_CONT = 8, /* text is a fragment of a continuation line */
+};
+
+struct syslog_namespace {
+ struct kref kref; /* syslog_ns reference count & control */
+
+ raw_spinlock_t logbuf_lock; /* access conflict locker */
+ /* cpu currently holding logbuf_lock of ns */
+ unsigned int logbuf_cpu;
+
+ /* index and sequence number of the first record stored in the buffer */
+ u64 log_first_seq;
+ u32 log_first_idx;
+
+ /* index and sequence number of the next record stored in the buffer */
+ u64 log_next_seq;
+ u32 log_next_idx;
+
+ /* the next printk record to read after the last 'clear' command */
+ u64 clear_seq;
+ u32 clear_idx;
+
+ char *log_buf;
+ u32 log_buf_len;
+
+ /* the next printk record to write to the console */
+ u64 console_seq;
+ u32 console_idx;
+
+ /* the next printk record to read by syslog(READ) or /proc/kmsg */
+ u64 syslog_seq;
+ u32 syslog_idx;
+ enum log_flags syslog_prev;
+ size_t syslog_partial;
+
+ int dmesg_restrict;
+};
+
+static inline struct syslog_namespace *get_syslog_ns(
+ struct syslog_namespace *ns)
+{
+ if (ns)
+ kref_get(&ns->kref);
+ return ns;
+}
+
+static inline void free_syslog_ns(struct kref *kref)
+{
+ struct syslog_namespace *ns;
+ ns = container_of(kref, struct syslog_namespace, kref);
+
+ kfree(ns->log_buf);
+ kfree(ns);
+}
+
+static inline void put_syslog_ns(struct syslog_namespace *ns)
+{
+ if (ns)
+ kref_put(&ns->kref, free_syslog_ns);
+}
+
int do_syslog(int type, char __user *buf, int count, bool from_file);

#endif /* _LINUX_SYSLOG_H */
diff --git a/kernel/printk.c b/kernel/printk.c
index d37d45c..7e544bf 100644
--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -193,13 +193,6 @@ static int console_may_schedule;
* separated by ',', and find the message after the ';' character.
*/

-enum log_flags {
- LOG_NOCONS = 1, /* already flushed, do not print to console */
- LOG_NEWLINE = 2, /* text ended with a newline */
- LOG_PREFIX = 4, /* text started with a prefix */
- LOG_CONT = 8, /* text is a fragment of a continuation line */
-};
-
struct log {
u64 ts_nsec; /* timestamp in nanoseconds */
u16 len; /* length of entire record */
--
1.8.2.2

2013-07-29 02:34:21

by Rui Xiang

[permalink] [raw]
Subject: [PATCH 4/9] syslog_ns: make syslog handling per namespace

This patch makes syslog buf and other fields per
namespace.

Here use ns->log_buf(log_buf_len, logbuf_lock,
log_first_seq, logbuf_lock, and so on) fields
instead of global ones to handle syslog.

Syslog interfaces such as /dev/kmsg, /proc/kmsg,
and syslog syscall are all containerized for
container users.

Signed-off-by: Rui Xiang <[email protected]>
---
fs/proc/kmsg.c | 17 +-
include/linux/printk.h | 1 -
include/linux/syslog.h | 3 +-
kernel/printk.c | 507 +++++++++++++++++++++++++------------------------
kernel/sysctl.c | 3 +-
5 files changed, 273 insertions(+), 258 deletions(-)

diff --git a/fs/proc/kmsg.c b/fs/proc/kmsg.c
index bdfabda..cb98431 100644
--- a/fs/proc/kmsg.c
+++ b/fs/proc/kmsg.c
@@ -13,6 +13,8 @@
#include <linux/proc_fs.h>
#include <linux/fs.h>
#include <linux/syslog.h>
+#include <linux/cred.h>
+#include <linux/user_namespace.h>

#include <asm/uaccess.h>
#include <asm/io.h>
@@ -21,12 +23,14 @@ extern wait_queue_head_t log_wait;

static int kmsg_open(struct inode * inode, struct file * file)
{
- return do_syslog(SYSLOG_ACTION_OPEN, NULL, 0, SYSLOG_FROM_PROC);
+ return do_syslog(SYSLOG_ACTION_OPEN, NULL, 0, SYSLOG_FROM_PROC,
+ file->f_cred->user_ns->syslog_ns);
}

static int kmsg_release(struct inode * inode, struct file * file)
{
- (void) do_syslog(SYSLOG_ACTION_CLOSE, NULL, 0, SYSLOG_FROM_PROC);
+ (void) do_syslog(SYSLOG_ACTION_CLOSE, NULL, 0, SYSLOG_FROM_PROC,
+ file->f_cred->user_ns->syslog_ns);
return 0;
}

@@ -34,15 +38,18 @@ static ssize_t kmsg_read(struct file *file, char __user *buf,
size_t count, loff_t *ppos)
{
if ((file->f_flags & O_NONBLOCK) &&
- !do_syslog(SYSLOG_ACTION_SIZE_UNREAD, NULL, 0, SYSLOG_FROM_PROC))
+ !do_syslog(SYSLOG_ACTION_SIZE_UNREAD, NULL, 0, SYSLOG_FROM_PROC,
+ file->f_cred->user_ns->syslog_ns))
return -EAGAIN;
- return do_syslog(SYSLOG_ACTION_READ, buf, count, SYSLOG_FROM_PROC);
+ return do_syslog(SYSLOG_ACTION_READ, buf, count, SYSLOG_FROM_PROC,
+ file->f_cred->user_ns->syslog_ns);
}

static unsigned int kmsg_poll(struct file *file, poll_table *wait)
{
poll_wait(file, &log_wait, wait);
- if (do_syslog(SYSLOG_ACTION_SIZE_UNREAD, NULL, 0, SYSLOG_FROM_PROC))
+ if (do_syslog(SYSLOG_ACTION_SIZE_UNREAD, NULL, 0, SYSLOG_FROM_PROC,
+ file->f_cred->user_ns->syslog_ns))
return POLLIN | POLLRDNORM;
return 0;
}
diff --git a/include/linux/printk.h b/include/linux/printk.h
index 22c7052..29e3f85 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -139,7 +139,6 @@ extern bool printk_timed_ratelimit(unsigned long *caller_jiffies,
unsigned int interval_msec);

extern int printk_delay_msec;
-extern int dmesg_restrict;
extern int kptr_restrict;

extern void wake_up_klogd(void);
diff --git a/include/linux/syslog.h b/include/linux/syslog.h
index 363bc56..fbf0cb6 100644
--- a/include/linux/syslog.h
+++ b/include/linux/syslog.h
@@ -120,7 +120,8 @@ static inline void put_syslog_ns(struct syslog_namespace *ns)
kref_put(&ns->kref, free_syslog_ns);
}

-int do_syslog(int type, char __user *buf, int count, bool from_file);
+int do_syslog(int type, char __user *buf, int count, bool from_file,
+ struct syslog_namespace *ns);

extern struct syslog_namespace init_syslog_ns;
#endif /* _LINUX_SYSLOG_H */
diff --git a/kernel/printk.c b/kernel/printk.c
index fd83ec1..846fef5 100644
--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -213,29 +213,8 @@ static DEFINE_RAW_SPINLOCK(logbuf_lock);

#ifdef CONFIG_PRINTK
DECLARE_WAIT_QUEUE_HEAD(log_wait);
-/* the next printk record to read by syslog(READ) or /proc/kmsg */
-static u64 syslog_seq;
-static u32 syslog_idx;
-static enum log_flags syslog_prev;
-static size_t syslog_partial;
-
-/* index and sequence number of the first record stored in the buffer */
-static u64 log_first_seq;
-static u32 log_first_idx;
-
-/* index and sequence number of the next record to store in the buffer */
-static u64 log_next_seq;
-static u32 log_next_idx;
-
-/* the next printk record to write to the console */
-static u64 console_seq;
-static u32 console_idx;
static enum log_flags console_prev;

-/* the next printk record to read after the last 'clear' command */
-static u64 clear_seq;
-static u32 clear_idx;
-
#define PREFIX_MAX 32
#define LOG_LINE_MAX 1024 - PREFIX_MAX

@@ -246,12 +225,8 @@ static u32 clear_idx;
#define LOG_ALIGN __alignof__(struct log)
#endif
#define __LOG_BUF_LEN (1 << CONFIG_LOG_BUF_SHIFT)
+/* this buf only for init_syslog_ns */
static char __log_buf[__LOG_BUF_LEN] __aligned(LOG_ALIGN);
-static char *log_buf = __log_buf;
-static u32 log_buf_len = __LOG_BUF_LEN;
-
-/* cpu currently holding logbuf_lock */
-static volatile unsigned int logbuf_cpu = UINT_MAX;

struct syslog_namespace init_syslog_ns = {
.kref = {
@@ -282,23 +257,23 @@ static char *log_dict(const struct log *msg)
}

/* get record by index; idx must point to valid msg */
-static struct log *log_from_idx(u32 idx)
+static struct log *log_from_idx(u32 idx, struct syslog_namespace *ns)
{
- struct log *msg = (struct log *)(log_buf + idx);
+ struct log *msg = (struct log *)(ns->log_buf + idx);

/*
* A length == 0 record is the end of buffer marker. Wrap around and
* read the message at the start of the buffer.
*/
if (!msg->len)
- return (struct log *)log_buf;
+ return (struct log *)ns->log_buf;
return msg;
}

/* get next record; idx must point to valid msg */
-static u32 log_next(u32 idx)
+static u32 log_next(u32 idx, struct syslog_namespace *ns)
{
- struct log *msg = (struct log *)(log_buf + idx);
+ struct log *msg = (struct log *)(ns->log_buf + idx);

/* length == 0 indicates the end of the buffer; wrap */
/*
@@ -307,7 +282,7 @@ static u32 log_next(u32 idx)
* return the one after that.
*/
if (!msg->len) {
- msg = (struct log *)log_buf;
+ msg = (struct log *)ns->log_buf;
return msg->len;
}
return idx + msg->len;
@@ -317,7 +292,8 @@ static u32 log_next(u32 idx)
static void log_store(int facility, int level,
enum log_flags flags, u64 ts_nsec,
const char *dict, u16 dict_len,
- const char *text, u16 text_len)
+ const char *text, u16 text_len,
+ struct syslog_namespace *ns)
{
struct log *msg;
u32 size, pad_len;
@@ -327,34 +303,40 @@ static void log_store(int facility, int level,
pad_len = (-size) & (LOG_ALIGN - 1);
size += pad_len;

- while (log_first_seq < log_next_seq) {
+ while (ns->log_first_seq < ns->log_next_seq) {
u32 free;

- if (log_next_idx > log_first_idx)
- free = max(log_buf_len - log_next_idx, log_first_idx);
+ if (ns->log_next_idx > ns->log_first_idx)
+ free = max(ns->log_buf_len -
+ ns->log_next_idx,
+ ns->log_first_idx);
else
- free = log_first_idx - log_next_idx;
+ free = ns->log_first_idx -
+ ns->log_next_idx;

if (free > size + sizeof(struct log))
break;

/* drop old messages until we have enough contiuous space */
- log_first_idx = log_next(log_first_idx);
- log_first_seq++;
+ ns->log_first_idx =
+ log_next(ns->log_first_idx, ns);
+ ns->log_first_seq++;
}

- if (log_next_idx + size + sizeof(struct log) >= log_buf_len) {
+ if (ns->log_next_idx + size + sizeof(struct log) >=
+ ns->log_buf_len) {
/*
* This message + an additional empty header does not fit
* at the end of the buffer. Add an empty header with len == 0
* to signify a wrap around.
*/
- memset(log_buf + log_next_idx, 0, sizeof(struct log));
- log_next_idx = 0;
+ memset(ns->log_buf + ns->log_next_idx,
+ 0, sizeof(struct log));
+ ns->log_next_idx = 0;
}

/* fill message */
- msg = (struct log *)(log_buf + log_next_idx);
+ msg = (struct log *)(ns->log_buf + ns->log_next_idx);
memcpy(log_text(msg), text, text_len);
msg->text_len = text_len;
memcpy(log_dict(msg), dict, dict_len);
@@ -370,19 +352,14 @@ static void log_store(int facility, int level,
msg->len = sizeof(struct log) + text_len + dict_len + pad_len;

/* insert message */
- log_next_idx += msg->len;
- log_next_seq++;
+ ns->log_next_idx += msg->len;
+ ns->log_next_seq++;
}

-#ifdef CONFIG_SECURITY_DMESG_RESTRICT
-int dmesg_restrict = 1;
-#else
-int dmesg_restrict;
-#endif
-
-static int syslog_action_restricted(int type)
+static int syslog_action_restricted(int type,
+ struct syslog_namespace *ns)
{
- if (dmesg_restrict)
+ if (ns->dmesg_restrict)
return 1;
/*
* Unless restricted, we allow "read all" and "get buffer size"
@@ -392,7 +369,8 @@ static int syslog_action_restricted(int type)
type != SYSLOG_ACTION_SIZE_BUFFER;
}

-static int check_syslog_permissions(int type, bool from_file)
+static int check_syslog_permissions(int type, bool from_file,
+ struct syslog_namespace *ns)
{
/*
* If this is from /proc/kmsg and we've already opened it, then we've
@@ -401,7 +379,7 @@ static int check_syslog_permissions(int type, bool from_file)
if (from_file && type != SYSLOG_ACTION_OPEN)
return 0;

- if (syslog_action_restricted(type)) {
+ if (syslog_action_restricted(type, ns)) {
if (capable(CAP_SYSLOG))
return 0;
/*
@@ -496,6 +474,8 @@ static ssize_t devkmsg_read(struct file *file, char __user *buf,
char cont = '-';
size_t len;
ssize_t ret;
+ struct syslog_namespace *ns =
+ file->f_cred->user_ns->syslog_ns;

if (!user)
return -EBADF;
@@ -503,32 +483,32 @@ static ssize_t devkmsg_read(struct file *file, char __user *buf,
ret = mutex_lock_interruptible(&user->lock);
if (ret)
return ret;
- raw_spin_lock_irq(&logbuf_lock);
- while (user->seq == log_next_seq) {
+ raw_spin_lock_irq(&ns->logbuf_lock);
+ while (user->seq == ns->log_next_seq) {
if (file->f_flags & O_NONBLOCK) {
ret = -EAGAIN;
- raw_spin_unlock_irq(&logbuf_lock);
+ raw_spin_unlock_irq(&ns->logbuf_lock);
goto out;
}

- raw_spin_unlock_irq(&logbuf_lock);
+ raw_spin_unlock_irq(&ns->logbuf_lock);
ret = wait_event_interruptible(log_wait,
- user->seq != log_next_seq);
+ user->seq != ns->log_next_seq);
if (ret)
goto out;
- raw_spin_lock_irq(&logbuf_lock);
+ raw_spin_lock_irq(&ns->logbuf_lock);
}

- if (user->seq < log_first_seq) {
+ if (user->seq < ns->log_first_seq) {
/* our last seen message is gone, return error and reset */
- user->idx = log_first_idx;
- user->seq = log_first_seq;
+ user->idx = ns->log_first_idx;
+ user->seq = ns->log_first_seq;
ret = -EPIPE;
- raw_spin_unlock_irq(&logbuf_lock);
+ raw_spin_unlock_irq(&ns->logbuf_lock);
goto out;
}

- msg = log_from_idx(user->idx);
+ msg = log_from_idx(user->idx, ns);
ts_usec = msg->ts_nsec;
do_div(ts_usec, 1000);

@@ -589,9 +569,9 @@ static ssize_t devkmsg_read(struct file *file, char __user *buf,
user->buf[len++] = '\n';
}

- user->idx = log_next(user->idx);
+ user->idx = log_next(user->idx, ns);
user->seq++;
- raw_spin_unlock_irq(&logbuf_lock);
+ raw_spin_unlock_irq(&ns->logbuf_lock);

if (len > count) {
ret = -EINVAL;
@@ -612,18 +592,19 @@ static loff_t devkmsg_llseek(struct file *file, loff_t offset, int whence)
{
struct devkmsg_user *user = file->private_data;
loff_t ret = 0;
+ struct syslog_namespace *ns = file->f_cred->user_ns->syslog_ns;

if (!user)
return -EBADF;
if (offset)
return -ESPIPE;

- raw_spin_lock_irq(&logbuf_lock);
+ raw_spin_lock_irq(&ns->logbuf_lock);
switch (whence) {
case SEEK_SET:
/* the first record */
- user->idx = log_first_idx;
- user->seq = log_first_seq;
+ user->idx = ns->log_first_idx;
+ user->seq = ns->log_first_seq;
break;
case SEEK_DATA:
/*
@@ -631,18 +612,18 @@ static loff_t devkmsg_llseek(struct file *file, loff_t offset, int whence)
* like issued by 'dmesg -c'. Reading /dev/kmsg itself
* changes no global state, and does not clear anything.
*/
- user->idx = clear_idx;
- user->seq = clear_seq;
+ user->idx = ns->clear_idx;
+ user->seq = ns->clear_seq;
break;
case SEEK_END:
/* after the last record */
- user->idx = log_next_idx;
- user->seq = log_next_seq;
+ user->idx = ns->log_next_idx;
+ user->seq = ns->log_next_seq;
break;
default:
ret = -EINVAL;
}
- raw_spin_unlock_irq(&logbuf_lock);
+ raw_spin_unlock_irq(&ns->logbuf_lock);
return ret;
}

@@ -650,21 +631,22 @@ static unsigned int devkmsg_poll(struct file *file, poll_table *wait)
{
struct devkmsg_user *user = file->private_data;
int ret = 0;
+ struct syslog_namespace *ns = file->f_cred->user_ns->syslog_ns;

if (!user)
return POLLERR|POLLNVAL;

poll_wait(file, &log_wait, wait);

- raw_spin_lock_irq(&logbuf_lock);
- if (user->seq < log_next_seq) {
+ raw_spin_lock_irq(&ns->logbuf_lock);
+ if (user->seq < ns->log_next_seq) {
/* return error when data has vanished underneath us */
- if (user->seq < log_first_seq)
+ if (user->seq < ns->log_first_seq)
ret = POLLIN|POLLRDNORM|POLLERR|POLLPRI;
else
ret = POLLIN|POLLRDNORM;
}
- raw_spin_unlock_irq(&logbuf_lock);
+ raw_spin_unlock_irq(&ns->logbuf_lock);

return ret;
}
@@ -673,13 +655,14 @@ static int devkmsg_open(struct inode *inode, struct file *file)
{
struct devkmsg_user *user;
int err;
+ struct syslog_namespace *ns = file->f_cred->user_ns->syslog_ns;

/* write-only does not need any file context */
if ((file->f_flags & O_ACCMODE) == O_WRONLY)
return 0;

err = check_syslog_permissions(SYSLOG_ACTION_READ_ALL,
- SYSLOG_FROM_READER);
+ SYSLOG_FROM_READER, ns);
if (err)
return err;

@@ -689,10 +672,10 @@ static int devkmsg_open(struct inode *inode, struct file *file)

mutex_init(&user->lock);

- raw_spin_lock_irq(&logbuf_lock);
- user->idx = log_first_idx;
- user->seq = log_first_seq;
- raw_spin_unlock_irq(&logbuf_lock);
+ raw_spin_lock_irq(&ns->logbuf_lock);
+ user->idx = ns->log_first_idx;
+ user->seq = ns->log_first_seq;
+ raw_spin_unlock_irq(&ns->logbuf_lock);

file->private_data = user;
return 0;
@@ -730,10 +713,11 @@ const struct file_operations kmsg_fops = {
*/
void log_buf_kexec_setup(void)
{
- VMCOREINFO_SYMBOL(log_buf);
- VMCOREINFO_SYMBOL(log_buf_len);
- VMCOREINFO_SYMBOL(log_first_idx);
- VMCOREINFO_SYMBOL(log_next_idx);
+ struct syslog_namespace *ns = &init_syslog_ns;
+ VMCOREINFO_SYMBOL(ns->log_buf);
+ VMCOREINFO_SYMBOL(ns->log_buf_len);
+ VMCOREINFO_SYMBOL(ns->log_first_idx);
+ VMCOREINFO_SYMBOL(ns->log_next_idx);
/*
* Export struct log size and field offsets. User space tools can
* parse it and detect any changes to structure down the line.
@@ -753,10 +737,11 @@ static unsigned long __initdata new_log_buf_len;
static int __init log_buf_len_setup(char *str)
{
unsigned size = memparse(str, &str);
+ struct syslog_namespace *ns = &init_syslog_ns;

if (size)
size = roundup_pow_of_two(size);
- if (size > log_buf_len)
+ if (size > ns->log_buf_len)
new_log_buf_len = size;

return 0;
@@ -768,6 +753,7 @@ void __init setup_log_buf(int early)
unsigned long flags;
char *new_log_buf;
int free;
+ struct syslog_namespace *ns = &init_syslog_ns;

if (!new_log_buf_len)
return;
@@ -789,15 +775,15 @@ void __init setup_log_buf(int early)
return;
}

- raw_spin_lock_irqsave(&logbuf_lock, flags);
- log_buf_len = new_log_buf_len;
- log_buf = new_log_buf;
+ raw_spin_lock_irqsave(&ns->logbuf_lock, flags);
+ memcpy(new_log_buf, ns->log_buf, __LOG_BUF_LEN);
+ ns->log_buf_len = new_log_buf_len;
+ ns->log_buf = new_log_buf;
new_log_buf_len = 0;
- free = __LOG_BUF_LEN - log_next_idx;
- memcpy(log_buf, __log_buf, __LOG_BUF_LEN);
- raw_spin_unlock_irqrestore(&logbuf_lock, flags);
+ free = __LOG_BUF_LEN - ns->log_next_idx;
+ raw_spin_unlock_irqrestore(&ns->logbuf_lock, flags);

- pr_info("log_buf_len: %d\n", log_buf_len);
+ pr_info("log_buf_len: %d\n", ns->log_buf_len);
pr_info("early log buf free: %d(%d%%)\n",
free, (free * 100) / __LOG_BUF_LEN);
}
@@ -977,7 +963,8 @@ static size_t msg_print_text(const struct log *msg, enum log_flags prev,
return len;
}

-static int syslog_print(char __user *buf, int size)
+static int syslog_print(char __user *buf, int size,
+ struct syslog_namespace *ns)
{
char *text;
struct log *msg;
@@ -991,37 +978,38 @@ static int syslog_print(char __user *buf, int size)
size_t n;
size_t skip;

- raw_spin_lock_irq(&logbuf_lock);
- if (syslog_seq < log_first_seq) {
+ raw_spin_lock_irq(&ns->logbuf_lock);
+ if (ns->syslog_seq < ns->log_first_seq) {
/* messages are gone, move to first one */
- syslog_seq = log_first_seq;
- syslog_idx = log_first_idx;
- syslog_prev = 0;
- syslog_partial = 0;
+ ns->syslog_seq = ns->log_first_seq;
+ ns->syslog_idx = ns->log_first_idx;
+ ns->syslog_prev = 0;
+ ns->syslog_partial = 0;
}
- if (syslog_seq == log_next_seq) {
- raw_spin_unlock_irq(&logbuf_lock);
+ if (ns->syslog_seq == ns->log_next_seq) {
+ raw_spin_unlock_irq(&ns->logbuf_lock);
break;
}

- skip = syslog_partial;
- msg = log_from_idx(syslog_idx);
- n = msg_print_text(msg, syslog_prev, true, text,
+ skip = ns->syslog_partial;
+ msg = log_from_idx(ns->syslog_idx, ns);
+ n = msg_print_text(msg, ns->syslog_prev, true, text,
LOG_LINE_MAX + PREFIX_MAX);
- if (n - syslog_partial <= size) {
+ if (n - ns->syslog_partial <= size) {
/* message fits into buffer, move forward */
- syslog_idx = log_next(syslog_idx);
- syslog_seq++;
- syslog_prev = msg->flags;
- n -= syslog_partial;
- syslog_partial = 0;
+ ns->syslog_idx =
+ log_next(ns->syslog_idx, ns);
+ ns->syslog_seq++;
+ ns->syslog_prev = msg->flags;
+ n -= ns->syslog_partial;
+ ns->syslog_partial = 0;
} else if (!len){
/* partial read(), remember position */
n = size;
- syslog_partial += n;
+ ns->syslog_partial += n;
} else
n = 0;
- raw_spin_unlock_irq(&logbuf_lock);
+ raw_spin_unlock_irq(&ns->logbuf_lock);

if (!n)
break;
@@ -1041,7 +1029,8 @@ static int syslog_print(char __user *buf, int size)
return len;
}

-static int syslog_print_all(char __user *buf, int size, bool clear)
+static int syslog_print_all(char __user *buf, int size, bool clear,
+ struct syslog_namespace *ns)
{
char *text;
int len = 0;
@@ -1050,55 +1039,55 @@ static int syslog_print_all(char __user *buf, int size, bool clear)
if (!text)
return -ENOMEM;

- raw_spin_lock_irq(&logbuf_lock);
+ raw_spin_lock_irq(&ns->logbuf_lock);
if (buf) {
u64 next_seq;
u64 seq;
u32 idx;
enum log_flags prev;

- if (clear_seq < log_first_seq) {
+ if (ns->clear_seq < ns->log_first_seq) {
/* messages are gone, move to first available one */
- clear_seq = log_first_seq;
- clear_idx = log_first_idx;
+ ns->clear_seq = ns->log_first_seq;
+ ns->clear_idx = ns->log_first_idx;
}

/*
* Find first record that fits, including all following records,
* into the user-provided buffer for this dump.
*/
- seq = clear_seq;
- idx = clear_idx;
+ seq = ns->clear_seq;
+ idx = ns->clear_idx;
prev = 0;
- while (seq < log_next_seq) {
- struct log *msg = log_from_idx(idx);
+ while (seq < ns->log_next_seq) {
+ struct log *msg = log_from_idx(idx, ns);

len += msg_print_text(msg, prev, true, NULL, 0);
prev = msg->flags;
- idx = log_next(idx);
+ idx = log_next(idx, ns);
seq++;
}

/* move first record forward until length fits into the buffer */
- seq = clear_seq;
- idx = clear_idx;
+ seq = ns->clear_seq;
+ idx = ns->clear_idx;
prev = 0;
- while (len > size && seq < log_next_seq) {
- struct log *msg = log_from_idx(idx);
+ while (len > size && seq < ns->log_next_seq) {
+ struct log *msg = log_from_idx(idx, ns);

len -= msg_print_text(msg, prev, true, NULL, 0);
prev = msg->flags;
- idx = log_next(idx);
+ idx = log_next(idx, ns);
seq++;
}

/* last message fitting into this dump */
- next_seq = log_next_seq;
+ next_seq = ns->log_next_seq;

len = 0;
prev = 0;
while (len >= 0 && seq < next_seq) {
- struct log *msg = log_from_idx(idx);
+ struct log *msg = log_from_idx(idx, ns);
int textlen;

textlen = msg_print_text(msg, prev, true, text,
@@ -1107,43 +1096,44 @@ static int syslog_print_all(char __user *buf, int size, bool clear)
len = textlen;
break;
}
- idx = log_next(idx);
+ idx = log_next(idx, ns);
seq++;
prev = msg->flags;

- raw_spin_unlock_irq(&logbuf_lock);
+ raw_spin_unlock_irq(&ns->logbuf_lock);
if (copy_to_user(buf + len, text, textlen))
len = -EFAULT;
else
len += textlen;
- raw_spin_lock_irq(&logbuf_lock);
+ raw_spin_lock_irq(&ns->logbuf_lock);

- if (seq < log_first_seq) {
+ if (seq < ns->log_first_seq) {
/* messages are gone, move to next one */
- seq = log_first_seq;
- idx = log_first_idx;
+ seq = ns->log_first_seq;
+ idx = ns->log_first_idx;
prev = 0;
}
}
}

if (clear) {
- clear_seq = log_next_seq;
- clear_idx = log_next_idx;
+ ns->clear_seq = ns->log_next_seq;
+ ns->clear_idx = ns->log_next_idx;
}
- raw_spin_unlock_irq(&logbuf_lock);
+ raw_spin_unlock_irq(&ns->logbuf_lock);

kfree(text);
return len;
}

-int do_syslog(int type, char __user *buf, int len, bool from_file)
+int do_syslog(int type, char __user *buf, int len, bool from_file,
+ struct syslog_namespace *ns)
{
bool clear = false;
static int saved_console_loglevel = -1;
int error;

- error = check_syslog_permissions(type, from_file);
+ error = check_syslog_permissions(type, from_file, ns);
if (error)
goto out;

@@ -1168,10 +1158,10 @@ int do_syslog(int type, char __user *buf, int len, bool from_file)
goto out;
}
error = wait_event_interruptible(log_wait,
- syslog_seq != log_next_seq);
+ ns->syslog_seq != ns->log_next_seq);
if (error)
goto out;
- error = syslog_print(buf, len);
+ error = syslog_print(buf, len, ns);
break;
/* Read/clear last kernel messages */
case SYSLOG_ACTION_READ_CLEAR:
@@ -1189,11 +1179,11 @@ int do_syslog(int type, char __user *buf, int len, bool from_file)
error = -EFAULT;
goto out;
}
- error = syslog_print_all(buf, len, clear);
+ error = syslog_print_all(buf, len, clear, ns);
break;
/* Clear ring buffer */
case SYSLOG_ACTION_CLEAR:
- syslog_print_all(NULL, 0, true);
+ syslog_print_all(NULL, 0, true, ns);
break;
/* Disable logging to console */
case SYSLOG_ACTION_CONSOLE_OFF:
@@ -1222,13 +1212,13 @@ int do_syslog(int type, char __user *buf, int len, bool from_file)
break;
/* Number of chars in the log buffer */
case SYSLOG_ACTION_SIZE_UNREAD:
- raw_spin_lock_irq(&logbuf_lock);
- if (syslog_seq < log_first_seq) {
+ raw_spin_lock_irq(&ns->logbuf_lock);
+ if (ns->syslog_seq < ns->log_first_seq) {
/* messages are gone, move to first one */
- syslog_seq = log_first_seq;
- syslog_idx = log_first_idx;
- syslog_prev = 0;
- syslog_partial = 0;
+ ns->syslog_seq = ns->log_first_seq;
+ ns->syslog_idx = ns->log_first_idx;
+ ns->syslog_prev = 0;
+ ns->syslog_partial = 0;
}
if (from_file) {
/*
@@ -1236,28 +1226,28 @@ int do_syslog(int type, char __user *buf, int len, bool from_file)
* for pending data, not the size; return the count of
* records, not the length.
*/
- error = log_next_idx - syslog_idx;
+ error = ns->log_next_idx - ns->syslog_idx;
} else {
- u64 seq = syslog_seq;
- u32 idx = syslog_idx;
- enum log_flags prev = syslog_prev;
+ u64 seq = ns->syslog_seq;
+ u32 idx = ns->syslog_idx;
+ enum log_flags prev = ns->syslog_prev;

error = 0;
- while (seq < log_next_seq) {
- struct log *msg = log_from_idx(idx);
+ while (seq < ns->log_next_seq) {
+ struct log *msg = log_from_idx(idx, ns);

error += msg_print_text(msg, prev, true, NULL, 0);
- idx = log_next(idx);
+ idx = log_next(idx, ns);
seq++;
prev = msg->flags;
}
- error -= syslog_partial;
+ error -= ns->syslog_partial;
}
- raw_spin_unlock_irq(&logbuf_lock);
+ raw_spin_unlock_irq(&ns->logbuf_lock);
break;
/* Size of the log buffer */
case SYSLOG_ACTION_SIZE_BUFFER:
- error = log_buf_len;
+ error = ns->log_buf_len;
break;
default:
error = -EINVAL;
@@ -1269,7 +1259,8 @@ out:

SYSCALL_DEFINE3(syslog, int, type, char __user *, buf, int, len)
{
- return do_syslog(type, buf, len, SYSLOG_FROM_READER);
+ return do_syslog(type, buf, len, SYSLOG_FROM_READER,
+ current_user_ns()->syslog_ns);
}

/*
@@ -1307,7 +1298,7 @@ static void call_console_drivers(int level, const char *text, size_t len)
* every 10 seconds, to leave time for slow consoles to print a
* full oops.
*/
-static void zap_locks(void)
+static void zap_locks(struct syslog_namespace *ns)
{
static unsigned long oops_timestamp;

@@ -1319,7 +1310,7 @@ static void zap_locks(void)

debug_locks_off();
/* If a crash is occurring, make sure we can't deadlock */
- raw_spin_lock_init(&logbuf_lock);
+ raw_spin_lock_init(&ns->logbuf_lock);
/* And make sure that we print immediately */
sema_init(&console_sem, 1);
}
@@ -1359,8 +1350,9 @@ static inline int can_use_console(unsigned int cpu)
* interrupts disabled. It should return with 'lockbuf_lock'
* released but interrupts still disabled.
*/
-static int console_trylock_for_printk(unsigned int cpu)
- __releases(&logbuf_lock)
+static int console_trylock_for_printk(unsigned int cpu,
+ struct syslog_namespace *ns)
+ __releases(&ns->logbuf_lock)
{
int retval = 0, wake = 0;

@@ -1379,8 +1371,8 @@ static int console_trylock_for_printk(unsigned int cpu)
retval = 0;
}
}
- logbuf_cpu = UINT_MAX;
- raw_spin_unlock(&logbuf_lock);
+ ns->logbuf_cpu = UINT_MAX;
+ raw_spin_unlock(&ns->logbuf_lock);
if (wake)
up(&console_sem);
return retval;
@@ -1418,7 +1410,7 @@ static struct cont {
bool flushed:1; /* buffer sealed and committed */
} cont;

-static void cont_flush(enum log_flags flags)
+static void cont_flush(enum log_flags flags, struct syslog_namespace *ns)
{
if (cont.flushed)
return;
@@ -1432,7 +1424,7 @@ static void cont_flush(enum log_flags flags)
* line. LOG_NOCONS suppresses a duplicated output.
*/
log_store(cont.facility, cont.level, flags | LOG_NOCONS,
- cont.ts_nsec, NULL, 0, cont.buf, cont.len);
+ cont.ts_nsec, NULL, 0, cont.buf, cont.len, ns);
cont.flags = flags;
cont.flushed = true;
} else {
@@ -1441,19 +1433,20 @@ static void cont_flush(enum log_flags flags)
* just submit it to the store and free the buffer.
*/
log_store(cont.facility, cont.level, flags, 0,
- NULL, 0, cont.buf, cont.len);
+ NULL, 0, cont.buf, cont.len, ns);
cont.len = 0;
}
}

-static bool cont_add(int facility, int level, const char *text, size_t len)
+static bool cont_add(int facility, int level, const char *text, size_t len,
+ struct syslog_namespace *ns)
{
if (cont.len && cont.flushed)
return false;

if (cont.len + len > sizeof(cont.buf)) {
/* the line gets too long, split it up in separate records */
- cont_flush(LOG_CONT);
+ cont_flush(LOG_CONT, ns);
return false;
}

@@ -1471,7 +1464,7 @@ static bool cont_add(int facility, int level, const char *text, size_t len)
cont.len += len;

if (cont.len > (sizeof(cont.buf) * 80) / 100)
- cont_flush(LOG_CONT);
+ cont_flush(LOG_CONT, ns);

return true;
}
@@ -1516,6 +1509,7 @@ asmlinkage int vprintk_emit(int facility, int level,
unsigned long flags;
int this_cpu;
int printed_len = 0;
+ struct syslog_namespace *ns = &init_syslog_ns;

boot_delay_msec(level);
printk_delay();
@@ -1527,7 +1521,7 @@ asmlinkage int vprintk_emit(int facility, int level,
/*
* Ouch, printk recursed into itself!
*/
- if (unlikely(logbuf_cpu == this_cpu)) {
+ if (unlikely(ns->logbuf_cpu == this_cpu)) {
/*
* If a crash is occurring during printk() on this CPU,
* then try to get the crash message out but make sure
@@ -1539,12 +1533,12 @@ asmlinkage int vprintk_emit(int facility, int level,
recursion_bug = 1;
goto out_restore_irqs;
}
- zap_locks();
+ zap_locks(ns);
}

lockdep_off();
- raw_spin_lock(&logbuf_lock);
- logbuf_cpu = this_cpu;
+ raw_spin_lock(&ns->logbuf_lock);
+ ns->logbuf_cpu = this_cpu;

if (recursion_bug) {
static const char recursion_msg[] =
@@ -1554,7 +1548,7 @@ asmlinkage int vprintk_emit(int facility, int level,
printed_len += strlen(recursion_msg);
/* emit KERN_CRIT message */
log_store(0, 2, LOG_PREFIX|LOG_NEWLINE, 0,
- NULL, 0, recursion_msg, printed_len);
+ NULL, 0, recursion_msg, printed_len, ns);
}

/*
@@ -1601,12 +1595,12 @@ asmlinkage int vprintk_emit(int facility, int level,
* or another task also prints continuation lines.
*/
if (cont.len && (lflags & LOG_PREFIX || cont.owner != current))
- cont_flush(LOG_NEWLINE);
+ cont_flush(LOG_NEWLINE, ns);

/* buffer line if possible, otherwise store it right away */
- if (!cont_add(facility, level, text, text_len))
+ if (!cont_add(facility, level, text, text_len, ns))
log_store(facility, level, lflags | LOG_CONT, 0,
- dict, dictlen, text, text_len);
+ dict, dictlen, text, text_len, ns);
} else {
bool stored = false;

@@ -1618,13 +1612,14 @@ asmlinkage int vprintk_emit(int facility, int level,
*/
if (cont.len && cont.owner == current) {
if (!(lflags & LOG_PREFIX))
- stored = cont_add(facility, level, text, text_len);
- cont_flush(LOG_NEWLINE);
+ stored = cont_add(facility, level, text,
+ text_len, ns);
+ cont_flush(LOG_NEWLINE, ns);
}

if (!stored)
log_store(facility, level, lflags, 0,
- dict, dictlen, text, text_len);
+ dict, dictlen, text, text_len, ns);
}
printed_len += text_len;

@@ -1636,7 +1631,7 @@ asmlinkage int vprintk_emit(int facility, int level,
* The console_trylock_for_printk() function will release 'logbuf_lock'
* regardless of whether it actually gets the console semaphore or not.
*/
- if (console_trylock_for_printk(this_cpu))
+ if (console_trylock_for_printk(this_cpu, ns))
console_unlock();

lockdep_on();
@@ -1995,12 +1990,13 @@ int is_console_locked(void)
return console_locked;
}

-static void console_cont_flush(char *text, size_t size)
+static void console_cont_flush(char *text, size_t size,
+ struct syslog_namespace *ns)
{
unsigned long flags;
size_t len;

- raw_spin_lock_irqsave(&logbuf_lock, flags);
+ raw_spin_lock_irqsave(&ns->logbuf_lock, flags);

if (!cont.len)
goto out;
@@ -2010,18 +2006,18 @@ static void console_cont_flush(char *text, size_t size)
* busy. The earlier ones need to be printed before this one, we
* did not flush any fragment so far, so just let it queue up.
*/
- if (console_seq < log_next_seq && !cont.cons)
+ if (ns->console_seq < ns->log_next_seq && !cont.cons)
goto out;

len = cont_print_text(text, size);
- raw_spin_unlock(&logbuf_lock);
+ raw_spin_unlock(&ns->logbuf_lock);
stop_critical_timings();
call_console_drivers(cont.level, text, len);
start_critical_timings();
local_irq_restore(flags);
return;
out:
- raw_spin_unlock_irqrestore(&logbuf_lock, flags);
+ raw_spin_unlock_irqrestore(&ns->logbuf_lock, flags);
}

/**
@@ -2045,6 +2041,7 @@ void console_unlock(void)
unsigned long flags;
bool wake_klogd = false;
bool retry;
+ struct syslog_namespace *ns = &init_syslog_ns;

if (console_suspended) {
up(&console_sem);
@@ -2054,37 +2051,38 @@ void console_unlock(void)
console_may_schedule = 0;

/* flush buffered message fragment immediately to console */
- console_cont_flush(text, sizeof(text));
+ console_cont_flush(text, sizeof(text), ns);
again:
for (;;) {
struct log *msg;
size_t len;
int level;

- raw_spin_lock_irqsave(&logbuf_lock, flags);
- if (seen_seq != log_next_seq) {
+ raw_spin_lock_irqsave(&ns->logbuf_lock, flags);
+ if (seen_seq != ns->log_next_seq) {
wake_klogd = true;
- seen_seq = log_next_seq;
+ seen_seq = ns->log_next_seq;
}

- if (console_seq < log_first_seq) {
+ if (ns->console_seq < ns->log_first_seq) {
/* messages are gone, move to first one */
- console_seq = log_first_seq;
- console_idx = log_first_idx;
+ ns->console_seq = ns->log_first_seq;
+ ns->console_idx = ns->log_first_idx;
console_prev = 0;
}
skip:
- if (console_seq == log_next_seq)
+ if (ns->console_seq == ns->log_next_seq)
break;

- msg = log_from_idx(console_idx);
+ msg = log_from_idx(ns->console_idx, ns);
if (msg->flags & LOG_NOCONS) {
/*
* Skip record we have buffered and already printed
* directly to the console when we received it.
*/
- console_idx = log_next(console_idx);
- console_seq++;
+ ns->console_idx =
+ log_next(ns->console_idx, ns);
+ ns->console_seq++;
/*
* We will get here again when we register a new
* CON_PRINTBUFFER console. Clear the flag so we
@@ -2098,10 +2096,11 @@ skip:
level = msg->level;
len = msg_print_text(msg, console_prev, false,
text, sizeof(text));
- console_idx = log_next(console_idx);
- console_seq++;
+ ns->console_idx =
+ log_next(ns->console_idx, ns);
+ ns->console_seq++;
console_prev = msg->flags;
- raw_spin_unlock(&logbuf_lock);
+ raw_spin_unlock(&ns->logbuf_lock);

stop_critical_timings(); /* don't trace print latency */
call_console_drivers(level, text, len);
@@ -2115,7 +2114,7 @@ skip:
if (unlikely(exclusive_console))
exclusive_console = NULL;

- raw_spin_unlock(&logbuf_lock);
+ raw_spin_unlock(&ns->logbuf_lock);

up(&console_sem);

@@ -2125,9 +2124,9 @@ skip:
* there's a new owner and the console_unlock() from them will do the
* flush, no worries.
*/
- raw_spin_lock(&logbuf_lock);
- retry = console_seq != log_next_seq;
- raw_spin_unlock_irqrestore(&logbuf_lock, flags);
+ raw_spin_lock(&ns->logbuf_lock);
+ retry = ns->console_seq != ns->log_next_seq;
+ raw_spin_unlock_irqrestore(&ns->logbuf_lock, flags);

if (retry && console_trylock())
goto again;
@@ -2252,6 +2251,7 @@ void register_console(struct console *newcon)
int i;
unsigned long flags;
struct console *bcon = NULL;
+ struct syslog_namespace *ns = &init_syslog_ns;

/*
* before we register a new CON_BOOT console, make sure we don't
@@ -2361,11 +2361,11 @@ void register_console(struct console *newcon)
* console_unlock(); will print out the buffered messages
* for us.
*/
- raw_spin_lock_irqsave(&logbuf_lock, flags);
- console_seq = syslog_seq;
- console_idx = syslog_idx;
- console_prev = syslog_prev;
- raw_spin_unlock_irqrestore(&logbuf_lock, flags);
+ raw_spin_lock_irqsave(&ns->logbuf_lock, flags);
+ ns->console_seq = ns->syslog_seq;
+ ns->console_idx = ns->syslog_idx;
+ console_prev = ns->syslog_prev;
+ raw_spin_unlock_irqrestore(&ns->logbuf_lock, flags);
/*
* We're about to replay the log buffer. Only do this to the
* just-registered console to avoid excessive message spam to
@@ -2627,6 +2627,7 @@ void kmsg_dump(enum kmsg_dump_reason reason)
{
struct kmsg_dumper *dumper;
unsigned long flags;
+ struct syslog_namespace *ns = &init_syslog_ns;

if ((reason > KMSG_DUMP_OOPS) && !always_kmsg_dump)
return;
@@ -2639,12 +2640,12 @@ void kmsg_dump(enum kmsg_dump_reason reason)
/* initialize iterator with data about the stored records */
dumper->active = true;

- raw_spin_lock_irqsave(&logbuf_lock, flags);
- dumper->cur_seq = clear_seq;
- dumper->cur_idx = clear_idx;
- dumper->next_seq = log_next_seq;
- dumper->next_idx = log_next_idx;
- raw_spin_unlock_irqrestore(&logbuf_lock, flags);
+ raw_spin_lock_irqsave(&ns->logbuf_lock, flags);
+ dumper->cur_seq = ns->clear_seq;
+ dumper->cur_idx = ns->clear_idx;
+ dumper->next_seq = ns->log_next_seq;
+ dumper->next_idx = ns->log_next_idx;
+ raw_spin_unlock_irqrestore(&ns->logbuf_lock, flags);

/* invoke dumper which will iterate over records */
dumper->dump(dumper, reason);
@@ -2680,24 +2681,25 @@ bool kmsg_dump_get_line_nolock(struct kmsg_dumper *dumper, bool syslog,
struct log *msg;
size_t l = 0;
bool ret = false;
+ struct syslog_namespace *ns = &init_syslog_ns;

if (!dumper->active)
goto out;

- if (dumper->cur_seq < log_first_seq) {
+ if (dumper->cur_seq < ns->log_first_seq) {
/* messages are gone, move to first available one */
- dumper->cur_seq = log_first_seq;
- dumper->cur_idx = log_first_idx;
+ dumper->cur_seq = ns->log_first_seq;
+ dumper->cur_idx = ns->log_first_idx;
}

/* last entry */
- if (dumper->cur_seq >= log_next_seq)
+ if (dumper->cur_seq >= ns->log_next_seq)
goto out;

- msg = log_from_idx(dumper->cur_idx);
+ msg = log_from_idx(dumper->cur_idx, ns);
l = msg_print_text(msg, 0, syslog, line, size);

- dumper->cur_idx = log_next(dumper->cur_idx);
+ dumper->cur_idx = log_next(dumper->cur_idx, ns);
dumper->cur_seq++;
ret = true;
out:
@@ -2728,10 +2730,11 @@ bool kmsg_dump_get_line(struct kmsg_dumper *dumper, bool syslog,
{
unsigned long flags;
bool ret;
+ struct syslog_namespace *ns = &init_syslog_ns;

- raw_spin_lock_irqsave(&logbuf_lock, flags);
+ raw_spin_lock_irqsave(&ns->logbuf_lock, flags);
ret = kmsg_dump_get_line_nolock(dumper, syslog, line, size, len);
- raw_spin_unlock_irqrestore(&logbuf_lock, flags);
+ raw_spin_unlock_irqrestore(&ns->logbuf_lock, flags);

return ret;
}
@@ -2767,20 +2770,21 @@ bool kmsg_dump_get_buffer(struct kmsg_dumper *dumper, bool syslog,
enum log_flags prev;
size_t l = 0;
bool ret = false;
+ struct syslog_namespace *ns = &init_syslog_ns;

if (!dumper->active)
goto out;

- raw_spin_lock_irqsave(&logbuf_lock, flags);
- if (dumper->cur_seq < log_first_seq) {
+ raw_spin_lock_irqsave(&ns->logbuf_lock, flags);
+ if (dumper->cur_seq < ns->log_first_seq) {
/* messages are gone, move to first available one */
- dumper->cur_seq = log_first_seq;
- dumper->cur_idx = log_first_idx;
+ dumper->cur_seq = ns->log_first_seq;
+ dumper->cur_idx = ns->log_first_idx;
}

/* last entry */
if (dumper->cur_seq >= dumper->next_seq) {
- raw_spin_unlock_irqrestore(&logbuf_lock, flags);
+ raw_spin_unlock_irqrestore(&ns->logbuf_lock, flags);
goto out;
}

@@ -2789,10 +2793,10 @@ bool kmsg_dump_get_buffer(struct kmsg_dumper *dumper, bool syslog,
idx = dumper->cur_idx;
prev = 0;
while (seq < dumper->next_seq) {
- struct log *msg = log_from_idx(idx);
+ struct log *msg = log_from_idx(idx, ns);

l += msg_print_text(msg, prev, true, NULL, 0);
- idx = log_next(idx);
+ idx = log_next(idx, ns);
seq++;
prev = msg->flags;
}
@@ -2802,10 +2806,10 @@ bool kmsg_dump_get_buffer(struct kmsg_dumper *dumper, bool syslog,
idx = dumper->cur_idx;
prev = 0;
while (l > size && seq < dumper->next_seq) {
- struct log *msg = log_from_idx(idx);
+ struct log *msg = log_from_idx(idx, ns);

l -= msg_print_text(msg, prev, true, NULL, 0);
- idx = log_next(idx);
+ idx = log_next(idx, ns);
seq++;
prev = msg->flags;
}
@@ -2817,10 +2821,10 @@ bool kmsg_dump_get_buffer(struct kmsg_dumper *dumper, bool syslog,
l = 0;
prev = 0;
while (seq < dumper->next_seq) {
- struct log *msg = log_from_idx(idx);
+ struct log *msg = log_from_idx(idx, ns);

l += msg_print_text(msg, prev, syslog, buf + l, size - l);
- idx = log_next(idx);
+ idx = log_next(idx, ns);
seq++;
prev = msg->flags;
}
@@ -2828,7 +2832,7 @@ bool kmsg_dump_get_buffer(struct kmsg_dumper *dumper, bool syslog,
dumper->next_seq = next_seq;
dumper->next_idx = next_idx;
ret = true;
- raw_spin_unlock_irqrestore(&logbuf_lock, flags);
+ raw_spin_unlock_irqrestore(&ns->logbuf_lock, flags);
out:
if (len)
*len = l;
@@ -2848,10 +2852,12 @@ EXPORT_SYMBOL_GPL(kmsg_dump_get_buffer);
*/
void kmsg_dump_rewind_nolock(struct kmsg_dumper *dumper)
{
- dumper->cur_seq = clear_seq;
- dumper->cur_idx = clear_idx;
- dumper->next_seq = log_next_seq;
- dumper->next_idx = log_next_idx;
+ struct syslog_namespace *ns = &init_syslog_ns;
+
+ dumper->cur_seq = ns->clear_seq;
+ dumper->cur_idx = ns->clear_idx;
+ dumper->next_seq = ns->log_next_seq;
+ dumper->next_idx = ns->log_next_idx;
}

/**
@@ -2865,10 +2871,11 @@ void kmsg_dump_rewind_nolock(struct kmsg_dumper *dumper)
void kmsg_dump_rewind(struct kmsg_dumper *dumper)
{
unsigned long flags;
+ struct syslog_namespace *ns = &init_syslog_ns;

- raw_spin_lock_irqsave(&logbuf_lock, flags);
+ raw_spin_lock_irqsave(&ns->logbuf_lock, flags);
kmsg_dump_rewind_nolock(dumper);
- raw_spin_unlock_irqrestore(&logbuf_lock, flags);
+ raw_spin_unlock_irqrestore(&ns->logbuf_lock, flags);
}
EXPORT_SYMBOL_GPL(kmsg_dump_rewind);

diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index ac09d98..0954b09 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -62,6 +62,7 @@
#include <linux/capability.h>
#include <linux/binfmts.h>
#include <linux/sched/sysctl.h>
+#include <linux/syslog.h>

#include <asm/uaccess.h>
#include <asm/processor.h>
@@ -773,7 +774,7 @@ static struct ctl_table kern_table[] = {
},
{
.procname = "dmesg_restrict",
- .data = &dmesg_restrict,
+ .data = &init_syslog_ns.dmesg_restrict,
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = proc_dointvec_minmax_sysadmin,
--
1.8.2.2

2013-07-29 02:34:32

by Rui Xiang

[permalink] [raw]
Subject: [PATCH 8/9] syslog_ns: implement ns_printk for specific syslog_ns

Add a new interface named ns_printk, and assign an
patamater ns. Log which belong to a container can
be printed by ns_printk.

Signed-off-by: Rui Xiang <[email protected]>
---
include/linux/printk.h | 4 ++++
kernel/printk.c | 53 ++++++++++++++++++++++++++++++++++++++++++++++----
2 files changed, 53 insertions(+), 4 deletions(-)

diff --git a/include/linux/printk.h b/include/linux/printk.h
index 29e3f85..bf83ad9 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -6,6 +6,7 @@
#include <linux/kern_levels.h>
#include <linux/linkage.h>

+struct syslog_namespace;
extern const char linux_banner[];
extern const char linux_proc_banner[];

@@ -123,6 +124,9 @@ asmlinkage int printk_emit(int facility, int level,
asmlinkage __printf(1, 2) __cold
int printk(const char *fmt, ...);

+asmlinkage __printf(2, 3) __cold
+int ns_printk(struct syslog_namespace *ns, const char *fmt, ...);
+
/*
* Special printk facility for scheduler use only, _DO_NOT_USE_ !
*/
diff --git a/kernel/printk.c b/kernel/printk.c
index 6b561db..56a8b27 100644
--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -1554,9 +1554,10 @@ static size_t cont_print_text(char *text, size_t size)
return textlen;
}

-asmlinkage int vprintk_emit(int facility, int level,
- const char *dict, size_t dictlen,
- const char *fmt, va_list args)
+static int ns_vprintk_emit(int facility, int level,
+ const char *dict, size_t dictlen,
+ const char *fmt, va_list args,
+ struct syslog_namespace *ns)
{
static int recursion_bug;
static char textbuf[LOG_LINE_MAX];
@@ -1566,7 +1567,6 @@ asmlinkage int vprintk_emit(int facility, int level,
unsigned long flags;
int this_cpu;
int printed_len = 0;
- struct syslog_namespace *ns = &init_syslog_ns;

boot_delay_msec(level);
printk_delay();
@@ -1697,6 +1697,14 @@ out_restore_irqs:

return printed_len;
}
+
+asmlinkage int vprintk_emit(int facility, int level,
+ const char *dict, size_t dictlen,
+ const char *fmt, va_list args)
+{
+ return ns_vprintk_emit(facility, level, dict, dictlen, fmt, args,
+ &init_syslog_ns);
+}
EXPORT_SYMBOL(vprintk_emit);

asmlinkage int vprintk(const char *fmt, va_list args)
@@ -1762,6 +1770,43 @@ asmlinkage int printk(const char *fmt, ...)
}
EXPORT_SYMBOL(printk);

+/**
+ * ns_printk - print a kernel message in syslog_ns
+ * @ns: syslog namespace
+ * @fmt: format string
+ *
+ * This is ns_printk().
+ * It can be called from container context. We add a param
+ * ns to record current syslog namespace, because we need to
+ * print some log which are not generated by host, but contaner.
+ *
+ * See the vsnprintf() documentation for format string extensions over C99.
+ **/
+asmlinkage int ns_printk(struct syslog_namespace *ns,
+ const char *fmt, ...)
+{
+ va_list args;
+ int r;
+
+ if (!ns)
+ ns = current_user_ns()->syslog_ns;
+
+#ifdef CONFIG_KGDB_KDB
+ if (unlikely(kdb_trap_printk)) {
+ va_start(args, fmt);
+ r = vkdb_printf(fmt, args);
+ va_end(args);
+ return r;
+ }
+#endif
+ va_start(args, fmt);
+ r = ns_vprintk_emit(0, -1, NULL, 0, fmt, args, ns);
+ va_end(args);
+
+ return r;
+}
+EXPORT_SYMBOL(ns_printk);
+
#else /* CONFIG_PRINTK */

#define LOG_LINE_MAX 0
--
1.8.2.2

2013-07-29 02:34:18

by Rui Xiang

[permalink] [raw]
Subject: [PATCH 3/9] syslog_ns: add init syslog_ns for global syslog

Add init_syslog_ns to manage host log buffer, and
initilize its fileds as the global variables.

Printk by default in kernel will continue to be
targeted at init_syslog_ns. So the buf of init
ns is just the same as the original global buf.

Signed-off-by: Rui Xiang <[email protected]>
---
include/linux/syslog.h | 1 +
include/linux/user_namespace.h | 1 +
kernel/printk.c | 18 ++++++++++++++++++
kernel/user.c | 3 +++
kernel/user_namespace.c | 4 ++++
5 files changed, 27 insertions(+)

diff --git a/include/linux/syslog.h b/include/linux/syslog.h
index 62ce47f..363bc56 100644
--- a/include/linux/syslog.h
+++ b/include/linux/syslog.h
@@ -122,4 +122,5 @@ static inline void put_syslog_ns(struct syslog_namespace *ns)

int do_syslog(int type, char __user *buf, int count, bool from_file);

+extern struct syslog_namespace init_syslog_ns;
#endif /* _LINUX_SYSLOG_H */
diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
index ce2de5b..4b5e190 100644
--- a/include/linux/user_namespace.h
+++ b/include/linux/user_namespace.h
@@ -32,6 +32,7 @@ struct user_namespace {
};

extern struct user_namespace init_user_ns;
+extern struct syslog_namespace init_syslog_ns;

#ifdef CONFIG_USER_NS

diff --git a/kernel/printk.c b/kernel/printk.c
index 7e544bf..fd83ec1 100644
--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -45,6 +45,8 @@
#include <linux/poll.h>
#include <linux/irq_work.h>
#include <linux/utsname.h>
+#include <linux/cred.h>
+#include <linux/user_namespace.h>

#include <asm/uaccess.h>

@@ -251,6 +253,22 @@ static u32 log_buf_len = __LOG_BUF_LEN;
/* cpu currently holding logbuf_lock */
static volatile unsigned int logbuf_cpu = UINT_MAX;

+struct syslog_namespace init_syslog_ns = {
+ .kref = {
+ .refcount = ATOMIC_INIT(2),
+ },
+ .logbuf_lock = __RAW_SPIN_LOCK_UNLOCKED(init_syslog_ns.logbuf_lock),
+ .logbuf_cpu = UINT_MAX,
+ .log_buf_len = __LOG_BUF_LEN,
+ .log_buf = __log_buf,
+ .owner = &init_user_ns,
+#ifdef CONFIG_SECURITY_DMESG_RESTRICT
+ .dmesg_restrict = 1,
+#else
+ .dmesg_restrict = 0,
+#endif
+};
+
/* human readable text of the record */
static char *log_text(const struct log *msg)
{
diff --git a/kernel/user.c b/kernel/user.c
index 69b4c3d..0bbd4f7 100644
--- a/kernel/user.c
+++ b/kernel/user.c
@@ -18,6 +18,8 @@
#include <linux/user_namespace.h>
#include <linux/proc_ns.h>

+struct syslog_namespace;
+
/*
* userns count is 1 for root user, 1 for init_uts_ns,
* and 1 for... ?
@@ -53,6 +55,7 @@ struct user_namespace init_user_ns = {
.proc_inum = PROC_USER_INIT_INO,
.may_mount_sysfs = true,
.may_mount_proc = true,
+ .syslog_ns = &init_syslog_ns,
};
EXPORT_SYMBOL_GPL(init_user_ns);

diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index d8c30db..20f402f 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -22,6 +22,7 @@
#include <linux/ctype.h>
#include <linux/projid.h>
#include <linux/fs_struct.h>
+#include <linux/syslog.h>

static struct kmem_cache *user_ns_cachep __read_mostly;

@@ -95,6 +96,8 @@ int create_user_ns(struct cred *new)
ns->owner = owner;
ns->group = group;

+ ns->syslog_ns = get_syslog_ns(parent_ns->syslog_ns);
+
set_cred_user_ns(new, ns);

update_mnt_policy(ns);
@@ -122,6 +125,7 @@ void free_user_ns(struct user_namespace *ns)
struct user_namespace *parent;

do {
+ put_syslog_ns(ns->syslog_ns);
parent = ns->parent;
proc_free_inum(ns->proc_inum);
kmem_cache_free(user_ns_cachep, ns);
--
1.8.2.2

2013-07-29 02:34:30

by Rui Xiang

[permalink] [raw]
Subject: [PATCH 7/9] syslog_ns: implement function for creating syslog ns

Add create_syslog_ns function to create a new ns. We
must create a user_ns before create a new syslog ns.
And then tie the new syslog_ns to current user_ns
instead of original syslog_ns which comes from
parent user_ns.

Add a new syslog flag SYSLOG_ACTION_NEW_NS to implement
a new command(11) of __NR_syslog system call. Through
that command, we can create a new syslog ns in user
space.

Signed-off-by: Rui Xiang <[email protected]>
---
include/linux/syslog.h | 2 ++
kernel/printk.c | 52 ++++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 54 insertions(+)

diff --git a/include/linux/syslog.h b/include/linux/syslog.h
index fbf0cb6..df57c21 100644
--- a/include/linux/syslog.h
+++ b/include/linux/syslog.h
@@ -46,6 +46,8 @@
#define SYSLOG_ACTION_SIZE_UNREAD 9
/* Return size of the log buffer */
#define SYSLOG_ACTION_SIZE_BUFFER 10
+/* Create a new syslog ns */
+#define SYSLOG_ACTION_NEW_NS 11

#define SYSLOG_FROM_READER 0
#define SYSLOG_FROM_PROC 1
diff --git a/kernel/printk.c b/kernel/printk.c
index fd2d600..6b561db 100644
--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -384,6 +384,10 @@ static int check_syslog_permissions(int type, bool from_file,
|| type == SYSLOG_ACTION_CONSOLE_LEVEL)
ns = &init_syslog_ns;

+ /* create a new syslog ns */
+ if (type == SYSLOG_ACTION_NEW_NS)
+ return 0;
+
if (syslog_action_restricted(type, ns)) {
if (ns_capable(ns->owner, CAP_SYSLOG))
return 0;
@@ -1131,6 +1135,51 @@ static int syslog_print_all(char __user *buf, int size, bool clear,
return len;
}

+static int create_syslog_ns(void)
+{
+ struct user_namespace *userns = current_user_ns();
+ struct syslog_namespace *oldns, *newns;
+ int err;
+
+ /*
+ * syslog ns belongs to a user ns. So you can only unshare your
+ * user_ns if you share a user_ns with your parent userns
+ */
+ if (userns == &init_user_ns ||
+ userns->syslog_ns != userns->parent->syslog_ns)
+ return -EINVAL;
+
+ if (!ns_capable(userns, CAP_SYSLOG))
+ return -EPERM;
+
+ err = -ENOMEM;
+ oldns = userns->syslog_ns;
+ newns = kzalloc(sizeof(*newns), GFP_ATOMIC);
+ if (!newns)
+ goto out;
+ newns->log_buf_len = __LOG_BUF_LEN;
+ newns->log_buf = kzalloc(newns->log_buf_len, GFP_ATOMIC);
+ if (!newns->log_buf)
+ goto out;
+
+ newns->owner = get_user_ns(userns);
+ raw_spin_lock_init(&(newns->logbuf_lock));
+ newns->logbuf_cpu = UINT_MAX;
+ newns->dmesg_restrict = oldns->dmesg_restrict;
+ put_syslog_ns(oldns);
+ kref_init(&newns->kref);
+ userns->syslog_ns = newns;
+ newns = NULL;
+
+ err = 0;
+out:
+ if (newns) {
+ kfree(newns->log_buf);
+ kfree(newns);
+ }
+ return err;
+}
+
int do_syslog(int type, char __user *buf, int len, bool from_file,
struct syslog_namespace *ns)
{
@@ -1254,6 +1303,9 @@ int do_syslog(int type, char __user *buf, int len, bool from_file,
case SYSLOG_ACTION_SIZE_BUFFER:
error = ns->log_buf_len;
break;
+ case SYSLOG_ACTION_NEW_NS:
+ error = create_syslog_ns();
+ break;
default:
error = -EINVAL;
break;
--
1.8.2.2

2013-07-29 02:34:51

by Rui Xiang

[permalink] [raw]
Subject: [PATCH 9/9] netfilter: use ns_printk in iptable context

To containerise iptables log, use ns_printk
to report individual logs to container as
getting syslog_ns from skb->dev->nd_net->user_ns.

Signed-off-by: Rui Xiang <[email protected]>
---
include/net/netfilter/xt_log.h | 6 +++++-
net/netfilter/xt_LOG.c | 4 ++--
2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/include/net/netfilter/xt_log.h b/include/net/netfilter/xt_log.h
index 9d9756c..5222cba 100644
--- a/include/net/netfilter/xt_log.h
+++ b/include/net/netfilter/xt_log.h
@@ -39,10 +39,14 @@ static struct sbuff *sb_open(void)
return m;
}

-static void sb_close(struct sbuff *m)
+static void sb_close(struct sbuff *m, struct sk_buff *skb)
{
m->buf[m->count] = 0;
+#ifdef CONFIG_NET_NS
+ ns_printk(skb->dev->nd_net->user_ns->syslog_ns, "%s\n", m->buf);
+#else
printk("%s\n", m->buf);
+#endif

if (likely(m != &emergency))
kfree(m);
diff --git a/net/netfilter/xt_LOG.c b/net/netfilter/xt_LOG.c
index 5ab2484..f2cd2fa3 100644
--- a/net/netfilter/xt_LOG.c
+++ b/net/netfilter/xt_LOG.c
@@ -493,7 +493,7 @@ ipt_log_packet(struct net *net,

dump_ipv4_packet(m, loginfo, skb, 0);

- sb_close(m);
+ sb_close(m, skb);
}

#if IS_ENABLED(CONFIG_IP6_NF_IPTABLES)
@@ -824,7 +824,7 @@ ip6t_log_packet(struct net *net,

dump_ipv6_packet(m, loginfo, skb, skb_network_offset(skb), 1);

- sb_close(m);
+ sb_close(m, skb);
}
#endif

--
1.8.2.2

2013-07-29 02:35:17

by Rui Xiang

[permalink] [raw]
Subject: [PATCH 5/9] syslog_ns: make permisiion check per user namespace

Use ns_capable to check capability in user ns,
instead of capable function. The user ns is the
owner of current syslog ns.

Signed-off-by: Rui Xiang <[email protected]>
---
kernel/printk.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/printk.c b/kernel/printk.c
index 846fef5..c5c65a8 100644
--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -380,13 +380,13 @@ static int check_syslog_permissions(int type, bool from_file,
return 0;

if (syslog_action_restricted(type, ns)) {
- if (capable(CAP_SYSLOG))
+ if (ns_capable(ns->owner, CAP_SYSLOG))
return 0;
/*
* For historical reasons, accept CAP_SYS_ADMIN too, with
* a warning.
*/
- if (capable(CAP_SYS_ADMIN)) {
+ if (ns_capable(ns->owner, CAP_SYS_ADMIN)) {
pr_warn_once("%s (%d): Attempt to access syslog with "
"CAP_SYS_ADMIN but no CAP_SYSLOG "
"(deprecated).\n",
--
1.8.2.2

2013-07-29 02:35:20

by Rui Xiang

[permalink] [raw]
Subject: [PATCH 6/9] syslog_ns: use init syslog_ns for console action

While flags SYSLOG_ACTION_CONSOLE_ON/OFF/LEVEL of
console actin are used in syslog syscall, the related
hanlding should be targeted at host by init_syslog_ns.

Signed-off-by: Rui Xiang <[email protected]>
---
kernel/printk.c | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/kernel/printk.c b/kernel/printk.c
index c5c65a8..fd2d600 100644
--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -379,6 +379,11 @@ static int check_syslog_permissions(int type, bool from_file,
if (from_file && type != SYSLOG_ACTION_OPEN)
return 0;

+ if (type == SYSLOG_ACTION_CONSOLE_OFF
+ || type == SYSLOG_ACTION_CONSOLE_ON
+ || type == SYSLOG_ACTION_CONSOLE_LEVEL)
+ ns = &init_syslog_ns;
+
if (syslog_action_restricted(type, ns)) {
if (ns_capable(ns->owner, CAP_SYSLOG))
return 0;
--
1.8.2.2

2013-07-29 02:36:13

by Rui Xiang

[permalink] [raw]
Subject: [PATCH 2/9] syslog_ns: add syslog_ns into user_namespace

Add a syslog_ns pointer to user_namespace, and make
syslog_ns per user_namespace, not global.

Since syslog_ns is assigned to user_ns, we can have
full capabilities in new user_ns to create a new syslog_ns.

Signed-off-by: Rui Xiang <[email protected]>
---
include/linux/syslog.h | 5 +++++
include/linux/user_namespace.h | 1 +
2 files changed, 6 insertions(+)

diff --git a/include/linux/syslog.h b/include/linux/syslog.h
index 425fafe..62ce47f 100644
--- a/include/linux/syslog.h
+++ b/include/linux/syslog.h
@@ -90,6 +90,11 @@ struct syslog_namespace {
size_t syslog_partial;

int dmesg_restrict;
+
+ /*
+ * user namespace which owns this syslog ns.
+ */
+ struct user_namespace *owner;
};

static inline struct syslog_namespace *get_syslog_ns(
diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
index b6b215f..ce2de5b 100644
--- a/include/linux/user_namespace.h
+++ b/include/linux/user_namespace.h
@@ -28,6 +28,7 @@ struct user_namespace {
unsigned int proc_inum;
bool may_mount_sysfs;
bool may_mount_proc;
+ struct syslog_namespace *syslog_ns;
};

extern struct user_namespace init_user_ns;
--
1.8.2.2

2013-07-29 09:37:15

by Gu Zheng

[permalink] [raw]
Subject: Re: [PATCH 0/9] Add namespace support for syslog v2

Hi Rui,

On 07/29/2013 10:31 AM, Rui Xiang wrote:

> This patchset introduces a system log namespace.
>
> It is the 2nd version. The link of the 1st version is
> http://lwn.net/Articles/525728/. In that version, syslog_
> namespace was added into nsproxy and created through a new
> clone flag CLONE_SYSLOG when cloning a process.
>
> There were some discussion in last November about the 1st
> version. This version used these important advice, and
> referred to Serge's patch(http://lwn.net/Articles/525629/).
>
> Unlike the 1st version, in this patchset, syslog namespace
> is tied to a user namespace. Add we must create a new user
> ns before create a new syslog ns, because that will make
> users have full capabilities in this new userns after
> cloning a new user ns. The syslog namespace can be created
> through a new command(11) to __NR_syslog syscall. That owe
> to a new syslog flag SYSLOG_ACTION_NEW_NS.
>
> In syslog_namespace, some necessary identifiers for handling
> syslog buf are containerized. When one container creates a
> new syslog ns, individual buf will be allocated to store log
> ownned this container.
>
> A new interface ns_printk is added to print the logs which
> we want to see in the container. Through ns_printk, we can
> get more logs related to a specific net ns, for instance,
> iptables. Here we use it to report iptable logs per
> contianer.
>
> Then default printk targeted at the init_syslog_ns will
> continue to print out most kernel log to host.
>
> One task in a new syslog ns could affect only current
> container through "dmesg", "dmesg -c" and /dev/kmsg
> actions. The read/write interface such as /dev/kmsg,
> /pro/kmsg and syslog syscall continue to be useful for
> container users.
>
> This patchset is based on linus' linux tree.

Changelog details between V2 and V1 is seriously needed, the inline description
is not easy reading for other guys.

>
> Rui Xiang (9):
> syslog_ns: add syslog_namespace and put/get_syslog_ns
> syslog_ns: add syslog_ns into user_namespace
> syslog_ns: add init syslog_ns for global syslog
> syslog_ns: make syslog handling per namespace
> syslog_ns: make permisiion check per user namespace
> syslog_ns: use init syslog_ns for console action
> syslog_ns: implement function for creating syslog ns
> syslog_ns: implement ns_printk for specific syslog_ns
> netfilter: use ns_printk in iptable context
>
> fs/proc/kmsg.c | 17 +-
> include/linux/printk.h | 5 +-
> include/linux/syslog.h | 79 ++++-
> include/linux/user_namespace.h | 2 +
> include/net/netfilter/xt_log.h | 6 +-
> kernel/printk.c | 642 ++++++++++++++++++++++++-----------------
> kernel/sysctl.c | 3 +-
> kernel/user.c | 3 +
> kernel/user_namespace.c | 4 +
> net/netfilter/xt_LOG.c | 4 +-
> 10 files changed, 493 insertions(+), 272 deletions(-)
>

2013-07-29 09:44:15

by Gu Zheng

[permalink] [raw]
Subject: Re: [PATCH 1/9] syslog_ns: add syslog_namespace and put/get_syslog_ns

Hi Rui,
Refer to inline:).

On 07/29/2013 10:31 AM, Rui Xiang wrote:

> Add a struct syslog_namespace which contains the necessary
> members for hanlding syslog and realize get_syslog_ns and
> put_syslog_ns API.
>
> Signed-off-by: Rui Xiang <[email protected]>
> ---
> include/linux/syslog.h | 68 ++++++++++++++++++++++++++++++++++++++++++++++++++
> kernel/printk.c | 7 ------
> 2 files changed, 68 insertions(+), 7 deletions(-)
>
> diff --git a/include/linux/syslog.h b/include/linux/syslog.h
> index 98a3153..425fafe 100644
> --- a/include/linux/syslog.h
> +++ b/include/linux/syslog.h
> @@ -21,6 +21,9 @@
> #ifndef _LINUX_SYSLOG_H
> #define _LINUX_SYSLOG_H
>
> +#include <linux/slab.h>
> +#include <linux/kref.h>
> +
> /* Close the log. Currently a NOP. */
> #define SYSLOG_ACTION_CLOSE 0
> /* Open the log. Currently a NOP. */
> @@ -47,6 +50,71 @@
> #define SYSLOG_FROM_READER 0
> #define SYSLOG_FROM_PROC 1
>
> +enum log_flags {
> + LOG_NOCONS = 1, /* already flushed, do not print to console */
> + LOG_NEWLINE = 2, /* text ended with a newline */
> + LOG_PREFIX = 4, /* text started with a prefix */
> + LOG_CONT = 8, /* text is a fragment of a continuation line */
> +};
> +
> +struct syslog_namespace {
> + struct kref kref; /* syslog_ns reference count & control */
> +
> + raw_spinlock_t logbuf_lock; /* access conflict locker */
> + /* cpu currently holding logbuf_lock of ns */
> + unsigned int logbuf_cpu;
> +
> + /* index and sequence number of the first record stored in the buffer */
> + u64 log_first_seq;
> + u32 log_first_idx;
> +
> + /* index and sequence number of the next record stored in the buffer */
> + u64 log_next_seq;
> + u32 log_next_idx;
> +
> + /* the next printk record to read after the last 'clear' command */
> + u64 clear_seq;
> + u32 clear_idx;
> +
> + char *log_buf;
> + u32 log_buf_len;
> +
> + /* the next printk record to write to the console */
> + u64 console_seq;
> + u32 console_idx;
> +
> + /* the next printk record to read by syslog(READ) or /proc/kmsg */
> + u64 syslog_seq;
> + u32 syslog_idx;
> + enum log_flags syslog_prev;
> + size_t syslog_partial;
> +
> + int dmesg_restrict;
> +};
> +
> +static inline struct syslog_namespace *get_syslog_ns(
> + struct syslog_namespace *ns)
> +{
> + if (ns)
> + kref_get(&ns->kref);
> + return ns;
> +}
> +
> +static inline void free_syslog_ns(struct kref *kref)
> +{
> + struct syslog_namespace *ns;
> + ns = container_of(kref, struct syslog_namespace, kref);
> +
> + kfree(ns->log_buf);
> + kfree(ns);
> +}

This interface seems a bit ugly, why not use the format like put_syslog_ns()?

static inline void free_syslog_ns(struct syslog_namespace *ns)

> +
> +static inline void put_syslog_ns(struct syslog_namespace *ns)
> +{
> + if (ns)
> + kref_put(&ns->kref, free_syslog_ns);
> +}
> +
> int do_syslog(int type, char __user *buf, int count, bool from_file);
>
> #endif /* _LINUX_SYSLOG_H */
> diff --git a/kernel/printk.c b/kernel/printk.c
> index d37d45c..7e544bf 100644
> --- a/kernel/printk.c
> +++ b/kernel/printk.c
> @@ -193,13 +193,6 @@ static int console_may_schedule;
> * separated by ',', and find the message after the ';' character.
> */
>
> -enum log_flags {
> - LOG_NOCONS = 1, /* already flushed, do not print to console */
> - LOG_NEWLINE = 2, /* text ended with a newline */
> - LOG_PREFIX = 4, /* text started with a prefix */
> - LOG_CONT = 8, /* text is a fragment of a continuation line */
> -};
> -
> struct log {
> u64 ts_nsec; /* timestamp in nanoseconds */
> u16 len; /* length of entire record */

2013-07-29 09:47:31

by Gao feng

[permalink] [raw]
Subject: Re: [PATCH 9/9] netfilter: use ns_printk in iptable context

On 07/29/2013 10:31 AM, Rui Xiang wrote:
> To containerise iptables log, use ns_printk
> to report individual logs to container as
> getting syslog_ns from skb->dev->nd_net->user_ns.
>
> Signed-off-by: Rui Xiang <[email protected]>
> ---
> include/net/netfilter/xt_log.h | 6 +++++-
> net/netfilter/xt_LOG.c | 4 ++--
> 2 files changed, 7 insertions(+), 3 deletions(-)
>
> diff --git a/include/net/netfilter/xt_log.h b/include/net/netfilter/xt_log.h
> index 9d9756c..5222cba 100644
> --- a/include/net/netfilter/xt_log.h
> +++ b/include/net/netfilter/xt_log.h
> @@ -39,10 +39,14 @@ static struct sbuff *sb_open(void)
> return m;
> }
>
> -static void sb_close(struct sbuff *m)
> +static void sb_close(struct sbuff *m, struct sk_buff *skb)
> {
> m->buf[m->count] = 0;
> +#ifdef CONFIG_NET_NS
> + ns_printk(skb->dev->nd_net->user_ns->syslog_ns, "%s\n", m->buf);
> +#else
> printk("%s\n", m->buf);
> +#endif
>
> if (likely(m != &emergency))
> kfree(m);
> diff --git a/net/netfilter/xt_LOG.c b/net/netfilter/xt_LOG.c
> index 5ab2484..f2cd2fa3 100644
> --- a/net/netfilter/xt_LOG.c
> +++ b/net/netfilter/xt_LOG.c
> @@ -493,7 +493,7 @@ ipt_log_packet(struct net *net,
>
> dump_ipv4_packet(m, loginfo, skb, 0);
>
> - sb_close(m);
> + sb_close(m, skb);


why don't you pass net directly to sb_close here?

un init net namespace will not trigger any system log through ipt_LOG/ip6t_LOG.
You can check the FIXME in ipt_log_packet.

BTW,for this patch,you should cc [email protected] too.

2013-07-29 09:50:23

by Gu Zheng

[permalink] [raw]
Subject: Re: [PATCH 2/9] syslog_ns: add syslog_ns into user_namespace

Hi Rui,

On 07/29/2013 10:31 AM, Rui Xiang wrote:

> Add a syslog_ns pointer to user_namespace, and make
> syslog_ns per user_namespace, not global.
>
> Since syslog_ns is assigned to user_ns, we can have
> full capabilities in new user_ns to create a new syslog_ns.
>
> Signed-off-by: Rui Xiang <[email protected]>
> ---
> include/linux/syslog.h | 5 +++++
> include/linux/user_namespace.h | 1 +
> 2 files changed, 6 insertions(+)
>
> diff --git a/include/linux/syslog.h b/include/linux/syslog.h
> index 425fafe..62ce47f 100644
> --- a/include/linux/syslog.h
> +++ b/include/linux/syslog.h
> @@ -90,6 +90,11 @@ struct syslog_namespace {
> size_t syslog_partial;
>
> int dmesg_restrict;
> +
> + /*
> + * user namespace which owns this syslog ns.
> + */
> + struct user_namespace *owner;
> };
>
> static inline struct syslog_namespace *get_syslog_ns(
> diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
> index b6b215f..ce2de5b 100644
> --- a/include/linux/user_namespace.h
> +++ b/include/linux/user_namespace.h
> @@ -28,6 +28,7 @@ struct user_namespace {
> unsigned int proc_inum;
> bool may_mount_sysfs;
> bool may_mount_proc;
> + struct syslog_namespace *syslog_ns;

As we add a syslog_ns pointer to user_namespace to make
syslog_ns per user_namespace and the caps check.
But why also add a point to syslog_namespace in
user_namespace? Am I missing something?:)

Thanks,
Gu

> };
>
> extern struct user_namespace init_user_ns;

2013-07-29 09:53:04

by Gao feng

[permalink] [raw]
Subject: Re: [PATCH 2/9] syslog_ns: add syslog_ns into user_namespace

On 07/29/2013 05:46 PM, Gu Zheng wrote:
> Hi Rui,
>
> On 07/29/2013 10:31 AM, Rui Xiang wrote:
>
>> Add a syslog_ns pointer to user_namespace, and make
>> syslog_ns per user_namespace, not global.
>>
>> Since syslog_ns is assigned to user_ns, we can have
>> full capabilities in new user_ns to create a new syslog_ns.
>>
>> Signed-off-by: Rui Xiang <[email protected]>
>> ---
>> include/linux/syslog.h | 5 +++++
>> include/linux/user_namespace.h | 1 +
>> 2 files changed, 6 insertions(+)
>>
>> diff --git a/include/linux/syslog.h b/include/linux/syslog.h
>> index 425fafe..62ce47f 100644
>> --- a/include/linux/syslog.h
>> +++ b/include/linux/syslog.h
>> @@ -90,6 +90,11 @@ struct syslog_namespace {
>> size_t syslog_partial;
>>
>> int dmesg_restrict;
>> +
>> + /*
>> + * user namespace which owns this syslog ns.
>> + */
>> + struct user_namespace *owner;
>> };
>>
>> static inline struct syslog_namespace *get_syslog_ns(
>> diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
>> index b6b215f..ce2de5b 100644
>> --- a/include/linux/user_namespace.h
>> +++ b/include/linux/user_namespace.h
>> @@ -28,6 +28,7 @@ struct user_namespace {
>> unsigned int proc_inum;
>> bool may_mount_sysfs;
>> bool may_mount_proc;
>> + struct syslog_namespace *syslog_ns;
>
> As we add a syslog_ns pointer to user_namespace to make
> syslog_ns per user_namespace and the caps check.
> But why also add a point to syslog_namespace in
> user_namespace? Am I missing something?:)
>

yep,with this we can make sure all the other types of namespace such as mount, net, pid
can access syslog_ns through user namespace.

2013-07-29 09:54:03

by Gu Zheng

[permalink] [raw]
Subject: Re: [PATCH 4/9] syslog_ns: make syslog handling per namespace

Hi Rui,

On 07/29/2013 10:31 AM, Rui Xiang wrote:

> This patch makes syslog buf and other fields per
> namespace.
>
> Here use ns->log_buf(log_buf_len, logbuf_lock,
> log_first_seq, logbuf_lock, and so on) fields
> instead of global ones to handle syslog.
>
> Syslog interfaces such as /dev/kmsg, /proc/kmsg,
> and syslog syscall are all containerized for
> container users.
>
> Signed-off-by: Rui Xiang <[email protected]>
> ---
> fs/proc/kmsg.c | 17 +-
> include/linux/printk.h | 1 -
> include/linux/syslog.h | 3 +-
> kernel/printk.c | 507 +++++++++++++++++++++++++------------------------
> kernel/sysctl.c | 3 +-
> 5 files changed, 273 insertions(+), 258 deletions(-)
>
> diff --git a/fs/proc/kmsg.c b/fs/proc/kmsg.c
> index bdfabda..cb98431 100644
> --- a/fs/proc/kmsg.c
> +++ b/fs/proc/kmsg.c
> @@ -13,6 +13,8 @@
> #include <linux/proc_fs.h>
> #include <linux/fs.h>
> #include <linux/syslog.h>
> +#include <linux/cred.h>
> +#include <linux/user_namespace.h>
>
> #include <asm/uaccess.h>
> #include <asm/io.h>
> @@ -21,12 +23,14 @@ extern wait_queue_head_t log_wait;
>
> static int kmsg_open(struct inode * inode, struct file * file)
> {
> - return do_syslog(SYSLOG_ACTION_OPEN, NULL, 0, SYSLOG_FROM_PROC);
> + return do_syslog(SYSLOG_ACTION_OPEN, NULL, 0, SYSLOG_FROM_PROC,
> + file->f_cred->user_ns->syslog_ns);

How about adding a help function to get the syslog_ns that file belongs to?


> }
>
> static int kmsg_release(struct inode * inode, struct file * file)
> {
> - (void) do_syslog(SYSLOG_ACTION_CLOSE, NULL, 0, SYSLOG_FROM_PROC);
> + (void) do_syslog(SYSLOG_ACTION_CLOSE, NULL, 0, SYSLOG_FROM_PROC,
> + file->f_cred->user_ns->syslog_ns);
> return 0;
> }
>
> @@ -34,15 +38,18 @@ static ssize_t kmsg_read(struct file *file, char __user *buf,
> size_t count, loff_t *ppos)
> {
> if ((file->f_flags & O_NONBLOCK) &&
> - !do_syslog(SYSLOG_ACTION_SIZE_UNREAD, NULL, 0, SYSLOG_FROM_PROC))
> + !do_syslog(SYSLOG_ACTION_SIZE_UNREAD, NULL, 0, SYSLOG_FROM_PROC,
> + file->f_cred->user_ns->syslog_ns))
> return -EAGAIN;
> - return do_syslog(SYSLOG_ACTION_READ, buf, count, SYSLOG_FROM_PROC);
> + return do_syslog(SYSLOG_ACTION_READ, buf, count, SYSLOG_FROM_PROC,
> + file->f_cred->user_ns->syslog_ns);
> }
>
> static unsigned int kmsg_poll(struct file *file, poll_table *wait)
> {
> poll_wait(file, &log_wait, wait);
> - if (do_syslog(SYSLOG_ACTION_SIZE_UNREAD, NULL, 0, SYSLOG_FROM_PROC))
> + if (do_syslog(SYSLOG_ACTION_SIZE_UNREAD, NULL, 0, SYSLOG_FROM_PROC,
> + file->f_cred->user_ns->syslog_ns))
> return POLLIN | POLLRDNORM;
> return 0;
> }
> diff --git a/include/linux/printk.h b/include/linux/printk.h
> index 22c7052..29e3f85 100644
> --- a/include/linux/printk.h
> +++ b/include/linux/printk.h
> @@ -139,7 +139,6 @@ extern bool printk_timed_ratelimit(unsigned long *caller_jiffies,
> unsigned int interval_msec);
>
> extern int printk_delay_msec;
> -extern int dmesg_restrict;
> extern int kptr_restrict;
>
> extern void wake_up_klogd(void);
> diff --git a/include/linux/syslog.h b/include/linux/syslog.h
> index 363bc56..fbf0cb6 100644
> --- a/include/linux/syslog.h
> +++ b/include/linux/syslog.h
> @@ -120,7 +120,8 @@ static inline void put_syslog_ns(struct syslog_namespace *ns)
> kref_put(&ns->kref, free_syslog_ns);
> }
>
> -int do_syslog(int type, char __user *buf, int count, bool from_file);
> +int do_syslog(int type, char __user *buf, int count, bool from_file,
> + struct syslog_namespace *ns);
>
> extern struct syslog_namespace init_syslog_ns;
> #endif /* _LINUX_SYSLOG_H */
> diff --git a/kernel/printk.c b/kernel/printk.c
> index fd83ec1..846fef5 100644
> --- a/kernel/printk.c
> +++ b/kernel/printk.c
> @@ -213,29 +213,8 @@ static DEFINE_RAW_SPINLOCK(logbuf_lock);
>
> #ifdef CONFIG_PRINTK
> DECLARE_WAIT_QUEUE_HEAD(log_wait);
> -/* the next printk record to read by syslog(READ) or /proc/kmsg */
> -static u64 syslog_seq;
> -static u32 syslog_idx;
> -static enum log_flags syslog_prev;
> -static size_t syslog_partial;
> -
> -/* index and sequence number of the first record stored in the buffer */
> -static u64 log_first_seq;
> -static u32 log_first_idx;
> -
> -/* index and sequence number of the next record to store in the buffer */
> -static u64 log_next_seq;
> -static u32 log_next_idx;
> -
> -/* the next printk record to write to the console */
> -static u64 console_seq;
> -static u32 console_idx;
> static enum log_flags console_prev;
>
> -/* the next printk record to read after the last 'clear' command */
> -static u64 clear_seq;
> -static u32 clear_idx;
> -
> #define PREFIX_MAX 32
> #define LOG_LINE_MAX 1024 - PREFIX_MAX
>
> @@ -246,12 +225,8 @@ static u32 clear_idx;
> #define LOG_ALIGN __alignof__(struct log)
> #endif
> #define __LOG_BUF_LEN (1 << CONFIG_LOG_BUF_SHIFT)
> +/* this buf only for init_syslog_ns */
> static char __log_buf[__LOG_BUF_LEN] __aligned(LOG_ALIGN);
> -static char *log_buf = __log_buf;
> -static u32 log_buf_len = __LOG_BUF_LEN;
> -
> -/* cpu currently holding logbuf_lock */
> -static volatile unsigned int logbuf_cpu = UINT_MAX;
>
> struct syslog_namespace init_syslog_ns = {
> .kref = {
> @@ -282,23 +257,23 @@ static char *log_dict(const struct log *msg)
> }
>
> /* get record by index; idx must point to valid msg */
> -static struct log *log_from_idx(u32 idx)
> +static struct log *log_from_idx(u32 idx, struct syslog_namespace *ns)
> {
> - struct log *msg = (struct log *)(log_buf + idx);
> + struct log *msg = (struct log *)(ns->log_buf + idx);
>
> /*
> * A length == 0 record is the end of buffer marker. Wrap around and
> * read the message at the start of the buffer.
> */
> if (!msg->len)
> - return (struct log *)log_buf;
> + return (struct log *)ns->log_buf;
> return msg;
> }
>
> /* get next record; idx must point to valid msg */
> -static u32 log_next(u32 idx)
> +static u32 log_next(u32 idx, struct syslog_namespace *ns)
> {
> - struct log *msg = (struct log *)(log_buf + idx);
> + struct log *msg = (struct log *)(ns->log_buf + idx);
>
> /* length == 0 indicates the end of the buffer; wrap */
> /*
> @@ -307,7 +282,7 @@ static u32 log_next(u32 idx)
> * return the one after that.
> */
> if (!msg->len) {
> - msg = (struct log *)log_buf;
> + msg = (struct log *)ns->log_buf;
> return msg->len;
> }
> return idx + msg->len;
> @@ -317,7 +292,8 @@ static u32 log_next(u32 idx)
> static void log_store(int facility, int level,
> enum log_flags flags, u64 ts_nsec,
> const char *dict, u16 dict_len,
> - const char *text, u16 text_len)
> + const char *text, u16 text_len,
> + struct syslog_namespace *ns)
> {
> struct log *msg;
> u32 size, pad_len;
> @@ -327,34 +303,40 @@ static void log_store(int facility, int level,
> pad_len = (-size) & (LOG_ALIGN - 1);
> size += pad_len;
>
> - while (log_first_seq < log_next_seq) {
> + while (ns->log_first_seq < ns->log_next_seq) {
> u32 free;
>
> - if (log_next_idx > log_first_idx)
> - free = max(log_buf_len - log_next_idx, log_first_idx);
> + if (ns->log_next_idx > ns->log_first_idx)
> + free = max(ns->log_buf_len -
> + ns->log_next_idx,
> + ns->log_first_idx);
> else
> - free = log_first_idx - log_next_idx;
> + free = ns->log_first_idx -
> + ns->log_next_idx;
>
> if (free > size + sizeof(struct log))
> break;
>
> /* drop old messages until we have enough contiuous space */
> - log_first_idx = log_next(log_first_idx);
> - log_first_seq++;
> + ns->log_first_idx =
> + log_next(ns->log_first_idx, ns);
> + ns->log_first_seq++;
> }
>
> - if (log_next_idx + size + sizeof(struct log) >= log_buf_len) {
> + if (ns->log_next_idx + size + sizeof(struct log) >=
> + ns->log_buf_len) {
> /*
> * This message + an additional empty header does not fit
> * at the end of the buffer. Add an empty header with len == 0
> * to signify a wrap around.
> */
> - memset(log_buf + log_next_idx, 0, sizeof(struct log));
> - log_next_idx = 0;
> + memset(ns->log_buf + ns->log_next_idx,
> + 0, sizeof(struct log));
> + ns->log_next_idx = 0;
> }
>
> /* fill message */
> - msg = (struct log *)(log_buf + log_next_idx);
> + msg = (struct log *)(ns->log_buf + ns->log_next_idx);
> memcpy(log_text(msg), text, text_len);
> msg->text_len = text_len;
> memcpy(log_dict(msg), dict, dict_len);
> @@ -370,19 +352,14 @@ static void log_store(int facility, int level,
> msg->len = sizeof(struct log) + text_len + dict_len + pad_len;
>
> /* insert message */
> - log_next_idx += msg->len;
> - log_next_seq++;
> + ns->log_next_idx += msg->len;
> + ns->log_next_seq++;
> }
>
> -#ifdef CONFIG_SECURITY_DMESG_RESTRICT
> -int dmesg_restrict = 1;
> -#else
> -int dmesg_restrict;
> -#endif
> -
> -static int syslog_action_restricted(int type)
> +static int syslog_action_restricted(int type,
> + struct syslog_namespace *ns)
> {
> - if (dmesg_restrict)
> + if (ns->dmesg_restrict)
> return 1;
> /*
> * Unless restricted, we allow "read all" and "get buffer size"
> @@ -392,7 +369,8 @@ static int syslog_action_restricted(int type)
> type != SYSLOG_ACTION_SIZE_BUFFER;
> }
>
> -static int check_syslog_permissions(int type, bool from_file)
> +static int check_syslog_permissions(int type, bool from_file,
> + struct syslog_namespace *ns)
> {
> /*
> * If this is from /proc/kmsg and we've already opened it, then we've
> @@ -401,7 +379,7 @@ static int check_syslog_permissions(int type, bool from_file)
> if (from_file && type != SYSLOG_ACTION_OPEN)
> return 0;
>
> - if (syslog_action_restricted(type)) {
> + if (syslog_action_restricted(type, ns)) {
> if (capable(CAP_SYSLOG))
> return 0;
> /*
> @@ -496,6 +474,8 @@ static ssize_t devkmsg_read(struct file *file, char __user *buf,
> char cont = '-';
> size_t len;
> ssize_t ret;
> + struct syslog_namespace *ns =
> + file->f_cred->user_ns->syslog_ns;
>
> if (!user)
> return -EBADF;
> @@ -503,32 +483,32 @@ static ssize_t devkmsg_read(struct file *file, char __user *buf,
> ret = mutex_lock_interruptible(&user->lock);
> if (ret)
> return ret;
> - raw_spin_lock_irq(&logbuf_lock);
> - while (user->seq == log_next_seq) {
> + raw_spin_lock_irq(&ns->logbuf_lock);
> + while (user->seq == ns->log_next_seq) {
> if (file->f_flags & O_NONBLOCK) {
> ret = -EAGAIN;
> - raw_spin_unlock_irq(&logbuf_lock);
> + raw_spin_unlock_irq(&ns->logbuf_lock);
> goto out;
> }
>
> - raw_spin_unlock_irq(&logbuf_lock);
> + raw_spin_unlock_irq(&ns->logbuf_lock);
> ret = wait_event_interruptible(log_wait,
> - user->seq != log_next_seq);
> + user->seq != ns->log_next_seq);
> if (ret)
> goto out;
> - raw_spin_lock_irq(&logbuf_lock);
> + raw_spin_lock_irq(&ns->logbuf_lock);
> }
>
> - if (user->seq < log_first_seq) {
> + if (user->seq < ns->log_first_seq) {
> /* our last seen message is gone, return error and reset */
> - user->idx = log_first_idx;
> - user->seq = log_first_seq;
> + user->idx = ns->log_first_idx;
> + user->seq = ns->log_first_seq;
> ret = -EPIPE;
> - raw_spin_unlock_irq(&logbuf_lock);
> + raw_spin_unlock_irq(&ns->logbuf_lock);
> goto out;
> }
>
> - msg = log_from_idx(user->idx);
> + msg = log_from_idx(user->idx, ns);
> ts_usec = msg->ts_nsec;
> do_div(ts_usec, 1000);
>
> @@ -589,9 +569,9 @@ static ssize_t devkmsg_read(struct file *file, char __user *buf,
> user->buf[len++] = '\n';
> }
>
> - user->idx = log_next(user->idx);
> + user->idx = log_next(user->idx, ns);
> user->seq++;
> - raw_spin_unlock_irq(&logbuf_lock);
> + raw_spin_unlock_irq(&ns->logbuf_lock);
>
> if (len > count) {
> ret = -EINVAL;
> @@ -612,18 +592,19 @@ static loff_t devkmsg_llseek(struct file *file, loff_t offset, int whence)
> {
> struct devkmsg_user *user = file->private_data;
> loff_t ret = 0;
> + struct syslog_namespace *ns = file->f_cred->user_ns->syslog_ns;
>
> if (!user)
> return -EBADF;
> if (offset)
> return -ESPIPE;
>
> - raw_spin_lock_irq(&logbuf_lock);
> + raw_spin_lock_irq(&ns->logbuf_lock);
> switch (whence) {
> case SEEK_SET:
> /* the first record */
> - user->idx = log_first_idx;
> - user->seq = log_first_seq;
> + user->idx = ns->log_first_idx;
> + user->seq = ns->log_first_seq;
> break;
> case SEEK_DATA:
> /*
> @@ -631,18 +612,18 @@ static loff_t devkmsg_llseek(struct file *file, loff_t offset, int whence)
> * like issued by 'dmesg -c'. Reading /dev/kmsg itself
> * changes no global state, and does not clear anything.
> */
> - user->idx = clear_idx;
> - user->seq = clear_seq;
> + user->idx = ns->clear_idx;
> + user->seq = ns->clear_seq;
> break;
> case SEEK_END:
> /* after the last record */
> - user->idx = log_next_idx;
> - user->seq = log_next_seq;
> + user->idx = ns->log_next_idx;
> + user->seq = ns->log_next_seq;
> break;
> default:
> ret = -EINVAL;
> }
> - raw_spin_unlock_irq(&logbuf_lock);
> + raw_spin_unlock_irq(&ns->logbuf_lock);
> return ret;
> }
>
> @@ -650,21 +631,22 @@ static unsigned int devkmsg_poll(struct file *file, poll_table *wait)
> {
> struct devkmsg_user *user = file->private_data;
> int ret = 0;
> + struct syslog_namespace *ns = file->f_cred->user_ns->syslog_ns;
>
> if (!user)
> return POLLERR|POLLNVAL;
>
> poll_wait(file, &log_wait, wait);
>
> - raw_spin_lock_irq(&logbuf_lock);
> - if (user->seq < log_next_seq) {
> + raw_spin_lock_irq(&ns->logbuf_lock);
> + if (user->seq < ns->log_next_seq) {
> /* return error when data has vanished underneath us */
> - if (user->seq < log_first_seq)
> + if (user->seq < ns->log_first_seq)
> ret = POLLIN|POLLRDNORM|POLLERR|POLLPRI;
> else
> ret = POLLIN|POLLRDNORM;
> }
> - raw_spin_unlock_irq(&logbuf_lock);
> + raw_spin_unlock_irq(&ns->logbuf_lock);
>
> return ret;
> }
> @@ -673,13 +655,14 @@ static int devkmsg_open(struct inode *inode, struct file *file)
> {
> struct devkmsg_user *user;
> int err;
> + struct syslog_namespace *ns = file->f_cred->user_ns->syslog_ns;
>
> /* write-only does not need any file context */
> if ((file->f_flags & O_ACCMODE) == O_WRONLY)
> return 0;
>
> err = check_syslog_permissions(SYSLOG_ACTION_READ_ALL,
> - SYSLOG_FROM_READER);
> + SYSLOG_FROM_READER, ns);
> if (err)
> return err;
>
> @@ -689,10 +672,10 @@ static int devkmsg_open(struct inode *inode, struct file *file)
>
> mutex_init(&user->lock);
>
> - raw_spin_lock_irq(&logbuf_lock);
> - user->idx = log_first_idx;
> - user->seq = log_first_seq;
> - raw_spin_unlock_irq(&logbuf_lock);
> + raw_spin_lock_irq(&ns->logbuf_lock);
> + user->idx = ns->log_first_idx;
> + user->seq = ns->log_first_seq;
> + raw_spin_unlock_irq(&ns->logbuf_lock);
>
> file->private_data = user;
> return 0;
> @@ -730,10 +713,11 @@ const struct file_operations kmsg_fops = {
> */
> void log_buf_kexec_setup(void)
> {
> - VMCOREINFO_SYMBOL(log_buf);
> - VMCOREINFO_SYMBOL(log_buf_len);
> - VMCOREINFO_SYMBOL(log_first_idx);
> - VMCOREINFO_SYMBOL(log_next_idx);
> + struct syslog_namespace *ns = &init_syslog_ns;
> + VMCOREINFO_SYMBOL(ns->log_buf);
> + VMCOREINFO_SYMBOL(ns->log_buf_len);
> + VMCOREINFO_SYMBOL(ns->log_first_idx);
> + VMCOREINFO_SYMBOL(ns->log_next_idx);
> /*
> * Export struct log size and field offsets. User space tools can
> * parse it and detect any changes to structure down the line.
> @@ -753,10 +737,11 @@ static unsigned long __initdata new_log_buf_len;
> static int __init log_buf_len_setup(char *str)
> {
> unsigned size = memparse(str, &str);
> + struct syslog_namespace *ns = &init_syslog_ns;
>
> if (size)
> size = roundup_pow_of_two(size);
> - if (size > log_buf_len)
> + if (size > ns->log_buf_len)
> new_log_buf_len = size;
>
> return 0;
> @@ -768,6 +753,7 @@ void __init setup_log_buf(int early)
> unsigned long flags;
> char *new_log_buf;
> int free;
> + struct syslog_namespace *ns = &init_syslog_ns;
>
> if (!new_log_buf_len)
> return;
> @@ -789,15 +775,15 @@ void __init setup_log_buf(int early)
> return;
> }
>
> - raw_spin_lock_irqsave(&logbuf_lock, flags);
> - log_buf_len = new_log_buf_len;
> - log_buf = new_log_buf;
> + raw_spin_lock_irqsave(&ns->logbuf_lock, flags);
> + memcpy(new_log_buf, ns->log_buf, __LOG_BUF_LEN);
> + ns->log_buf_len = new_log_buf_len;
> + ns->log_buf = new_log_buf;
> new_log_buf_len = 0;
> - free = __LOG_BUF_LEN - log_next_idx;
> - memcpy(log_buf, __log_buf, __LOG_BUF_LEN);
> - raw_spin_unlock_irqrestore(&logbuf_lock, flags);
> + free = __LOG_BUF_LEN - ns->log_next_idx;
> + raw_spin_unlock_irqrestore(&ns->logbuf_lock, flags);
>
> - pr_info("log_buf_len: %d\n", log_buf_len);
> + pr_info("log_buf_len: %d\n", ns->log_buf_len);
> pr_info("early log buf free: %d(%d%%)\n",
> free, (free * 100) / __LOG_BUF_LEN);
> }
> @@ -977,7 +963,8 @@ static size_t msg_print_text(const struct log *msg, enum log_flags prev,
> return len;
> }
>
> -static int syslog_print(char __user *buf, int size)
> +static int syslog_print(char __user *buf, int size,
> + struct syslog_namespace *ns)
> {
> char *text;
> struct log *msg;
> @@ -991,37 +978,38 @@ static int syslog_print(char __user *buf, int size)
> size_t n;
> size_t skip;
>
> - raw_spin_lock_irq(&logbuf_lock);
> - if (syslog_seq < log_first_seq) {
> + raw_spin_lock_irq(&ns->logbuf_lock);
> + if (ns->syslog_seq < ns->log_first_seq) {
> /* messages are gone, move to first one */
> - syslog_seq = log_first_seq;
> - syslog_idx = log_first_idx;
> - syslog_prev = 0;
> - syslog_partial = 0;
> + ns->syslog_seq = ns->log_first_seq;
> + ns->syslog_idx = ns->log_first_idx;
> + ns->syslog_prev = 0;
> + ns->syslog_partial = 0;
> }
> - if (syslog_seq == log_next_seq) {
> - raw_spin_unlock_irq(&logbuf_lock);
> + if (ns->syslog_seq == ns->log_next_seq) {
> + raw_spin_unlock_irq(&ns->logbuf_lock);
> break;
> }
>
> - skip = syslog_partial;
> - msg = log_from_idx(syslog_idx);
> - n = msg_print_text(msg, syslog_prev, true, text,
> + skip = ns->syslog_partial;
> + msg = log_from_idx(ns->syslog_idx, ns);
> + n = msg_print_text(msg, ns->syslog_prev, true, text,
> LOG_LINE_MAX + PREFIX_MAX);
> - if (n - syslog_partial <= size) {
> + if (n - ns->syslog_partial <= size) {
> /* message fits into buffer, move forward */
> - syslog_idx = log_next(syslog_idx);
> - syslog_seq++;
> - syslog_prev = msg->flags;
> - n -= syslog_partial;
> - syslog_partial = 0;
> + ns->syslog_idx =
> + log_next(ns->syslog_idx, ns);
> + ns->syslog_seq++;
> + ns->syslog_prev = msg->flags;
> + n -= ns->syslog_partial;
> + ns->syslog_partial = 0;
> } else if (!len){
> /* partial read(), remember position */
> n = size;
> - syslog_partial += n;
> + ns->syslog_partial += n;
> } else
> n = 0;
> - raw_spin_unlock_irq(&logbuf_lock);
> + raw_spin_unlock_irq(&ns->logbuf_lock);
>
> if (!n)
> break;
> @@ -1041,7 +1029,8 @@ static int syslog_print(char __user *buf, int size)
> return len;
> }
>
> -static int syslog_print_all(char __user *buf, int size, bool clear)
> +static int syslog_print_all(char __user *buf, int size, bool clear,
> + struct syslog_namespace *ns)
> {
> char *text;
> int len = 0;
> @@ -1050,55 +1039,55 @@ static int syslog_print_all(char __user *buf, int size, bool clear)
> if (!text)
> return -ENOMEM;
>
> - raw_spin_lock_irq(&logbuf_lock);
> + raw_spin_lock_irq(&ns->logbuf_lock);
> if (buf) {
> u64 next_seq;
> u64 seq;
> u32 idx;
> enum log_flags prev;
>
> - if (clear_seq < log_first_seq) {
> + if (ns->clear_seq < ns->log_first_seq) {
> /* messages are gone, move to first available one */
> - clear_seq = log_first_seq;
> - clear_idx = log_first_idx;
> + ns->clear_seq = ns->log_first_seq;
> + ns->clear_idx = ns->log_first_idx;
> }
>
> /*
> * Find first record that fits, including all following records,
> * into the user-provided buffer for this dump.
> */
> - seq = clear_seq;
> - idx = clear_idx;
> + seq = ns->clear_seq;
> + idx = ns->clear_idx;
> prev = 0;
> - while (seq < log_next_seq) {
> - struct log *msg = log_from_idx(idx);
> + while (seq < ns->log_next_seq) {
> + struct log *msg = log_from_idx(idx, ns);
>
> len += msg_print_text(msg, prev, true, NULL, 0);
> prev = msg->flags;
> - idx = log_next(idx);
> + idx = log_next(idx, ns);
> seq++;
> }
>
> /* move first record forward until length fits into the buffer */
> - seq = clear_seq;
> - idx = clear_idx;
> + seq = ns->clear_seq;
> + idx = ns->clear_idx;
> prev = 0;
> - while (len > size && seq < log_next_seq) {
> - struct log *msg = log_from_idx(idx);
> + while (len > size && seq < ns->log_next_seq) {
> + struct log *msg = log_from_idx(idx, ns);
>
> len -= msg_print_text(msg, prev, true, NULL, 0);
> prev = msg->flags;
> - idx = log_next(idx);
> + idx = log_next(idx, ns);
> seq++;
> }
>
> /* last message fitting into this dump */
> - next_seq = log_next_seq;
> + next_seq = ns->log_next_seq;
>
> len = 0;
> prev = 0;
> while (len >= 0 && seq < next_seq) {
> - struct log *msg = log_from_idx(idx);
> + struct log *msg = log_from_idx(idx, ns);
> int textlen;
>
> textlen = msg_print_text(msg, prev, true, text,
> @@ -1107,43 +1096,44 @@ static int syslog_print_all(char __user *buf, int size, bool clear)
> len = textlen;
> break;
> }
> - idx = log_next(idx);
> + idx = log_next(idx, ns);
> seq++;
> prev = msg->flags;
>
> - raw_spin_unlock_irq(&logbuf_lock);
> + raw_spin_unlock_irq(&ns->logbuf_lock);
> if (copy_to_user(buf + len, text, textlen))
> len = -EFAULT;
> else
> len += textlen;
> - raw_spin_lock_irq(&logbuf_lock);
> + raw_spin_lock_irq(&ns->logbuf_lock);
>
> - if (seq < log_first_seq) {
> + if (seq < ns->log_first_seq) {
> /* messages are gone, move to next one */
> - seq = log_first_seq;
> - idx = log_first_idx;
> + seq = ns->log_first_seq;
> + idx = ns->log_first_idx;
> prev = 0;
> }
> }
> }
>
> if (clear) {
> - clear_seq = log_next_seq;
> - clear_idx = log_next_idx;
> + ns->clear_seq = ns->log_next_seq;
> + ns->clear_idx = ns->log_next_idx;
> }
> - raw_spin_unlock_irq(&logbuf_lock);
> + raw_spin_unlock_irq(&ns->logbuf_lock);
>
> kfree(text);
> return len;
> }
>
> -int do_syslog(int type, char __user *buf, int len, bool from_file)
> +int do_syslog(int type, char __user *buf, int len, bool from_file,
> + struct syslog_namespace *ns)
> {
> bool clear = false;
> static int saved_console_loglevel = -1;
> int error;
>
> - error = check_syslog_permissions(type, from_file);
> + error = check_syslog_permissions(type, from_file, ns);
> if (error)
> goto out;
>
> @@ -1168,10 +1158,10 @@ int do_syslog(int type, char __user *buf, int len, bool from_file)
> goto out;
> }
> error = wait_event_interruptible(log_wait,
> - syslog_seq != log_next_seq);
> + ns->syslog_seq != ns->log_next_seq);
> if (error)
> goto out;
> - error = syslog_print(buf, len);
> + error = syslog_print(buf, len, ns);
> break;
> /* Read/clear last kernel messages */
> case SYSLOG_ACTION_READ_CLEAR:
> @@ -1189,11 +1179,11 @@ int do_syslog(int type, char __user *buf, int len, bool from_file)
> error = -EFAULT;
> goto out;
> }
> - error = syslog_print_all(buf, len, clear);
> + error = syslog_print_all(buf, len, clear, ns);
> break;
> /* Clear ring buffer */
> case SYSLOG_ACTION_CLEAR:
> - syslog_print_all(NULL, 0, true);
> + syslog_print_all(NULL, 0, true, ns);
> break;
> /* Disable logging to console */
> case SYSLOG_ACTION_CONSOLE_OFF:
> @@ -1222,13 +1212,13 @@ int do_syslog(int type, char __user *buf, int len, bool from_file)
> break;
> /* Number of chars in the log buffer */
> case SYSLOG_ACTION_SIZE_UNREAD:
> - raw_spin_lock_irq(&logbuf_lock);
> - if (syslog_seq < log_first_seq) {
> + raw_spin_lock_irq(&ns->logbuf_lock);
> + if (ns->syslog_seq < ns->log_first_seq) {
> /* messages are gone, move to first one */
> - syslog_seq = log_first_seq;
> - syslog_idx = log_first_idx;
> - syslog_prev = 0;
> - syslog_partial = 0;
> + ns->syslog_seq = ns->log_first_seq;
> + ns->syslog_idx = ns->log_first_idx;
> + ns->syslog_prev = 0;
> + ns->syslog_partial = 0;
> }
> if (from_file) {
> /*
> @@ -1236,28 +1226,28 @@ int do_syslog(int type, char __user *buf, int len, bool from_file)
> * for pending data, not the size; return the count of
> * records, not the length.
> */
> - error = log_next_idx - syslog_idx;
> + error = ns->log_next_idx - ns->syslog_idx;
> } else {
> - u64 seq = syslog_seq;
> - u32 idx = syslog_idx;
> - enum log_flags prev = syslog_prev;
> + u64 seq = ns->syslog_seq;
> + u32 idx = ns->syslog_idx;
> + enum log_flags prev = ns->syslog_prev;
>
> error = 0;
> - while (seq < log_next_seq) {
> - struct log *msg = log_from_idx(idx);
> + while (seq < ns->log_next_seq) {
> + struct log *msg = log_from_idx(idx, ns);
>
> error += msg_print_text(msg, prev, true, NULL, 0);
> - idx = log_next(idx);
> + idx = log_next(idx, ns);
> seq++;
> prev = msg->flags;
> }
> - error -= syslog_partial;
> + error -= ns->syslog_partial;
> }
> - raw_spin_unlock_irq(&logbuf_lock);
> + raw_spin_unlock_irq(&ns->logbuf_lock);
> break;
> /* Size of the log buffer */
> case SYSLOG_ACTION_SIZE_BUFFER:
> - error = log_buf_len;
> + error = ns->log_buf_len;
> break;
> default:
> error = -EINVAL;
> @@ -1269,7 +1259,8 @@ out:
>
> SYSCALL_DEFINE3(syslog, int, type, char __user *, buf, int, len)
> {
> - return do_syslog(type, buf, len, SYSLOG_FROM_READER);
> + return do_syslog(type, buf, len, SYSLOG_FROM_READER,
> + current_user_ns()->syslog_ns);
> }
>
> /*
> @@ -1307,7 +1298,7 @@ static void call_console_drivers(int level, const char *text, size_t len)
> * every 10 seconds, to leave time for slow consoles to print a
> * full oops.
> */
> -static void zap_locks(void)
> +static void zap_locks(struct syslog_namespace *ns)
> {
> static unsigned long oops_timestamp;
>
> @@ -1319,7 +1310,7 @@ static void zap_locks(void)
>
> debug_locks_off();
> /* If a crash is occurring, make sure we can't deadlock */
> - raw_spin_lock_init(&logbuf_lock);
> + raw_spin_lock_init(&ns->logbuf_lock);
> /* And make sure that we print immediately */
> sema_init(&console_sem, 1);
> }
> @@ -1359,8 +1350,9 @@ static inline int can_use_console(unsigned int cpu)
> * interrupts disabled. It should return with 'lockbuf_lock'
> * released but interrupts still disabled.
> */
> -static int console_trylock_for_printk(unsigned int cpu)
> - __releases(&logbuf_lock)
> +static int console_trylock_for_printk(unsigned int cpu,
> + struct syslog_namespace *ns)
> + __releases(&ns->logbuf_lock)
> {
> int retval = 0, wake = 0;
>
> @@ -1379,8 +1371,8 @@ static int console_trylock_for_printk(unsigned int cpu)
> retval = 0;
> }
> }
> - logbuf_cpu = UINT_MAX;
> - raw_spin_unlock(&logbuf_lock);
> + ns->logbuf_cpu = UINT_MAX;
> + raw_spin_unlock(&ns->logbuf_lock);
> if (wake)
> up(&console_sem);
> return retval;
> @@ -1418,7 +1410,7 @@ static struct cont {
> bool flushed:1; /* buffer sealed and committed */
> } cont;
>
> -static void cont_flush(enum log_flags flags)
> +static void cont_flush(enum log_flags flags, struct syslog_namespace *ns)
> {
> if (cont.flushed)
> return;
> @@ -1432,7 +1424,7 @@ static void cont_flush(enum log_flags flags)
> * line. LOG_NOCONS suppresses a duplicated output.
> */
> log_store(cont.facility, cont.level, flags | LOG_NOCONS,
> - cont.ts_nsec, NULL, 0, cont.buf, cont.len);
> + cont.ts_nsec, NULL, 0, cont.buf, cont.len, ns);
> cont.flags = flags;
> cont.flushed = true;
> } else {
> @@ -1441,19 +1433,20 @@ static void cont_flush(enum log_flags flags)
> * just submit it to the store and free the buffer.
> */
> log_store(cont.facility, cont.level, flags, 0,
> - NULL, 0, cont.buf, cont.len);
> + NULL, 0, cont.buf, cont.len, ns);
> cont.len = 0;
> }
> }
>
> -static bool cont_add(int facility, int level, const char *text, size_t len)
> +static bool cont_add(int facility, int level, const char *text, size_t len,
> + struct syslog_namespace *ns)
> {
> if (cont.len && cont.flushed)
> return false;
>
> if (cont.len + len > sizeof(cont.buf)) {
> /* the line gets too long, split it up in separate records */
> - cont_flush(LOG_CONT);
> + cont_flush(LOG_CONT, ns);
> return false;
> }
>
> @@ -1471,7 +1464,7 @@ static bool cont_add(int facility, int level, const char *text, size_t len)
> cont.len += len;
>
> if (cont.len > (sizeof(cont.buf) * 80) / 100)
> - cont_flush(LOG_CONT);
> + cont_flush(LOG_CONT, ns);
>
> return true;
> }
> @@ -1516,6 +1509,7 @@ asmlinkage int vprintk_emit(int facility, int level,
> unsigned long flags;
> int this_cpu;
> int printed_len = 0;
> + struct syslog_namespace *ns = &init_syslog_ns;
>
> boot_delay_msec(level);
> printk_delay();
> @@ -1527,7 +1521,7 @@ asmlinkage int vprintk_emit(int facility, int level,
> /*
> * Ouch, printk recursed into itself!
> */
> - if (unlikely(logbuf_cpu == this_cpu)) {
> + if (unlikely(ns->logbuf_cpu == this_cpu)) {
> /*
> * If a crash is occurring during printk() on this CPU,
> * then try to get the crash message out but make sure
> @@ -1539,12 +1533,12 @@ asmlinkage int vprintk_emit(int facility, int level,
> recursion_bug = 1;
> goto out_restore_irqs;
> }
> - zap_locks();
> + zap_locks(ns);
> }
>
> lockdep_off();
> - raw_spin_lock(&logbuf_lock);
> - logbuf_cpu = this_cpu;
> + raw_spin_lock(&ns->logbuf_lock);
> + ns->logbuf_cpu = this_cpu;
>
> if (recursion_bug) {
> static const char recursion_msg[] =
> @@ -1554,7 +1548,7 @@ asmlinkage int vprintk_emit(int facility, int level,
> printed_len += strlen(recursion_msg);
> /* emit KERN_CRIT message */
> log_store(0, 2, LOG_PREFIX|LOG_NEWLINE, 0,
> - NULL, 0, recursion_msg, printed_len);
> + NULL, 0, recursion_msg, printed_len, ns);
> }
>
> /*
> @@ -1601,12 +1595,12 @@ asmlinkage int vprintk_emit(int facility, int level,
> * or another task also prints continuation lines.
> */
> if (cont.len && (lflags & LOG_PREFIX || cont.owner != current))
> - cont_flush(LOG_NEWLINE);
> + cont_flush(LOG_NEWLINE, ns);
>
> /* buffer line if possible, otherwise store it right away */
> - if (!cont_add(facility, level, text, text_len))
> + if (!cont_add(facility, level, text, text_len, ns))
> log_store(facility, level, lflags | LOG_CONT, 0,
> - dict, dictlen, text, text_len);
> + dict, dictlen, text, text_len, ns);
> } else {
> bool stored = false;
>
> @@ -1618,13 +1612,14 @@ asmlinkage int vprintk_emit(int facility, int level,
> */
> if (cont.len && cont.owner == current) {
> if (!(lflags & LOG_PREFIX))
> - stored = cont_add(facility, level, text, text_len);
> - cont_flush(LOG_NEWLINE);
> + stored = cont_add(facility, level, text,
> + text_len, ns);
> + cont_flush(LOG_NEWLINE, ns);
> }
>
> if (!stored)
> log_store(facility, level, lflags, 0,
> - dict, dictlen, text, text_len);
> + dict, dictlen, text, text_len, ns);
> }
> printed_len += text_len;
>
> @@ -1636,7 +1631,7 @@ asmlinkage int vprintk_emit(int facility, int level,
> * The console_trylock_for_printk() function will release 'logbuf_lock'
> * regardless of whether it actually gets the console semaphore or not.
> */
> - if (console_trylock_for_printk(this_cpu))
> + if (console_trylock_for_printk(this_cpu, ns))
> console_unlock();
>
> lockdep_on();
> @@ -1995,12 +1990,13 @@ int is_console_locked(void)
> return console_locked;
> }
>
> -static void console_cont_flush(char *text, size_t size)
> +static void console_cont_flush(char *text, size_t size,
> + struct syslog_namespace *ns)
> {
> unsigned long flags;
> size_t len;
>
> - raw_spin_lock_irqsave(&logbuf_lock, flags);
> + raw_spin_lock_irqsave(&ns->logbuf_lock, flags);
>
> if (!cont.len)
> goto out;
> @@ -2010,18 +2006,18 @@ static void console_cont_flush(char *text, size_t size)
> * busy. The earlier ones need to be printed before this one, we
> * did not flush any fragment so far, so just let it queue up.
> */
> - if (console_seq < log_next_seq && !cont.cons)
> + if (ns->console_seq < ns->log_next_seq && !cont.cons)
> goto out;
>
> len = cont_print_text(text, size);
> - raw_spin_unlock(&logbuf_lock);
> + raw_spin_unlock(&ns->logbuf_lock);
> stop_critical_timings();
> call_console_drivers(cont.level, text, len);
> start_critical_timings();
> local_irq_restore(flags);
> return;
> out:
> - raw_spin_unlock_irqrestore(&logbuf_lock, flags);
> + raw_spin_unlock_irqrestore(&ns->logbuf_lock, flags);
> }
>
> /**
> @@ -2045,6 +2041,7 @@ void console_unlock(void)
> unsigned long flags;
> bool wake_klogd = false;
> bool retry;
> + struct syslog_namespace *ns = &init_syslog_ns;
>
> if (console_suspended) {
> up(&console_sem);
> @@ -2054,37 +2051,38 @@ void console_unlock(void)
> console_may_schedule = 0;
>
> /* flush buffered message fragment immediately to console */
> - console_cont_flush(text, sizeof(text));
> + console_cont_flush(text, sizeof(text), ns);
> again:
> for (;;) {
> struct log *msg;
> size_t len;
> int level;
>
> - raw_spin_lock_irqsave(&logbuf_lock, flags);
> - if (seen_seq != log_next_seq) {
> + raw_spin_lock_irqsave(&ns->logbuf_lock, flags);
> + if (seen_seq != ns->log_next_seq) {
> wake_klogd = true;
> - seen_seq = log_next_seq;
> + seen_seq = ns->log_next_seq;
> }
>
> - if (console_seq < log_first_seq) {
> + if (ns->console_seq < ns->log_first_seq) {
> /* messages are gone, move to first one */
> - console_seq = log_first_seq;
> - console_idx = log_first_idx;
> + ns->console_seq = ns->log_first_seq;
> + ns->console_idx = ns->log_first_idx;
> console_prev = 0;
> }
> skip:
> - if (console_seq == log_next_seq)
> + if (ns->console_seq == ns->log_next_seq)
> break;
>
> - msg = log_from_idx(console_idx);
> + msg = log_from_idx(ns->console_idx, ns);
> if (msg->flags & LOG_NOCONS) {
> /*
> * Skip record we have buffered and already printed
> * directly to the console when we received it.
> */
> - console_idx = log_next(console_idx);
> - console_seq++;
> + ns->console_idx =
> + log_next(ns->console_idx, ns);
> + ns->console_seq++;
> /*
> * We will get here again when we register a new
> * CON_PRINTBUFFER console. Clear the flag so we
> @@ -2098,10 +2096,11 @@ skip:
> level = msg->level;
> len = msg_print_text(msg, console_prev, false,
> text, sizeof(text));
> - console_idx = log_next(console_idx);
> - console_seq++;
> + ns->console_idx =
> + log_next(ns->console_idx, ns);
> + ns->console_seq++;
> console_prev = msg->flags;
> - raw_spin_unlock(&logbuf_lock);
> + raw_spin_unlock(&ns->logbuf_lock);
>
> stop_critical_timings(); /* don't trace print latency */
> call_console_drivers(level, text, len);
> @@ -2115,7 +2114,7 @@ skip:
> if (unlikely(exclusive_console))
> exclusive_console = NULL;
>
> - raw_spin_unlock(&logbuf_lock);
> + raw_spin_unlock(&ns->logbuf_lock);
>
> up(&console_sem);
>
> @@ -2125,9 +2124,9 @@ skip:
> * there's a new owner and the console_unlock() from them will do the
> * flush, no worries.
> */
> - raw_spin_lock(&logbuf_lock);
> - retry = console_seq != log_next_seq;
> - raw_spin_unlock_irqrestore(&logbuf_lock, flags);
> + raw_spin_lock(&ns->logbuf_lock);
> + retry = ns->console_seq != ns->log_next_seq;
> + raw_spin_unlock_irqrestore(&ns->logbuf_lock, flags);
>
> if (retry && console_trylock())
> goto again;
> @@ -2252,6 +2251,7 @@ void register_console(struct console *newcon)
> int i;
> unsigned long flags;
> struct console *bcon = NULL;
> + struct syslog_namespace *ns = &init_syslog_ns;
>
> /*
> * before we register a new CON_BOOT console, make sure we don't
> @@ -2361,11 +2361,11 @@ void register_console(struct console *newcon)
> * console_unlock(); will print out the buffered messages
> * for us.
> */
> - raw_spin_lock_irqsave(&logbuf_lock, flags);
> - console_seq = syslog_seq;
> - console_idx = syslog_idx;
> - console_prev = syslog_prev;
> - raw_spin_unlock_irqrestore(&logbuf_lock, flags);
> + raw_spin_lock_irqsave(&ns->logbuf_lock, flags);
> + ns->console_seq = ns->syslog_seq;
> + ns->console_idx = ns->syslog_idx;
> + console_prev = ns->syslog_prev;
> + raw_spin_unlock_irqrestore(&ns->logbuf_lock, flags);
> /*
> * We're about to replay the log buffer. Only do this to the
> * just-registered console to avoid excessive message spam to
> @@ -2627,6 +2627,7 @@ void kmsg_dump(enum kmsg_dump_reason reason)
> {
> struct kmsg_dumper *dumper;
> unsigned long flags;
> + struct syslog_namespace *ns = &init_syslog_ns;
>
> if ((reason > KMSG_DUMP_OOPS) && !always_kmsg_dump)
> return;
> @@ -2639,12 +2640,12 @@ void kmsg_dump(enum kmsg_dump_reason reason)
> /* initialize iterator with data about the stored records */
> dumper->active = true;
>
> - raw_spin_lock_irqsave(&logbuf_lock, flags);
> - dumper->cur_seq = clear_seq;
> - dumper->cur_idx = clear_idx;
> - dumper->next_seq = log_next_seq;
> - dumper->next_idx = log_next_idx;
> - raw_spin_unlock_irqrestore(&logbuf_lock, flags);
> + raw_spin_lock_irqsave(&ns->logbuf_lock, flags);
> + dumper->cur_seq = ns->clear_seq;
> + dumper->cur_idx = ns->clear_idx;
> + dumper->next_seq = ns->log_next_seq;
> + dumper->next_idx = ns->log_next_idx;
> + raw_spin_unlock_irqrestore(&ns->logbuf_lock, flags);
>
> /* invoke dumper which will iterate over records */
> dumper->dump(dumper, reason);
> @@ -2680,24 +2681,25 @@ bool kmsg_dump_get_line_nolock(struct kmsg_dumper *dumper, bool syslog,
> struct log *msg;
> size_t l = 0;
> bool ret = false;
> + struct syslog_namespace *ns = &init_syslog_ns;
>
> if (!dumper->active)
> goto out;
>
> - if (dumper->cur_seq < log_first_seq) {
> + if (dumper->cur_seq < ns->log_first_seq) {
> /* messages are gone, move to first available one */
> - dumper->cur_seq = log_first_seq;
> - dumper->cur_idx = log_first_idx;
> + dumper->cur_seq = ns->log_first_seq;
> + dumper->cur_idx = ns->log_first_idx;
> }
>
> /* last entry */
> - if (dumper->cur_seq >= log_next_seq)
> + if (dumper->cur_seq >= ns->log_next_seq)
> goto out;
>
> - msg = log_from_idx(dumper->cur_idx);
> + msg = log_from_idx(dumper->cur_idx, ns);
> l = msg_print_text(msg, 0, syslog, line, size);
>
> - dumper->cur_idx = log_next(dumper->cur_idx);
> + dumper->cur_idx = log_next(dumper->cur_idx, ns);
> dumper->cur_seq++;
> ret = true;
> out:
> @@ -2728,10 +2730,11 @@ bool kmsg_dump_get_line(struct kmsg_dumper *dumper, bool syslog,
> {
> unsigned long flags;
> bool ret;
> + struct syslog_namespace *ns = &init_syslog_ns;
>
> - raw_spin_lock_irqsave(&logbuf_lock, flags);
> + raw_spin_lock_irqsave(&ns->logbuf_lock, flags);
> ret = kmsg_dump_get_line_nolock(dumper, syslog, line, size, len);
> - raw_spin_unlock_irqrestore(&logbuf_lock, flags);
> + raw_spin_unlock_irqrestore(&ns->logbuf_lock, flags);
>
> return ret;
> }
> @@ -2767,20 +2770,21 @@ bool kmsg_dump_get_buffer(struct kmsg_dumper *dumper, bool syslog,
> enum log_flags prev;
> size_t l = 0;
> bool ret = false;
> + struct syslog_namespace *ns = &init_syslog_ns;
>
> if (!dumper->active)
> goto out;
>
> - raw_spin_lock_irqsave(&logbuf_lock, flags);
> - if (dumper->cur_seq < log_first_seq) {
> + raw_spin_lock_irqsave(&ns->logbuf_lock, flags);
> + if (dumper->cur_seq < ns->log_first_seq) {
> /* messages are gone, move to first available one */
> - dumper->cur_seq = log_first_seq;
> - dumper->cur_idx = log_first_idx;
> + dumper->cur_seq = ns->log_first_seq;
> + dumper->cur_idx = ns->log_first_idx;
> }
>
> /* last entry */
> if (dumper->cur_seq >= dumper->next_seq) {
> - raw_spin_unlock_irqrestore(&logbuf_lock, flags);
> + raw_spin_unlock_irqrestore(&ns->logbuf_lock, flags);
> goto out;
> }
>
> @@ -2789,10 +2793,10 @@ bool kmsg_dump_get_buffer(struct kmsg_dumper *dumper, bool syslog,
> idx = dumper->cur_idx;
> prev = 0;
> while (seq < dumper->next_seq) {
> - struct log *msg = log_from_idx(idx);
> + struct log *msg = log_from_idx(idx, ns);
>
> l += msg_print_text(msg, prev, true, NULL, 0);
> - idx = log_next(idx);
> + idx = log_next(idx, ns);
> seq++;
> prev = msg->flags;
> }
> @@ -2802,10 +2806,10 @@ bool kmsg_dump_get_buffer(struct kmsg_dumper *dumper, bool syslog,
> idx = dumper->cur_idx;
> prev = 0;
> while (l > size && seq < dumper->next_seq) {
> - struct log *msg = log_from_idx(idx);
> + struct log *msg = log_from_idx(idx, ns);
>
> l -= msg_print_text(msg, prev, true, NULL, 0);
> - idx = log_next(idx);
> + idx = log_next(idx, ns);
> seq++;
> prev = msg->flags;
> }
> @@ -2817,10 +2821,10 @@ bool kmsg_dump_get_buffer(struct kmsg_dumper *dumper, bool syslog,
> l = 0;
> prev = 0;
> while (seq < dumper->next_seq) {
> - struct log *msg = log_from_idx(idx);
> + struct log *msg = log_from_idx(idx, ns);
>
> l += msg_print_text(msg, prev, syslog, buf + l, size - l);
> - idx = log_next(idx);
> + idx = log_next(idx, ns);
> seq++;
> prev = msg->flags;
> }
> @@ -2828,7 +2832,7 @@ bool kmsg_dump_get_buffer(struct kmsg_dumper *dumper, bool syslog,
> dumper->next_seq = next_seq;
> dumper->next_idx = next_idx;
> ret = true;
> - raw_spin_unlock_irqrestore(&logbuf_lock, flags);
> + raw_spin_unlock_irqrestore(&ns->logbuf_lock, flags);
> out:
> if (len)
> *len = l;
> @@ -2848,10 +2852,12 @@ EXPORT_SYMBOL_GPL(kmsg_dump_get_buffer);
> */
> void kmsg_dump_rewind_nolock(struct kmsg_dumper *dumper)
> {
> - dumper->cur_seq = clear_seq;
> - dumper->cur_idx = clear_idx;
> - dumper->next_seq = log_next_seq;
> - dumper->next_idx = log_next_idx;
> + struct syslog_namespace *ns = &init_syslog_ns;
> +
> + dumper->cur_seq = ns->clear_seq;
> + dumper->cur_idx = ns->clear_idx;
> + dumper->next_seq = ns->log_next_seq;
> + dumper->next_idx = ns->log_next_idx;
> }
>
> /**
> @@ -2865,10 +2871,11 @@ void kmsg_dump_rewind_nolock(struct kmsg_dumper *dumper)
> void kmsg_dump_rewind(struct kmsg_dumper *dumper)
> {
> unsigned long flags;
> + struct syslog_namespace *ns = &init_syslog_ns;
>
> - raw_spin_lock_irqsave(&logbuf_lock, flags);
> + raw_spin_lock_irqsave(&ns->logbuf_lock, flags);
> kmsg_dump_rewind_nolock(dumper);
> - raw_spin_unlock_irqrestore(&logbuf_lock, flags);
> + raw_spin_unlock_irqrestore(&ns->logbuf_lock, flags);
> }
> EXPORT_SYMBOL_GPL(kmsg_dump_rewind);
>
> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> index ac09d98..0954b09 100644
> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -62,6 +62,7 @@
> #include <linux/capability.h>
> #include <linux/binfmts.h>
> #include <linux/sched/sysctl.h>
> +#include <linux/syslog.h>
>
> #include <asm/uaccess.h>
> #include <asm/processor.h>
> @@ -773,7 +774,7 @@ static struct ctl_table kern_table[] = {
> },
> {
> .procname = "dmesg_restrict",
> - .data = &dmesg_restrict,
> + .data = &init_syslog_ns.dmesg_restrict,
> .maxlen = sizeof(int),
> .mode = 0644,
> .proc_handler = proc_dointvec_minmax_sysadmin,

2013-07-29 10:00:27

by Gu Zheng

[permalink] [raw]
Subject: Re: [PATCH 2/9] syslog_ns: add syslog_ns into user_namespace

On 07/29/2013 05:54 PM, Gao feng wrote:

> On 07/29/2013 05:46 PM, Gu Zheng wrote:
>> Hi Rui,
>>
>> On 07/29/2013 10:31 AM, Rui Xiang wrote:
>>
>>> Add a syslog_ns pointer to user_namespace, and make
>>> syslog_ns per user_namespace, not global.
>>>
>>> Since syslog_ns is assigned to user_ns, we can have
>>> full capabilities in new user_ns to create a new syslog_ns.
>>>
>>> Signed-off-by: Rui Xiang <[email protected]>
>>> ---
>>> include/linux/syslog.h | 5 +++++
>>> include/linux/user_namespace.h | 1 +
>>> 2 files changed, 6 insertions(+)
>>>
>>> diff --git a/include/linux/syslog.h b/include/linux/syslog.h
>>> index 425fafe..62ce47f 100644
>>> --- a/include/linux/syslog.h
>>> +++ b/include/linux/syslog.h
>>> @@ -90,6 +90,11 @@ struct syslog_namespace {
>>> size_t syslog_partial;
>>>
>>> int dmesg_restrict;
>>> +
>>> + /*
>>> + * user namespace which owns this syslog ns.
>>> + */
>>> + struct user_namespace *owner;
>>> };
>>>
>>> static inline struct syslog_namespace *get_syslog_ns(
>>> diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
>>> index b6b215f..ce2de5b 100644
>>> --- a/include/linux/user_namespace.h
>>> +++ b/include/linux/user_namespace.h
>>> @@ -28,6 +28,7 @@ struct user_namespace {
>>> unsigned int proc_inum;
>>> bool may_mount_sysfs;
>>> bool may_mount_proc;
>>> + struct syslog_namespace *syslog_ns;
>>
>> As we add a syslog_ns pointer to user_namespace to make
>> syslog_ns per user_namespace and the caps check.
>> But why also add a point to syslog_namespace in
>> user_namespace? Am I missing something?:)
>>
>
> yep,with this we can make sure all the other types of namespace such as mount, net, pid
> can access syslog_ns through user namespace.

Got it.:)

Thanks,
Gu

>
>

2013-07-29 10:29:12

by Gu Zheng

[permalink] [raw]
Subject: Re: [PATCH 7/9] syslog_ns: implement function for creating syslog ns

Hi Rui,

On 07/29/2013 10:31 AM, Rui Xiang wrote:

> Add create_syslog_ns function to create a new ns. We
> must create a user_ns before create a new syslog ns.
> And then tie the new syslog_ns to current user_ns
> instead of original syslog_ns which comes from
> parent user_ns.
>
> Add a new syslog flag SYSLOG_ACTION_NEW_NS to implement
> a new command(11) of __NR_syslog system call. Through
> that command, we can create a new syslog ns in user
> space.
>
> Signed-off-by: Rui Xiang <[email protected]>
> ---
> include/linux/syslog.h | 2 ++
> kernel/printk.c | 52 ++++++++++++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 54 insertions(+)
>
> diff --git a/include/linux/syslog.h b/include/linux/syslog.h
> index fbf0cb6..df57c21 100644
> --- a/include/linux/syslog.h
> +++ b/include/linux/syslog.h
> @@ -46,6 +46,8 @@
> #define SYSLOG_ACTION_SIZE_UNREAD 9
> /* Return size of the log buffer */
> #define SYSLOG_ACTION_SIZE_BUFFER 10
> +/* Create a new syslog ns */
> +#define SYSLOG_ACTION_NEW_NS 11
>
> #define SYSLOG_FROM_READER 0
> #define SYSLOG_FROM_PROC 1
> diff --git a/kernel/printk.c b/kernel/printk.c
> index fd2d600..6b561db 100644
> --- a/kernel/printk.c
> +++ b/kernel/printk.c
> @@ -384,6 +384,10 @@ static int check_syslog_permissions(int type, bool from_file,
> || type == SYSLOG_ACTION_CONSOLE_LEVEL)
> ns = &init_syslog_ns;
>
> + /* create a new syslog ns */
> + if (type == SYSLOG_ACTION_NEW_NS)
> + return 0;
> +

Don't we need further permission or caps check here? Return success directly seems sloppy.

Thanks,
Gu

> if (syslog_action_restricted(type, ns)) {
> if (ns_capable(ns->owner, CAP_SYSLOG))
> return 0;
> @@ -1131,6 +1135,51 @@ static int syslog_print_all(char __user *buf, int size, bool clear,
> return len;
> }
>
> +static int create_syslog_ns(void)
> +{
> + struct user_namespace *userns = current_user_ns();
> + struct syslog_namespace *oldns, *newns;
> + int err;
> +
> + /*
> + * syslog ns belongs to a user ns. So you can only unshare your
> + * user_ns if you share a user_ns with your parent userns
> + */
> + if (userns == &init_user_ns ||
> + userns->syslog_ns != userns->parent->syslog_ns)
> + return -EINVAL;
> +
> + if (!ns_capable(userns, CAP_SYSLOG))
> + return -EPERM;
> +
> + err = -ENOMEM;
> + oldns = userns->syslog_ns;
> + newns = kzalloc(sizeof(*newns), GFP_ATOMIC);
> + if (!newns)
> + goto out;
> + newns->log_buf_len = __LOG_BUF_LEN;
> + newns->log_buf = kzalloc(newns->log_buf_len, GFP_ATOMIC);
> + if (!newns->log_buf)
> + goto out;
> +
> + newns->owner = get_user_ns(userns);
> + raw_spin_lock_init(&(newns->logbuf_lock));
> + newns->logbuf_cpu = UINT_MAX;
> + newns->dmesg_restrict = oldns->dmesg_restrict;
> + put_syslog_ns(oldns);
> + kref_init(&newns->kref);
> + userns->syslog_ns = newns;
> + newns = NULL;
> +
> + err = 0;
> +out:
> + if (newns) {
> + kfree(newns->log_buf);
> + kfree(newns);
> + }
> + return err;
> +}
> +
> int do_syslog(int type, char __user *buf, int len, bool from_file,
> struct syslog_namespace *ns)
> {
> @@ -1254,6 +1303,9 @@ int do_syslog(int type, char __user *buf, int len, bool from_file,
> case SYSLOG_ACTION_SIZE_BUFFER:
> error = ns->log_buf_len;
> break;
> + case SYSLOG_ACTION_NEW_NS:
> + error = create_syslog_ns();
> + break;
> default:
> error = -EINVAL;
> break;

2013-07-29 10:38:19

by Gao feng

[permalink] [raw]
Subject: Re: [PATCH 7/9] syslog_ns: implement function for creating syslog ns

On 07/29/2013 10:31 AM, Rui Xiang wrote:
> Add create_syslog_ns function to create a new ns. We
> must create a user_ns before create a new syslog ns.
> And then tie the new syslog_ns to current user_ns
> instead of original syslog_ns which comes from
> parent user_ns.
>
> Add a new syslog flag SYSLOG_ACTION_NEW_NS to implement
> a new command(11) of __NR_syslog system call. Through
> that command, we can create a new syslog ns in user
> space.
>
> Signed-off-by: Rui Xiang <[email protected]>
> ---
> include/linux/syslog.h | 2 ++
> kernel/printk.c | 52 ++++++++++++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 54 insertions(+)
>
> diff --git a/include/linux/syslog.h b/include/linux/syslog.h
> index fbf0cb6..df57c21 100644
> --- a/include/linux/syslog.h
> +++ b/include/linux/syslog.h
> @@ -46,6 +46,8 @@
> #define SYSLOG_ACTION_SIZE_UNREAD 9
> /* Return size of the log buffer */
> #define SYSLOG_ACTION_SIZE_BUFFER 10
> +/* Create a new syslog ns */
> +#define SYSLOG_ACTION_NEW_NS 11
>
> #define SYSLOG_FROM_READER 0
> #define SYSLOG_FROM_PROC 1
> diff --git a/kernel/printk.c b/kernel/printk.c
> index fd2d600..6b561db 100644
> --- a/kernel/printk.c
> +++ b/kernel/printk.c
> @@ -384,6 +384,10 @@ static int check_syslog_permissions(int type, bool from_file,
> || type == SYSLOG_ACTION_CONSOLE_LEVEL)
> ns = &init_syslog_ns;
>
> + /* create a new syslog ns */
> + if (type == SYSLOG_ACTION_NEW_NS)
> + return 0;
> +
> if (syslog_action_restricted(type, ns)) {
> if (ns_capable(ns->owner, CAP_SYSLOG))
> return 0;
> @@ -1131,6 +1135,51 @@ static int syslog_print_all(char __user *buf, int size, bool clear,
> return len;
> }
>
> +static int create_syslog_ns(void)
> +{
> + struct user_namespace *userns = current_user_ns();
> + struct syslog_namespace *oldns, *newns;
> + int err;
> +
> + /*
> + * syslog ns belongs to a user ns. So you can only unshare your
> + * user_ns if you share a user_ns with your parent userns
> + */
> + if (userns == &init_user_ns ||
> + userns->syslog_ns != userns->parent->syslog_ns)
> + return -EINVAL;
> +
> + if (!ns_capable(userns, CAP_SYSLOG))
> + return -EPERM;
> +
> + err = -ENOMEM;
> + oldns = userns->syslog_ns;
> + newns = kzalloc(sizeof(*newns), GFP_ATOMIC);
> + if (!newns)
> + goto out;
> + newns->log_buf_len = __LOG_BUF_LEN;
> + newns->log_buf = kzalloc(newns->log_buf_len, GFP_ATOMIC);
> + if (!newns->log_buf)
> + goto out;
> +
> + newns->owner = get_user_ns(userns);
> + raw_spin_lock_init(&(newns->logbuf_lock));
> + newns->logbuf_cpu = UINT_MAX;
> + newns->dmesg_restrict = oldns->dmesg_restrict;
> + put_syslog_ns(oldns);
> + kref_init(&newns->kref);
> + userns->syslog_ns = newns;

seems like user namespace references the syslog_ns and syslog_ns references
user namespace too? how do you deal with the release?

> + newns = NULL;
> +
> + err = 0;
> +out:
> + if (newns) {
> + kfree(newns->log_buf);
> + kfree(newns);
> + }
> + return err;
> +}
> +
> int do_syslog(int type, char __user *buf, int len, bool from_file,
> struct syslog_namespace *ns)
> {
> @@ -1254,6 +1303,9 @@ int do_syslog(int type, char __user *buf, int len, bool from_file,
> case SYSLOG_ACTION_SIZE_BUFFER:
> error = ns->log_buf_len;
> break;
> + case SYSLOG_ACTION_NEW_NS:
> + error = create_syslog_ns();
> + break;
> default:
> error = -EINVAL;
> break;
>

2013-07-29 10:41:12

by Gu Zheng

[permalink] [raw]
Subject: Re: [PATCH 8/9] syslog_ns: implement ns_printk for specific syslog_ns

Hi Rui,

On 07/29/2013 10:31 AM, Rui Xiang wrote:

> Add a new interface named ns_printk, and assign an
> patamater ns. Log which belong to a container can
> be printed by ns_printk.

One question, with the syslog_ns used, do the log we print by *printk* in the
host contains the log in each syslog_ns(print out with ns_printk) or not?

Thanks,
Gu

>
> Signed-off-by: Rui Xiang <[email protected]>
> ---
> include/linux/printk.h | 4 ++++
> kernel/printk.c | 53 ++++++++++++++++++++++++++++++++++++++++++++++----
> 2 files changed, 53 insertions(+), 4 deletions(-)
>
> diff --git a/include/linux/printk.h b/include/linux/printk.h
> index 29e3f85..bf83ad9 100644
> --- a/include/linux/printk.h
> +++ b/include/linux/printk.h
> @@ -6,6 +6,7 @@
> #include <linux/kern_levels.h>
> #include <linux/linkage.h>
>
> +struct syslog_namespace;
> extern const char linux_banner[];
> extern const char linux_proc_banner[];
>
> @@ -123,6 +124,9 @@ asmlinkage int printk_emit(int facility, int level,
> asmlinkage __printf(1, 2) __cold
> int printk(const char *fmt, ...);
>
> +asmlinkage __printf(2, 3) __cold
> +int ns_printk(struct syslog_namespace *ns, const char *fmt, ...);
> +
> /*
> * Special printk facility for scheduler use only, _DO_NOT_USE_ !
> */
> diff --git a/kernel/printk.c b/kernel/printk.c
> index 6b561db..56a8b27 100644
> --- a/kernel/printk.c
> +++ b/kernel/printk.c
> @@ -1554,9 +1554,10 @@ static size_t cont_print_text(char *text, size_t size)
> return textlen;
> }
>
> -asmlinkage int vprintk_emit(int facility, int level,
> - const char *dict, size_t dictlen,
> - const char *fmt, va_list args)
> +static int ns_vprintk_emit(int facility, int level,
> + const char *dict, size_t dictlen,
> + const char *fmt, va_list args,
> + struct syslog_namespace *ns)
> {
> static int recursion_bug;
> static char textbuf[LOG_LINE_MAX];
> @@ -1566,7 +1567,6 @@ asmlinkage int vprintk_emit(int facility, int level,
> unsigned long flags;
> int this_cpu;
> int printed_len = 0;
> - struct syslog_namespace *ns = &init_syslog_ns;
>
> boot_delay_msec(level);
> printk_delay();
> @@ -1697,6 +1697,14 @@ out_restore_irqs:
>
> return printed_len;
> }
> +
> +asmlinkage int vprintk_emit(int facility, int level,
> + const char *dict, size_t dictlen,
> + const char *fmt, va_list args)
> +{
> + return ns_vprintk_emit(facility, level, dict, dictlen, fmt, args,
> + &init_syslog_ns);
> +}
> EXPORT_SYMBOL(vprintk_emit);
>
> asmlinkage int vprintk(const char *fmt, va_list args)
> @@ -1762,6 +1770,43 @@ asmlinkage int printk(const char *fmt, ...)
> }
> EXPORT_SYMBOL(printk);
>
> +/**
> + * ns_printk - print a kernel message in syslog_ns
> + * @ns: syslog namespace
> + * @fmt: format string
> + *
> + * This is ns_printk().
> + * It can be called from container context. We add a param
> + * ns to record current syslog namespace, because we need to
> + * print some log which are not generated by host, but contaner.
> + *
> + * See the vsnprintf() documentation for format string extensions over C99.
> + **/
> +asmlinkage int ns_printk(struct syslog_namespace *ns,
> + const char *fmt, ...)
> +{
> + va_list args;
> + int r;
> +
> + if (!ns)
> + ns = current_user_ns()->syslog_ns;
> +
> +#ifdef CONFIG_KGDB_KDB
> + if (unlikely(kdb_trap_printk)) {
> + va_start(args, fmt);
> + r = vkdb_printf(fmt, args);
> + va_end(args);
> + return r;
> + }
> +#endif
> + va_start(args, fmt);
> + r = ns_vprintk_emit(0, -1, NULL, 0, fmt, args, ns);
> + va_end(args);
> +
> + return r;
> +}
> +EXPORT_SYMBOL(ns_printk);
> +

Here can we do some clean up to printk using ns_printk?

> #else /* CONFIG_PRINTK */
>
> #define LOG_LINE_MAX 0

2013-07-29 11:47:43

by Rui Xiang

[permalink] [raw]
Subject: Re: [PATCH 1/9] syslog_ns: add syslog_namespace and put/get_syslog_ns

On 2013/7/29 17:40, Gu Zheng wrote:
> Hi Rui,
> Refer to inline:).
>
Hi Gu,

Thanks for your attention.

> On 07/29/2013 10:31 AM, Rui Xiang wrote:
>
>> Add a struct syslog_namespace which contains the necessary
>> members for hanlding syslog and realize get_syslog_ns and
>> put_syslog_ns API.
>>
>> Signed-off-by: Rui Xiang <[email protected]>
>> ---
>> include/linux/syslog.h | 68 ++++++++++++++++++++++++++++++++++++++++++++++++++
>> kernel/printk.c | 7 ------
>> 2 files changed, 68 insertions(+), 7 deletions(-)
>>

...

>> +
>> +static inline void free_syslog_ns(struct kref *kref)
>> +{
>> + struct syslog_namespace *ns;
>> + ns = container_of(kref, struct syslog_namespace, kref);
>> +
>> + kfree(ns->log_buf);
>> + kfree(ns);
>> +}
>
> This interface seems a bit ugly, why not use the format like put_syslog_ns()?
>
> static inline void free_syslog_ns(struct syslog_namespace *ns)
>

Free_syslog_ns is used in put_syslog_ns. And the kref_put function uses kref as
a parameter for its relase funtion. You can see that from
static inline int kref_put(struct kref *kref, void (*release)(struct kref *kref)).

Thanks.

>> +
>> +static inline void put_syslog_ns(struct syslog_namespace *ns)
>> +{
>> + if (ns)
>> + kref_put(&ns->kref, free_syslog_ns);
>> +}
>> +
>>

2013-07-29 12:18:33

by Rui Xiang

[permalink] [raw]
Subject: Re: [PATCH 8/9] syslog_ns: implement ns_printk for specific syslog_ns

On 2013/7/29 18:37, Gu Zheng wrote:
> Hi Rui,
>
> On 07/29/2013 10:31 AM, Rui Xiang wrote:
>
>> Add a new interface named ns_printk, and assign an
>> patamater ns. Log which belong to a container can
>> be printed by ns_printk.
>
> One question, with the syslog_ns used, do the log we print by *printk* in the
> host contains the log in each syslog_ns(print out with ns_printk) or not?
>

No. While using ns_printk, a parameter ns shouled be passed to identify syslog_ns.
If this ns has been created, it has a own log_buf to store logs. Otherwise this ns
comes from current->user_ns. When it is inis_syslog_ns the logs will be printed
out in host.

> Thanks,
> Gu
>
>>
>> Signed-off-by: Rui Xiang <[email protected]>
>> ---
>> include/linux/printk.h | 4 ++++
>> kernel/printk.c | 53 ++++++++++++++++++++++++++++++++++++++++++++++----
>> 2 files changed, 53 insertions(+), 4 deletions(-)
>>
>> diff --git a/include/linux/printk.h b/include/linux/printk.h
>> index 29e3f85..bf83ad9 100644
>> --- a/include/linux/printk.h
>> +++ b/include/linux/printk.h
>> @@ -6,6 +6,7 @@
>> #include <linux/kern_levels.h>
>> #include <linux/linkage.h>
>>
>> +struct syslog_namespace;
>> extern const char linux_banner[];
>> extern const char linux_proc_banner[];
>>
>> @@ -123,6 +124,9 @@ asmlinkage int printk_emit(int facility, int level,
>> asmlinkage __printf(1, 2) __cold
>> int printk(const char *fmt, ...);
>>
>> +asmlinkage __printf(2, 3) __cold
>> +int ns_printk(struct syslog_namespace *ns, const char *fmt, ...);
>> +
>> /*
>> * Special printk facility for scheduler use only, _DO_NOT_USE_ !
>> */
>> diff --git a/kernel/printk.c b/kernel/printk.c
>> index 6b561db..56a8b27 100644
>> --- a/kernel/printk.c
>> +++ b/kernel/printk.c
>> @@ -1554,9 +1554,10 @@ static size_t cont_print_text(char *text, size_t size)
>> return textlen;
>> }
>>
>> -asmlinkage int vprintk_emit(int facility, int level,
>> - const char *dict, size_t dictlen,
>> - const char *fmt, va_list args)
>> +static int ns_vprintk_emit(int facility, int level,
>> + const char *dict, size_t dictlen,
>> + const char *fmt, va_list args,
>> + struct syslog_namespace *ns)
>> {
>> static int recursion_bug;
>> static char textbuf[LOG_LINE_MAX];
>> @@ -1566,7 +1567,6 @@ asmlinkage int vprintk_emit(int facility, int level,
>> unsigned long flags;
>> int this_cpu;
>> int printed_len = 0;
>> - struct syslog_namespace *ns = &init_syslog_ns;
>>
>> boot_delay_msec(level);
>> printk_delay();
>> @@ -1697,6 +1697,14 @@ out_restore_irqs:
>>
>> return printed_len;
>> }
>> +
>> +asmlinkage int vprintk_emit(int facility, int level,
>> + const char *dict, size_t dictlen,
>> + const char *fmt, va_list args)
>> +{
>> + return ns_vprintk_emit(facility, level, dict, dictlen, fmt, args,
>> + &init_syslog_ns);
>> +}
>> EXPORT_SYMBOL(vprintk_emit);
>>
>> asmlinkage int vprintk(const char *fmt, va_list args)
>> @@ -1762,6 +1770,43 @@ asmlinkage int printk(const char *fmt, ...)
>> }
>> EXPORT_SYMBOL(printk);
>>
>> +/**
>> + * ns_printk - print a kernel message in syslog_ns
>> + * @ns: syslog namespace
>> + * @fmt: format string
>> + *
>> + * This is ns_printk().
>> + * It can be called from container context. We add a param
>> + * ns to record current syslog namespace, because we need to
>> + * print some log which are not generated by host, but contaner.
>> + *
>> + * See the vsnprintf() documentation for format string extensions over C99.
>> + **/
>> +asmlinkage int ns_printk(struct syslog_namespace *ns,
>> + const char *fmt, ...)
>> +{
>> + va_list args;
>> + int r;
>> +
>> + if (!ns)
>> + ns = current_user_ns()->syslog_ns;
>> +
>> +#ifdef CONFIG_KGDB_KDB
>> + if (unlikely(kdb_trap_printk)) {
>> + va_start(args, fmt);
>> + r = vkdb_printf(fmt, args);
>> + va_end(args);
>> + return r;
>> + }
>> +#endif
>> + va_start(args, fmt);
>> + r = ns_vprintk_emit(0, -1, NULL, 0, fmt, args, ns);
>> + va_end(args);
>> +
>> + return r;
>> +}
>> +EXPORT_SYMBOL(ns_printk);
>> +
>
> Here can we do some clean up to printk using ns_printk?
>

ok, I will have a try to do it.:)

Thanks.

2013-07-29 12:37:56

by Rui Xiang

[permalink] [raw]
Subject: Re: [PATCH 9/9] netfilter: use ns_printk in iptable context

On 2013/7/29 17:48, Gao feng wrote:
> On 07/29/2013 10:31 AM, Rui Xiang wrote:
>> To containerise iptables log, use ns_printk
>> to report individual logs to container as
>> getting syslog_ns from skb->dev->nd_net->user_ns.
>>
>> Signed-off-by: Rui Xiang <[email protected]>
>> ---
>> include/net/netfilter/xt_log.h | 6 +++++-
>> net/netfilter/xt_LOG.c | 4 ++--
>> 2 files changed, 7 insertions(+), 3 deletions(-)
>>
>> diff --git a/include/net/netfilter/xt_log.h b/include/net/netfilter/xt_log.h
>> index 9d9756c..5222cba 100644
>> --- a/include/net/netfilter/xt_log.h
>> +++ b/include/net/netfilter/xt_log.h
>> @@ -39,10 +39,14 @@ static struct sbuff *sb_open(void)
>> return m;
>> }
>>
>> -static void sb_close(struct sbuff *m)
>> +static void sb_close(struct sbuff *m, struct sk_buff *skb)
>> {
>> m->buf[m->count] = 0;
>> +#ifdef CONFIG_NET_NS
>> + ns_printk(skb->dev->nd_net->user_ns->syslog_ns, "%s\n", m->buf);
>> +#else
>> printk("%s\n", m->buf);
>> +#endif
>>
>> if (likely(m != &emergency))
>> kfree(m);
>> diff --git a/net/netfilter/xt_LOG.c b/net/netfilter/xt_LOG.c
>> index 5ab2484..f2cd2fa3 100644
>> --- a/net/netfilter/xt_LOG.c
>> +++ b/net/netfilter/xt_LOG.c
>> @@ -493,7 +493,7 @@ ipt_log_packet(struct net *net,
>>
>> dump_ipv4_packet(m, loginfo, skb, 0);
>>
>> - sb_close(m);
>> + sb_close(m, skb);
>
>
> why don't you pass net directly to sb_close here?
>
> un init net namespace will not trigger any system log through ipt_LOG/ip6t_LOG.
> You can check the FIXME in ipt_log_packet.
>
> BTW,for this patch,you should cc [email protected] too.
>
Hi Gao,

Thanks for your attention.

Yes, you are right. In the 1st version, there was no net parameter in ipt_log_packet
function. Here I didn't do any amendment. And I will alter it in next version.



Thanks.

2013-07-29 18:58:48

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH 0/9] Add namespace support for syslog v2

Rui Xiang <[email protected]> writes:

> This patchset introduces a system log namespace.

The largest outstanding question is not answered. Can't we just fix
iptables to log somehwere better than dmesg, and would that not entirely
remove the need for this work?

That question needs to be answered before we proceed down this path.

Eric

> It is the 2nd version. The link of the 1st version is
> http://lwn.net/Articles/525728/. In that version, syslog_
> namespace was added into nsproxy and created through a new
> clone flag CLONE_SYSLOG when cloning a process.
>
> There were some discussion in last November about the 1st
> version. This version used these important advice, and
> referred to Serge's patch(http://lwn.net/Articles/525629/).
>
> Unlike the 1st version, in this patchset, syslog namespace
> is tied to a user namespace. Add we must create a new user
> ns before create a new syslog ns, because that will make
> users have full capabilities in this new userns after
> cloning a new user ns. The syslog namespace can be created
> through a new command(11) to __NR_syslog syscall. That owe
> to a new syslog flag SYSLOG_ACTION_NEW_NS.
>
> In syslog_namespace, some necessary identifiers for handling
> syslog buf are containerized. When one container creates a
> new syslog ns, individual buf will be allocated to store log
> ownned this container.
>
> A new interface ns_printk is added to print the logs which
> we want to see in the container. Through ns_printk, we can
> get more logs related to a specific net ns, for instance,
> iptables. Here we use it to report iptable logs per
> contianer.
>
> Then default printk targeted at the init_syslog_ns will
> continue to print out most kernel log to host.
>
> One task in a new syslog ns could affect only current
> container through "dmesg", "dmesg -c" and /dev/kmsg
> actions. The read/write interface such as /dev/kmsg,
> /pro/kmsg and syslog syscall continue to be useful for
> container users.
>
> This patchset is based on linus' linux tree.
>
> Rui Xiang (9):
> syslog_ns: add syslog_namespace and put/get_syslog_ns
> syslog_ns: add syslog_ns into user_namespace
> syslog_ns: add init syslog_ns for global syslog
> syslog_ns: make syslog handling per namespace
> syslog_ns: make permisiion check per user namespace
> syslog_ns: use init syslog_ns for console action
> syslog_ns: implement function for creating syslog ns
> syslog_ns: implement ns_printk for specific syslog_ns
> netfilter: use ns_printk in iptable context
>
> fs/proc/kmsg.c | 17 +-
> include/linux/printk.h | 5 +-
> include/linux/syslog.h | 79 ++++-
> include/linux/user_namespace.h | 2 +
> include/net/netfilter/xt_log.h | 6 +-
> kernel/printk.c | 642 ++++++++++++++++++++++++-----------------
> kernel/sysctl.c | 3 +-
> kernel/user.c | 3 +
> kernel/user_namespace.c | 4 +
> net/netfilter/xt_LOG.c | 4 +-
> 10 files changed, 493 insertions(+), 272 deletions(-)

2013-07-30 00:50:36

by Gu Zheng

[permalink] [raw]
Subject: Re: [PATCH 1/9] syslog_ns: add syslog_namespace and put/get_syslog_ns

On 07/29/2013 07:47 PM, Rui Xiang wrote:

> On 2013/7/29 17:40, Gu Zheng wrote:
>> Hi Rui,
>> Refer to inline:).
>>
> Hi Gu,
>
> Thanks for your attention.
>
>> On 07/29/2013 10:31 AM, Rui Xiang wrote:
>>
>>> Add a struct syslog_namespace which contains the necessary
>>> members for hanlding syslog and realize get_syslog_ns and
>>> put_syslog_ns API.
>>>
>>> Signed-off-by: Rui Xiang <[email protected]>
>>> ---
>>> include/linux/syslog.h | 68 ++++++++++++++++++++++++++++++++++++++++++++++++++
>>> kernel/printk.c | 7 ------
>>> 2 files changed, 68 insertions(+), 7 deletions(-)
>>>
>
> ...
>
>>> +
>>> +static inline void free_syslog_ns(struct kref *kref)
>>> +{
>>> + struct syslog_namespace *ns;
>>> + ns = container_of(kref, struct syslog_namespace, kref);
>>> +
>>> + kfree(ns->log_buf);
>>> + kfree(ns);
>>> +}
>>
>> This interface seems a bit ugly, why not use the format like put_syslog_ns()?
>>
>> static inline void free_syslog_ns(struct syslog_namespace *ns)
>>
>
> Free_syslog_ns is used in put_syslog_ns. And the kref_put function uses kref as
> a parameter for its relase funtion. You can see that from
> static inline int kref_put(struct kref *kref, void (*release)(struct kref *kref)).

Got it.

Regards,
Gu

>
> Thanks.
>
>>> +
>>> +static inline void put_syslog_ns(struct syslog_namespace *ns)
>>> +{
>>> + if (ns)
>>> + kref_put(&ns->kref, free_syslog_ns);
>>> +}
>>> +
>>>
>
>

2013-07-30 02:13:06

by Rui Xiang

[permalink] [raw]
Subject: Re: [PATCH 0/9] Add namespace support for syslog v2


Hi Eric,

Thanks for your attention.


On 2013/7/30 2:58, Eric W. Biederman wrote:
> Rui Xiang <[email protected]> writes:
>
>> This patchset introduces a system log namespace.
>
> The largest outstanding question is not answered. Can't we just fix
> iptables to log somehwere better than dmesg, and would that not entirely
> remove the need for this work?
>
I don't think there is any question. In this patchset, patch(1-8) implement
a mechanism for syslog_ns. And iptables is an actual scene to use ns. So we
can treat patch9 as a instance for syslog_ns.

Sorry for my negligence, I will add some detailed logs in next version.


Thanks.

> That question needs to be answered before we proceed down this path.
>
> Eric
>
>> It is the 2nd version. The link of the 1st version is
>> http://lwn.net/Articles/525728/. In that version, syslog_
>> namespace was added into nsproxy and created through a new
>> clone flag CLONE_SYSLOG when cloning a process.
>>
>> There were some discussion in last November about the 1st
>> version. This version used these important advice, and
>> referred to Serge's patch(http://lwn.net/Articles/525629/).
>>
>> Unlike the 1st version, in this patchset, syslog namespace
>> is tied to a user namespace. Add we must create a new user
>> ns before create a new syslog ns, because that will make
>> users have full capabilities in this new userns after
>> cloning a new user ns. The syslog namespace can be created
>> through a new command(11) to __NR_syslog syscall. That owe
>> to a new syslog flag SYSLOG_ACTION_NEW_NS.
>>
>> In syslog_namespace, some necessary identifiers for handling
>> syslog buf are containerized. When one container creates a
>> new syslog ns, individual buf will be allocated to store log
>> ownned this container.
>>
>> A new interface ns_printk is added to print the logs which
>> we want to see in the container. Through ns_printk, we can
>> get more logs related to a specific net ns, for instance,
>> iptables. Here we use it to report iptable logs per
>> contianer.
>>
>> Then default printk targeted at the init_syslog_ns will
>> continue to print out most kernel log to host.
>>
>> One task in a new syslog ns could affect only current
>> container through "dmesg", "dmesg -c" and /dev/kmsg
>> actions. The read/write interface such as /dev/kmsg,
>> /pro/kmsg and syslog syscall continue to be useful for
>> container users.
>>
>> This patchset is based on linus' linux tree.
>>
>> Rui Xiang (9):
>> syslog_ns: add syslog_namespace and put/get_syslog_ns
>> syslog_ns: add syslog_ns into user_namespace
>> syslog_ns: add init syslog_ns for global syslog
>> syslog_ns: make syslog handling per namespace
>> syslog_ns: make permisiion check per user namespace
>> syslog_ns: use init syslog_ns for console action
>> syslog_ns: implement function for creating syslog ns
>> syslog_ns: implement ns_printk for specific syslog_ns
>> netfilter: use ns_printk in iptable context
>>
>> fs/proc/kmsg.c | 17 +-
>> include/linux/printk.h | 5 +-
>> include/linux/syslog.h | 79 ++++-
>> include/linux/user_namespace.h | 2 +
>> include/net/netfilter/xt_log.h | 6 +-
>> kernel/printk.c | 642 ++++++++++++++++++++++++-----------------
>> kernel/sysctl.c | 3 +-
>> kernel/user.c | 3 +
>> kernel/user_namespace.c | 4 +
>> net/netfilter/xt_LOG.c | 4 +-
>> 10 files changed, 493 insertions(+), 272 deletions(-)
>
> .
>

2013-07-30 03:39:25

by Rui Xiang

[permalink] [raw]
Subject: Re: [PATCH 7/9] syslog_ns: implement function for creating syslog ns

On 2013/7/29 18:25, Gu Zheng wrote:
> Hi Rui,
>
> On 07/29/2013 10:31 AM, Rui Xiang wrote:
>
>> Add create_syslog_ns function to create a new ns. We
>> must create a user_ns before create a new syslog ns.
>> And then tie the new syslog_ns to current user_ns
>> instead of original syslog_ns which comes from
>> parent user_ns.

...

>> diff --git a/kernel/printk.c b/kernel/printk.c
>> index fd2d600..6b561db 100644
>> --- a/kernel/printk.c
>> +++ b/kernel/printk.c
>> @@ -384,6 +384,10 @@ static int check_syslog_permissions(int type, bool from_file,
>> || type == SYSLOG_ACTION_CONSOLE_LEVEL)
>> ns = &init_syslog_ns;
>>
>> + /* create a new syslog ns */
>> + if (type == SYSLOG_ACTION_NEW_NS)
>> + return 0;
>> +
>
> Don't we need further permission or caps check here? Return success directly seems sloppy.
>
CAP_SYSLOG is checked in create_syslog_ns, so I think we can return 0 temporarily.


2013-07-30 03:46:28

by Gu Zheng

[permalink] [raw]
Subject: Re: [PATCH 7/9] syslog_ns: implement function for creating syslog ns

On 07/30/2013 11:39 AM, Rui Xiang wrote:

> On 2013/7/29 18:25, Gu Zheng wrote:
>> Hi Rui,
>>
>> On 07/29/2013 10:31 AM, Rui Xiang wrote:
>>
>>> Add create_syslog_ns function to create a new ns. We
>>> must create a user_ns before create a new syslog ns.
>>> And then tie the new syslog_ns to current user_ns
>>> instead of original syslog_ns which comes from
>>> parent user_ns.
>
> ...
>
>>> diff --git a/kernel/printk.c b/kernel/printk.c
>>> index fd2d600..6b561db 100644
>>> --- a/kernel/printk.c
>>> +++ b/kernel/printk.c
>>> @@ -384,6 +384,10 @@ static int check_syslog_permissions(int type, bool from_file,
>>> || type == SYSLOG_ACTION_CONSOLE_LEVEL)
>>> ns = &init_syslog_ns;
>>>
>>> + /* create a new syslog ns */
>>> + if (type == SYSLOG_ACTION_NEW_NS)
>>> + return 0;
>>> +
>>
>> Don't we need further permission or caps check here? Return success directly seems sloppy.
>>
> CAP_SYSLOG is checked in create_syslog_ns, so I think we can return 0 temporarily.

If so, why not move the check here? IMO, permission checking is the earlier the better,
what's your opinion?

Regards,
Gu

>
>
>
>

2013-07-30 06:13:31

by Bruno Prémont

[permalink] [raw]
Subject: Re: [PATCH 0/9] Add namespace support for syslog v2

On Mon, 29 Jul 2013 11:58:23 -0700 Eric W. Biederman wrote:
> Rui Xiang <[email protected]> writes:
>
> > This patchset introduces a system log namespace.
>
> The largest outstanding question is not answered. Can't we just fix
> iptables to log somehwere better than dmesg, and would that not entirely
> remove the need for this work?

Doesn't NFLOG target + ulogd allow for this?

Though last I tried it seems the corresponding netlink resources are not
(fully) netns aware (unless I did something wrong in my setup).

Bruno


> That question needs to be answered before we proceed down this path.
>
> Eric

2013-08-01 01:35:24

by Gao feng

[permalink] [raw]
Subject: Re: [PATCH 4/9] syslog_ns: make syslog handling per namespace

On 07/29/2013 10:31 AM, Rui Xiang wrote:
> This patch makes syslog buf and other fields per
> namespace.
>
> Here use ns->log_buf(log_buf_len, logbuf_lock,
> log_first_seq, logbuf_lock, and so on) fields
> instead of global ones to handle syslog.
>
> Syslog interfaces such as /dev/kmsg, /proc/kmsg,
> and syslog syscall are all containerized for
> container users.
>

/dev/kmsg is used by the syslog api closelog, openlog, syslog, vsyslog,
this should be per user namespace, but seems in your patch,
the syslog message generated through these APIs on host can be exported
to the /dev/kmsg of container, is this want we want?

Thanks

2013-08-01 03:11:31

by Rui Xiang

[permalink] [raw]
Subject: Re: [PATCH 4/9] syslog_ns: make syslog handling per namespace

On 2013/8/1 9:36, Gao feng wrote:
> On 07/29/2013 10:31 AM, Rui Xiang wrote:
>> This patch makes syslog buf and other fields per
>> namespace.
>>
>> Here use ns->log_buf(log_buf_len, logbuf_lock,
>> log_first_seq, logbuf_lock, and so on) fields
>> instead of global ones to handle syslog.
>>
>> Syslog interfaces such as /dev/kmsg, /proc/kmsg,
>> and syslog syscall are all containerized for
>> container users.
>>
>
> /dev/kmsg is used by the syslog api closelog, openlog, syslog, vsyslog,
> this should be per user namespace, but seems in your patch,

Yes, /dev/kmsg is per user namespace, and per syslog ns, too.

> the syslog message generated through these APIs on host can be exported
> to the /dev/kmsg of container, is this want we want?
>
Ah.. I think your question targets at devkmsg_writev function, right?
You remind me that it's really an issue. Printk_emit in devkmsg_writev
should not use init_syslog_ns as its syslog_ns but current_user_ns->syslog_ns.

In 1st version, current_syslog_ns was used in vprintk_emit. In this version,
the interface vprintk_emit has changed, but this patch misses that.
I will fix it.


Thanks for your reminder. :)

2013-08-01 05:37:13

by Gao feng

[permalink] [raw]
Subject: Re: [PATCH 4/9] syslog_ns: make syslog handling per namespace

On 08/01/2013 11:10 AM, Rui Xiang wrote:
> On 2013/8/1 9:36, Gao feng wrote:
>> On 07/29/2013 10:31 AM, Rui Xiang wrote:
>>> This patch makes syslog buf and other fields per
>>> namespace.
>>>
>>> Here use ns->log_buf(log_buf_len, logbuf_lock,
>>> log_first_seq, logbuf_lock, and so on) fields
>>> instead of global ones to handle syslog.
>>>
>>> Syslog interfaces such as /dev/kmsg, /proc/kmsg,
>>> and syslog syscall are all containerized for
>>> container users.
>>>
>>
>> /dev/kmsg is used by the syslog api closelog, openlog, syslog, vsyslog,
>> this should be per user namespace, but seems in your patch,
>
> Yes, /dev/kmsg is per user namespace, and per syslog ns, too.
>
>> the syslog message generated through these APIs on host can be exported
>> to the /dev/kmsg of container, is this want we want?
>>
> Ah.. I think your question targets at devkmsg_writev function, right?

yep, another small problem, you forgot to remove the global logbuf_lock.

> You remind me that it's really an issue. Printk_emit in devkmsg_writev
> should not use init_syslog_ns as its syslog_ns but current_user_ns->syslog_ns.
>
> In 1st version, current_syslog_ns was used in vprintk_emit. In this version,
> the interface vprintk_emit has changed, but this patch misses that.
> I will fix it.
>

Ok, thanks!

2013-08-01 06:30:25

by Rui Xiang

[permalink] [raw]
Subject: Re: [PATCH 4/9] syslog_ns: make syslog handling per namespace

On 2013/8/1 13:38, Gao feng wrote:
> On 08/01/2013 11:10 AM, Rui Xiang wrote:
>> On 2013/8/1 9:36, Gao feng wrote:
>>> On 07/29/2013 10:31 AM, Rui Xiang wrote:
>>>> This patch makes syslog buf and other fields per
>>>> namespace.
>>>>
>>>> Here use ns->log_buf(log_buf_len, logbuf_lock,
>>>> log_first_seq, logbuf_lock, and so on) fields
>>>> instead of global ones to handle syslog.
>>>>
>>>> Syslog interfaces such as /dev/kmsg, /proc/kmsg,
>>>> and syslog syscall are all containerized for
>>>> container users.
>>>>
>>>
>>> /dev/kmsg is used by the syslog api closelog, openlog, syslog, vsyslog,
>>> this should be per user namespace, but seems in your patch,
>>
>> Yes, /dev/kmsg is per user namespace, and per syslog ns, too.
>>
>>> the syslog message generated through these APIs on host can be exported
>>> to the /dev/kmsg of container, is this want we want?
>>>
>> Ah.. I think your question targets at devkmsg_writev function, right?
>
> yep, another small problem, you forgot to remove the global logbuf_lock.
>

Got it.



Thanks.