2010-01-05 02:15:32

by Andi Kleen

[permalink] [raw]
Subject: [PATCH] [0/9] SYSCTL: Use RCU to avoid races with string sysctls v2


v2 version that addresses all the earlier review comments
minus patches that were already merged.

I think this one is ready for merge now. Andrew, could you please
take it?

Signed-off-by: Andi Kleen <[email protected]>

---

With BKL-less sysctls most of the writable string sysctls are racy. There
is no locking on the reader side, so a reader could see an inconsistent
string or worse miss the terminating null and walk of beyond it.

This patch kit adds a new "rcu string" variant to avoid these
problems and convers the racy users. One the writer side the strings are
always copied to new memory and the readers use rcu_read_lock()
to get a stable view. For readers who access the string over
sleeps the reader copies the string.

This is all hidden in a new generic "rcu_string" ADT which can be also
used for other purposes.

This finally implements all the letters in RCU, most other users
leave out the 'C'.

I left some obscure users in architectures (sparc, mips) alone and audited
all of the others. The sparc reboot_cmd one has references to asm files
which I didn't want to touch and the mips variant seemd just too obscure.

All the others are not racy.

-Andi


2010-01-05 02:15:36

by Andi Kleen

[permalink] [raw]
Subject: [PATCH] [4/9] SYSCTL: Use RCU strings for core_pattern sysctl


Also saves ~220 bytes in the data segment for default kernels.

As a bonus this removes one use of the BKL.

Signed-off-by: Andi Kleen <[email protected]>

---
fs/exec.c | 11 +++++------
kernel/sysctl.c | 6 +++---
2 files changed, 8 insertions(+), 9 deletions(-)

Index: linux-2.6.33-rc2-ak/fs/exec.c
===================================================================
--- linux-2.6.33-rc2-ak.orig/fs/exec.c
+++ linux-2.6.33-rc2-ak/fs/exec.c
@@ -62,7 +62,7 @@
#include "internal.h"

int core_uses_pid;
-char core_pattern[CORENAME_MAX_SIZE] = "core";
+char *core_pattern = "core";
unsigned int core_pipe_limit;
int suid_dumpable = 0;

@@ -1421,7 +1421,7 @@ EXPORT_SYMBOL(set_binfmt);
static int format_corename(char *corename, long signr)
{
const struct cred *cred = current_cred();
- const char *pat_ptr = core_pattern;
+ const char *pat_ptr = rcu_dereference(core_pattern);
int ispipe = (*pat_ptr == '|');
char *out_ptr = corename;
char *const out_end = corename + CORENAME_MAX_SIZE;
@@ -1825,12 +1825,11 @@ void do_coredump(long signr, int exit_co
clear_thread_flag(TIF_SIGPENDING);

/*
- * lock_kernel() because format_corename() is controlled by sysctl, which
- * uses lock_kernel()
+ * Protect corename by RCU vs proc_rcu_string()
*/
- lock_kernel();
+ rcu_read_lock();
ispipe = format_corename(corename, signr);
- unlock_kernel();
+ rcu_read_unlock();

if ((!ispipe) && (cprm.limit < binfmt->min_coredump))
goto fail_unlock;
Index: linux-2.6.33-rc2-ak/kernel/sysctl.c
===================================================================
--- linux-2.6.33-rc2-ak.orig/kernel/sysctl.c
+++ linux-2.6.33-rc2-ak/kernel/sysctl.c
@@ -75,7 +75,7 @@ extern int sysctl_oom_dump_tasks;
extern int max_threads;
extern int core_uses_pid;
extern int suid_dumpable;
-extern char core_pattern[];
+extern char *core_pattern;
extern unsigned int core_pipe_limit;
extern int pid_max;
extern int min_free_kbytes;
@@ -399,10 +399,10 @@ static struct ctl_table kern_table[] = {
},
{
.procname = "core_pattern",
- .data = core_pattern,
+ .data = &core_pattern,
.maxlen = CORENAME_MAX_SIZE,
.mode = 0644,
- .proc_handler = proc_dostring,
+ .proc_handler = proc_rcu_string,
},
{
.procname = "core_pipe_limit",

2010-01-05 02:15:38

by Andi Kleen

[permalink] [raw]
Subject: [PATCH] [7/9] SYSCTL: Convert poweroff_command to proc_rcu_string


Avoids races with lockless sysctl.

Also saves ~220 bytes in the data segment for default kernels.

Signed-off-by: Andi Kleen <[email protected]>

---
include/linux/reboot.h | 2 +-
kernel/sys.c | 8 ++++++--
kernel/sysctl.c | 2 +-
3 files changed, 8 insertions(+), 4 deletions(-)

Index: linux-2.6.33-rc2-ak/kernel/sys.c
===================================================================
--- linux-2.6.33-rc2-ak.orig/kernel/sys.c
+++ linux-2.6.33-rc2-ak/kernel/sys.c
@@ -1597,7 +1597,7 @@ SYSCALL_DEFINE3(getcpu, unsigned __user
return err ? -EFAULT : 0;
}

-char poweroff_cmd[POWEROFF_CMD_PATH_LEN] = "/sbin/poweroff";
+char *poweroff_cmd = "/sbin/poweroff";

static void argv_cleanup(char **argv, char **envp)
{
@@ -1614,7 +1614,7 @@ static void argv_cleanup(char **argv, ch
int orderly_poweroff(bool force)
{
int argc;
- char **argv = argv_split(GFP_ATOMIC, poweroff_cmd, &argc);
+ char **argv;
static char *envp[] = {
"HOME=/",
"PATH=/sbin:/bin:/usr/sbin:/usr/bin",
@@ -1623,6 +1623,10 @@ int orderly_poweroff(bool force)
int ret = -ENOMEM;
struct subprocess_info *info;

+ /* RCU protection for poweroff_cmd */
+ rcu_read_lock();
+ argv = argv_split(GFP_ATOMIC, rcu_dereference(poweroff_cmd), &argc);
+ rcu_read_unlock();
if (argv == NULL) {
printk(KERN_WARNING "%s failed to allocate memory for \"%s\"\n",
__func__, poweroff_cmd);
Index: linux-2.6.33-rc2-ak/include/linux/reboot.h
===================================================================
--- linux-2.6.33-rc2-ak.orig/include/linux/reboot.h
+++ linux-2.6.33-rc2-ak/include/linux/reboot.h
@@ -67,7 +67,7 @@ extern void kernel_power_off(void);
void ctrl_alt_del(void);

#define POWEROFF_CMD_PATH_LEN 256
-extern char poweroff_cmd[POWEROFF_CMD_PATH_LEN];
+extern char *poweroff_cmd;

extern int orderly_poweroff(bool force);

Index: linux-2.6.33-rc2-ak/kernel/sysctl.c
===================================================================
--- linux-2.6.33-rc2-ak.orig/kernel/sysctl.c
+++ linux-2.6.33-rc2-ak/kernel/sysctl.c
@@ -871,7 +871,7 @@ static struct ctl_table kern_table[] = {
.data = &poweroff_cmd,
.maxlen = POWEROFF_CMD_PATH_LEN,
.mode = 0644,
- .proc_handler = proc_dostring,
+ .proc_handler = proc_rcu_string,
},
#ifdef CONFIG_KEYS
{

2010-01-05 02:16:20

by Andi Kleen

[permalink] [raw]
Subject: [PATCH] [9/9] SYSCTL: Use RCU protected sysctl for ocfs group add helper


This avoids races with unlocked sysctl()

Also saves ~220 bytes in the data segment.

Cc: [email protected]
Signed-off-by: Andi Kleen <[email protected]>

---
fs/ocfs2/stackglue.c | 17 +++++++++++++----
1 file changed, 13 insertions(+), 4 deletions(-)

Index: linux-2.6.33-rc2-ak/fs/ocfs2/stackglue.c
===================================================================
--- linux-2.6.33-rc2-ak.orig/fs/ocfs2/stackglue.c
+++ linux-2.6.33-rc2-ak/fs/ocfs2/stackglue.c
@@ -27,6 +27,7 @@
#include <linux/kobject.h>
#include <linux/sysfs.h>
#include <linux/sysctl.h>
+#include <linux/rcustring.h>

#include "ocfs2_fs.h"

@@ -40,7 +41,7 @@ static struct ocfs2_locking_protocol *lp
static DEFINE_SPINLOCK(ocfs2_stack_lock);
static LIST_HEAD(ocfs2_stack_list);
static char cluster_stack_name[OCFS2_STACK_LABEL_LEN + 1];
-static char ocfs2_hb_ctl_path[OCFS2_MAX_HB_CTL_PATH] = "/sbin/ocfs2_hb_ctl";
+static char *ocfs2_hb_ctl_path = "/sbin/ocfs2_hb_ctl";

/*
* The stack currently in use. If not null, active_stack->sp_count > 0,
@@ -395,8 +396,15 @@ static void ocfs2_leave_group(const char
{
int ret;
char *argv[5], *envp[3];
+ char *helper;

- argv[0] = ocfs2_hb_ctl_path;
+ helper = access_rcu_string(&ocfs2_hb_ctl_path, OCFS2_MAX_HB_CTL_PATH, GFP_KERNEL);
+ if (!helper) {
+ printk(KERN_ERR "ocfs2_leave_group: no memory\n");
+ return;
+ }
+
+ argv[0] = helper;
argv[1] = "-K";
argv[2] = "-u";
argv[3] = (char *)group;
@@ -414,6 +422,7 @@ static void ocfs2_leave_group(const char
"\"%s %s %s %s\"\n",
ret, argv[0], argv[1], argv[2], argv[3]);
}
+ kfree(helper);
}

/*
@@ -621,10 +630,10 @@ error:
static ctl_table ocfs2_nm_table[] = {
{
.procname = "hb_ctl_path",
- .data = ocfs2_hb_ctl_path,
+ .data = &ocfs2_hb_ctl_path,
.maxlen = OCFS2_MAX_HB_CTL_PATH,
.mode = 0644,
- .proc_handler = proc_dostring,
+ .proc_handler = proc_rcu_string,
},
{ }
};

2010-01-05 02:15:34

by Andi Kleen

[permalink] [raw]
Subject: [PATCH] [1/9] Add rcustring ADT for RCU protected strings v2


Add a little ADT for RCU protected strings. RCU is a convenient
way to manage modifications to read-often-write-seldom strings.
Add some helper functions to make this more straight forward.

Used by follow-on patches to implement RCU protected sysctl strings.

* General rules:
* Reader has to use rcu_read_lock() and not sleep while accessing the string,
* or alternatively get a copy with access_rcu_string()
* Writer needs an own lock against each other.
* Each modification should allocate a new string first and free the old
* one with free_rcu_string()
* In writers use rcu_assign_pointer to publicize the updated string to
* global readers.
* The size passed to access_rcu_string() must be the same as passed
* to alloc_rcu_string() and be known in advance. Don't use strlen()!
*
* For sysctls also see proc_rcu_string() as a convenient wrapper

v2: Use rcu_dereference correctly

Cc: [email protected]
Signed-off-by: Andi Kleen <[email protected]>

---
include/linux/rcustring.h | 20 ++++++++
lib/Makefile | 3 -
lib/rcustring.c | 115 ++++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 137 insertions(+), 1 deletion(-)

Index: linux-2.6.33-rc2-ak/include/linux/rcustring.h
===================================================================
--- /dev/null
+++ linux-2.6.33-rc2-ak/include/linux/rcustring.h
@@ -0,0 +1,20 @@
+#ifndef _RCUSTRING_H
+#define _RCUSTRING_H 1
+
+#include <linux/gfp.h>
+#include <linux/rcupdate.h>
+
+/*
+ * Simple wrapper to manage strings by RCU.
+ */
+
+extern char *alloc_rcu_string(int size, gfp_t gfp);
+extern void free_rcu_string(const char *string);
+
+/*
+ * size must be the same as alloc_rcu_string, don't
+ * use strlen on str!
+ */
+extern char *access_rcu_string(char **str, int size, gfp_t gfp);
+
+#endif
Index: linux-2.6.33-rc2-ak/lib/rcustring.c
===================================================================
--- /dev/null
+++ linux-2.6.33-rc2-ak/lib/rcustring.c
@@ -0,0 +1,115 @@
+/*
+ * Manage strings by Read-Copy-Update. This is useful for global strings
+ * that change only very rarely, but are read often.
+ *
+ * Author: Andi Kleen
+ *
+ * General rules:
+ * Reader has to use rcu_read_lock() and not sleep while accessing the string,
+ * or alternatively get a copy with access_rcu_string()
+ * Writer needs an own lock against each other.
+ * Each modification should allocate a new string first and free the old
+ * one with free_rcu_string()
+ * In writers use rcu_assign_pointer to publicize the updated string to
+ * global readers.
+ * The size passed to access_rcu_string() must be the same as passed
+ * to alloc_rcu_string() and be known in advance. Don't use strlen()!
+ * The pointer passed to access_rcu_string must be to the variable modified
+ * by the writer.
+ *
+ * For sysctls also see proc_rcu_string() as a convenient wrapper
+ *
+ * Typical example:
+ * #define MAX_GLOBAL_SIZE ...
+ * char *global = "default";
+ *
+ * Rare writer:
+ * char *old, *new;
+ * DECLARE_MUTEX(writer_lock);
+ * mutex_lock(&writer_lock);
+ * new = alloc_rcu_string(MAX_GLOBAL_SIZE, GFP_KERNEL);
+ * if (!new) {
+ * mutex_unlock(&writer_lock);
+ * return -ENOMEM;
+ * }
+ * strlcpy(new, new_value, MAX_GLOBAL_SIZE);
+ * old = global;
+ * rcu_assign_pointer(global, new);
+ * mutex_unlock(&writer_lock);
+ * free_rcu_string(old);
+ *
+ * Sleepy reader:
+ * char *str = access_rcu_string(&global, MAX_GLOBAL_SIZE, GFP_KERNEL);
+ * if (!str)
+ * return -ENOMEM;
+ * ... use str while sleeping ...
+ * kfree(string);
+ *
+ * Non sleepy reader:
+ * rcu_read_lock();
+ * str = rcu_dereference(&global);
+ * ... use str without sleeping ...
+ * rcu_read_unlock();
+ *
+ * Note this code could be relatively easily generalized for other kinds
+ * of non-atomic data, but this initial version only handles strings.
+ * Only need to change the strlcpy() below to memcpy()
+ */
+#include <linux/kernel.h>
+#include <linux/rcustring.h>
+#include <linux/slab.h>
+#include <linux/rcupdate.h>
+#include <linux/module.h>
+#include <linux/string.h>
+
+struct rcu_string {
+ struct rcu_head rcu;
+ char str[0];
+};
+
+char *alloc_rcu_string(int size, gfp_t gfp)
+{
+ struct rcu_string *rs = kmalloc(sizeof(struct rcu_string) + size, gfp);
+ if (!rs)
+ return NULL;
+ return rs->str;
+}
+EXPORT_SYMBOL(alloc_rcu_string);
+
+static void do_free_rcu_string(struct rcu_head *h)
+{
+ kfree(container_of(h, struct rcu_string, rcu));
+}
+
+static inline struct rcu_string *str_to_rcustr(const char *str)
+{
+ /*
+ * Opencoded container_of because the strict type checking
+ * in normal container_of cannot deal with char str[0] vs char *str.
+ */
+ return (struct rcu_string *)(str - offsetof(struct rcu_string, str));
+}
+
+void free_rcu_string(const char *str)
+{
+ struct rcu_string *rs = str_to_rcustr(str);
+ call_rcu(&rs->rcu, do_free_rcu_string);
+}
+EXPORT_SYMBOL(free_rcu_string);
+
+/*
+ * Get a local private copy of a RCU protected string.
+ * Mostly useful to get a string that is stable while sleeping.
+ * Caller must free returned string.
+ */
+char *access_rcu_string(char **str, int size, gfp_t gfp)
+{
+ char *copy = kmalloc(size, gfp);
+ if (!str)
+ return NULL;
+ rcu_read_lock();
+ strlcpy(copy, rcu_dereference(*str), size);
+ rcu_read_unlock();
+ return copy;
+}
+EXPORT_SYMBOL(access_rcu_string);
Index: linux-2.6.33-rc2-ak/lib/Makefile
===================================================================
--- linux-2.6.33-rc2-ak.orig/lib/Makefile
+++ linux-2.6.33-rc2-ak/lib/Makefile
@@ -12,7 +12,8 @@ lib-y := ctype.o string.o vsprintf.o cmd
idr.o int_sqrt.o extable.o prio_tree.o \
sha1.o irq_regs.o reciprocal_div.o argv_split.o \
proportions.o prio_heap.o ratelimit.o show_mem.o \
- is_single_threaded.o plist.o decompress.o flex_array.o
+ is_single_threaded.o plist.o decompress.o flex_array.o \
+ rcustring.o

lib-$(CONFIG_MMU) += ioremap.o
lib-$(CONFIG_SMP) += cpumask.o

2010-01-05 02:17:51

by Andi Kleen

[permalink] [raw]
Subject: [PATCH] [3/9] SYSCTL: Add proc_rcu_string to manage sysctls using rcu strings


Add a helper to use the new rcu strings for managing access
to text sysctls. Conversions will be in follow-on patches.

An alternative would be to use seqlocks here, but RCU seemed
cleaner.

Signed-off-by: Andi Kleen <[email protected]>

---
include/linux/sysctl.h | 2 +
kernel/sysctl.c | 66 +++++++++++++++++++++++++++++++++++++++++++++++++
kernel/sysctl_check.c | 1
3 files changed, 69 insertions(+)

Index: linux-2.6.33-rc2-ak/include/linux/sysctl.h
===================================================================
--- linux-2.6.33-rc2-ak.orig/include/linux/sysctl.h
+++ linux-2.6.33-rc2-ak/include/linux/sysctl.h
@@ -969,6 +969,8 @@ typedef int proc_handler (struct ctl_tab

extern int proc_dostring(struct ctl_table *, int,
void __user *, size_t *, loff_t *);
+extern int proc_rcu_string(struct ctl_table *, int,
+ void __user *, size_t *, loff_t *);
extern int proc_dointvec(struct ctl_table *, int,
void __user *, size_t *, loff_t *);
extern int proc_dointvec_minmax(struct ctl_table *, int,
Index: linux-2.6.33-rc2-ak/kernel/sysctl.c
===================================================================
--- linux-2.6.33-rc2-ak.orig/kernel/sysctl.c
+++ linux-2.6.33-rc2-ak/kernel/sysctl.c
@@ -50,6 +50,7 @@
#include <linux/ftrace.h>
#include <linux/slow-work.h>
#include <linux/perf_event.h>
+#include <linux/rcustring.h>

#include <asm/uaccess.h>
#include <asm/processor.h>
@@ -2016,6 +2017,60 @@ static int _proc_do_string(void* data, i
}

/**
+ * proc_rcu_string - sysctl string with rcu protection
+ * @table: the sysctl table
+ * @write: %TRUE if this is a write to the sysctl file
+ * @buffer: the user buffer
+ * @lenp: the size of the user buffer
+ * @ppos: file position
+ *
+ * Handle a string sysctl similar to proc_dostring.
+ * The main difference is that the data pointer in the table
+ * points to a pointer to a string. The string should be initially
+ * pointing to a statically allocated (as a C object, not on the heap)
+ * default. When it is replaced old uses will be protected by
+ * RCU. The reader should use rcu_read_lock()/unlock() or
+ * access_rcu_string().
+ */
+int proc_rcu_string(struct ctl_table *table, int write,
+ void __user *buffer, size_t *lenp, loff_t *ppos)
+{
+ int ret;
+
+ if (write) {
+ /* protect writers against each other */
+ static DEFINE_MUTEX(rcu_string_mutex);
+ char *old;
+ char *new;
+
+ new = alloc_rcu_string(table->maxlen, GFP_KERNEL);
+ if (!new)
+ return -ENOMEM;
+ mutex_lock(&rcu_string_mutex);
+ old = *(char **)(table->data);
+ strcpy(new, old);
+ ret = _proc_do_string(new, table->maxlen, write, buffer, lenp, ppos);
+ rcu_assign_pointer(*(char **)(table->data), new);
+ /*
+ * For the first initialization allow constant strings.
+ */
+ if (!kernel_address((unsigned long)old))
+ free_rcu_string(old);
+ mutex_unlock(&rcu_string_mutex);
+ } else {
+ char *str;
+
+ str = access_rcu_string((char **)(table->data), table->maxlen,
+ GFP_KERNEL);
+ if (!str)
+ return -ENOMEM;
+ ret = _proc_do_string(str, table->maxlen, write, buffer, lenp, ppos);
+ kfree(str);
+ }
+ return ret;
+}
+
+/**
* proc_dostring - read a string sysctl
* @table: the sysctl table
* @write: %TRUE if this is a write to the sysctl file
@@ -2030,6 +2085,10 @@ static int _proc_do_string(void* data, i
* and a newline '\n' is added. It is truncated if the buffer is
* not large enough.
*
+ * WARNING: this should be only used for read only strings
+ * or when you have a wrapper with special locking. Otherwise
+ * use proc_rcu_string to avoid races with the consumer.
+ *
* Returns 0 on success.
*/
int proc_dostring(struct ctl_table *table, int write,
@@ -2614,6 +2673,12 @@ int proc_dostring(struct ctl_table *tabl
return -ENOSYS;
}

+int proc_rcu_string(struct ctl_table *table, int write,
+ void __user *buffer, size_t *lenp, loff_t *ppos)
+{
+ return -ENOSYS;
+}
+
int proc_dointvec(struct ctl_table *table, int write,
void __user *buffer, size_t *lenp, loff_t *ppos)
{
@@ -2670,6 +2735,7 @@ EXPORT_SYMBOL(proc_dointvec_minmax);
EXPORT_SYMBOL(proc_dointvec_userhz_jiffies);
EXPORT_SYMBOL(proc_dointvec_ms_jiffies);
EXPORT_SYMBOL(proc_dostring);
+EXPORT_SYMBOL(proc_rcu_string);
EXPORT_SYMBOL(proc_doulongvec_minmax);
EXPORT_SYMBOL(proc_doulongvec_ms_jiffies_minmax);
EXPORT_SYMBOL(register_sysctl_table);
Index: linux-2.6.33-rc2-ak/kernel/sysctl_check.c
===================================================================
--- linux-2.6.33-rc2-ak.orig/kernel/sysctl_check.c
+++ linux-2.6.33-rc2-ak/kernel/sysctl_check.c
@@ -131,6 +131,7 @@ int sysctl_check_table(struct nsproxy *n
set_fail(&fail, table, "Directory with extra2");
} else {
if ((table->proc_handler == proc_dostring) ||
+ (table->proc_handler == proc_rcu_string) ||
(table->proc_handler == proc_dointvec) ||
(table->proc_handler == proc_dointvec_minmax) ||
(table->proc_handler == proc_dointvec_jiffies) ||

2010-01-05 02:16:57

by Andi Kleen

[permalink] [raw]
Subject: [PATCH] [5/9] SYSCTL: Add call_usermodehelper_cleanup()


This is the same as call_usermodehelper(), but with an cleanup callback.
Needed for some of the followon proc_rcu_string() users.

This avoids open coding this.

Signed-off-by: Andi Kleen <[email protected]>

---
include/linux/kmod.h | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)

Index: linux-2.6.33-rc2-ak/include/linux/kmod.h
===================================================================
--- linux-2.6.33-rc2-ak.orig/include/linux/kmod.h
+++ linux-2.6.33-rc2-ak/include/linux/kmod.h
@@ -72,7 +72,8 @@ int call_usermodehelper_exec(struct subp
void call_usermodehelper_freeinfo(struct subprocess_info *info);

static inline int
-call_usermodehelper(char *path, char **argv, char **envp, enum umh_wait wait)
+call_usermodehelper_cleanup(char *path, char **argv, char **envp, enum umh_wait wait,
+ void (*cleanup)(char **, char **))
{
struct subprocess_info *info;
gfp_t gfp_mask = (wait == UMH_NO_WAIT) ? GFP_ATOMIC : GFP_KERNEL;
@@ -80,10 +81,18 @@ call_usermodehelper(char *path, char **a
info = call_usermodehelper_setup(path, argv, envp, gfp_mask);
if (info == NULL)
return -ENOMEM;
+ if (cleanup)
+ call_usermodehelper_setcleanup(info, cleanup);
return call_usermodehelper_exec(info, wait);
}

static inline int
+call_usermodehelper(char *path, char **argv, char **envp, enum umh_wait wait)
+{
+ return call_usermodehelper_cleanup(path, argv, envp, wait, NULL);
+}
+
+static inline int
call_usermodehelper_keys(char *path, char **argv, char **envp,
struct key *session_keyring, enum umh_wait wait)
{

2010-01-05 02:16:30

by Andi Kleen

[permalink] [raw]
Subject: [PATCH] [6/9] SYSCTL: Convert modprobe_path to proc_rcu_string()


Avoids races with lockless sysctl

Also saves ~220 bytes in the data segment for default kernels.

Signed-off-by: Andi Kleen <[email protected]>

---
kernel/kmod.c | 36 ++++++++++++++++++++++++++++--------
kernel/sysctl.c | 4 ++--
2 files changed, 30 insertions(+), 10 deletions(-)

Index: linux-2.6.33-rc2-ak/kernel/kmod.c
===================================================================
--- linux-2.6.33-rc2-ak.orig/kernel/kmod.c
+++ linux-2.6.33-rc2-ak/kernel/kmod.c
@@ -35,6 +35,7 @@
#include <linux/resource.h>
#include <linux/notifier.h>
#include <linux/suspend.h>
+#include <linux/rcustring.h>
#include <asm/uaccess.h>

#include <trace/events/module.h>
@@ -48,7 +49,12 @@ static struct workqueue_struct *khelper_
/*
modprobe_path is set via /proc/sys.
*/
-char modprobe_path[KMOD_PATH_LEN] = "/sbin/modprobe";
+char *modprobe_path = "/sbin/modprobe";
+
+static void free_arg(char **argv, char **env)
+{
+ kfree(argv[0]);
+}

/**
* __request_module - try to load a kernel module
@@ -71,7 +77,8 @@ int __request_module(bool wait, const ch
char module_name[MODULE_NAME_LEN];
unsigned int max_modprobes;
int ret;
- char *argv[] = { modprobe_path, "-q", "--", module_name, NULL };
+ char *mp_copy;
+ char *argv[] = { NULL, "-q", "--", module_name, NULL };
static char *envp[] = { "HOME=/",
"TERM=linux",
"PATH=/sbin:/usr/sbin:/bin:/usr/bin",
@@ -80,15 +87,24 @@ int __request_module(bool wait, const ch
#define MAX_KMOD_CONCURRENT 50 /* Completely arbitrary value - KAO */
static int kmod_loop_msg;

+ /* Get a stable-over-sleeps private copy of modprobe_path */
+ mp_copy = access_rcu_string(&modprobe_path, KMOD_PATH_LEN,
+ wait ? GFP_KERNEL : GFP_ATOMIC);
+ if (!mp_copy)
+ return -ENOMEM;
+ argv[0] = mp_copy;
+
va_start(args, fmt);
ret = vsnprintf(module_name, MODULE_NAME_LEN, fmt, args);
va_end(args);
- if (ret >= MODULE_NAME_LEN)
- return -ENAMETOOLONG;
+ if (ret >= MODULE_NAME_LEN) {
+ ret = -ENAMETOOLONG;
+ goto error;
+ }

ret = security_kernel_module_request(module_name);
if (ret)
- return ret;
+ goto error;

/* If modprobe needs a service that is in a module, we get a recursive
* loop. Limit the number of running kmod threads to max_threads/2 or
@@ -111,14 +127,18 @@ int __request_module(bool wait, const ch
"request_module: runaway loop modprobe %s\n",
module_name);
atomic_dec(&kmod_concurrent);
- return -ENOMEM;
+ ret = -ENOMEM;
+ goto error;
}

trace_module_request(module_name, wait, _RET_IP_);

- ret = call_usermodehelper(modprobe_path, argv, envp,
- wait ? UMH_WAIT_PROC : UMH_WAIT_EXEC);
+ ret = call_usermodehelper_cleanup(mp_copy, argv, envp,
+ wait ? UMH_WAIT_PROC : UMH_WAIT_EXEC, free_arg);
+ mp_copy = NULL; /* free_arg frees */
atomic_dec(&kmod_concurrent);
+error:
+ kfree(mp_copy);
return ret;
}
EXPORT_SYMBOL(__request_module);
Index: linux-2.6.33-rc2-ak/kernel/sysctl.c
===================================================================
--- linux-2.6.33-rc2-ak.orig/kernel/sysctl.c
+++ linux-2.6.33-rc2-ak/kernel/sysctl.c
@@ -121,7 +121,7 @@ static int min_percpu_pagelist_fract = 8
static int ngroups_max = NGROUPS_MAX;

#ifdef CONFIG_MODULES
-extern char modprobe_path[];
+extern char *modprobe_path;
extern int modules_disabled;
#endif
#ifdef CONFIG_CHR_DEV_SG
@@ -532,7 +532,7 @@ static struct ctl_table kern_table[] = {
.data = &modprobe_path,
.maxlen = KMOD_PATH_LEN,
.mode = 0644,
- .proc_handler = proc_dostring,
+ .proc_handler = proc_rcu_string,
},
{
.procname = "modules_disabled",

2010-01-05 02:16:46

by Andi Kleen

[permalink] [raw]
Subject: [PATCH] [8/9] SYSCTL: Convert hotplug helper string to proc_rcu_string()


This avoids races with lockless sysctl.

Also saves ~220 bytes in the data segment for default kernels.

I also moved the code into a separate function because the original
was very long.

Acked-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Andi Kleen <[email protected]>

---
include/linux/kobject.h | 2 -
kernel/sysctl.c | 2 -
lib/kobject_uevent.c | 50 ++++++++++++++++++++++++++++++------------------
3 files changed, 34 insertions(+), 20 deletions(-)

Index: linux-2.6.33-rc2-ak/include/linux/kobject.h
===================================================================
--- linux-2.6.33-rc2-ak.orig/include/linux/kobject.h
+++ linux-2.6.33-rc2-ak/include/linux/kobject.h
@@ -31,7 +31,7 @@
#define UEVENT_BUFFER_SIZE 2048 /* buffer for the variables */

/* path to the userspace helper executed on an event */
-extern char uevent_helper[];
+extern char *uevent_helper;

/* counter to tag the uevent, read only except for the kobject core */
extern u64 uevent_seqnum;
Index: linux-2.6.33-rc2-ak/kernel/sysctl.c
===================================================================
--- linux-2.6.33-rc2-ak.orig/kernel/sysctl.c
+++ linux-2.6.33-rc2-ak/kernel/sysctl.c
@@ -551,7 +551,7 @@ static struct ctl_table kern_table[] = {
.data = &uevent_helper,
.maxlen = UEVENT_HELPER_PATH_LEN,
.mode = 0644,
- .proc_handler = proc_dostring,
+ .proc_handler = proc_rcu_string,
},
#endif
#ifdef CONFIG_CHR_DEV_SG
Index: linux-2.6.33-rc2-ak/lib/kobject_uevent.c
===================================================================
--- linux-2.6.33-rc2-ak.orig/lib/kobject_uevent.c
+++ linux-2.6.33-rc2-ak/lib/kobject_uevent.c
@@ -22,11 +22,12 @@
#include <linux/socket.h>
#include <linux/skbuff.h>
#include <linux/netlink.h>
+#include <linux/rcustring.h>
#include <net/sock.h>


u64 uevent_seqnum;
-char uevent_helper[UEVENT_HELPER_PATH_LEN] = CONFIG_UEVENT_HELPER_PATH;
+char *uevent_helper = CONFIG_UEVENT_HELPER_PATH;
static DEFINE_SPINLOCK(sequence_lock);
#if defined(CONFIG_NET)
static struct sock *uevent_sock;
@@ -76,6 +77,34 @@ out:
return ret;
}

+/* Call an external helper executable. */
+static int uevent_call_helper(const char *subsystem, struct kobj_uevent_env *env)
+{
+ char *argv[3];
+ char *helper;
+ int retval;
+
+ helper = access_rcu_string(&uevent_helper, UEVENT_HELPER_PATH_LEN, GFP_KERNEL);
+ if (!helper)
+ return -ENOMEM;
+
+ retval = -E2BIG;
+ argv[0] = helper;
+ argv[1] = (char *)subsystem;
+ argv[2] = NULL;
+ retval = add_uevent_var(env, "HOME=/");
+ if (retval)
+ goto error;
+ retval = add_uevent_var(env, "PATH=/sbin:/bin:/usr/sbin:/usr/bin");
+ if (retval)
+ goto error;
+
+ retval = call_usermodehelper(argv[0], argv, env->envp, UMH_WAIT_EXEC);
+error:
+ kfree(helper);
+ return retval;
+}
+
/**
* kobject_uevent_env - send an uevent with environmental data
*
@@ -243,23 +272,8 @@ int kobject_uevent_env(struct kobject *k
#endif

/* call uevent_helper, usually only enabled during early boot */
- if (uevent_helper[0]) {
- char *argv [3];
-
- argv [0] = uevent_helper;
- argv [1] = (char *)subsystem;
- argv [2] = NULL;
- retval = add_uevent_var(env, "HOME=/");
- if (retval)
- goto exit;
- retval = add_uevent_var(env,
- "PATH=/sbin:/bin:/usr/sbin:/usr/bin");
- if (retval)
- goto exit;
-
- retval = call_usermodehelper(argv[0], argv,
- env->envp, UMH_WAIT_EXEC);
- }
+ if (uevent_helper[0])
+ retval = uevent_call_helper(subsystem, env);

exit:
kfree(devpath);

2010-01-05 02:17:37

by Andi Kleen

[permalink] [raw]
Subject: [PATCH] [2/9] Add a kernel_address() that works for data too


Add a variant of kernel_text_address() that includes kernel data.

Assumes kernel is _text ... _end - init section. True everywhere?

Cc: [email protected]
Signed-off-by: Andi Kleen <[email protected]>

---
include/linux/kernel.h | 1 +
kernel/extable.c | 18 ++++++++++++++++++
2 files changed, 19 insertions(+)

Index: linux-2.6.33-rc2-ak/include/linux/kernel.h
===================================================================
--- linux-2.6.33-rc2-ak.orig/include/linux/kernel.h
+++ linux-2.6.33-rc2-ak/include/linux/kernel.h
@@ -205,6 +205,7 @@ extern unsigned long long memparse(const
extern int core_kernel_text(unsigned long addr);
extern int __kernel_text_address(unsigned long addr);
extern int kernel_text_address(unsigned long addr);
+extern int kernel_address(unsigned long addr);
extern int func_ptr_is_kernel_text(void *ptr);

struct pid;
Index: linux-2.6.33-rc2-ak/kernel/extable.c
===================================================================
--- linux-2.6.33-rc2-ak.orig/kernel/extable.c
+++ linux-2.6.33-rc2-ak/kernel/extable.c
@@ -72,6 +72,18 @@ int core_kernel_text(unsigned long addr)
return 0;
}

+static int core_kernel_address(unsigned long addr)
+{
+ if ((addr >= (unsigned long)_text &&
+ addr <= (unsigned long)_end)) {
+ if (addr >= (unsigned long)__init_begin &&
+ addr < (unsigned long)__init_end)
+ return system_state == SYSTEM_BOOTING;
+ return 1;
+ }
+ return 0;
+}
+
int __kernel_text_address(unsigned long addr)
{
if (core_kernel_text(addr))
@@ -98,6 +110,12 @@ int kernel_text_address(unsigned long ad
return is_module_text_address(addr);
}

+/* text or data in core kernel or module */
+int kernel_address(unsigned long addr)
+{
+ return core_kernel_address(addr) || is_module_address(addr);
+}
+
/*
* On some architectures (PPC64, IA64) function pointers
* are actually only tokens to some data that then holds the

2010-01-05 05:32:56

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] [1/9] Add rcustring ADT for RCU protected strings v2

On Tue, Jan 05, 2010 at 03:15:26AM +0100, Andi Kleen wrote:
>
> Add a little ADT for RCU protected strings. RCU is a convenient
> way to manage modifications to read-often-write-seldom strings.
> Add some helper functions to make this more straight forward.
>
> Used by follow-on patches to implement RCU protected sysctl strings.
>
> * General rules:
> * Reader has to use rcu_read_lock() and not sleep while accessing the string,
> * or alternatively get a copy with access_rcu_string()
> * Writer needs an own lock against each other.
> * Each modification should allocate a new string first and free the old
> * one with free_rcu_string()
> * In writers use rcu_assign_pointer to publicize the updated string to
> * global readers.
> * The size passed to access_rcu_string() must be the same as passed
> * to alloc_rcu_string() and be known in advance. Don't use strlen()!
> *
> * For sysctls also see proc_rcu_string() as a convenient wrapper
>
> v2: Use rcu_dereference correctly

Very good, the extra level of indirection on the first argument to
access_rcu_string() fixes the problem I complained about last time.

One additional question below.

> Cc: [email protected]
> Signed-off-by: Andi Kleen <[email protected]>
>
> ---
> include/linux/rcustring.h | 20 ++++++++
> lib/Makefile | 3 -
> lib/rcustring.c | 115 ++++++++++++++++++++++++++++++++++++++++++++++
> 3 files changed, 137 insertions(+), 1 deletion(-)

[ . . . ]

> +/*
> + * Get a local private copy of a RCU protected string.
> + * Mostly useful to get a string that is stable while sleeping.
> + * Caller must free returned string.
> + */
> +char *access_rcu_string(char **str, int size, gfp_t gfp)
> +{
> + char *copy = kmalloc(size, gfp);
> + if (!str)
> + return NULL;
> + rcu_read_lock();
> + strlcpy(copy, rcu_dereference(*str), size);

What if "str" is non-NULL, but "*str" is NULL? Or is that disallowed
somehow?

If it is not disallowed, then something like the following?

if (!str)
return NULL;
rcu_read_lock();
tmp = rcu_dereference(*str);
if (!tmp) {
rcu_read_unlock();
return NULL;
}
strlcpy(copy, tmp, size);
rcu_read_unlock();
return copy;

> + rcu_read_unlock();
> + return copy;
> +}
> +EXPORT_SYMBOL(access_rcu_string);
> Index: linux-2.6.33-rc2-ak/lib/Makefile
> ===================================================================
> --- linux-2.6.33-rc2-ak.orig/lib/Makefile
> +++ linux-2.6.33-rc2-ak/lib/Makefile
> @@ -12,7 +12,8 @@ lib-y := ctype.o string.o vsprintf.o cmd
> idr.o int_sqrt.o extable.o prio_tree.o \
> sha1.o irq_regs.o reciprocal_div.o argv_split.o \
> proportions.o prio_heap.o ratelimit.o show_mem.o \
> - is_single_threaded.o plist.o decompress.o flex_array.o
> + is_single_threaded.o plist.o decompress.o flex_array.o \
> + rcustring.o
>
> lib-$(CONFIG_MMU) += ioremap.o
> lib-$(CONFIG_SMP) += cpumask.o

2010-01-05 06:57:11

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH] [4/9] SYSCTL: Use RCU strings for core_pattern sysctl

Andi Kleen <[email protected]> writes:

> Also saves ~220 bytes in the data segment for default kernels.
>
> As a bonus this removes one use of the BKL.
>
> Signed-off-by: Andi Kleen <[email protected]>
>
> ---
> fs/exec.c | 11 +++++------
> kernel/sysctl.c | 6 +++---
> 2 files changed, 8 insertions(+), 9 deletions(-)
>
> Index: linux-2.6.33-rc2-ak/fs/exec.c
> ===================================================================
> --- linux-2.6.33-rc2-ak.orig/fs/exec.c
> +++ linux-2.6.33-rc2-ak/fs/exec.c
> @@ -62,7 +62,7 @@
> #include "internal.h"
>
> int core_uses_pid;
> -char core_pattern[CORENAME_MAX_SIZE] = "core";
> +char *core_pattern = "core";
> unsigned int core_pipe_limit;
> int suid_dumpable = 0;
>
> @@ -1421,7 +1421,7 @@ EXPORT_SYMBOL(set_binfmt);
> static int format_corename(char *corename, long signr)
> {
> const struct cred *cred = current_cred();
> - const char *pat_ptr = core_pattern;
> + const char *pat_ptr = rcu_dereference(core_pattern);

rcu_dereference should aways be between rcu_read_lock()
and rcu_read_unlock();

> int ispipe = (*pat_ptr == '|');
> char *out_ptr = corename;
> char *const out_end = corename + CORENAME_MAX_SIZE;
> @@ -1825,12 +1825,11 @@ void do_coredump(long signr, int exit_co
> clear_thread_flag(TIF_SIGPENDING);
>
> /*
> - * lock_kernel() because format_corename() is controlled by sysctl, which
> - * uses lock_kernel()
> + * Protect corename by RCU vs proc_rcu_string()
> */
> - lock_kernel();
> + rcu_read_lock();
> ispipe = format_corename(corename, signr);
> - unlock_kernel();
> + rcu_read_unlock();
>
> if ((!ispipe) && (cprm.limit < binfmt->min_coredump))
> goto fail_unlock;
> Index: linux-2.6.33-rc2-ak/kernel/sysctl.c
> ===================================================================
> --- linux-2.6.33-rc2-ak.orig/kernel/sysctl.c
> +++ linux-2.6.33-rc2-ak/kernel/sysctl.c
> @@ -75,7 +75,7 @@ extern int sysctl_oom_dump_tasks;
> extern int max_threads;
> extern int core_uses_pid;
> extern int suid_dumpable;
> -extern char core_pattern[];
> +extern char *core_pattern;
> extern unsigned int core_pipe_limit;
> extern int pid_max;
> extern int min_free_kbytes;
> @@ -399,10 +399,10 @@ static struct ctl_table kern_table[] = {
> },
> {
> .procname = "core_pattern",
> - .data = core_pattern,
> + .data = &core_pattern,
> .maxlen = CORENAME_MAX_SIZE,
> .mode = 0644,
> - .proc_handler = proc_dostring,
> + .proc_handler = proc_rcu_string,
> },
> {
> .procname = "core_pipe_limit",

2010-01-05 08:45:43

by Russell King

[permalink] [raw]
Subject: Re: [PATCH] [2/9] Add a kernel_address() that works for data too

On Tue, Jan 05, 2010 at 03:15:27AM +0100, Andi Kleen wrote:
>
> Add a variant of kernel_text_address() that includes kernel data.
>
> Assumes kernel is _text ... _end - init section. True everywhere?

No. XIP kernels have the text and data/bss separated into two distinct
address regions.

--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of:

2010-01-05 09:00:15

by Sam Ravnborg

[permalink] [raw]
Subject: Re: [PATCH] [2/9] Add a kernel_address() that works for data too

On Tue, Jan 05, 2010 at 03:15:27AM +0100, Andi Kleen wrote:
>
> Add a variant of kernel_text_address() that includes kernel data.
>
> Assumes kernel is _text ... _end - init section. True everywhere?

Tim Abbott has done a great job lately to unify the various
linker scripts used by the different architectures.

Architectures are supposed to follow the skeleton outlined
in include/asm-generic/vmlinux.lds.h

But I think the skeleton needs a small update.
It describes that we mark start of .text with _stext.
But reality is that we use _text for this.

But you can trust _etext an almost all architectures.
It is a bug if it is missing.

So [_text, _etext] is the text section.

The data section may be placed before or after - it depends on the architecture.
But again - only some architectures define _sdata.
But all? define _edata.


Sam


>
> Cc: [email protected]
> Signed-off-by: Andi Kleen <[email protected]>
>
> ---
> include/linux/kernel.h | 1 +
> kernel/extable.c | 18 ++++++++++++++++++
> 2 files changed, 19 insertions(+)
>
> Index: linux-2.6.33-rc2-ak/include/linux/kernel.h
> ===================================================================
> --- linux-2.6.33-rc2-ak.orig/include/linux/kernel.h
> +++ linux-2.6.33-rc2-ak/include/linux/kernel.h
> @@ -205,6 +205,7 @@ extern unsigned long long memparse(const
> extern int core_kernel_text(unsigned long addr);
> extern int __kernel_text_address(unsigned long addr);
> extern int kernel_text_address(unsigned long addr);
> +extern int kernel_address(unsigned long addr);
> extern int func_ptr_is_kernel_text(void *ptr);
>
> struct pid;
> Index: linux-2.6.33-rc2-ak/kernel/extable.c
> ===================================================================
> --- linux-2.6.33-rc2-ak.orig/kernel/extable.c
> +++ linux-2.6.33-rc2-ak/kernel/extable.c
> @@ -72,6 +72,18 @@ int core_kernel_text(unsigned long addr)
> return 0;
> }
>
> +static int core_kernel_address(unsigned long addr)
> +{
> + if ((addr >= (unsigned long)_text &&
> + addr <= (unsigned long)_end)) {
> + if (addr >= (unsigned long)__init_begin &&
> + addr < (unsigned long)__init_end)
> + return system_state == SYSTEM_BOOTING;
> + return 1;
> + }
> + return 0;
> +}
> +
> int __kernel_text_address(unsigned long addr)
> {
> if (core_kernel_text(addr))
> @@ -98,6 +110,12 @@ int kernel_text_address(unsigned long ad
> return is_module_text_address(addr);
> }
>
> +/* text or data in core kernel or module */
> +int kernel_address(unsigned long addr)
> +{
> + return core_kernel_address(addr) || is_module_address(addr);
> +}
> +
> /*
> * On some architectures (PPC64, IA64) function pointers
> * are actually only tokens to some data that then holds the
> --
> To unsubscribe from this list: send the line "unsubscribe linux-arch" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

2010-01-05 10:47:49

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] [1/9] Add rcustring ADT for RCU protected strings v2

On Mon, Jan 04, 2010 at 09:32:52PM -0800, Paul E. McKenney wrote:
> > +/*
> > + * Get a local private copy of a RCU protected string.
> > + * Mostly useful to get a string that is stable while sleeping.
> > + * Caller must free returned string.
> > + */
> > +char *access_rcu_string(char **str, int size, gfp_t gfp)
> > +{
> > + char *copy = kmalloc(size, gfp);
> > + if (!str)
> > + return NULL;
> > + rcu_read_lock();
> > + strlcpy(copy, rcu_dereference(*str), size);
>
> What if "str" is non-NULL, but "*str" is NULL? Or is that disallowed
> somehow?

I would consider it disallowed, all the strings have some default value.
Empty string would be ""

If we allowed it the caller couldn't easily distingush error from expected NULL.

-Andi

--
[email protected] -- Speaking for myself only.

2010-01-05 11:49:41

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] [4/9] SYSCTL: Use RCU strings for core_pattern sysctl

[email protected] (Eric W. Biederman) writes:
>>
>> @@ -1421,7 +1421,7 @@ EXPORT_SYMBOL(set_binfmt);
>> static int format_corename(char *corename, long signr)
>> {
>> const struct cred *cred = current_cred();
>> - const char *pat_ptr = core_pattern;
>> + const char *pat_ptr = rcu_dereference(core_pattern);
>
> rcu_dereference should aways be between rcu_read_lock()
> and rcu_read_unlock();

It is, see the call site below:

>> - * lock_kernel() because format_corename() is controlled by sysctl, which
>> - * uses lock_kernel()
>> + * Protect corename by RCU vs proc_rcu_string()
>> */
>> - lock_kernel();
>> + rcu_read_lock();
>> ispipe = format_corename(corename, signr);
>> - unlock_kernel();
>> + rcu_read_unlock();

-Andi

--
[email protected] -- Speaking for myself only.

2010-01-05 14:12:01

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] [1/9] Add rcustring ADT for RCU protected strings v2

On Tue, Jan 05, 2010 at 11:47:46AM +0100, Andi Kleen wrote:
> On Mon, Jan 04, 2010 at 09:32:52PM -0800, Paul E. McKenney wrote:
> > > +/*
> > > + * Get a local private copy of a RCU protected string.
> > > + * Mostly useful to get a string that is stable while sleeping.
> > > + * Caller must free returned string.
> > > + */
> > > +char *access_rcu_string(char **str, int size, gfp_t gfp)
> > > +{
> > > + char *copy = kmalloc(size, gfp);
> > > + if (!str)
> > > + return NULL;
> > > + rcu_read_lock();
> > > + strlcpy(copy, rcu_dereference(*str), size);
> >
> > What if "str" is non-NULL, but "*str" is NULL? Or is that disallowed
> > somehow?
>
> I would consider it disallowed, all the strings have some default value.
> Empty string would be ""
>
> If we allowed it the caller couldn't easily distingush error from expected NULL.

OK, then:

Reviewed-by: Paul E. McKenney <[email protected]>

Thanx, Paul

2010-01-05 14:19:30

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] [1/9] Add rcustring ADT for RCU protected strings v2

"Paul E. McKenney" <[email protected]> writes:
>>
>> I would consider it disallowed, all the strings have some default value.
>> Empty string would be ""
>>
>> If we allowed it the caller couldn't easily distingush error from expected NULL.
>
> OK, then:

Thanks. I actually changed my mind and added support for NULL strings
(although I suspect they are not too useful)

I also found one more minor problem (and there was one conversion
missing in v2), will do a repost later

-Andi

--
[email protected] -- Speaking for myself only.

2010-01-05 19:05:37

by Russell King

[permalink] [raw]
Subject: Re: [PATCH] [2/9] Add a kernel_address() that works for data too

On Tue, Jan 05, 2010 at 09:58:53AM +0100, Sam Ravnborg wrote:
> But you can trust _etext an almost all architectures.
> It is a bug if it is missing.
>
> So [_text, _etext] is the text section.
>
> The data section may be placed before or after - it depends on the architecture.
> But again - only some architectures define _sdata.
> But all? define _edata.

You can not guarantee that the data segment is after the text segment,
unless you want to outlaw XIP kernels. XIP kernels have the text
segment mapped at a completely different address to the data segment.

I'd suggest the only way to identify the data segment in a generic way
is to have everyone define _sdata, or more preferably _data (to be
consistent with _text), to be the start of the data segment.

--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of:

2010-01-05 19:15:16

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] [2/9] Add a kernel_address() that works for data too

> I'd suggest the only way to identify the data segment in a generic way
> is to have everyone define _sdata, or more preferably _data (to be
> consistent with _text), to be the start of the data segment.

I agree. I'll take a look at that.

-Andi
--
[email protected] -- Speaking for myself only.

2010-01-08 23:51:09

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] [1/9] Add rcustring ADT for RCU protected strings v2

On Tue, 5 Jan 2010 03:15:26 +0100 (CET)
Andi Kleen <[email protected]> wrote:

> +extern char *alloc_rcu_string(int size, gfp_t gfp);
> +extern void free_rcu_string(const char *string);
> +
> +/*
> + * size must be the same as alloc_rcu_string, don't
> + * use strlen on str!
> + */
> +extern char *access_rcu_string(char **str, int size, gfp_t gfp);

I think better names would be rcu_string_alloc(), rcu_string_free() and
rcu_string_access().

2010-01-08 23:51:28

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] [2/9] Add a kernel_address() that works for data too

On Tue, 5 Jan 2010 20:15:03 +0100
Andi Kleen <[email protected]> wrote:

> > I'd suggest the only way to identify the data segment in a generic way
> > is to have everyone define _sdata, or more preferably _data (to be
> > consistent with _text), to be the start of the data segment.
>
> I agree. I'll take a look at that.
>

I'll merge this as-is for now. Please send updates when convenient.

2010-01-11 12:12:45

by Bert Wesarg

[permalink] [raw]
Subject: Re: [PATCH] [1/9] Add rcustring ADT for RCU protected strings v2

On Tue, Jan 5, 2010 at 03:15, Andi Kleen <[email protected]> wrote:
> Index: linux-2.6.33-rc2-ak/lib/rcustring.c
> ===================================================================
> --- /dev/null
> +++ linux-2.6.33-rc2-ak/lib/rcustring.c

[ . . . ]

> +/*
> + * Get a local private copy of a RCU protected string.
> + * Mostly useful to get a string that is stable while sleeping.
> + * Caller must free returned string.
> + */
> +char *access_rcu_string(char **str, int size, gfp_t gfp)
> +{
> +       char *copy = kmalloc(size, gfp);
No check of return value from kmalloc()?

> +       if (!str)
> +               return NULL;
> +       rcu_read_lock();
> +       strlcpy(copy, rcu_dereference(*str), size);
> +       rcu_read_unlock();
> +       return copy;
> +}
> +EXPORT_SYMBOL(access_rcu_string);

2010-01-11 14:27:00

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] [1/9] Add rcustring ADT for RCU protected strings v2

On Mon, Jan 11, 2010 at 01:12:41PM +0100, Bert Wesarg wrote:
> On Tue, Jan 5, 2010 at 03:15, Andi Kleen <[email protected]> wrote:
> > Index: linux-2.6.33-rc2-ak/lib/rcustring.c
> > ===================================================================
> > --- /dev/null
> > +++ linux-2.6.33-rc2-ak/lib/rcustring.c
>
> [ . . . ]
>
> > +/*
> > + * Get a local private copy of a RCU protected string.
> > + * Mostly useful to get a string that is stable while sleeping.
> > + * Caller must free returned string.
> > + */
> > +char *access_rcu_string(char **str, int size, gfp_t gfp)
> > +{
> > + ? ? ? char *copy = kmalloc(size, gfp);
> No check of return value from kmalloc()?
>
> > + ? ? ? if (!str)
> > + ? ? ? ? ? ? ? return NULL;

The !str should be !copy right. I fixed that in my version.
Thanks for catching.

-Andi