2016-12-22 17:02:01

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 0/2] sched: Introduce rcuwait

Hi,

Here's an updated version of the pcpu rwsem writer wait/wake changes
with the abstractions wanted by Oleg. Patch 1 adds rcuwait (for a lack
of better name), and patch 2 trivially makes use of it.

Has survived torture testing, which is actually very handy in this case
particularly dealing with equal amount of reader and writer threads.

Thanks.

Davidlohr Bueso (2):
sched: Introduce rcuwait machinery
locking/percpu-rwsem: Replace waitqueue with rcuwait

include/linux/percpu-rwsem.h | 8 +++---
include/linux/rcuwait.h | 63 +++++++++++++++++++++++++++++++++++++++++++
kernel/exit.c | 29 ++++++++++++++++++++
kernel/locking/percpu-rwsem.c | 7 +++--
4 files changed, 99 insertions(+), 8 deletions(-)
create mode 100644 include/linux/rcuwait.h

--
2.6.6


2016-12-22 17:02:02

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 1/2] sched: Introduce rcuwait machinery

rcuwait provides support for (single) rcu-safe task wait/wake functionality,
with the caveat that it must not be called after exit_notify(), such that
we avoid racing with rcu delayed_put_task_struct callbacks, task_struct
being rcu unaware in this context -- for which we similarly have
task_rcu_dereference() magic, but with different return semantics, which
can conflict with the wakeup side.

The interfaces are quite straightforward:

rcuwait_wait_event()
rcuwait_trywake()

More details are in the comments, but it's perhaps worth mentioning at least,
that users must provide proper serialization when waiting on a condition, and
avoid corrupting a concurrent waiter. Also care must be taken between the task
and the condition for when calling the wakeup -- we cannot miss wakeups. When
porting users, this is for example, a given when using waitqueues in that
everything is done under the q->lock.

Signed-off-by: Davidlohr Bueso <[email protected]>
---
include/linux/rcuwait.h | 63 +++++++++++++++++++++++++++++++++++++++++++++++++
kernel/exit.c | 29 +++++++++++++++++++++++
2 files changed, 92 insertions(+)
create mode 100644 include/linux/rcuwait.h

diff --git a/include/linux/rcuwait.h b/include/linux/rcuwait.h
new file mode 100644
index 000000000000..3e07beb14c1f
--- /dev/null
+++ b/include/linux/rcuwait.h
@@ -0,0 +1,63 @@
+#ifndef _LINUX_RCUWAIT_H_
+#define _LINUX_RCUWAIT_H_
+
+#include <linux/rcupdate.h>
+
+/*
+ * rcuwait provides a way of blocking and waking up a single
+ * task in an rcu-safe manner; where it is forbidden to use
+ * after exit_notify(). task_struct is not properly rcu protected,
+ * unless dealing with rcu-aware lists, ie: find_task_by_*().
+ *
+ * Alternatively we have task_rcu_dereference(), but the return
+ * semantics have different implications which would break the
+ * wakeup side. The only time @task is non-nil is when a user is
+ * blocked (or checking if it needs to) on a condition, and reset
+ * as soon as we know that the condition has succeeded and are
+ * awoken.
+ */
+struct rcuwait {
+ struct task_struct *task;
+};
+
+#define __RCUWAIT_INITIALIZER(name) \
+ { .task = NULL, }
+
+static inline void rcuwait_init(struct rcuwait *w)
+{
+ w->task = NULL;
+}
+
+extern void rcuwait_trywake(struct rcuwait *w);
+
+/*
+ * The caller is responsible for locking around rcuwait_wait_event(),
+ * such that writes to @task are properly serialized.
+ */
+#define rcuwait_wait_event(w, condition) \
+({ \
+ /* \
+ * Complain if we are called after do_exit()/exit_notify(), \
+ * as we cannot rely on the rcu critical region for the \
+ * wakeup side. \
+ */ \
+ WARN_ON(current->exit_state); \
+ \
+ rcu_assign_pointer((w)->task, current); \
+ for (;;) { \
+ /* \
+ * Implicit barrier (A) pairs with (B) in \
+ * rcuwait_trywake(). \
+ */ \
+ set_current_state(TASK_UNINTERRUPTIBLE); \
+ if (condition) \
+ break; \
+ \
+ schedule(); \
+ } \
+ \
+ WRITE_ONCE((w)->task, NULL); \
+ __set_current_state(TASK_RUNNING); \
+})
+
+#endif /* _LINUX_RCUWAIT_H_ */
diff --git a/kernel/exit.c b/kernel/exit.c
index aacff8e2aec0..6862884179a8 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -282,6 +282,35 @@ struct task_struct *task_rcu_dereference(struct task_struct **ptask)
return task;
}

+void rcuwait_trywake(struct rcuwait *w)
+{
+ struct task_struct *task;
+
+ rcu_read_lock();
+
+ /*
+ * Order condition vs @task, such that everything prior to the load
+ * of @task is visible. This is the condition as to why the user called
+ * rcuwait_trywake() in the first place. Pairs with set_current_state()
+ * barrier (A) in rcuwait_wait_event().
+ *
+ * WAIT WAKE
+ * [S] tsk = current [S] cond = true
+ * MB (A) MB (B)
+ * [L] cond [L] tsk
+ */
+ smp_rmb(); /* (B) */
+
+ /*
+ * Avoid using task_rcu_dereference() magic as long as we are careful,
+ * see comment in rcuwait_wait_event() regarding ->exit_state.
+ */
+ task = rcu_dereference(w->task);
+ if (task)
+ wake_up_process(task);
+ rcu_read_unlock();
+}
+
struct task_struct *try_get_task_struct(struct task_struct **ptask)
{
struct task_struct *task;
--
2.6.6

2016-12-22 17:02:06

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 2/2] locking/percpu-rwsem: Replace waitqueue with rcuwait

The use of any kind of wait queue is an overkill for pcpu-rwsems.
While one option would be to use the less heavy simple (swait)
flavor, this is still too much for what pcpu-rwsems needs. For one,
we do not care about any sort of queuing in that the only (rare) time
writers (and readers, for that matter) are queued is when trying to
acquire the regular contended rw_sem. There cannot be any further
queuing as writers are serialized by the rw_sem in the first place.

Given that percpu_down_write() must not be called after exit_notify(),
we can replace the bulky waitqueue with rcuwait such that a writer
can wait for its turn to take the lock. As such, we can avoid the
queue handling and locking overhead.

Signed-off-by: Davidlohr Bueso <[email protected]>
---
include/linux/percpu-rwsem.h | 8 ++++----
kernel/locking/percpu-rwsem.c | 7 +++----
2 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/include/linux/percpu-rwsem.h b/include/linux/percpu-rwsem.h
index 5b2e6159b744..93664f022ecf 100644
--- a/include/linux/percpu-rwsem.h
+++ b/include/linux/percpu-rwsem.h
@@ -4,15 +4,15 @@
#include <linux/atomic.h>
#include <linux/rwsem.h>
#include <linux/percpu.h>
-#include <linux/wait.h>
+#include <linux/rcuwait.h>
#include <linux/rcu_sync.h>
#include <linux/lockdep.h>

struct percpu_rw_semaphore {
struct rcu_sync rss;
unsigned int __percpu *read_count;
- struct rw_semaphore rw_sem;
- wait_queue_head_t writer;
+ struct rw_semaphore rw_sem; /* slowpath */
+ struct rcuwait writer; /* blocked writer */
int readers_block;
};

@@ -22,7 +22,7 @@ static struct percpu_rw_semaphore name = { \
.rss = __RCU_SYNC_INITIALIZER(name.rss, RCU_SCHED_SYNC), \
.read_count = &__percpu_rwsem_rc_##name, \
.rw_sem = __RWSEM_INITIALIZER(name.rw_sem), \
- .writer = __WAIT_QUEUE_HEAD_INITIALIZER(name.writer), \
+ .writer = __RCUWAIT_INITIALIZER(name.writer), \
}

extern int __percpu_down_read(struct percpu_rw_semaphore *, int);
diff --git a/kernel/locking/percpu-rwsem.c b/kernel/locking/percpu-rwsem.c
index ce182599cf2e..e2502d6ec82f 100644
--- a/kernel/locking/percpu-rwsem.c
+++ b/kernel/locking/percpu-rwsem.c
@@ -1,7 +1,6 @@
#include <linux/atomic.h>
#include <linux/rwsem.h>
#include <linux/percpu.h>
-#include <linux/wait.h>
#include <linux/lockdep.h>
#include <linux/percpu-rwsem.h>
#include <linux/rcupdate.h>
@@ -18,7 +17,7 @@ int __percpu_init_rwsem(struct percpu_rw_semaphore *sem,
/* ->rw_sem represents the whole percpu_rw_semaphore for lockdep */
rcu_sync_init(&sem->rss, RCU_SCHED_SYNC);
__init_rwsem(&sem->rw_sem, name, rwsem_key);
- init_waitqueue_head(&sem->writer);
+ rcuwait_init(&sem->writer);
sem->readers_block = 0;
return 0;
}
@@ -103,7 +102,7 @@ void __percpu_up_read(struct percpu_rw_semaphore *sem)
__this_cpu_dec(*sem->read_count);

/* Prod writer to recheck readers_active */
- wake_up(&sem->writer);
+ rcuwait_trywake(&sem->writer);
}
EXPORT_SYMBOL_GPL(__percpu_up_read);

@@ -160,7 +159,7 @@ void percpu_down_write(struct percpu_rw_semaphore *sem)
*/

/* Wait for all now active readers to complete. */
- wait_event(sem->writer, readers_active_check(sem));
+ rcuwait_wait_event(&sem->writer, readers_active_check(sem));
}
EXPORT_SYMBOL_GPL(percpu_down_write);

--
2.6.6

2016-12-22 19:27:45

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH 1/2] sched: Introduce rcuwait machinery

Hi Davidlohr,

[auto build test ERROR on tip/auto-latest]
[also build test ERROR on v4.9 next-20161222]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url: https://github.com/0day-ci/linux/commits/Davidlohr-Bueso/sched-Introduce-rcuwait/20161223-020109
config: i386-randconfig-s1-201651 (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
# save the attached .config to linux build tree
make ARCH=i386

Note: the linux-review/Davidlohr-Bueso/sched-Introduce-rcuwait/20161223-020109 HEAD 9e9d238f94d5aa8e348e7e70585533fe0dbd373b builds fine.
It only hurts bisectibility.

All error/warnings (new ones prefixed by >>):

>> kernel/exit.c:285:29: warning: 'struct rcuwait' declared inside parameter list will not be visible outside of this definition or declaration
void rcuwait_trywake(struct rcuwait *w)
^~~~~~~
In file included from include/linux/srcu.h:33:0,
from include/linux/notifier.h:15,
from include/linux/memory_hotplug.h:6,
from include/linux/mmzone.h:751,
from include/linux/gfp.h:5,
from include/linux/mm.h:9,
from kernel/exit.c:7:
kernel/exit.c: In function 'rcuwait_trywake':
>> kernel/exit.c:308:26: error: dereferencing pointer to incomplete type 'struct rcuwait'
task = rcu_dereference(w->task);
^
include/linux/rcupdate.h:606:10: note: in definition of macro '__rcu_dereference_check'
typeof(*p) *________p1 = (typeof(*p) *__force)lockless_dereference(p); \
^
include/linux/rcupdate.h:786:28: note: in expansion of macro 'rcu_dereference_check'
#define rcu_dereference(p) rcu_dereference_check(p, 0)
^~~~~~~~~~~~~~~~~~~~~
>> kernel/exit.c:308:9: note: in expansion of macro 'rcu_dereference'
task = rcu_dereference(w->task);
^~~~~~~~~~~~~~~

vim +308 kernel/exit.c

279 if (!sighand)
280 return NULL;
281
282 return task;
283 }
284
> 285 void rcuwait_trywake(struct rcuwait *w)
286 {
287 struct task_struct *task;
288
289 rcu_read_lock();
290
291 /*
292 * Order condition vs @task, such that everything prior to the load
293 * of @task is visible. This is the condition as to why the user called
294 * rcuwait_trywake() in the first place. Pairs with set_current_state()
295 * barrier (A) in rcuwait_wait_event().
296 *
297 * WAIT WAKE
298 * [S] tsk = current [S] cond = true
299 * MB (A) MB (B)
300 * [L] cond [L] tsk
301 */
302 smp_rmb(); /* (B) */
303
304 /*
305 * Avoid using task_rcu_dereference() magic as long as we are careful,
306 * see comment in rcuwait_wait_event() regarding ->exit_state.
307 */
> 308 task = rcu_dereference(w->task);
309 if (task)
310 wake_up_process(task);
311 rcu_read_unlock();

---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation


Attachments:
(No filename) (3.17 kB)
.config.gz (32.82 kB)
Download all attachments

2016-12-22 19:55:45

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH 1/2] sched: Introduce rcuwait machinery

Hi Davidlohr,

[auto build test ERROR on tip/auto-latest]
[also build test ERROR on v4.9 next-20161222]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url: https://github.com/0day-ci/linux/commits/Davidlohr-Bueso/sched-Introduce-rcuwait/20161223-020109
config: m68k-sun3_defconfig (attached as .config)
compiler: m68k-linux-gcc (GCC) 4.9.0
reproduce:
wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=m68k

Note: the linux-review/Davidlohr-Bueso/sched-Introduce-rcuwait/20161223-020109 HEAD 9e9d238f94d5aa8e348e7e70585533fe0dbd373b builds fine.
It only hurts bisectibility.

All error/warnings (new ones prefixed by >>):

>> kernel/exit.c:285:29: warning: 'struct rcuwait' declared inside parameter list
void rcuwait_trywake(struct rcuwait *w)
^
>> kernel/exit.c:285:29: warning: its scope is only this definition or declaration, which is probably not what you want
In file included from include/linux/srcu.h:33:0,
from include/linux/notifier.h:15,
from include/linux/memory_hotplug.h:6,
from include/linux/mmzone.h:751,
from include/linux/gfp.h:5,
from include/linux/mm.h:9,
from kernel/exit.c:7:
kernel/exit.c: In function 'rcuwait_trywake':
>> kernel/exit.c:308:26: error: dereferencing pointer to incomplete type
task = rcu_dereference(w->task);
^
include/linux/rcupdate.h:606:10: note: in definition of macro '__rcu_dereference_check'
typeof(*p) *________p1 = (typeof(*p) *__force)lockless_dereference(p); \
^
include/linux/rcupdate.h:786:28: note: in expansion of macro 'rcu_dereference_check'
#define rcu_dereference(p) rcu_dereference_check(p, 0)
^
kernel/exit.c:308:9: note: in expansion of macro 'rcu_dereference'
task = rcu_dereference(w->task);
^
>> kernel/exit.c:308:26: error: dereferencing pointer to incomplete type
task = rcu_dereference(w->task);
^
include/linux/rcupdate.h:606:36: note: in definition of macro '__rcu_dereference_check'
typeof(*p) *________p1 = (typeof(*p) *__force)lockless_dereference(p); \
^
include/linux/rcupdate.h:786:28: note: in expansion of macro 'rcu_dereference_check'
#define rcu_dereference(p) rcu_dereference_check(p, 0)
^
kernel/exit.c:308:9: note: in expansion of macro 'rcu_dereference'
task = rcu_dereference(w->task);
^
In file included from include/asm-generic/bug.h:4:0,
from arch/m68k/include/asm/bug.h:28,
from include/linux/bug.h:4,
from include/linux/mmdebug.h:4,
from include/linux/mm.h:8,
from kernel/exit.c:7:
>> kernel/exit.c:308:26: error: dereferencing pointer to incomplete type
task = rcu_dereference(w->task);
^
include/linux/compiler.h:563:9: note: in definition of macro 'lockless_dereference'
typeof(p) _________p1 = READ_ONCE(p); \
^
include/linux/rcupdate.h:727:2: note: in expansion of macro '__rcu_dereference_check'
__rcu_dereference_check((p), (c) || rcu_read_lock_held(), __rcu)
^
include/linux/rcupdate.h:786:28: note: in expansion of macro 'rcu_dereference_check'
#define rcu_dereference(p) rcu_dereference_check(p, 0)
^
kernel/exit.c:308:9: note: in expansion of macro 'rcu_dereference'
task = rcu_dereference(w->task);
^
In file included from include/asm-generic/bug.h:4:0,
from arch/m68k/include/asm/bug.h:28,
from include/linux/bug.h:4,
from include/linux/mmdebug.h:4,
from include/linux/mm.h:8,
from kernel/exit.c:7:
>> kernel/exit.c:308:26: error: dereferencing pointer to incomplete type
task = rcu_dereference(w->task);
^
include/linux/compiler.h:305:17: note: in definition of macro '__READ_ONCE'
union { typeof(x) __val; char __c[1]; } __u; \
^
>> include/linux/compiler.h:563:26: note: in expansion of macro 'READ_ONCE'
typeof(p) _________p1 = READ_ONCE(p); \
^
>> include/linux/rcupdate.h:606:48: note: in expansion of macro 'lockless_dereference'
typeof(*p) *________p1 = (typeof(*p) *__force)lockless_dereference(p); \
^
include/linux/rcupdate.h:727:2: note: in expansion of macro '__rcu_dereference_check'
__rcu_dereference_check((p), (c) || rcu_read_lock_held(), __rcu)
^
include/linux/rcupdate.h:786:28: note: in expansion of macro 'rcu_dereference_check'
#define rcu_dereference(p) rcu_dereference_check(p, 0)
^
kernel/exit.c:308:9: note: in expansion of macro 'rcu_dereference'
task = rcu_dereference(w->task);
^
>> kernel/exit.c:308:26: error: dereferencing pointer to incomplete type
task = rcu_dereference(w->task);
^
include/linux/compiler.h:307:22: note: in definition of macro '__READ_ONCE'
__read_once_size(&(x), __u.__c, sizeof(x)); \
^
>> include/linux/compiler.h:563:26: note: in expansion of macro 'READ_ONCE'
typeof(p) _________p1 = READ_ONCE(p); \
^
>> include/linux/rcupdate.h:606:48: note: in expansion of macro 'lockless_dereference'
typeof(*p) *________p1 = (typeof(*p) *__force)lockless_dereference(p); \
^
include/linux/rcupdate.h:727:2: note: in expansion of macro '__rcu_dereference_check'
__rcu_dereference_check((p), (c) || rcu_read_lock_held(), __rcu)
^
include/linux/rcupdate.h:786:28: note: in expansion of macro 'rcu_dereference_check'
#define rcu_dereference(p) rcu_dereference_check(p, 0)
^
kernel/exit.c:308:9: note: in expansion of macro 'rcu_dereference'
task = rcu_dereference(w->task);
^
>> kernel/exit.c:308:26: error: dereferencing pointer to incomplete type
task = rcu_dereference(w->task);
^
include/linux/compiler.h:307:42: note: in definition of macro '__READ_ONCE'
__read_once_size(&(x), __u.__c, sizeof(x)); \
^
>> include/linux/compiler.h:563:26: note: in expansion of macro 'READ_ONCE'
typeof(p) _________p1 = READ_ONCE(p); \
^
>> include/linux/rcupdate.h:606:48: note: in expansion of macro 'lockless_dereference'
typeof(*p) *________p1 = (typeof(*p) *__force)lockless_dereference(p); \
^
include/linux/rcupdate.h:727:2: note: in expansion of macro '__rcu_dereference_check'
__rcu_dereference_check((p), (c) || rcu_read_lock_held(), __rcu)
^
include/linux/rcupdate.h:786:28: note: in expansion of macro 'rcu_dereference_check'
#define rcu_dereference(p) rcu_dereference_check(p, 0)
^
kernel/exit.c:308:9: note: in expansion of macro 'rcu_dereference'
task = rcu_dereference(w->task);
^
>> kernel/exit.c:308:26: error: dereferencing pointer to incomplete type
task = rcu_dereference(w->task);
^
include/linux/compiler.h:309:30: note: in definition of macro '__READ_ONCE'
__read_once_size_nocheck(&(x), __u.__c, sizeof(x)); \
^
>> include/linux/compiler.h:563:26: note: in expansion of macro 'READ_ONCE'
typeof(p) _________p1 = READ_ONCE(p); \
^
>> include/linux/rcupdate.h:606:48: note: in expansion of macro 'lockless_dereference'
typeof(*p) *________p1 = (typeof(*p) *__force)lockless_dereference(p); \
^
include/linux/rcupdate.h:727:2: note: in expansion of macro '__rcu_dereference_check'
__rcu_dereference_check((p), (c) || rcu_read_lock_held(), __rcu)
^
include/linux/rcupdate.h:786:28: note: in expansion of macro 'rcu_dereference_check'
#define rcu_dereference(p) rcu_dereference_check(p, 0)
^
kernel/exit.c:308:9: note: in expansion of macro 'rcu_dereference'
task = rcu_dereference(w->task);
^
>> kernel/exit.c:308:26: error: dereferencing pointer to incomplete type
task = rcu_dereference(w->task);
^
include/linux/compiler.h:309:50: note: in definition of macro '__READ_ONCE'
__read_once_size_nocheck(&(x), __u.__c, sizeof(x)); \
^
>> include/linux/compiler.h:563:26: note: in expansion of macro 'READ_ONCE'
typeof(p) _________p1 = READ_ONCE(p); \
^
>> include/linux/rcupdate.h:606:48: note: in expansion of macro 'lockless_dereference'
typeof(*p) *________p1 = (typeof(*p) *__force)lockless_dereference(p); \
^
include/linux/rcupdate.h:727:2: note: in expansion of macro '__rcu_dereference_check'
__rcu_dereference_check((p), (c) || rcu_read_lock_held(), __rcu)
^
include/linux/rcupdate.h:786:28: note: in expansion of macro 'rcu_dereference_check'
#define rcu_dereference(p) rcu_dereference_check(p, 0)
^
kernel/exit.c:308:9: note: in expansion of macro 'rcu_dereference'
task = rcu_dereference(w->task);
^
In file included from include/asm-generic/bug.h:4:0,
from arch/m68k/include/asm/bug.h:28,
from include/linux/bug.h:4,
from include/linux/mmdebug.h:4,
from include/linux/mm.h:8,
from kernel/exit.c:7:

vim +308 kernel/exit.c

279 if (!sighand)
280 return NULL;
281
282 return task;
283 }
284
> 285 void rcuwait_trywake(struct rcuwait *w)
286 {
287 struct task_struct *task;
288
289 rcu_read_lock();
290
291 /*
292 * Order condition vs @task, such that everything prior to the load
293 * of @task is visible. This is the condition as to why the user called
294 * rcuwait_trywake() in the first place. Pairs with set_current_state()
295 * barrier (A) in rcuwait_wait_event().
296 *
297 * WAIT WAKE
298 * [S] tsk = current [S] cond = true
299 * MB (A) MB (B)
300 * [L] cond [L] tsk
301 */
302 smp_rmb(); /* (B) */
303
304 /*
305 * Avoid using task_rcu_dereference() magic as long as we are careful,
306 * see comment in rcuwait_wait_event() regarding ->exit_state.
307 */
> 308 task = rcu_dereference(w->task);
309 if (task)
310 wake_up_process(task);
311 rcu_read_unlock();

---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation


Attachments:
(No filename) (11.20 kB)
.config.gz (11.40 kB)
Download all attachments