Hi,
It's a while since the last attempt by Paul to get simple wait ready
for mainline [1]. At the last realtime workshop it was discussed how
the swait implementation could be made preempt aware. Peter posted an
untested version of it here [2].
In order to test it, I used Paul's two patches which makes completion
and rcu using swait instead of wait. Some small renamings were
necessary to get it working, e.g. s/swait_head/swait_queue_head/.
My test system didn't crash or showed any obvious defects, so I
decided to apply some benchmarks utilizing mmtests. I have picked some
random tests (kernbench aim9 vmr-stream ebizz), which didn't require a
lot of tinker around to get them running. The results are here:
baseline: v4.2-rc5-22-ged8bbba
http://monom.org/mmtests-swait-peterz-v1/
I don't think the numbers are trustworthy yet. Mabye one could read
it as it doesn't explode and the numbers aren't to far away from
baseline. I need to figure out which tests are fitting for these
patches and what are the 'right' parameters for them.
Sebastian had some comments on Peter's patch. I haven't addressed them
yet [3].
cheers,
daniel
[1] https://lwn.net/Articles/616857/
[2] http://www.spinics.net/lists/linux-rt-users/msg12703.html
[3] https://www.mail-archive.com/[email protected]/msg832142.html
Paul Gortmaker (2):
sched/completion: convert completions to use simple wait queues
rcu: use simple wait queues where possible in rcutree
Peter Zijlstra (1):
KVM: use simple waitqueue for vcpu->wq
include/linux/completion.h | 8 +--
include/linux/swait.h | 172 +++++++++++++++++++++++++++++++++++++++++++++
kernel/rcu/tree.c | 13 ++--
kernel/rcu/tree.h | 6 +-
kernel/rcu/tree_plugin.h | 18 ++---
kernel/sched/Makefile | 2 +-
kernel/sched/completion.c | 32 ++++-----
kernel/sched/swait.c | 122 ++++++++++++++++++++++++++++++++
8 files changed, 334 insertions(+), 39 deletions(-)
create mode 100644 include/linux/swait.h
create mode 100644 kernel/sched/swait.c
--
2.4.3
From: Peter Zijlstra <[email protected]>
On Tue, Feb 17, 2015 at 06:44:19PM +0100, Sebastian Andrzej Siewior wrote:
> * Peter Zijlstra | 2015-01-21 16:07:16 [+0100]:
>
> >On Tue, Jan 20, 2015 at 01:16:13PM -0500, Steven Rostedt wrote:
> >> I'm actually wondering if we should just nuke the _interruptible()
> >> version of swait. As it should only be all interruptible or all not
> >> interruptible, that the swait_wake() should just do the wake up
> >> regardless. In which case, swait_wake() is good enough. No need to have
> >> different versions where people may think do something special.
> >>
> >> Peter?
> >
> >Yeah, I think the lastest thing I have sitting here on my disk only has
> >the swake_up() which does TASK_NORMAL, no choice there.
>
> what is the swait status in terms of mainline? This sounds like it
> beeing worked on.
> I could take the series but then I would drop it again if the mainline
> implementation changes…
Well, 'worked' on might be putting too much on it, its one of the many
many 'spare' time things that never get attention unless people bug me
;-)
The below is my current patch, I've not actually tried it yet, I (think
I) had one patch doing some conversions but I'm having trouble locating
it.
Mostly-Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
[wagi: fixed compile issues]
---
include/linux/swait.h | 172 ++++++++++++++++++++++++++++++++++++++++++++++++++
kernel/sched/Makefile | 2 +-
kernel/sched/swait.c | 122 +++++++++++++++++++++++++++++++++++
3 files changed, 295 insertions(+), 1 deletion(-)
create mode 100644 include/linux/swait.h
create mode 100644 kernel/sched/swait.c
diff --git a/include/linux/swait.h b/include/linux/swait.h
new file mode 100644
index 0000000..c1f9c62
--- /dev/null
+++ b/include/linux/swait.h
@@ -0,0 +1,172 @@
+#ifndef _LINUX_SWAIT_H
+#define _LINUX_SWAIT_H
+
+#include <linux/list.h>
+#include <linux/stddef.h>
+#include <linux/spinlock.h>
+#include <asm/current.h>
+
+/*
+ * Simple wait queues
+ *
+ * While these are very similar to the other/complex wait queues (wait.h) the
+ * most important difference is that the simple waitqueue allows for
+ * deterministic behaviour -- IOW it has strictly bounded IRQ and lock hold
+ * times.
+ *
+ * In order to make this so, we had to drop a fair number of features of the
+ * other waitqueue code; notably:
+ *
+ * - mixing INTERRUPTIBLE and UNINTERRUPTIBLE sleeps on the same waitqueue;
+ * all wakeups are TASK_NORMAL in order to avoid O(n) lookups for the right
+ * sleeper state.
+ *
+ * - the exclusive mode; because this requires preserving the list order
+ * and this is hard.
+ *
+ * - custom wake functions; because you cannot give any guarantees about
+ * random code.
+ *
+ * As a side effect of this; the data structures are slimmer.
+ *
+ * One would recommend using this wait queue where possible.
+ */
+
+struct task_struct;
+
+struct swait_queue_head {
+ raw_spinlock_t lock;
+ struct list_head task_list;
+};
+
+struct swait_queue {
+ struct task_struct *task;
+ struct list_head task_list;
+};
+
+#define __SWAITQUEUE_INITIALIZER(name) { \
+ .task = current, \
+ .task_list = LIST_HEAD_INIT((name).task_list), \
+}
+
+#define DECLARE_SWAITQUEUE(name) \
+ struct swait_queue name = __SWAITQUEUE_INITIALIZER(name)
+
+#define __SWAIT_QUEUE_HEAD_INITIALIZER(name) { \
+ .lock = __RAW_SPIN_LOCK_UNLOCKED(name.lock), \
+ .task_list = LIST_HEAD_INIT((name).task_list), \
+}
+
+#define DECLARE_SWAIT_QUEUE_HEAD(name) \
+ struct swait_queue_head name = __SWAIT_QUEUE_HEAD_INITIALIZER(name)
+
+extern void __init_swait_queue_head(struct swait_queue_head *q, const char *name,
+ struct lock_class_key *key);
+
+#define init_swait_queue_head(q) \
+ do { \
+ static struct lock_class_key __key; \
+ __init_swait_queue_head((q), #q, &__key); \
+ } while (0)
+
+#ifdef CONFIG_LOCKDEP
+# define __SWAIT_QUEUE_HEAD_INIT_ONSTACK(name) \
+ ({ init_swait_queue_head(&name); name; })
+# define DECLARE_SWAIT_QUEUE_HEAD_ONSTACK(name) \
+ struct swait_queue_head name = __SWAIT_QUEUE_HEAD_INIT_ONSTACK(name)
+#else
+# define DECLARE_SWAIT_QUEUE_HEAD_ONSTACK(name) \
+ DECLARE_SWAIT_QUEUE_HEAD(name)
+#endif
+
+static inline int swait_active(struct swait_queue_head *q)
+{
+ return !list_empty(&q->task_list);
+}
+
+extern void swake_up(struct swait_queue_head *q);
+extern void swake_up_all(struct swait_queue_head *q);
+extern void swake_up_locked(struct swait_queue_head *q);
+
+extern void __prepare_to_swait(struct swait_queue_head *q, struct swait_queue *wait);
+extern void prepare_to_swait(struct swait_queue_head *q, struct swait_queue *wait, int state);
+extern long prepare_to_swait_event(struct swait_queue_head *q, struct swait_queue *wait, int state);
+
+extern void __finish_swait(struct swait_queue_head *q, struct swait_queue *wait);
+extern void finish_swait(struct swait_queue_head *q, struct swait_queue *wait);
+
+/* as per ___wait_event() but for swait, therefore "exclusive == 0" */
+#define ___swait_event(wq, condition, state, ret, cmd) \
+({ \
+ struct swait_queue __wait; \
+ long __ret = ret; \
+ \
+ INIT_LIST_HEAD(&__wait.task_list); \
+ for (;;) { \
+ long __int = prepare_to_swait_event(&wq, &__wait, state);\
+ \
+ if (condition) \
+ break; \
+ \
+ if (___wait_is_interruptible(state) && __int) { \
+ __ret = __int; \
+ break; \
+ } \
+ \
+ cmd; \
+ } \
+ finish_swait(&wq, &__wait); \
+ __ret; \
+})
+
+#define __swait_event(wq, condition) \
+ (void)___swait_event(wq, condition, TASK_UNINTERRUPTIBLE, 0, \
+ schedule())
+
+#define swait_event(wq, condition) \
+do { \
+ if (condition) \
+ break; \
+ __swait_event(wq, condition); \
+} while (0)
+
+#define __swait_event_timeout(wq, condition, timeout) \
+ ___swait_event(wq, ___wait_cond_timeout(condition), \
+ TASK_UNINTERRUPTIBLE, timeout, \
+ __ret = schedule_timeout(__ret))
+
+#define swait_event_timeout(wq, condition, timeout) \
+({ \
+ long __ret = timeout; \
+ if (!___wait_cond_timeout(condition)) \
+ __ret = __swait_event_timeout(wq, condition, timeout); \
+ __ret; \
+})
+
+#define __swait_event_interruptible(wq, condition) \
+ ___swait_event(wq, condition, TASK_INTERRUPTIBLE, 0, \
+ schedule())
+
+#define swait_event_interruptible(wq, condition) \
+({ \
+ int __ret = 0; \
+ if (!(condition)) \
+ __ret = __swait_event_interruptible(wq, condition); \
+ __ret; \
+})
+
+#define __swait_event_interruptible_timeout(wq, condition, timeout) \
+ ___swait_event(wq, ___wait_cond_timeout(condition), \
+ TASK_INTERRUPTIBLE, timeout, \
+ __ret = schedule_timeout(__ret))
+
+#define swait_event_interruptible_timeout(wq, condition, timeout) \
+({ \
+ long __ret = timeout; \
+ if (!___wait_cond_timeout(condition)) \
+ __ret = __swait_event_interruptible_timeout(wq, \
+ condition, timeout); \
+ __ret; \
+})
+
+#endif /* _LINUX_SWAIT_H */
diff --git a/kernel/sched/Makefile b/kernel/sched/Makefile
index 6768797..7d4cba2 100644
--- a/kernel/sched/Makefile
+++ b/kernel/sched/Makefile
@@ -13,7 +13,7 @@ endif
obj-y += core.o loadavg.o clock.o cputime.o
obj-y += idle_task.o fair.o rt.o deadline.o stop_task.o
-obj-y += wait.o completion.o idle.o
+obj-y += wait.o swait.o completion.o idle.o
obj-$(CONFIG_SMP) += cpupri.o cpudeadline.o
obj-$(CONFIG_SCHED_AUTOGROUP) += auto_group.o
obj-$(CONFIG_SCHEDSTATS) += stats.o
diff --git a/kernel/sched/swait.c b/kernel/sched/swait.c
new file mode 100644
index 0000000..533710e
--- /dev/null
+++ b/kernel/sched/swait.c
@@ -0,0 +1,122 @@
+#include <linux/sched.h>
+#include <linux/swait.h>
+
+void __init_swait_queue_head(struct swait_queue_head *q, const char *name,
+ struct lock_class_key *key)
+{
+ raw_spin_lock_init(&q->lock);
+ lockdep_set_class_and_name(&q->lock, key, name);
+ INIT_LIST_HEAD(&q->task_list);
+}
+EXPORT_SYMBOL(__init_swait_queue_head);
+
+/*
+ * The thing about the wake_up_state() return value; I think we can ignore it.
+ *
+ * If for some reason it would return 0, that means the previously waiting
+ * task is already running, so it will observe condition true (or has already).
+ */
+void swake_up_locked(struct swait_queue_head *q)
+{
+ struct swait_queue *curr;
+
+ list_for_each_entry(curr, &q->task_list, task_list) {
+ wake_up_process(curr->task);
+ list_del_init(&curr->task_list);
+ break;
+ }
+}
+EXPORT_SYMBOL(swake_up_locked);
+
+void swake_up(struct swait_queue_head *q)
+{
+ unsigned long flags;
+
+ if (!swait_active(q))
+ return;
+
+ raw_spin_lock_irqsave(&q->lock, flags);
+ swake_up_locked(q);
+ raw_spin_unlock_irqrestore(&q->lock, flags);
+}
+EXPORT_SYMBOL(swake_up);
+
+/*
+ * Does not allow usage from IRQ disabled, since we must be able to
+ * release IRQs to guarantee bounded hold time.
+ */
+void swake_up_all(struct swait_queue_head *q)
+{
+ struct swait_queue *curr;
+ LIST_HEAD(tmp);
+
+ if (!swait_active(q))
+ return;
+
+ raw_spin_lock_irq(&q->lock);
+ list_splice_init(&q->task_list, &tmp);
+ while (!list_empty(&tmp)) {
+ curr = list_first_entry(&tmp, typeof(*curr), task_list);
+
+ wake_up_state(curr->task, TASK_NORMAL);
+ list_del_init(&curr->task_list);
+
+ if (list_empty(&tmp))
+ break;
+
+ raw_spin_unlock_irq(&q->lock);
+ raw_spin_lock_irq(&q->lock);
+ }
+ raw_spin_unlock_irq(&q->lock);
+}
+EXPORT_SYMBOL(swake_up_all);
+
+void __prepare_to_swait(struct swait_queue_head *q, struct swait_queue *wait)
+{
+ wait->task = current;
+ if (list_empty(&wait->task_list))
+ list_add(&wait->task_list, &q->task_list);
+}
+
+void prepare_to_swait(struct swait_queue_head *q, struct swait_queue *wait, int state)
+{
+ unsigned long flags;
+
+ raw_spin_lock_irqsave(&q->lock, flags);
+ __prepare_to_swait(q, wait);
+ set_current_state(state);
+ raw_spin_unlock_irqrestore(&q->lock, flags);
+}
+EXPORT_SYMBOL(prepare_to_swait);
+
+long prepare_to_swait_event(struct swait_queue_head *q, struct swait_queue *wait, int state)
+{
+ if (signal_pending_state(state, current))
+ return -ERESTARTSYS;
+
+ prepare_to_swait(q, wait, state);
+
+ return 0;
+}
+EXPORT_SYMBOL(prepare_to_swait_event);
+
+void __finish_swait(struct swait_queue_head *q, struct swait_queue *wait)
+{
+ __set_current_state(TASK_RUNNING);
+ if (!list_empty(&wait->task_list))
+ list_del_init(&wait->task_list);
+}
+
+void finish_swait(struct swait_queue_head *q, struct swait_queue *wait)
+{
+ unsigned long flags;
+
+ __set_current_state(TASK_RUNNING);
+
+ if (!list_empty_careful(&wait->task_list)) {
+ raw_spin_lock_irqsave(&q->lock, flags);
+ list_del_init(&wait->task_list);
+ raw_spin_unlock_irqrestore(&q->lock, flags);
+ }
+}
+EXPORT_SYMBOL(finish_swait);
--
2.4.3
From: Paul Gortmaker <[email protected]>
Completions have no long lasting callbacks and therefore do not need
the complex waitqueue variant. Use simple waitqueues which reduces
the contention on the waitqueue lock.
This was a carry forward from v3.10-rt, with some RT specific chunks,
dropped, and updated to align with names that were chosen to match the
simple waitqueue support.
Originally-by: Thomas Gleixner <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
include/linux/completion.h | 8 ++++----
kernel/sched/completion.c | 32 ++++++++++++++++----------------
2 files changed, 20 insertions(+), 20 deletions(-)
diff --git a/include/linux/completion.h b/include/linux/completion.h
index 5d5aaae..8b3bca8 100644
--- a/include/linux/completion.h
+++ b/include/linux/completion.h
@@ -8,7 +8,7 @@
* See kernel/sched/completion.c for details.
*/
-#include <linux/wait.h>
+#include <linux/swait.h>
/*
* struct completion - structure used to maintain state for a "completion"
@@ -24,11 +24,11 @@
*/
struct completion {
unsigned int done;
- wait_queue_head_t wait;
+ struct swait_queue_head wait;
};
#define COMPLETION_INITIALIZER(work) \
- { 0, __WAIT_QUEUE_HEAD_INITIALIZER((work).wait) }
+ { 0, __SWAIT_QUEUE_HEAD_INITIALIZER((work).wait) }
#define COMPLETION_INITIALIZER_ONSTACK(work) \
({ init_completion(&work); work; })
@@ -73,7 +73,7 @@ struct completion {
static inline void init_completion(struct completion *x)
{
x->done = 0;
- init_waitqueue_head(&x->wait);
+ init_swait_queue_head(&x->wait);
}
/**
diff --git a/kernel/sched/completion.c b/kernel/sched/completion.c
index 8d0f35d..b020159 100644
--- a/kernel/sched/completion.c
+++ b/kernel/sched/completion.c
@@ -30,10 +30,10 @@ void complete(struct completion *x)
{
unsigned long flags;
- spin_lock_irqsave(&x->wait.lock, flags);
+ raw_spin_lock_irqsave(&x->wait.lock, flags);
x->done++;
- __wake_up_locked(&x->wait, TASK_NORMAL, 1);
- spin_unlock_irqrestore(&x->wait.lock, flags);
+ swake_up_locked(&x->wait);
+ raw_spin_unlock_irqrestore(&x->wait.lock, flags);
}
EXPORT_SYMBOL(complete);
@@ -50,10 +50,10 @@ void complete_all(struct completion *x)
{
unsigned long flags;
- spin_lock_irqsave(&x->wait.lock, flags);
+ raw_spin_lock_irqsave(&x->wait.lock, flags);
x->done += UINT_MAX/2;
- __wake_up_locked(&x->wait, TASK_NORMAL, 0);
- spin_unlock_irqrestore(&x->wait.lock, flags);
+ swake_up_locked(&x->wait);
+ raw_spin_unlock_irqrestore(&x->wait.lock, flags);
}
EXPORT_SYMBOL(complete_all);
@@ -62,20 +62,20 @@ do_wait_for_common(struct completion *x,
long (*action)(long), long timeout, int state)
{
if (!x->done) {
- DECLARE_WAITQUEUE(wait, current);
+ DECLARE_SWAITQUEUE(wait);
- __add_wait_queue_tail_exclusive(&x->wait, &wait);
+ __prepare_to_swait(&x->wait, &wait);
do {
if (signal_pending_state(state, current)) {
timeout = -ERESTARTSYS;
break;
}
__set_current_state(state);
- spin_unlock_irq(&x->wait.lock);
+ raw_spin_unlock_irq(&x->wait.lock);
timeout = action(timeout);
- spin_lock_irq(&x->wait.lock);
+ raw_spin_lock_irq(&x->wait.lock);
} while (!x->done && timeout);
- __remove_wait_queue(&x->wait, &wait);
+ __finish_swait(&x->wait, &wait);
if (!x->done)
return timeout;
}
@@ -89,9 +89,9 @@ __wait_for_common(struct completion *x,
{
might_sleep();
- spin_lock_irq(&x->wait.lock);
+ raw_spin_lock_irq(&x->wait.lock);
timeout = do_wait_for_common(x, action, timeout, state);
- spin_unlock_irq(&x->wait.lock);
+ raw_spin_unlock_irq(&x->wait.lock);
return timeout;
}
@@ -277,12 +277,12 @@ bool try_wait_for_completion(struct completion *x)
if (!READ_ONCE(x->done))
return 0;
- spin_lock_irqsave(&x->wait.lock, flags);
+ raw_spin_lock_irqsave(&x->wait.lock, flags);
if (!x->done)
ret = 0;
else
x->done--;
- spin_unlock_irqrestore(&x->wait.lock, flags);
+ raw_spin_unlock_irqrestore(&x->wait.lock, flags);
return ret;
}
EXPORT_SYMBOL(try_wait_for_completion);
@@ -311,7 +311,7 @@ bool completion_done(struct completion *x)
* after it's acquired the lock.
*/
smp_rmb();
- spin_unlock_wait(&x->wait.lock);
+ raw_spin_unlock_wait(&x->wait.lock);
return true;
}
EXPORT_SYMBOL(completion_done);
--
2.4.3
From: Paul Gortmaker <[email protected]>
As of commit dae6e64d2bcfd4b06304ab864c7e3a4f6b5fedf4 ("rcu: Introduce
proper blocking to no-CBs kthreads GP waits") the RCU subsystem started
making use of wait queues.
Here we convert all additions of RCU wait queues to use simple wait queues,
since they don't need the extra overhead of the full wait queue features.
Originally this was done for RT kernels[1], since we would get things like...
BUG: sleeping function called from invalid context at kernel/rtmutex.c:659
in_atomic(): 1, irqs_disabled(): 1, pid: 8, name: rcu_preempt
Pid: 8, comm: rcu_preempt Not tainted
Call Trace:
[<ffffffff8106c8d0>] __might_sleep+0xd0/0xf0
[<ffffffff817d77b4>] rt_spin_lock+0x24/0x50
[<ffffffff8106fcf6>] __wake_up+0x36/0x70
[<ffffffff810c4542>] rcu_gp_kthread+0x4d2/0x680
[<ffffffff8105f910>] ? __init_waitqueue_head+0x50/0x50
[<ffffffff810c4070>] ? rcu_gp_fqs+0x80/0x80
[<ffffffff8105eabb>] kthread+0xdb/0xe0
[<ffffffff8106b912>] ? finish_task_switch+0x52/0x100
[<ffffffff817e0754>] kernel_thread_helper+0x4/0x10
[<ffffffff8105e9e0>] ? __init_kthread_worker+0x60/0x60
[<ffffffff817e0750>] ? gs_change+0xb/0xb
...and hence simple wait queues were deployed on RT out of necessity
(as simple wait uses a raw lock), but mainline might as well take
advantage of the more streamline support as well.
[1] This is a carry forward of work from v3.10-rt; the original conversion
was by Thomas on an earlier -rt version, and Sebastian extended it to
additional post-3.10 added RCU waiters; here I've added a commit log and
unified the RCU changes into one, and uprev'd it to match mainline RCU.
Cc: Thomas Gleixner <[email protected]>
Cc: Sebastian Andrzej Siewior <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Steven Rostedt <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
kernel/rcu/tree.c | 13 +++++++------
kernel/rcu/tree.h | 6 +++---
kernel/rcu/tree_plugin.h | 18 +++++++++---------
3 files changed, 19 insertions(+), 18 deletions(-)
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 65137bc..d424378 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1590,7 +1590,7 @@ static void rcu_gp_kthread_wake(struct rcu_state *rsp)
!READ_ONCE(rsp->gp_flags) ||
!rsp->gp_kthread)
return;
- wake_up(&rsp->gp_wq);
+ swake_up(&rsp->gp_wq);
}
/*
@@ -2038,7 +2038,7 @@ static int __noreturn rcu_gp_kthread(void *arg)
READ_ONCE(rsp->gpnum),
TPS("reqwait"));
rsp->gp_state = RCU_GP_WAIT_GPS;
- wait_event_interruptible(rsp->gp_wq,
+ swait_event_interruptible(rsp->gp_wq,
READ_ONCE(rsp->gp_flags) &
RCU_GP_FLAG_INIT);
/* Locking provides needed memory barrier. */
@@ -2067,7 +2067,7 @@ static int __noreturn rcu_gp_kthread(void *arg)
READ_ONCE(rsp->gpnum),
TPS("fqswait"));
rsp->gp_state = RCU_GP_WAIT_FQS;
- ret = wait_event_interruptible_timeout(rsp->gp_wq,
+ ret = swait_event_interruptible_timeout(rsp->gp_wq,
((gf = READ_ONCE(rsp->gp_flags)) &
RCU_GP_FLAG_FQS) ||
(!READ_ONCE(rnp->qsmask) &&
@@ -2191,7 +2191,7 @@ static void rcu_report_qs_rsp(struct rcu_state *rsp, unsigned long flags)
WARN_ON_ONCE(!rcu_gp_in_progress(rsp));
WRITE_ONCE(rsp->gp_flags, READ_ONCE(rsp->gp_flags) | RCU_GP_FLAG_FQS);
raw_spin_unlock_irqrestore(&rcu_get_root(rsp)->lock, flags);
- rcu_gp_kthread_wake(rsp);
+ swake_up(&rsp->gp_wq); /* Memory barrier implied by swake_up() path. */
}
/*
@@ -2852,7 +2852,7 @@ static void force_quiescent_state(struct rcu_state *rsp)
}
WRITE_ONCE(rsp->gp_flags, READ_ONCE(rsp->gp_flags) | RCU_GP_FLAG_FQS);
raw_spin_unlock_irqrestore(&rnp_old->lock, flags);
- rcu_gp_kthread_wake(rsp);
+ swake_up(&rsp->gp_wq); /* Memory barrier implied by swake_up() path. */
}
/*
@@ -4056,7 +4056,8 @@ static void __init rcu_init_one(struct rcu_state *rsp,
}
}
- init_waitqueue_head(&rsp->gp_wq);
+ rsp->rda = rda;
+ init_swait_queue_head(&rsp->gp_wq);
rnp = rsp->level[rcu_num_lvls - 1];
for_each_possible_cpu(i) {
while (i > rnp->grphi)
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index 4adb7ca..a5eecc4 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -230,7 +230,7 @@ struct rcu_node {
/* Refused to boost: not sure why, though. */
/* This can happen due to race conditions. */
#ifdef CONFIG_RCU_NOCB_CPU
- wait_queue_head_t nocb_gp_wq[2];
+ struct swait_queue_head nocb_gp_wq[2];
/* Place for rcu_nocb_kthread() to wait GP. */
#endif /* #ifdef CONFIG_RCU_NOCB_CPU */
int need_future_gp[2];
@@ -369,7 +369,7 @@ struct rcu_data {
atomic_long_t nocb_q_count_lazy; /* invocation (all stages). */
struct rcu_head *nocb_follower_head; /* CBs ready to invoke. */
struct rcu_head **nocb_follower_tail;
- wait_queue_head_t nocb_wq; /* For nocb kthreads to sleep on. */
+ struct swait_queue_head nocb_wq; /* For nocb kthreads to sleep on. */
struct task_struct *nocb_kthread;
int nocb_defer_wakeup; /* Defer wakeup of nocb_kthread. */
@@ -458,7 +458,7 @@ struct rcu_state {
unsigned long gpnum; /* Current gp number. */
unsigned long completed; /* # of last completed gp. */
struct task_struct *gp_kthread; /* Task for grace periods. */
- wait_queue_head_t gp_wq; /* Where GP task waits. */
+ struct swait_queue_head gp_wq; /* Where GP task waits. */
short gp_flags; /* Commands for GP task. */
short gp_state; /* GP kthread sleep state. */
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 013485f..9ea46a8 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -1869,7 +1869,7 @@ early_param("rcu_nocb_poll", parse_rcu_nocb_poll);
*/
static void rcu_nocb_gp_cleanup(struct rcu_state *rsp, struct rcu_node *rnp)
{
- wake_up_all(&rnp->nocb_gp_wq[rnp->completed & 0x1]);
+ swake_up_all(&rnp->nocb_gp_wq[rnp->completed & 0x1]);
}
/*
@@ -1887,8 +1887,8 @@ static void rcu_nocb_gp_set(struct rcu_node *rnp, int nrq)
static void rcu_init_one_nocb(struct rcu_node *rnp)
{
- init_waitqueue_head(&rnp->nocb_gp_wq[0]);
- init_waitqueue_head(&rnp->nocb_gp_wq[1]);
+ init_swait_queue_head(&rnp->nocb_gp_wq[0]);
+ init_swait_queue_head(&rnp->nocb_gp_wq[1]);
}
#ifndef CONFIG_RCU_NOCB_CPU_ALL
@@ -1913,7 +1913,7 @@ static void wake_nocb_leader(struct rcu_data *rdp, bool force)
if (READ_ONCE(rdp_leader->nocb_leader_sleep) || force) {
/* Prior smp_mb__after_atomic() orders against prior enqueue. */
WRITE_ONCE(rdp_leader->nocb_leader_sleep, false);
- wake_up(&rdp_leader->nocb_wq);
+ swake_up(&rdp_leader->nocb_wq);
}
}
@@ -2126,7 +2126,7 @@ static void rcu_nocb_wait_gp(struct rcu_data *rdp)
*/
trace_rcu_future_gp(rnp, rdp, c, TPS("StartWait"));
for (;;) {
- wait_event_interruptible(
+ swait_event_interruptible(
rnp->nocb_gp_wq[c & 0x1],
(d = ULONG_CMP_GE(READ_ONCE(rnp->completed), c)));
if (likely(d))
@@ -2154,7 +2154,7 @@ wait_again:
/* Wait for callbacks to appear. */
if (!rcu_nocb_poll) {
trace_rcu_nocb_wake(my_rdp->rsp->name, my_rdp->cpu, "Sleep");
- wait_event_interruptible(my_rdp->nocb_wq,
+ swait_event_interruptible(my_rdp->nocb_wq,
!READ_ONCE(my_rdp->nocb_leader_sleep));
/* Memory barrier handled by smp_mb() calls below and repoll. */
} else if (firsttime) {
@@ -2229,7 +2229,7 @@ wait_again:
* List was empty, wake up the follower.
* Memory barriers supplied by atomic_long_add().
*/
- wake_up(&rdp->nocb_wq);
+ swake_up(&rdp->nocb_wq);
}
}
@@ -2250,7 +2250,7 @@ static void nocb_follower_wait(struct rcu_data *rdp)
if (!rcu_nocb_poll) {
trace_rcu_nocb_wake(rdp->rsp->name, rdp->cpu,
"FollowerSleep");
- wait_event_interruptible(rdp->nocb_wq,
+ swait_event_interruptible(rdp->nocb_wq,
READ_ONCE(rdp->nocb_follower_head));
} else if (firsttime) {
/* Don't drown trace log with "Poll"! */
@@ -2409,7 +2409,7 @@ void __init rcu_init_nohz(void)
static void __init rcu_boot_init_nocb_percpu_data(struct rcu_data *rdp)
{
rdp->nocb_tail = &rdp->nocb_head;
- init_waitqueue_head(&rdp->nocb_wq);
+ init_swait_queue_head(&rdp->nocb_wq);
rdp->nocb_follower_tail = &rdp->nocb_follower_head;
}
--
2.4.3
[[RFC v0 0/3] Simple wait queue support] On 05/08/2015 (Wed 15:30) Daniel Wagner wrote:
> Hi,
>
> It's a while since the last attempt by Paul to get simple wait ready
> for mainline [1]. At the last realtime workshop it was discussed how
> the swait implementation could be made preempt aware. Peter posted an
> untested version of it here [2].
So, from memory, here are the issues or questions that need answers
before we can consider trying mainline IMO.
1) naming: do we keep the swait, do we try and morph complex wait users
into using cwait, or some mix of the two, or ... ?
2) placement: as I think I said before, the standalone files works for
the -rt patches because it is the lowest maintenance solution, but
IMO for mainline, the simple and complex versions should be right
beside each other so they can be easily contrasted and compared and
so any changes to one will naturally also flow to the other.
3) barrier usage: we'd had some questions and patches in the past that
futz'd around with the use of barriers, and as a mainline requirement
we'd need someone to check, understand and document them all properly.
4) poll_wait: currently it and poll_table_entry are both hard coupled
to wait_queue_head_t -- so any users of poll_wait are not eligible
for conversion to simple wait. (I just happened to notice that
recently.) A quick grep shows ~500 poll_wait users.
5) the aforementioned "don't do an unbounded number of callbacks while
holding the raw lock" issue.
We should solve #5 for -rt regardless; I wouldn't attempt to make a
new "for mainline" set again w/o some consensus on #1 and #2, and I
think it would take someone like peterz/paulmck/rostedt to do #3
properly. I don't know if #4 is an issue we need to worry about
right away; probably not. And I'm sure I'll think of some other
issue five seconds after I hit send...
Paul.
--
>
> In order to test it, I used Paul's two patches which makes completion
> and rcu using swait instead of wait. Some small renamings were
> necessary to get it working, e.g. s/swait_head/swait_queue_head/.
>
> My test system didn't crash or showed any obvious defects, so I
> decided to apply some benchmarks utilizing mmtests. I have picked some
> random tests (kernbench aim9 vmr-stream ebizz), which didn't require a
> lot of tinker around to get them running. The results are here:
>
> baseline: v4.2-rc5-22-ged8bbba
>
> http://monom.org/mmtests-swait-peterz-v1/
>
> I don't think the numbers are trustworthy yet. Mabye one could read
> it as it doesn't explode and the numbers aren't to far away from
> baseline. I need to figure out which tests are fitting for these
> patches and what are the 'right' parameters for them.
>
> Sebastian had some comments on Peter's patch. I haven't addressed them
> yet [3].
>
> cheers,
> daniel
>
> [1] https://lwn.net/Articles/616857/
> [2] http://www.spinics.net/lists/linux-rt-users/msg12703.html
> [3] https://www.mail-archive.com/[email protected]/msg832142.html
>
> Paul Gortmaker (2):
> sched/completion: convert completions to use simple wait queues
> rcu: use simple wait queues where possible in rcutree
>
> Peter Zijlstra (1):
> KVM: use simple waitqueue for vcpu->wq
>
> include/linux/completion.h | 8 +--
> include/linux/swait.h | 172 +++++++++++++++++++++++++++++++++++++++++++++
> kernel/rcu/tree.c | 13 ++--
> kernel/rcu/tree.h | 6 +-
> kernel/rcu/tree_plugin.h | 18 ++---
> kernel/sched/Makefile | 2 +-
> kernel/sched/completion.c | 32 ++++-----
> kernel/sched/swait.c | 122 ++++++++++++++++++++++++++++++++
> 8 files changed, 334 insertions(+), 39 deletions(-)
> create mode 100644 include/linux/swait.h
> create mode 100644 kernel/sched/swait.c
>
> --
> 2.4.3
>
On Thu, 6 Aug 2015 15:22:45 -0400
Paul Gortmaker <[email protected]> wrote:
> [[RFC v0 0/3] Simple wait queue support] On 05/08/2015 (Wed 15:30) Daniel Wagner wrote:
>
> > Hi,
> >
> > It's a while since the last attempt by Paul to get simple wait ready
> > for mainline [1]. At the last realtime workshop it was discussed how
> > the swait implementation could be made preempt aware. Peter posted an
> > untested version of it here [2].
>
> So, from memory, here are the issues or questions that need answers
> before we can consider trying mainline IMO.
>
> 1) naming: do we keep the swait, do we try and morph complex wait users
> into using cwait, or some mix of the two, or ... ?
I would say keep swait for now. Convert a few major hitters to it to
test it out. After a release or two, create a cwait, and start
converting the complex waits to them. Then after a period of stability,
convert the normal wait_queues to be implemented with swait, and remove
the swait from the kernel.
>
> 2) placement: as I think I said before, the standalone files works for
> the -rt patches because it is the lowest maintenance solution, but
> IMO for mainline, the simple and complex versions should be right
> beside each other so they can be easily contrasted and compared and
> so any changes to one will naturally also flow to the other.
I'm agnostic on this part.
>
> 3) barrier usage: we'd had some questions and patches in the past that
> futz'd around with the use of barriers, and as a mainline requirement
> we'd need someone to check, understand and document them all properly.
Sounds like a plan.
>
> 4) poll_wait: currently it and poll_table_entry are both hard coupled
> to wait_queue_head_t -- so any users of poll_wait are not eligible
> for conversion to simple wait. (I just happened to notice that
> recently.) A quick grep shows ~500 poll_wait users.
Looks like a good candidate to test the cwait on :-)
>
> 5) the aforementioned "don't do an unbounded number of callbacks while
> holding the raw lock" issue.
>
> We should solve #5 for -rt regardless; I wouldn't attempt to make a
> new "for mainline" set again w/o some consensus on #1 and #2, and I
> think it would take someone like peterz/paulmck/rostedt to do #3
> properly. I don't know if #4 is an issue we need to worry about
> right away; probably not. And I'm sure I'll think of some other
> issue five seconds after I hit send...
>
5... 4... 3... 2... 1... (where's the other issue?)
-- Steve
On Thu, Aug 06, 2015 at 03:31:20PM -0400, Steven Rostedt wrote:
> > 4) poll_wait: currently it and poll_table_entry are both hard coupled
> > to wait_queue_head_t -- so any users of poll_wait are not eligible
> > for conversion to simple wait. (I just happened to notice that
> > recently.) A quick grep shows ~500 poll_wait users.
>
> Looks like a good candidate to test the cwait on :-)
Poll uses custom wake functions, which disqualifies them on the spot.
On Thu, 6 Aug 2015 23:36:49 +0200
Peter Zijlstra <[email protected]> wrote:
> On Thu, Aug 06, 2015 at 03:31:20PM -0400, Steven Rostedt wrote:
> > > 4) poll_wait: currently it and poll_table_entry are both hard coupled
> > > to wait_queue_head_t -- so any users of poll_wait are not eligible
> > > for conversion to simple wait. (I just happened to notice that
> > > recently.) A quick grep shows ~500 poll_wait users.
> >
> > Looks like a good candidate to test the cwait on :-)
>
> Poll uses custom wake functions, which disqualifies them on the spot.
For cwait? That's the complex wait queue (what the current wait queues
will be called in the future).
-- Steve
On 08/05/2015 03:30 PM, Daniel Wagner wrote:
> My test system didn't crash or showed any obvious defects, so I
> decided to apply some benchmarks utilizing mmtests. I have picked some
As it turns out, this is not really true. I forgot to enable lockdep:
[ 0.053193] =================================
[ 0.053490] [ INFO: inconsistent lock state ]
[ 0.053790] 4.2.0-rc5-00022-g6d2f78c #63 Not tainted
[ 0.054000] ---------------------------------
[ 0.054000] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
[ 0.054000] rcu_preempt/7 [HC0[0]:SC0[0]:HE1:SE1] takes:
[ 0.054000] (rcu_node_0){+.?...}, at: [<ffffffff81133643>] rcu_gp_kthread+0xfe3/0x2390
[ 0.054000] {IN-SOFTIRQ-W} state was registered at:
[ 0.054000] [<ffffffff81108a6c>] __lock_acquire+0xbfc/0x2050
[ 0.054000] [<ffffffff8110acbf>] lock_acquire+0xdf/0x2b0
[ 0.054000] [<ffffffff81913b19>] _raw_spin_lock_irqsave+0x59/0xa0
[ 0.054000] [<ffffffff811357e9>] rcu_process_callbacks+0x109/0xf00
[ 0.054000] [<ffffffff810b195d>] __do_softirq+0x14d/0x670
[ 0.054000] [<ffffffff810b20d1>] irq_exit+0x101/0x110
[ 0.054000] [<ffffffff81916d16>] smp_apic_timer_interrupt+0x46/0x60
[ 0.054000] [<ffffffff81914cf0>] apic_timer_interrupt+0x70/0x80
[ 0.054000] [<ffffffff814d7a97>] vsnprintf+0x367/0x530
[ 0.054000] [<ffffffff814d7d03>] snprintf+0x43/0x60
[ 0.054000] [<ffffffff810d90cf>] register_sched_domain_sysctl+0xdf/0x670
[ 0.054000] [<ffffffff8217c74d>] sched_init_smp+0x524/0x5d7
[ 0.054000] [<ffffffff8215a1e8>] kernel_init_freeable+0x171/0x28f
[ 0.054000] [<ffffffff818fe3be>] kernel_init+0xe/0xe0
[ 0.054000] [<ffffffff8191419f>] ret_from_fork+0x3f/0x70
[ 0.054000] irq event stamp: 46
[ 0.054000] hardirqs last enabled at (45): [<ffffffff819131a0>] _raw_spin_unlock_irq+0x30/0x60
[ 0.054000] hardirqs last disabled at (46): [<ffffffff81912ebf>] _raw_spin_lock_irq+0x1f/0x90
[ 0.054000] softirqs last enabled at (0): [<ffffffff810a8c98>] copy_process.part.26+0x628/0x1ce0
[ 0.054000] softirqs last disabled at (0): [< (null)>] (null)
[ 0.054000]
[ 0.054000] other info that might help us debug this:
[ 0.054000] Possible unsafe locking scenario:
[ 0.054000]
[ 0.054000] CPU0
[ 0.054000] ----
[ 0.054000] lock(rcu_node_0);
[ 0.054000] <Interrupt>
[ 0.054000] lock(rcu_node_0);
[ 0.054000]
[ 0.054000] *** DEADLOCK ***
[ 0.054000]
[ 0.054000] 1 lock held by rcu_preempt/7:
[ 0.054000] #0: (rcu_node_0){+.?...}, at: [<ffffffff81133643>] rcu_gp_kthread+0xfe3/0x2390
[ 0.054000]
[ 0.054000] stack backtrace:
[ 0.054000] CPU: 0 PID: 7 Comm: rcu_preempt Not tainted 4.2.0-rc5-00022-g6d2f78c #63
[ 0.054000] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.1-20150318_183358- 04/01/2014
[ 0.054000] 0000000000000000 00000000be272de7 ffff88007c0e7b98 ffffffff81908ed7
[ 0.054000] 0000000000000000 ffff88007c0e8000 ffff88007c0e7bf8 ffffffff81107195
[ 0.054000] 0000000000000000 ffff880000000001 ffff880000000001 ffffffff8102fa9f
[ 0.054000] Call Trace:
[ 0.054000] [<ffffffff81908ed7>] dump_stack+0x4f/0x7b
[ 0.054000] [<ffffffff81107195>] print_usage_bug+0x215/0x240
[ 0.054000] [<ffffffff8102fa9f>] ? save_stack_trace+0x2f/0x50
[ 0.054000] [<ffffffff81107767>] mark_lock+0x5a7/0x670
[ 0.054000] [<ffffffff811066d0>] ? check_usage_forwards+0x150/0x150
[ 0.054000] [<ffffffff811078ad>] mark_held_locks+0x7d/0xb0
[ 0.054000] [<ffffffff819131a0>] ? _raw_spin_unlock_irq+0x30/0x60
[ 0.054000] [<ffffffff81107a48>] trace_hardirqs_on_caller+0x168/0x220
[ 0.054000] [<ffffffff81107b0d>] trace_hardirqs_on+0xd/0x10
[ 0.054000] [<ffffffff819131a0>] _raw_spin_unlock_irq+0x30/0x60
[ 0.054000] [<ffffffff810fc0bb>] swake_up_all+0xbb/0xe0
[ 0.054000] [<ffffffff811336cd>] rcu_gp_kthread+0x106d/0x2390
[ 0.054000] [<ffffffff819131a0>] ? _raw_spin_unlock_irq+0x30/0x60
[ 0.054000] [<ffffffff8190c575>] ? __schedule+0x885/0x9e0
[ 0.054000] [<ffffffff810d2329>] ? kthread+0xd9/0x120
[ 0.054000] [<ffffffff81132660>] ? rcu_barrier+0x20/0x20
[ 0.054000] [<ffffffff810d2354>] kthread+0x104/0x120
[ 0.054000] [<ffffffff819131a0>] ? _raw_spin_unlock_irq+0x30/0x60
[ 0.054000] [<ffffffff810d2250>] ? kthread_create_on_node+0x260/0x260
[ 0.054000] [<ffffffff8191419f>] ret_from_fork+0x3f/0x70
[ 0.054000] [<ffffffff810d2250>] ? kthread_create_on_node+0x260/0x260
If I decoded this correctly the call to rcu_future_gp_cleanup() is
supposed to run with IRQs disabled. swake_up_all() though will reenable the
IRQs:
rcu_gp_cleanup()
rcu_for_each_node_breadth_first(rsp, rnp) {
raw_spin_lock_irq(&rnp->lock);
nocb += rcu_future_gp_cleanup(rsp, rnp);
raw_spin_unlock_irq(&rnp->lock);
}
rcu_future_gp_cleanup()
rcu_nocb_gp_cleanup()
swake_up_all()
With IRQs enabled again and we end up in rcu_process_callbacks
under SOFTIRQ. rcu_process_callbacks aquires the RCU lock again.
Not sure what to do here.
cheers,
daniel
On Thu, Aug 06, 2015 at 06:14:12PM -0400, Steven Rostedt wrote:
> On Thu, 6 Aug 2015 23:36:49 +0200
> Peter Zijlstra <[email protected]> wrote:
>
> > On Thu, Aug 06, 2015 at 03:31:20PM -0400, Steven Rostedt wrote:
> > > > 4) poll_wait: currently it and poll_table_entry are both hard coupled
> > > > to wait_queue_head_t -- so any users of poll_wait are not eligible
> > > > for conversion to simple wait. (I just happened to notice that
> > > > recently.) A quick grep shows ~500 poll_wait users.
> > >
> > > Looks like a good candidate to test the cwait on :-)
> >
> > Poll uses custom wake functions, which disqualifies them on the spot.
>
> For cwait? That's the complex wait queue (what the current wait queues
> will be called in the future).
Urgh, I don't think we should go rename that.. that's going to be lots
of pain for no to little gain.
On Fri, 7 Aug 2015 11:25:41 +0200
Peter Zijlstra <[email protected]> wrote:
> > For cwait? That's the complex wait queue (what the current wait queues
> > will be called in the future).
>
> Urgh, I don't think we should go rename that.. that's going to be lots
> of pain for no to little gain.
I thought we decided that we would have the end result be that the
default be the simple wait queue, and those that need more complexity,
would have a complex wait queue, as IIRC, the simple wait queue can
replace the majority of the existing wait queues.
-- Steve
On 08/07/2015 08:42 AM, Daniel Wagner wrote:
> On 08/05/2015 03:30 PM, Daniel Wagner wrote:
>> My test system didn't crash or showed any obvious defects, so I
>> decided to apply some benchmarks utilizing mmtests. I have picked some
>
> As it turns out, this is not really true. I forgot to enable lockdep:
[...]
> If I decoded this correctly the call to rcu_future_gp_cleanup() is
> supposed to run with IRQs disabled. swake_up_all() though will reenable the
> IRQs:
>
> rcu_gp_cleanup()
> rcu_for_each_node_breadth_first(rsp, rnp) {
> raw_spin_lock_irq(&rnp->lock);
>
> nocb += rcu_future_gp_cleanup(rsp, rnp);
> raw_spin_unlock_irq(&rnp->lock);
> }
>
> rcu_future_gp_cleanup()
> rcu_nocb_gp_cleanup()
> swake_up_all()
>
>
> With IRQs enabled again and we end up in rcu_process_callbacks
> under SOFTIRQ. rcu_process_callbacks aquires the RCU lock again.
>
> Not sure what to do here.
Not really knowing if this is okay but I think the call to
rcu_nocb_gp_cleanup() inside rcu_future_gp_cleanup() doesn't need to be
protected by rnp->lock. At least lockdep and rcutorture is still happy.
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index d424378..9411fc3 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1569,7 +1569,6 @@ static int rcu_future_gp_cleanup(struct rcu_state
*rsp, struct rcu_node *rnp)
int needmore;
struct rcu_data *rdp = this_cpu_ptr(rsp->rda);
- rcu_nocb_gp_cleanup(rsp, rnp);
rnp->need_future_gp[c & 0x1] = 0;
needmore = rnp->need_future_gp[(c + 1) & 0x1];
trace_rcu_future_gp(rnp, rdp, c,
@@ -1992,6 +1991,7 @@ static void rcu_gp_cleanup(struct rcu_state *rsp)
/* smp_mb() provided by prior unlock-lock pair. */
nocb += rcu_future_gp_cleanup(rsp, rnp);
raw_spin_unlock_irq(&rnp->lock);
+ rcu_nocb_gp_cleanup(rsp, rnp);
cond_resched_rcu_qs();
WRITE_ONCE(rsp->gp_activity, jiffies);
rcu_gp_slow(rsp, gp_cleanup_delay);
Hi Clark,
On 08/05/2015 03:30 PM, Daniel Wagner wrote:
> It's a while since the last attempt by Paul to get simple wait ready
> for mainline [1]. At the last realtime workshop it was discussed how
> the swait implementation could be made preempt aware. Peter posted an
> untested version of it here [2].
>
> In order to test it, I used Paul's two patches which makes completion
> and rcu using swait instead of wait. Some small renamings were
> necessary to get it working, e.g. s/swait_head/swait_queue_head/.
I have setup a git tree with the patches as you have requested:
https://git.kernel.org/cgit/linux/kernel/git/wagi/linux.git/log/?h=swait
If you have your list -> hlist patch ready I'll gladly test it.
cheers,
daniel