Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753193AbbDSTSH (ORCPT ); Sun, 19 Apr 2015 15:18:07 -0400 Received: from smtp2.provo.novell.com ([137.65.250.81]:50022 "EHLO smtp2.provo.novell.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753089AbbDSTSE (ORCPT ); Sun, 19 Apr 2015 15:18:04 -0400 From: Davidlohr Bueso To: Peter Zijlstra , Thomas Gleixner , Ingo Molnar Cc: Sebastian Andrzej Siewior , Linus Torvalds , Chris Mason , Steven Rostedt , fredrik.markstrom@windriver.com, linux-kernel@vger.kernel.org, Davidlohr Bueso , Davidlohr Bueso Subject: [PATCH 1/2] sched: lockless wake-queues Date: Sun, 19 Apr 2015 12:17:39 -0700 Message-Id: <1429471060-21271-2-git-send-email-dave@stgolabs.net> X-Mailer: git-send-email 2.1.4 In-Reply-To: <1429471060-21271-1-git-send-email-dave@stgolabs.net> References: <1429471060-21271-1-git-send-email-dave@stgolabs.net> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4582 Lines: 148 From: Peter Zijlstra This is useful for locking primitives that can effect multiple wakeups per operation and want to avoid lock internal lock contention by delaying the wakeups until we've released the lock internal locks. Alternatively it can be used to avoid issuing multiple wakeups, and thus save a few cycles, in packet processing. Queue all target tasks and wakeup once you've processed all packets. That way you avoid waking the target task multiple times if there were multiple packets for the same task. Properties of a wake_q are: - Lockless, as queue head must reside on the stack. - Being a queue, maintains wakeup order passed by the callers. This can be important for otherwise, in scenarios where highly contended locks could affect any reliance on lock fairness. It also respects user order of wakeups. - A queued task cannot be added again until it is woken up. This patch adds the needed infrastructure into the scheduler code and uses the new wake_q to delay the futex wakeups until after we've released the corresponding user locks. Signed-off-by: Peter Zijlstra [tweaks, adjustments, comments, etc.] Signed-off-by: Davidlohr Bueso --- include/linux/sched.h | 30 ++++++++++++++++++++++++++++++ kernel/sched/core.c | 49 +++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 79 insertions(+) diff --git a/include/linux/sched.h b/include/linux/sched.h index 8222ae4..3b20fe5 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -947,6 +947,34 @@ static inline int cpu_numa_flags(void) } #endif +/* + * Wake-queues are lists of tasks with a pending wakeup, whose + * callers have already marked the task as woken internally, + * and can thus carry on. A common use case is being able to + * do the wakeups once the corresponding lock as been released. + * + * We hold reference to each task in the list across the wakeup, + * thus guaranteeing that the memory is still valid by the time + * the actual wakeups are performed in wake_up_q(). + */ +struct wake_q_node { + struct wake_q_node *next; +}; + +struct wake_q_head { + struct wake_q_node *first; + struct wake_q_node *last; +}; + +#define WAKE_Q_TAIL ((struct wake_q_node *) 0x01) + +#define WAKE_Q(name) \ + struct wake_q_head name = { WAKE_Q_TAIL, WAKE_Q_TAIL } + +extern void wake_q_add(struct wake_q_head *head, + struct task_struct *task); +extern void wake_up_q(struct wake_q_head *head); + struct sched_domain_attr { int relax_domain_level; }; @@ -1519,6 +1547,8 @@ struct task_struct { /* Protection of the PI data structures: */ raw_spinlock_t pi_lock; + struct wake_q_node wake_q; + #ifdef CONFIG_RT_MUTEXES /* PI waiters blocked on a rt_mutex held by this task */ struct rb_root pi_waiters; diff --git a/kernel/sched/core.c b/kernel/sched/core.c index f9123a8..ebe6890 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -541,6 +541,55 @@ static bool set_nr_if_polling(struct task_struct *p) #endif #endif +void wake_q_add(struct wake_q_head *head, struct task_struct *task) +{ + struct wake_q_node *node = &task->wake_q; + + /* + * Atomically grab the task, if ->wake_q is !nil already it means + * its already queued (either by us or someone else) and will get the + * wakeup due to that. + * + * This cmpxchg() implies a full barrier, which pairs with the write + * barrier implied by the wakeup in wake_up_list(). + */ + if (cmpxchg(&node->next, NULL, WAKE_Q_TAIL)) + return; + + get_task_struct(task); + + /* + * The head is context local, there can be no concurrency. + */ + if (head->first == WAKE_Q_TAIL) + head->first = node; + else + head->last->next = node; + + head->last = node; +} + +void wake_up_q(struct wake_q_head *head) +{ + struct wake_q_node *node = head->first; + + while (node != WAKE_Q_TAIL) { + struct task_struct *task; + + task = container_of(node, struct task_struct, wake_q); + BUG_ON(!task); + node = node->next; + task->wake_q.next = NULL; /* task can safely be re-inserted now */ + + /* + * wake_up_process() implies a wmb() to pair with the queueing + * in wake_q_add() so as not to miss wakeups. + */ + wake_up_process(task); + put_task_struct(task); + } +} + /* * resched_curr - mark rq's current task 'to be rescheduled now'. * -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/