Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933565AbbBIQPd (ORCPT ); Mon, 9 Feb 2015 11:15:33 -0500 Received: from mail-qc0-f178.google.com ([209.85.216.178]:37601 "EHLO mail-qc0-f178.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932496AbbBIQPa (ORCPT ); Mon, 9 Feb 2015 11:15:30 -0500 Date: Mon, 9 Feb 2015 11:15:27 -0500 From: Tejun Heo To: Rabin Vincent Cc: Jesper Nilsson , linux-kernel@vger.kernel.org Subject: [PATCH for-3.20-fixes] workqueue: fix hang involving racing cancel[_delayed]_work_sync()'s for PREEMPT_NONE Message-ID: <20150209161527.GH3220@htj.duckdns.org> References: <20150206171156.GA8942@axis.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150206171156.GA8942@axis.com> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5050 Lines: 141 Hello, Can you please verify that the following patch fixes the issue? Thanks. -------- 8< -------- cancel[_delayed]_work_sync() are implemented using __cancel_work_timer() which grabs the PENDING bit using try_to_grab_pending() and then flushes the work item with PENDING set to prevent the on-going execution of the work item from requeueing itself. try_to_grab_pending() can always grab PENDING bit without blocking except when someone else is doing the above flushing during cancelation. In that case, try_to_grab_pending() returns -ENOENT. In this case, __cancel_work_timer() currently invokes flush_work(). The assumption is that the completion of the work item is what the other canceling task would be waiting for too and thus waiting for the same condition and retrying should allow forward progress without excessive busy looping Unfortunately, this doesn't work if preemption is disabled or the latter task has real time priority. Let's say task A just got woken up from flush_work() by the completion of the target work item. If, before task A starts executing, task B gets scheduled and invokes __cancel_work_timer() on the same work item, its try_to_grab_pending() will return -ENOENT as the work item is still being canceled by task A and flush_work() will also immediately return false as the work item is no longer executing. This puts task B in a busy loop possibly preventing task A from executing and clearing the canceling state on the work item leading to a hang. task A task B worker executing work __cancel_work_timer() try_to_grab_pending() set work CANCELING flush_work() block for work completion completion, wakes up A __cancel_work_timer() while (forever) { try_to_grab_pending() -ENOENT as work is being canceled flush_work() false as work is no longer executing } This patch removes the possible hang by updating __cancel_work_timer() to explicitly wait for clearing of CANCELING rather than invoking flush_work() after try_to_grab_pending() fails with -ENOENT. The explicit wait uses the matching bit waitqueue for the CANCELING bit. Link: http://lkml.kernel.org/g/20150206171156.GA8942@axis.com Signed-off-by: Tejun Heo Reported-by: Rabin Vincent Cc: stable@vger.kernel.org --- include/linux/workqueue.h | 3 ++- kernel/workqueue.c | 36 ++++++++++++++++++++++++++++++++---- 2 files changed, 34 insertions(+), 5 deletions(-) --- a/include/linux/workqueue.h +++ b/include/linux/workqueue.h @@ -70,7 +70,8 @@ enum { /* data contains off-queue information when !WORK_STRUCT_PWQ */ WORK_OFFQ_FLAG_BASE = WORK_STRUCT_COLOR_SHIFT, - WORK_OFFQ_CANCELING = (1 << WORK_OFFQ_FLAG_BASE), + __WORK_OFFQ_CANCELING = WORK_OFFQ_FLAG_BASE, + WORK_OFFQ_CANCELING = (1 << __WORK_OFFQ_CANCELING), /* * When a work item is off queue, its high bits point to the last --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -2730,17 +2730,35 @@ EXPORT_SYMBOL_GPL(flush_work); static bool __cancel_work_timer(struct work_struct *work, bool is_dwork) { + wait_queue_head_t *waitq = bit_waitqueue(&work->data, + __WORK_OFFQ_CANCELING); unsigned long flags; int ret; do { ret = try_to_grab_pending(work, is_dwork, &flags); /* - * If someone else is canceling, wait for the same event it - * would be waiting for before retrying. + * If someone else is already canceling, wait for it to + * finish. flush_work() doesn't work for PREEMPT_NONE + * because we may get scheduled between @work's completion + * and the other canceling task resuming and clearing + * CANCELING - flush_work() will return false immediately + * as @work is no longer busy, try_to_grab_pending() will + * return -ENOENT as @work is still being canceled and the + * other canceling task won't be able to clear CANCELING as + * we're hogging the CPU. + * + * Explicitly wait for completion using a bit waitqueue. + * We can't use wait_on_bit() as the CANCELING bit may get + * recycled to point to pwq if @work gets re-queued. */ - if (unlikely(ret == -ENOENT)) - flush_work(work); + if (unlikely(ret == -ENOENT)) { + DEFINE_WAIT(wait); + prepare_to_wait(waitq, &wait, TASK_UNINTERRUPTIBLE); + if (work_is_canceling(work)) + schedule(); + finish_wait(waitq, &wait); + } } while (unlikely(ret < 0)); /* tell other tasks trying to grab @work to back off */ @@ -2749,6 +2767,16 @@ static bool __cancel_work_timer(struct w flush_work(work); clear_work_data(work); + + /* + * Paired with prepare_to_wait() above so that either + * waitqueue_active() is visible here or !work_is_canceling() is + * visible there. + */ + smp_mb(); + if (waitqueue_active(waitq)) + wake_up(waitq); + return ret; } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/