Received: by 2002:a25:6193:0:0:0:0:0 with SMTP id v141csp178808ybb; Tue, 31 Mar 2020 20:25:49 -0700 (PDT) X-Google-Smtp-Source: APiQypLScZ8yI6pHhARPfjrftRspnFKCB7r5EALfuceof363EJuoBDGW45JzEAYugKD5rUz5nlA+ X-Received: by 2002:aca:4d49:: with SMTP id a70mr1494971oib.152.1585711549192; Tue, 31 Mar 2020 20:25:49 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1585711549; cv=none; d=google.com; s=arc-20160816; b=H01s4uOW8FDtSuO2rf9paHfCyASWEVYicIW2tKjaa5Vxhn6qK5fW0AQofHt/6APDtR H8RLLmXqDIEWBlQ/T/RnEHNJKxJyRVPfY5ZPqGz/fp685mly2J8TQ/u8rg0CBKfhq2qQ QYjgkYUBme8cSmmct7cNHzsjqPgAc8oeR3Bq1ajga3RaH0ZNQv1mK/NxA8geSkM3Z2bx R9hilItEu++gqhl7AaQkKw/C1OiO1Z6pP+H1ro7NBiT9dNvdVbuvAJhNSeVHZz7QS3qe X47vqE/Y4H57D0RYuEGWDxIgdzFNmZr8nox+RFW33qMA0YbCYi/dFByZc9oNSTkD2391 gu9w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=tKHXS6/Mnrp6srBbnR262gLigKf8aMPoVYaaS/hPS9s=; b=e4pbhOf1QxbjYHn/eVFM+edPaVJ4hH7du2+eKqlir5BKcTEFM2+UJdsYkFHVAt7210 MxihjalbJVTeJ3qD+va9jZzG7C4YO75ASRtNU9eSjHLDgEe96Lvisne3EIVAkmvo7y2F 7h0/Kvp5/0DjmtoM0SkWbQWSlxT6Nhen/DdaIMkmw0DqwQdXxtfoH1mRT1NXcod0/+yT Ww8cn6j4nHutMq80xG9Q2hqrjQ+8JViKiq9PE2sA0iSctZ9eRmg4Ls08xFnntatHZa/0 n0H3CWxa6kwvuJMYG5t5Bqvgx8sfjx3TY2QhQb6FzIg3eakPx0SZjZof50gCFKIzhuQX apaQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=eN6sShuN; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id y139si216898ooa.14.2020.03.31.20.25.36; Tue, 31 Mar 2020 20:25:49 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=eN6sShuN; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731680AbgDADXE (ORCPT + 99 others); Tue, 31 Mar 2020 23:23:04 -0400 Received: from mail-io1-f65.google.com ([209.85.166.65]:35694 "EHLO mail-io1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731556AbgDADXE (ORCPT ); Tue, 31 Mar 2020 23:23:04 -0400 Received: by mail-io1-f65.google.com with SMTP id o3so18551941ioh.2 for ; Tue, 31 Mar 2020 20:23:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=tKHXS6/Mnrp6srBbnR262gLigKf8aMPoVYaaS/hPS9s=; b=eN6sShuN0c7Oei1hyyHoiDdZJ5O6nb+kKREnHlHZ6WM+V+tPSLzKBnBo8ahtmpvqH8 2vEsbSgkslrKFZsOoZoqORstlEu3nMfbJL5q6icPCKeetVzyzxA+OCYqsn5SpB+7/iTK 671amErct+hOBEzjM6LXsMuFWOx6SGXXtuwD7iAxtQazgkzySGyqoUmA3RPXvBSNQKrL 7T+fUALYIE2kcSpzJAM/Tu3r/4pBfr8Dpz73muo/mrNyxS1qysN++hWEigA77a8rwkp2 eVtJIReJoMqh+r1PXWJCDlJ0bDNg7XW7AKd3giWORc3GRU++I++rUwceVhRZYuxt0Wzx t/WA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=tKHXS6/Mnrp6srBbnR262gLigKf8aMPoVYaaS/hPS9s=; b=Wl1D6oHTLvjJk8N+1oCylX8gThrbnBK4AdmTsxb5+tDxqMki5Yi01Ofd+n7iUzo8yE FDrNaiG3mh2LJoYLbLqi5t2MMxQJNS2TgfSM6FTd96v4mjMNdBp4TfEhXmejUxEhTWa9 6trXWqmv12XCN1WI4O+G0q+0hLzuuQftaXp9dHNqNJcrwCTi0H4cZKk4+rAZxG9/mq3D rp1uOmuJdnfa2c4GOY5yjVg0J6vrTIOCVKdglJ1lFjsQY8Y6KFLprI0x+4za5+9wC/XY FqSPjDq6nyx8IDDEFHgkR480dQ3Yi/Y2pcvl4w0pFGvfkE3lkTmIJzTwGLimWgz3bQvN jfhA== X-Gm-Message-State: ANhLgQ1FR93pmGtSd4ehk/ZNZ89TKcP5wyaxgiGuVgVk/ilGr/TrrhmJ 3BYyMqids06V1t8wG8/tYgX4EQT0u3VGcGfxMpc= X-Received: by 2002:a05:6638:20d:: with SMTP id e13mr18318129jaq.56.1585711383185; Tue, 31 Mar 2020 20:23:03 -0700 (PDT) MIME-Version: 1.0 References: <20200327074308.GY11705@shao2-debian> <20200327175350.rw5gex6cwum3ohnu@linutronix.de> In-Reply-To: <20200327175350.rw5gex6cwum3ohnu@linutronix.de> From: Lai Jiangshan Date: Wed, 1 Apr 2020 11:22:51 +0800 Message-ID: Subject: Re: [PATCH] workqueue: Don't double assign worker->sleeping To: Sebastian Andrzej Siewior Cc: kernel test robot , Thomas Gleixner , Ingo Molnar , "Peter Zijlstra (Intel)" , LKML , LKP , Tejun Heo Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hello If I don't miss all the issues you have listed, it is a good and straightforward fix, but I have concern that cmpxchg_local() might have performance impact on some non-x86 arch. The two issues as you have listed: 1) WARN_ON_ONCE() on valid condition when interrupted(async-page-faulted) 2) wq_worker_running() can be interrupted(async-page-faulted in virtual machine) and nr_running would be decreased twice. For fixing issue one, we can just remove WARN_ON_ONCE() as this patch. For fixing issue two, ->sleeping in wq_worker_running() can be checked&modified under irq-disabled. (we can't use preempt-disabled context here) thanks, Lai On Sat, Mar 28, 2020 at 1:53 AM Sebastian Andrzej Siewior wrote: > > The kernel test robot triggered a warning with the following race: > task-ctx interrupt-ctx > worker > -> process_one_work() > -> work_item() > -> schedule(); > -> sched_submit_work() > -> wq_worker_sleeping() > -> ->sleeping = 1 > atomic_dec_and_test(nr_running) > __schedule(); *interrupt* > async_page_fault() > -> local_irq_enable(); > -> schedule(); > -> sched_submit_work() > -> wq_worker_sleeping() > -> if (WARN_ON(->sleeping)) return > -> __schedule() > -> sched_update_worker() > -> wq_worker_running() > -> atomic_inc(nr_running); > -> ->sleeping = 0; > > -> sched_update_worker() > -> wq_worker_running() > if (!->sleeping) return > > In this context the warning is pointless everything is fine. > > However, if the interrupt occurs in wq_worker_sleeping() between reading and > setting `sleeping' i.e. > > | if (WARN_ON_ONCE(worker->sleeping)) > | return; > *interrupt* > | worker->sleeping = 1; > > then pool->nr_running will be decremented twice in wq_worker_sleeping() > but it will be incremented only once in wq_worker_running(). > > Replace the assignment of `sleeping' with a cmpxchg_local() to ensure > that there is no double assignment of the variable. The variable is only > accessed from the local CPU. Remove the WARN statement because this > condition can be valid. > > An alternative would be to move `->sleeping' to `->flags' as a new bit > but this would require to acquire the pool->lock in wq_worker_running(). > > Fixes: 6d25be5782e48 ("sched/core, workqueues: Distangle worker accounting from rq lock") > Link: https://lkml.kernel.org/r/20200327074308.GY11705@shao2-debian > Reported-by: kernel test robot > Signed-off-by: Sebastian Andrzej Siewior > --- > kernel/workqueue.c | 6 ++---- > 1 file changed, 2 insertions(+), 4 deletions(-) > > diff --git a/kernel/workqueue.c b/kernel/workqueue.c > index 4e01c448b4b48..dc477a2a3ce30 100644 > --- a/kernel/workqueue.c > +++ b/kernel/workqueue.c > @@ -846,11 +846,10 @@ void wq_worker_running(struct task_struct *task) > { > struct worker *worker = kthread_data(task); > > - if (!worker->sleeping) > + if (cmpxchg_local(&worker->sleeping, 1, 0) == 0) > return; > if (!(worker->flags & WORKER_NOT_RUNNING)) > atomic_inc(&worker->pool->nr_running); > - worker->sleeping = 0; > } > > /** > @@ -875,10 +874,9 @@ void wq_worker_sleeping(struct task_struct *task) > > pool = worker->pool; > > - if (WARN_ON_ONCE(worker->sleeping)) > + if (cmpxchg_local(&worker->sleeping, 0, 1) == 1) > return; > > - worker->sleeping = 1; > spin_lock_irq(&pool->lock); > > /* > -- > 2.26.0 >