Received: by 2002:a6b:fb09:0:0:0:0:0 with SMTP id h9csp1852775iog; Sat, 25 Jun 2022 23:15:18 -0700 (PDT) X-Google-Smtp-Source: AGRyM1vIh6PBvkrhpqDaq5v/lYELQSNgApW2UzKmL0vWDrJyo1plpKJzd6EjFRR93sUyIf/VgBhX X-Received: by 2002:a17:90b:380c:b0:1ed:2071:e6b with SMTP id mq12-20020a17090b380c00b001ed20710e6bmr8379104pjb.82.1656224118029; Sat, 25 Jun 2022 23:15:18 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1656224118; cv=none; d=google.com; s=arc-20160816; b=os8QSaGAc/h+IJD4/qQ6s3jj+YnDtUbSvn/rQYENvwS7FrOIb/y7tairpgtvYoshtv i+GDbyIlN9yM9NJFi3w46Bc584rg4lBDyezAt2Bc89smH2rmx8j8tJO/Irc3+0qlzDvX KEJ1tB+zFoHtMKmxDvyBaEvOJEttPc6ICvq54I729bbFpMtQrw2S+Tj8rHtYZOaZX2FC OpNQLn5KNVvCHC2BJgYw848KxFMwNQQnM+wX4kGv4hPUbhe9i0aMnLVmoIy+7KC15ALU aHaQNf8cyIuHM5av5wNXHiNW3XehKjEZMwAQqUehBw5QNB1r3ldt5elc8IySPaszVW5Q pm4A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:sender:dkim-signature; bh=ngQRvUiP2qfs10F42aiHdjJqUcSIkLjXxpBqZuzDQiE=; b=z+gz3n2/IS61ycrzOliKYl7sfmJ8zTf6MR0DD+GlQ9V9ztpLIdfOH6HtfCULqzTmub 8kTMUt3yAeD0KtTIdKEofW92wgfgU+0PCl1evx39zaPwVaa6Vpzwi+BAsgUVuT5BJBRk GKXuPJdm5LYZUZJzcsBADBsTuPQp4dyLG4bEiwJgkR+aKq1ujt2Kg0XZCfT6ljVjl34M u9/kLEymBvYE1SYfAgy1mzNeJuEt8WcLObbdDwAhbgK2F8fGKWEuDasTKOIKGf2ruTJ5 6UwwF9hs9tTQ4Nbtg9q1bEpkCvEqu4R+5nmfaqO8m9JN4JdX0ZeXS5hDZRUN6sUde0B7 3dbw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=bgGWzko+; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id s29-20020a63451d000000b0040cf53f9b98si9112619pga.226.2022.06.25.23.15.04; Sat, 25 Jun 2022 23:15:18 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=bgGWzko+; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233842AbiFZGJ2 (ORCPT + 99 others); Sun, 26 Jun 2022 02:09:28 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52088 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229742AbiFZGJX (ORCPT ); Sun, 26 Jun 2022 02:09:23 -0400 Received: from mail-oi1-x232.google.com (mail-oi1-x232.google.com [IPv6:2607:f8b0:4864:20::232]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 96A911183F for ; Sat, 25 Jun 2022 23:09:22 -0700 (PDT) Received: by mail-oi1-x232.google.com with SMTP id w193so8903139oie.5 for ; Sat, 25 Jun 2022 23:09:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=ngQRvUiP2qfs10F42aiHdjJqUcSIkLjXxpBqZuzDQiE=; b=bgGWzko+bHWBsJ7y0Vk7nsDmTBuMhKWy8nY9aynjFwbLTkHBSbTLZK2fWND4Dum3Je yrmYGrVO5YevJUZadlJuASRegN+Kf4uWxlaq2zQj9aYvbhqB3WOz0BSGpSnXqERaeSNR q5GpGyy6fXy8/6tfm748UkkUYuT6sqlF+Trta08NHdm1F4C9JIv+1ltm7lNUYUzMSUok JG03+bbAPZrDQ/I0mDmXc4PrX+h7LOrQ1PK7LdK9K+uFmeYEn5ab1TWTjqwo17bGqN5R 21EILL5e0fZABA1znTey1AQ2b9xqZcOz0yRA64k5pBD6MPrBFyHDFrWTfX45WZXyf8gC zucA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:sender:date:from:to:cc:subject:message-id :references:mime-version:content-disposition:in-reply-to; bh=ngQRvUiP2qfs10F42aiHdjJqUcSIkLjXxpBqZuzDQiE=; b=M4+0ox/LS3xAvgQfzLsQCeUdMVPQ1RZGz+o1XfcZNNRzuFVO3BikxL84gIJqXvjxV9 Pd22R+D2dwrBXQ3Q703c8BRVEufQ4NMRvtEA/+oSx0JVrvHNJ+V0r0wrJC/E0+0Zwlfz 6H9Q9a2u9fXcE+b1G67Fv2m2n2bsWQv9NvfQFoaSoa1shMJRnOwQ9K1yVlZW0ZoABrCk MNYD7OB3F8JiyqBFruD/lVCObTDu/uKi7ZOZAoyT8hhBrgq9lToQ6Hzzxb3yYislFiRR IIl98EMspKM+34wVcDezrmKuxulSe+5rN4IYvWsplU0Pgfd6HEs0kAjNRV17Rh6Rc+7V 3EGw== X-Gm-Message-State: AJIora9pkaLr8RxODqQCxmE4wnSBU8SJCdPHVbOKTMRA78mmiIcTSl7y RMwLU2bjIl2/dfZX8M0obac= X-Received: by 2002:a05:6808:11c8:b0:335:2198:99a6 with SMTP id p8-20020a05680811c800b00335219899a6mr6942000oiv.6.1656223761705; Sat, 25 Jun 2022 23:09:21 -0700 (PDT) Received: from localhost ([2600:380:b83e:43fa:ba68:31f:1790:3643]) by smtp.gmail.com with ESMTPSA id s24-20020a0568301c7800b0060be71204casm4231168otg.53.2022.06.25.23.09.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 25 Jun 2022 23:09:20 -0700 (PDT) Sender: Tejun Heo Date: Sun, 26 Jun 2022 15:09:11 +0900 From: Tejun Heo To: Linus Torvalds Cc: Petr Mladek , Lai Jiangshan , Michal Hocko , Linux Kernel Mailing List , Peter Zijlstra , Thomas Gleixner , Ingo Molnar , Andrew Morton , Oleg Nesterov , "Eric W. Biederman" Subject: Re: re. Spurious wakeup on a newly created kthread Message-ID: References: <20220622140853.31383-1-pmladek@suse.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-1.5 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_EF,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hello, On Sat, Jun 25, 2022 at 07:53:34PM -0700, Linus Torvalds wrote: ... > Is that the _common_ pattern? It's not *uncommon*. It's maybe not the > strictly normal wait-queue one, but if you grep for > "wake_up_process()" you will find quite a lot of them. ... > > I'm probably missing sometihng. Is it about bespoke wait mechanisms? > > Can you give a concrete example of an async wakeup scenario? > > Grep for wake_up_process() and just look for them/ Right, for kthreads, custom work list + directed wakeup at the known handler task is a pattern seen across the code base and that does affect all other waits the kthread would do. ... > So you need to always do that in a loop. The wait_event code will do > that loop for you, but if you do manual wait-queues you are required > to do the looping yourself. The reason why this bothered me sometimes is that w/ simple kthread use cases, there are place where all the legtimate wakeup sources are clearly known. In those cases, the fact that I can't think of a case where the looping would be needed creates subtle nagging feeling when writing them. ... > Again, none of these are *really* spurious. They are real wakeup > events. It's just that within the *local* code they look spurious, > because the locking code, the disk IO code, whatever the code is > doesn't know or care about all the other things that process is > involved in. I see. Yeah, waiting on multiple sources which may not be known to each wait logic makes sense. ... > But I don't think that's what's going on here. I think the workqueue > code is just confused, and should have initielized "worker->pool" much > earlier. Because as things are now, when worker_thread() starts > running, and does that > > static int worker_thread(void *__worker) > { > struct worker *worker = __worker; > struct worker_pool *pool = worker->pool; > > thing, that can happen *immediately* after that > > kthread_create_on_node(worker_thread, worker, > > happens. It just so happens that *normally* the create_worker() code > ends up finishing setup before the new worker has actually finished > scheduling.. > > No? I'm not sure. Putting aside the always-loop-with-cond-check princicple for the time being, I can't yet think of a scenario where the interlocking would break down unless there really is a wakeup which is coming from an unrelated source. Just experimented with commenting out that wakeup in create_worker(). Simply commenting it out doesn't break anything but if I wait for PF_WQ_WORKER to be set by the new worker thread, it doesn't happen. ie. the initial wakeup is spurious because there is a later wakeup which comes when a work item is actually dispatched to the new worker. But the newly created kworker won't start executing its callback function without at least one extra wakeup, whereever that may be coming from. After the initial TASK_NEW handshake, the new kthread notifies kthread_create_info->done and then goes into an UNINTERRUPTIBLE sleep and won't start executing the callback function unless somebody wakes it up. This being a brand new task, it's a pretty sterile wakeup environment. So, there actually has to be an outside wakeup source here. If we say that anyone should expect external wakeups that it has no direct control over, the kthread_bind interface is broken too cuz that's making exactly the same assumption that the task is dormant at that point till the owner sends a wakeup and thus its internal state can be manipulated safely. Petr, regardless of how this eventually gets resolved, I'm really curious where the wakeup that you saw came from. Would it be possible for you to find out? Thanks. -- tejun