Received: by 2002:a05:6358:d09b:b0:dc:cd0c:909e with SMTP id jc27csp2007271rwb; Thu, 8 Dec 2022 18:51:12 -0800 (PST) X-Google-Smtp-Source: AA0mqf57ltrjWzhS8oyoeIWc9lQOSJWdrT8zjTm/63JjhJaJgCRRwUwjWWIgUYvNfdGmIVXj94rb X-Received: by 2002:a62:5241:0:b0:576:450d:6e68 with SMTP id g62-20020a625241000000b00576450d6e68mr3641392pfb.27.1670554271789; Thu, 08 Dec 2022 18:51:11 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670554271; cv=none; d=google.com; s=arc-20160816; b=sQuFGySEw9Hy9ZdMpCFJXww2awZ5AdtxpXjULjCMHpG3F9XkmdLcyOuKyMgn7qAAdA umVXXG8D4nLmxSdSfMuewCaksHKjk1GYQEqDiuBvpEQuhst6+nUQBmCp0kYOH4lQMd7j sw8mtR1c4QlMzdganxjOQrLcBotziVVeq5Cq7o346pm1PXsTcrM+pcOD1EUIqR6Ts8V0 dGR/evY7Z+nfSS52FdoSFmnagmPMs57VNbPCVcy54Ma2I8XAIdJZZfqp5IfiwHOjpiXt PIL49EAicKZ71BmnHBR7Fmk0OiW/gG4nGIZ8GugGF5CD6EonYYaC3TvZ2fFAWAXyzJKR uU+w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=1K/3GUYMBBM0JWTkpD5AOt6IFxw20f5QEF2hH5GOc2U=; b=GErRSCHKjMcxaMT2lDB/qMrFSKy8tfMEOMtftsgmhex9QYE3FO+u9S7V+bQyzT0+fR vNG/Qte/wI/8Dd9buDHXVExGEvUotVvbfdk1McD+j1PSPbFVMgELcVctk3lJXrVlSz/S IsAjIOEXTw2iugc/+1Pobvy11P/28vD84QaOpJJiUO5NGGpeC2roE1nbWAIG3XVX6463 fzDMcANGUKsTb6Pp4xQFHvhWmcXroBqLNbaw0qhd4h6Zf8YEzMsUlcuUQ4WSwXus/3jf hfuHxz72dSAwYChQ7OuNbrEiaACYh3/87Dx+02QTsQ/Bqw4Fcns+lJ9Na4YLz8p5dEFV y5xw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=givek1J3; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id x64-20020a628643000000b00577617c05e2si512376pfd.376.2022.12.08.18.51.02; Thu, 08 Dec 2022 18:51:11 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=givek1J3; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229797AbiLICZb (ORCPT + 73 others); Thu, 8 Dec 2022 21:25:31 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51140 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229517AbiLICZ2 (ORCPT ); Thu, 8 Dec 2022 21:25:28 -0500 Received: from mail-wr1-x42a.google.com (mail-wr1-x42a.google.com [IPv6:2a00:1450:4864:20::42a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 48C21A4307 for ; Thu, 8 Dec 2022 18:25:27 -0800 (PST) Received: by mail-wr1-x42a.google.com with SMTP id q7so3822275wrr.8 for ; Thu, 08 Dec 2022 18:25:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=1K/3GUYMBBM0JWTkpD5AOt6IFxw20f5QEF2hH5GOc2U=; b=givek1J38a1InAvDzqJ2IJpDTNNkY/0qnKpwXJ0YWKO1lRTOvqkv2jZmvFu4JDsTl7 qpHxNx0IwT4Q8LLjxOpVyjcYbR7kkDpuZ4l1wxDyP94vsMGGsZO61IvUMyLbiyABPB71 gCzPklAqagS+oYxqhUivsCoCNLKDcDIJfypznV/ZTyVBGcLeQRBfcOz0y/aV/I+bp6Qw fzK/VhjtGOoBlnj+hlHktP/j3mjhLznjPr1GcNyJ33hYSN3twXzl/JVR0d60H18ojDvz Z1feJkTUGvzXEbojAqWFur4SO2msfJx1Lj0/X+xF/fCPQMTS0wmu9tNqBzmNHXVxKhL1 XM3g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=1K/3GUYMBBM0JWTkpD5AOt6IFxw20f5QEF2hH5GOc2U=; b=SpXwkHITaxQzYUTDUV/0sigPbRsc7dUQ0hud5/Any4tmghxkzcpmVUrazXhXsAmDL7 kNVlb1MkET+ml9DnlS6pbq8CMgFyprCA3QflGVlK9zm3dMo+nUDyPFAwRcNfrE6AJFOZ MO6hpDQq8dRUEY+5la2N5FYbpE7tRQ6ecdbONVX+KcTit+6YllKzdQzL53uZld7pERnY zog8XepWH/BsX4m2PMfBjI8s0f8EaH1a6QV0ve9HuyGUfkQRNhPeVaJ774i6pSpRZsxO aAWULR1dw7IGzZNWNy1lW3edpsbVeZtFlXoC305ScTtVv+4zsWVAZmdYPVtPDAeoRY+p q55Q== X-Gm-Message-State: ANoB5plYi8nSEKz2Cu7Xaz9vgDb0QdzszoDnIr4+KJXV9N5baGtsNkZv AhCKtismqIcCLSE0Rwuf3Gp2vXmH63uiaSbfshs+sBmu X-Received: by 2002:a5d:58e6:0:b0:242:5562:6d6 with SMTP id f6-20020a5d58e6000000b00242556206d6mr12030563wrd.541.1670552725763; Thu, 08 Dec 2022 18:25:25 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: richard clark Date: Fri, 9 Dec 2022 10:25:14 +0800 Message-ID: Subject: Re: work item still be scheduled to execute after destroy_workqueue? To: Lai Jiangshan Cc: tj@kernel.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Dec 8, 2022 at 3:46 PM Lai Jiangshan wrote: > > On Thu, Dec 8, 2022 at 10:44 AM richard clark > wrote: > > > > On Wed, Dec 7, 2022 at 10:38 AM Lai Jiangshan wrote: > > > > > > On Tue, Dec 6, 2022 at 5:20 PM richard clark > > > wrote: > > > > > > > > On Tue, Dec 6, 2022 at 2:23 PM Lai Jiangshan wrote: > > > > > > > > > > On Tue, Dec 6, 2022 at 12:35 PM richard clark > > > > > wrote: > > > > > > > > > > > > > > > > > > A WARN is definitely reasonable and has its benefits. Can I try to > > > > > > submit the patch and you're nice to review as maintainer? > > > > > > > > > > > > Thanks, > > > > > > Richard > > > > > > > > > > > > > > > > > Sure, go ahead. > > > > > > > > > > What's in my mind is that the following code is wrapped in a new function: > > > > > > > > > > mutex_lock(&wq->mutex); > > > > > if (!wq->nr_drainers++) > > > > > wq->flags |= __WQ_DRAINING; > > > > > mutex_unlock(&wq->mutex); > > > > > > > > > > > > > > > and the new function replaces the open code drain_workqueue() and > > > > > is also called in destroy_workqueue() (before calling drain_workqueue()). > > > > > > > > > Except that, do we need to defer the __WQ_DRAINING clean to the > > > > rcu_call, thus we still have a close-loop of the drainer's count, like > > > > this? > > > > > > No, I don't think we need it. The wq is totally freed in rcu_free_wq. > > > > > > Or we can just introduce __WQ_DESTROYING. > > > > > > It seems using __WQ_DESTROYING is better. > > > > The wq->flags will be unreliable after kfree(wq), for example, in my > > machine, the wq->flags can be 0x7ec1e1a3, 0x37cff1a3 or 0x7fa23da3 ... > > after wq be kfreed, consequently the result of queueing a new work > > item to a kfreed wq is undetermined, sometimes it's ok because the > > queue_work will return directly(e.g, the wq->flags = 0x7ec1e1a3, a > > fake __WQ_DRAINING state), sometimes it will trigger a kernel NULL > > pointer dereference BUG when the wq->flags = 0x7fa23da3(fake > > !__WQ_DRAINING state). > > The whole wq is unreliable after destroy_workqueue(). > > All we need is just adding something to help identify any > wrong usage while the wq is in RCU grace period. > OK, understood! > > > > IMO, given the above condition, we can handle this in 2 phases: > > before the rcu_call and after. > > a. before rcu_call. Using __WQ_DESTROYING to allow the chained work > > queued in or not in destroy_workqueue(...) level, __WQ_DRAINING is > > used to make the drain_workqueue(...) still can be standalone. The > > code snippet like this: > > destroy_workqueue(...) > > { > > mutex_lock(&wq->mutex); > > wq->flags |= __WQ_DESTROYING; > > mutex_lock(&wq->mutex); > > Ok, put it before calling drain_workqueue() > > > ... > > } > > > > __queue_work(...) > > { > > if (unlikely((wq->flags & __WQ_DESTROYING) || (wq->flags & > > __WQ_DRAINING)) && > > WARN_ON_ONCE(!is_chained_work(wq))) > > Ok, combine __WQ_DESTROYING and __WQ_DRAINING together as: > if (unlikely((wq->flags & (__WQ_DESTROYING | __WQ_DRAINING)) && > > > > return; > > } > > > > b. after rcu_call. What in my mind is: > > rcu_free_wq(struct rcu_head *rcu) > > { > > ... > > kfree(wq); > > wq = NULL; > > It is useless code. > > > } > > > > __queue_work(...) > > { > > if (!wq) > > return; > > It is useless code. OK, will remove the above codes in the patch... > > > ... > > } > > > > Any comments? > > > > > > > > > > > > > --- a/kernel/workqueue.c > > > > +++ b/kernel/workqueue.c > > > > > > > > @@ -3528,6 +3526,9 @@ static void rcu_free_wq(struct rcu_head *rcu) > > > > > > > > else > > > > free_workqueue_attrs(wq->unbound_attrs); > > > > > > > > + if (!--wq->nr_drainers) > > > > + wq->flags &= ~__WQ_DRAINING; > > > > + > > > > kfree(wq); > > > > > > > > > > > > > > __WQ_DRAINING will cause the needed WARN on illegally queuing items on > > > > > destroyed workqueue. > > > > > > > > I will re-test it if there are no concerns about the above fix... > > > > > > > > > > > > > > Thanks > > > > > Lai