Received: by 2002:a05:6358:7058:b0:131:369:b2a3 with SMTP id 24csp1304426rwp; Thu, 13 Jul 2023 08:58:13 -0700 (PDT) X-Google-Smtp-Source: APBJJlFSPLkzbuJe/mrSm7cgVjohbIeWXtEESU9UqRnGr+/FYwjkA5/KZXIYZ6L/ZN2eoeuR8WJL X-Received: by 2002:a17:902:ce91:b0:1b8:35fa:cdcc with SMTP id f17-20020a170902ce9100b001b835facdccmr2565365plg.5.1689263892930; Thu, 13 Jul 2023 08:58:12 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1689263892; cv=none; d=google.com; s=arc-20160816; b=RXV/nFRSGAcQ+VbX//uQKwacnhgU1QVV9Xj2YohsOcczlw0GLtzUbLV5oOJx+RORJN MPK1Q06mKxjeCpiTAgMKnYip8sVr9NJ4KYTWNgosFEBJV74ld1AoH1mhizjxYLyeuWVz oEHgn2M9L51V4zmqP+y4IIfSukkWW0ultWC1XB6slrvGzLDjdFcX2bnwLW0sv+lmoGSM UPXtLaFf+enxBIwh/jcOR57ME1Z60a+mROWMsUwG9feHoksdj8b0MiL40zpB9cO6OZWh 5mvRvUtzkIZtCLvYYmMRSIu7IItEje7shoXCSxYKQ+STeium3wLWtlLivFszTfOd8nYB CglQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=a0ym5xH/1YGuvDjsQ/zPTyHw/VcGdmnjHCR3gjE6Czs=; fh=K6saaQM8+Ohu0j0P/MOxtZgla3r292I+6wT6goavZbQ=; b=mpeU0k7YE1Qk6LEiWzIflhoYH2owXHbyQ3sikESvZ9deVmKKUY+vd8xlGFXRIYM2uY GKypCBpziRA3pB03bzUAtIqNbtZGJCG0BTFQM25DBm8y1fFT1/OTcKy9BgOMSI5aTQIn zV+5bn0aURqyt1TAkEFg9gBK1ULppzQR6qF38oxjQ9qd9qOtLPFashQs4zMismWptNsi 78HRCn3lhZC6aa7xvglgecA9Te8jFaEoLRvMeCatXyl4cXPxJHgplplLn/MX9Dbh4JjP bOK1rSO06wMLn8W0/kwvh4qvURdgJ9NMPzFw3XDLp/Gii9yvO8VwWqrc4YAK32YPAvhw /Zag== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@joelfernandes.org header.s=google header.b=K4aglIY7; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id j4-20020a170903028400b001b9d95078d9si5360680plr.404.2023.07.13.08.58.00; Thu, 13 Jul 2023 08:58:12 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@joelfernandes.org header.s=google header.b=K4aglIY7; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234497AbjGMPeB (ORCPT + 99 others); Thu, 13 Jul 2023 11:34:01 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34344 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234791AbjGMPdm (ORCPT ); Thu, 13 Jul 2023 11:33:42 -0400 Received: from mail-lf1-x12f.google.com (mail-lf1-x12f.google.com [IPv6:2a00:1450:4864:20::12f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B73902D6D for ; Thu, 13 Jul 2023 08:33:38 -0700 (PDT) Received: by mail-lf1-x12f.google.com with SMTP id 2adb3069b0e04-4fba03becc6so2681747e87.0 for ; Thu, 13 Jul 2023 08:33:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=joelfernandes.org; s=google; t=1689262417; x=1691854417; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=a0ym5xH/1YGuvDjsQ/zPTyHw/VcGdmnjHCR3gjE6Czs=; b=K4aglIY7Cjx6mxk6XXEEsWyNR84LENYDBY9kBPae4ZMDFOyBVnuSalrbABemFMPFem 54TG8CNgqNdnNOkDQMhuIjArSJwUo+1lbxO8g3EQas0cl4Vkto0bDIYEFm7OFUPkUZmz j3otXi0h/k3C8lVxm/cH/Cs8P+xbH1BJwPy6U= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689262417; x=1691854417; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=a0ym5xH/1YGuvDjsQ/zPTyHw/VcGdmnjHCR3gjE6Czs=; b=AI6pL7gwfaaR40zwSz+SxlI5WP0n9jHz1ZVjMz2+Y0NJT3OLKGidqjaPxlvDJkwqph t5UBTDRrLdF89ZcT7aspTpOnzzSiDpTOxLbnanJlTfCUkZNZ/RAkDzskymebRC+zLQQZ so4nAJtffCuLhFQX7gs89Hu1rXDyRWzSYLHLCz9A42nwcv02QR3NGy4aJdTqi1D0ba9z RhR0e6YgvFkDmc1Xgqb/CZ1TTvXQ90CwEtwnyjYly1QbeL13x9PFe3Cc388mqPfG7A9E KJ8aqWXYGsHzzzvACyjry6xRVZ/l+UWHcZbH8lONUpEt20+k53UsTH1KCfW1/p+xw01k 89bA== X-Gm-Message-State: ABy/qLYonvT1BtBxvjKBTajjIMbyVdxoHHxZ1sEqZAAE4jLDr3QID1Eh rEz7QRc3g1qTH+gPg/dqLgY/DyiHBLBpPCQtw1O1BA== X-Received: by 2002:a05:6512:3192:b0:4fb:7f45:bcb6 with SMTP id i18-20020a056512319200b004fb7f45bcb6mr16799lfe.16.1689262416351; Thu, 13 Jul 2023 08:33:36 -0700 (PDT) MIME-Version: 1.0 References: <20230711233816.2187577-1-dhavale@google.com> <20230713003201.GA469376@google.com> <161f1615-3d85-cf47-d2d5-695adf1ca7d4@linux.alibaba.com> <0d9e7b4d-6477-47a6-b3d2-2c9d9b64903d@paulmck-laptop> <87292a44-cc02-4d95-940e-e4e31d0bc6f2@paulmck-laptop> <894a3b64-a369-7bc6-c8a8-0910843cc587@linux.alibaba.com> In-Reply-To: <894a3b64-a369-7bc6-c8a8-0910843cc587@linux.alibaba.com> From: Joel Fernandes Date: Thu, 13 Jul 2023 11:33:24 -0400 Message-ID: Subject: Re: [PATCH v1] rcu: Fix and improve RCU read lock checks when !CONFIG_DEBUG_LOCK_ALLOC To: Gao Xiang Cc: paulmck@kernel.org, Sandeep Dhavale , Frederic Weisbecker , Neeraj Upadhyay , Josh Triplett , Boqun Feng , Steven Rostedt , Mathieu Desnoyers , Lai Jiangshan , Zqiang , Matthias Brugger , AngeloGioacchino Del Regno , linux-erofs@lists.ozlabs.org, xiang@kernel.org, Will Shiu , kernel-team@android.com, rcu@vger.kernel.org, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-mediatek@lists.infradead.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jul 13, 2023 at 10:34=E2=80=AFAM Gao Xiang wrote: > > > > On 2023/7/13 22:07, Joel Fernandes wrote: > > On Thu, Jul 13, 2023 at 12:59=E2=80=AFAM Gao Xiang wrote: > >> On 2023/7/13 12:52, Paul E. McKenney wrote: > >>> On Thu, Jul 13, 2023 at 12:41:09PM +0800, Gao Xiang wrote: > >>>> > >>>> > >> > >> ... > >> > >>>> > >>>> There are lots of performance issues here and even a plumber > >>>> topic last year to show that, see: > >>>> > >>>> [1] https://lore.kernel.org/r/20230519001709.2563-1-tj@kernel.org > >>>> [2] https://lore.kernel.org/r/CAHk-=3DwgE9kORADrDJ4nEsHHLirqPCZ1tGaE= PAZejHdZ03qCOGg@mail.gmail.com > >>>> [3] https://lore.kernel.org/r/CAB=3DBE-SBtO6vcoyLNA9F-9VaN5R0t3o_Zn+= FW8GbO6wyUqFneQ@mail.gmail.com > >>>> [4] https://lpc.events/event/16/contributions/1338/ > >>>> and more. > >>>> > >>>> I'm not sure if it's necessary to look info all of that, > >>>> andSandeep knows more than I am (the scheduling issue > >>>> becomes vital on some aarch64 platform.) > >>> > >>> Hmmm... Please let me try again. > >>> > >>> Assuming that this approach turns out to make sense, the resulting > >>> patch will need to clearly state the performance benefits directly in > >>> the commit log. > >>> > >>> And of course, for the approach to make sense, it must avoid breaking > >>> the existing lockdep-RCU debugging code. > >>> > >>> Is that more clear? > >> > >> Personally I'm not working on Android platform any more so I don't > >> have a way to reproduce, hopefully Sandeep could give actually > >> number _again_ if dm-verity is enabled and trigger another > >> workqueue here and make a comparsion why the scheduling latency of > >> the extra work becomes unacceptable. > >> > > > > Question from my side, are we talking about only performance issues or > > also a crash? It appears z_erofs_decompress_pcluster() takes > > mutex_lock(&pcl->lock); > > > > So if it is either in an RCU read-side critical section or in an > > atomic section, like the softirq path, then it may > > schedule-while-atomic or trigger RCU warnings. > > > > z_erofs_decompressqueue_endio > > -> z_erofs_decompress_kickoff > > ->z_erofs_decompressqueue_work > > ->z_erofs_decompress_queue > > -> z_erofs_decompress_pcluster > > -> mutex_lock > > > > Why does the softirq path not trigger a workqueue instead? I said "if it is". I was giving a scenario. mutex_lock() is not allowed in softirq context or in an RCU-reader. > > Per Sandeep in [1], this stack happens under RCU read-lock in: > > > > #define __blk_mq_run_dispatch_ops(q, check_sleep, dispatch_ops) \ > > [...] > > rcu_read_lock(); > > (dispatch_ops); > > rcu_read_unlock(); > > [...] > > > > Coming from: > > blk_mq_flush_plug_list -> > > blk_mq_run_dispatch_ops(q, > > __blk_mq_flush_plug_list(q, plug)); > > > > and __blk_mq_flush_plug_list does this: > > q->mq_ops->queue_rqs(&plug->mq_list); > > > > This somehow ends up calling the bio_endio and the > > z_erofs_decompressqueue_endio which grabs the mutex. > > > > So... I have a question, it looks like one of the paths in > > __blk_mq_run_dispatch_ops() uses SRCU. Where are as the alternate > > path uses RCU. Why does this alternate want to block even if it is not > > supposed to? Is the real issue here that the BLK_MQ_F_BLOCKING should > > be set? It sounds like you want to block in the "else" path even > > though BLK_MQ_F_BLOCKING is not set: > > BLK_MQ_F_BLOCKING is not a flag that a filesystem can do anything with. > That is block layer and mq device driver stuffs. filesystems cannot set > this value. > > As I said, as far as I understand, previously, > .end_io() can only be called without RCU context, so it will be fine, > but I don't know when .end_io() can be called under some RCU context > now. From what Sandeep described, the code path is in an RCU reader. My question is more, why doesn't it use SRCU instead since it clearly does so if BLK_MQ_F_BLOCKING. What are the tradeoffs? IMHO, a deeper dive needs to be made into that before concluding that the fix is to use rcu_read_lock_any_held(). - Joel