Received: by 2002:a05:6358:7058:b0:131:369:b2a3 with SMTP id 24csp2719030rwp; Fri, 14 Jul 2023 10:11:49 -0700 (PDT) X-Google-Smtp-Source: APBJJlEdqaKboC8FNj2Su+w0HPgXROaAXbTAllFOyVdppdN72GS3+92NmsVsh9BjdwsXfOdtpkBq X-Received: by 2002:a05:6a20:7fa8:b0:12c:ed6f:c114 with SMTP id d40-20020a056a207fa800b0012ced6fc114mr6356918pzj.50.1689354709041; Fri, 14 Jul 2023 10:11:49 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1689354709; cv=none; d=google.com; s=arc-20160816; b=Dbh5gnU2ISIlE/oypoNuV6QE9sKhSxr493tnjlFNlvmzWRiDrbA9IOxEcG1stLXa0H /egYMlngnZ6uzWRxqSQ29+2DwYh6pR9qj3ABsdtoWNnQ5Wn9pvTho7+05frtzqJA2Wuv cNbCI71NbPty5YETR+Z6gU5EeudLhVzB7rUhVQQFbI+sOjnxIqGNt4Q3qNUeF+KMvV6y fnmoLUPaFCzuLumJOyJyRtxgoSyu26KsMCEdBN6em9YDGIHF0iTvqJsty94INKmrZzun xVaYFOhWCg2qp72uyHNf60iU82rEEXcGLYdQMMGNaKKtXb0T0RRznFFcBWH4zFWsqKMD Q6nA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:reply-to:message-id :subject:cc:to:from:date:dkim-signature; bh=PjO6Ct8hKjOL2j6zv6lo3CzVUIMch7lPoxf820Is+Lc=; fh=Ijtq0kL1xqZVConXN+yVa1FdQzNDBZy39O1gz1aNPlk=; b=GOSZTaoreoiWB0pxKfe48spZtoOJkWJLcZXTUrfWtGVa6QgfsH/0POQe5wTyvrIjSh OYh+9lpk+b51/quipTLti+SU5+CTRmdT9kVGGlwek1n9HW73kF9PWzNxdLVzFq5+m5M5 B7D9Tjvz4VSK3i3u7MPbsItSEkSvVbg+jMo3m/GMGKnt8kLcVtozGhCN09KMhzhr0uR1 XHFddJwlKu18E7WnjOsnD6Sd5M1Ic+mEtq0q00VuxO3GKIB6dP0FYbkYjzSRCW02GYCR tK/63Ck9fDJCXRaTFE8dbcttIi/+/X0oQqw4KJsvRttwGO8o25MKI8K/toQyVimhSN3M tU9w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=lVDdgcy6; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id m13-20020a65530d000000b0051423af249fsi7264865pgq.304.2023.07.14.10.11.37; Fri, 14 Jul 2023 10:11:49 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=lVDdgcy6; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235865AbjGNRCP (ORCPT + 99 others); Fri, 14 Jul 2023 13:02:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37174 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235199AbjGNRCN (ORCPT ); Fri, 14 Jul 2023 13:02:13 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 91DC31992; Fri, 14 Jul 2023 10:02:11 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 1F14A61D66; Fri, 14 Jul 2023 17:02:11 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 754D8C433C8; Fri, 14 Jul 2023 17:02:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1689354130; bh=jY6JZxKceDtVZD4Gt1yOdz9n+P+MKOe0fgA+nUL1asY=; h=Date:From:To:Cc:Subject:Reply-To:References:In-Reply-To:From; b=lVDdgcy6wu6DgGYTSgR37ppjVW6mcMp7lZ+9syQr5XuU9PVInkTh8p+1zs8uUv23H ERsMBzcIO+Nnq2qgU2LUsvZGA6LXRvy3ITeapw95JehlLDCY1x+m1xrQNkVLtsPRby 9Ct2BLyGq3pUyrhc3SJ6hgKK5wJGi4knIU98s//qe+J/xPOWqBFMv1vEznnT4/u2eO fL1kmvuAMS/8XFpOrIf40/rcfHTaevI21jB6Rn4Em1gmUPrdt5BSiFxc1ZBcuJNo+Y 3wgNkiGu1QSHbreTAqP+NBA9airT5t6KwlYjQfOVBxITLeIQl3VmS75u+Eo/Efz2A0 4Yh24fqoZXBfQ== Received: by paulmck-ThinkPad-P17-Gen-1.home (Postfix, from userid 1000) id 13CC3CE03B3; Fri, 14 Jul 2023 10:02:10 -0700 (PDT) Date: Fri, 14 Jul 2023 10:02:10 -0700 From: "Paul E. McKenney" To: Alan Huang Cc: Joel Fernandes , Gao Xiang , Sandeep Dhavale , Frederic Weisbecker , Neeraj Upadhyay , Josh Triplett , Boqun Feng , Steven Rostedt , Mathieu Desnoyers , Lai Jiangshan , Zqiang , Matthias Brugger , AngeloGioacchino Del Regno , linux-erofs@lists.ozlabs.org, xiang@kernel.org, Will Shiu , kernel-team@android.com, rcu@vger.kernel.org, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-mediatek@lists.infradead.org Subject: Re: [PATCH v1] rcu: Fix and improve RCU read lock checks when !CONFIG_DEBUG_LOCK_ALLOC Message-ID: Reply-To: paulmck@kernel.org References: <87292a44-cc02-4d95-940e-e4e31d0bc6f2@paulmck-laptop> <894a3b64-a369-7bc6-c8a8-0910843cc587@linux.alibaba.com> <58b661d0-0ebb-4b45-a10d-c5927fb791cd@paulmck-laptop> <7d433fac-a62d-4e81-b8e5-57cf5f2d1d55@paulmck-laptop> <6E5326AD-9A5D-4570-906A-BDE8257B6F0C@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <6E5326AD-9A5D-4570-906A-BDE8257B6F0C@gmail.com> X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Jul 14, 2023 at 11:54:47PM +0800, Alan Huang wrote: > > > 2023年7月14日 23:35,Alan Huang 写道: > > > >> > >> 2023年7月14日 10:16,Paul E. McKenney 写道: > >> > >> On Thu, Jul 13, 2023 at 09:33:35AM -0700, Paul E. McKenney wrote: > >>> On Thu, Jul 13, 2023 at 11:33:24AM -0400, Joel Fernandes wrote: > >>>> On Thu, Jul 13, 2023 at 10:34 AM Gao Xiang wrote: > >>>>> On 2023/7/13 22:07, Joel Fernandes wrote: > >>>>>> On Thu, Jul 13, 2023 at 12:59 AM Gao Xiang wrote: > >>>>>>> On 2023/7/13 12:52, Paul E. McKenney wrote: > >>>>>>>> On Thu, Jul 13, 2023 at 12:41:09PM +0800, Gao Xiang wrote: > >>>>>>> > >>>>>>> ... > >>>>>>> > >>>>>>>>> > >>>>>>>>> There are lots of performance issues here and even a plumber > >>>>>>>>> topic last year to show that, see: > >>>>>>>>> > >>>>>>>>> [1] https://lore.kernel.org/r/20230519001709.2563-1-tj@kernel.org > >>>>>>>>> [2] https://lore.kernel.org/r/CAHk-=wgE9kORADrDJ4nEsHHLirqPCZ1tGaEPAZejHdZ03qCOGg@mail.gmail.com > >>>>>>>>> [3] https://lore.kernel.org/r/CAB=BE-SBtO6vcoyLNA9F-9VaN5R0t3o_Zn+FW8GbO6wyUqFneQ@mail.gmail.com > >>>>>>>>> [4] https://lpc.events/event/16/contributions/1338/ > >>>>>>>>> and more. > >>>>>>>>> > >>>>>>>>> I'm not sure if it's necessary to look info all of that, > >>>>>>>>> andSandeep knows more than I am (the scheduling issue > >>>>>>>>> becomes vital on some aarch64 platform.) > >>>>>>>> > >>>>>>>> Hmmm... Please let me try again. > >>>>>>>> > >>>>>>>> Assuming that this approach turns out to make sense, the resulting > >>>>>>>> patch will need to clearly state the performance benefits directly in > >>>>>>>> the commit log. > >>>>>>>> > >>>>>>>> And of course, for the approach to make sense, it must avoid breaking > >>>>>>>> the existing lockdep-RCU debugging code. > >>>>>>>> > >>>>>>>> Is that more clear? > >>>>>>> > >>>>>>> Personally I'm not working on Android platform any more so I don't > >>>>>>> have a way to reproduce, hopefully Sandeep could give actually > >>>>>>> number _again_ if dm-verity is enabled and trigger another > >>>>>>> workqueue here and make a comparsion why the scheduling latency of > >>>>>>> the extra work becomes unacceptable. > >>>>>>> > >>>>>> > >>>>>> Question from my side, are we talking about only performance issues or > >>>>>> also a crash? It appears z_erofs_decompress_pcluster() takes > >>>>>> mutex_lock(&pcl->lock); > >>>>>> > >>>>>> So if it is either in an RCU read-side critical section or in an > >>>>>> atomic section, like the softirq path, then it may > >>>>>> schedule-while-atomic or trigger RCU warnings. > >>>>>> > >>>>>> z_erofs_decompressqueue_endio > >>>>>> -> z_erofs_decompress_kickoff > >>>>>> ->z_erofs_decompressqueue_work > >>>>>> ->z_erofs_decompress_queue > >>>>>> -> z_erofs_decompress_pcluster > >>>>>> -> mutex_lock > >>>>>> > >>>>> > >>>>> Why does the softirq path not trigger a workqueue instead? > >>>> > >>>> I said "if it is". I was giving a scenario. mutex_lock() is not > >>>> allowed in softirq context or in an RCU-reader. > >>>> > >>>>>> Per Sandeep in [1], this stack happens under RCU read-lock in: > >>>>>> > >>>>>> #define __blk_mq_run_dispatch_ops(q, check_sleep, dispatch_ops) \ > >>>>>> [...] > >>>>>> rcu_read_lock(); > >>>>>> (dispatch_ops); > >>>>>> rcu_read_unlock(); > >>>>>> [...] > >>>>>> > >>>>>> Coming from: > >>>>>> blk_mq_flush_plug_list -> > >>>>>> blk_mq_run_dispatch_ops(q, > >>>>>> __blk_mq_flush_plug_list(q, plug)); > >>>>>> > >>>>>> and __blk_mq_flush_plug_list does this: > >>>>>> q->mq_ops->queue_rqs(&plug->mq_list); > >>>>>> > >>>>>> This somehow ends up calling the bio_endio and the > >>>>>> z_erofs_decompressqueue_endio which grabs the mutex. > >>>>>> > >>>>>> So... I have a question, it looks like one of the paths in > >>>>>> __blk_mq_run_dispatch_ops() uses SRCU. Where are as the alternate > >>>>>> path uses RCU. Why does this alternate want to block even if it is not > >>>>>> supposed to? Is the real issue here that the BLK_MQ_F_BLOCKING should > >>>>>> be set? It sounds like you want to block in the "else" path even > >>>>>> though BLK_MQ_F_BLOCKING is not set: > >>>>> > >>>>> BLK_MQ_F_BLOCKING is not a flag that a filesystem can do anything with. > >>>>> That is block layer and mq device driver stuffs. filesystems cannot set > >>>>> this value. > >>>>> > >>>>> As I said, as far as I understand, previously, > >>>>> .end_io() can only be called without RCU context, so it will be fine, > >>>>> but I don't know when .end_io() can be called under some RCU context > >>>>> now. > >>>> > >>>>> From what Sandeep described, the code path is in an RCU reader. My > >>>> question is more, why doesn't it use SRCU instead since it clearly > >>>> does so if BLK_MQ_F_BLOCKING. What are the tradeoffs? IMHO, a deeper > >>>> dive needs to be made into that before concluding that the fix is to > >>>> use rcu_read_lock_any_held(). > >>> > >>> How can this be solved? > >>> > >>> 1. Always use a workqueue. Simple, but is said to have performance > >>> issues. > >>> > >>> 2. Pass a flag in that indicates whether or not the caller is in an > >>> RCU read-side critical section. Conceptually simple, but might > >>> or might not be reasonable to actually implement in the code as > >>> it exists now. (You tell me!) > >>> > >>> 3. Create a function in z_erofs that gives you a decent > >>> approximation, maybe something like the following. > >>> > >>> 4. Other ideas here. > >> > >> 5. #3 plus make the corresponding Kconfig option select > >> PREEMPT_COUNT, assuming that any users needing compression in > >> non-preemptible kernels are OK with PREEMPT_COUNT being set. > >> (Some users of non-preemptible kernels object strenuously > >> to the added overhead from CONFIG_PREEMPT_COUNT=y.) > > > > 6. Set one bit in bio->bi_private, check the bit and flip it in rcu_read_lock() path, > > then in z_erofs_decompressqueue_endio, check if the bit has changed. > > Seems bad, read and modify bi_private is a bad idea. Is there some other field that would work? Thanx, Paul > > Not sure if this is feasible or acceptable. :) > > > >> > >> Thanx, Paul > >> > >>> The following is untested, and is probably quite buggy, but it should > >>> provide you with a starting point. > >>> > >>> static bool z_erofs_wq_needed(void) > >>> { > >>> if (IS_ENABLED(CONFIG_PREEMPTION) && rcu_preempt_depth()) > >>> return true; // RCU reader > >>> if (IS_ENABLED(CONFIG_PREEMPT_COUNT) && !preemptible()) > >>> return true; // non-preemptible > >>> if (!IS_ENABLED(CONFIG_PREEMPT_COUNT)) > >>> return true; // non-preeemptible kernel, so play it safe > >>> return false; > >>> } > >>> > >>> You break it, you buy it! ;-) > >>> > >>> Thanx, Paul > >