Received: by 2002:a05:6358:7058:b0:131:369:b2a3 with SMTP id 24csp1238875rwp; Thu, 13 Jul 2023 08:05:57 -0700 (PDT) X-Google-Smtp-Source: APBJJlHJ4FRwN0M/U12OzYdDho1nODzs6IBb8jGHnnb9KOILmiFFh2E229pvyCbpwNE+mMCJCZ9k X-Received: by 2002:a05:6402:715:b0:51d:dbb0:fb86 with SMTP id w21-20020a056402071500b0051ddbb0fb86mr2238925edx.11.1689260756851; Thu, 13 Jul 2023 08:05:56 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1689260756; cv=none; d=google.com; s=arc-20160816; b=s0ggwvPH5x5V9KJKH0blKO7jF2mpk1ebnDwHKIMbkkERvHTtnhy8nL7cc3nwT4QoK0 O9vuOM/J3FbXm269B0loB0RMHMMnhIyMvxTE76004dIRtZ9Q/zB6iiXgm0c6p1P9enuQ WwLqWNBfKy+YMVc/2P/ESjoGodclf0DQR3ykDloSn2+CAcfiUTIyALB0tbOwe8TmWBXv EuPidF25UGAMP61Q7f5X4CvpsjGZseeZzjWyBq4xAxYBj0T3dJqpqfTfz2tzk81QD925 5TQaco6M0wjelnoZjK/kRIGiW/eIn1HaRJxN5Ia7mADQxUbPYsxAF/fiRWoGkXlq4oMq TAjQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:subject:user-agent:mime-version:date:message-id; bh=MwHVo9dmMOREl/VC/rJKxfSnL/hZH5lNH7lL2c1kYlE=; fh=zKWaJ3/kUUwFN/05VRRMNtxKYMNCa7Q0iBXd0Xt8oZ8=; b=iHxakKUMMo9B5m5ZVhSZHBQA/eGDjD63NCyZEaFhPRVtruTfCv9fsa3tp2i1RaDy5D odtGGeW1pZRVOltZplmLW09hac94pYVo9oWCWSuBjBv06vStMgNFpfuJF6BLMK47aN14 I874eVsmxWC1Hik9ZZcgXeSaO2T4+QwbayqkX4uOBYVfZ06z9yM5Gmcudb2iXVgGmHvb Q1d7rRRxdZDaY3/ozcf10aiyvz8bngz2kwFJhxZXVSqHs22H1NVCsl61r35v9Zn0ENf4 eFVok0gcyLzWTuo4rke6YXgC63HIDYLxE/cPEqpCCALXKKQ50nYwjBo6KE7TWQ8429tS 3LBg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id b9-20020aa7c909000000b0051e0c61714csi7067629edt.104.2023.07.13.08.05.18; Thu, 13 Jul 2023 08:05:56 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231536AbjGMOfT (ORCPT + 99 others); Thu, 13 Jul 2023 10:35:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48548 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231372AbjGMOfN (ORCPT ); Thu, 13 Jul 2023 10:35:13 -0400 Received: from out30-119.freemail.mail.aliyun.com (out30-119.freemail.mail.aliyun.com [115.124.30.119]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7E6712691; Thu, 13 Jul 2023 07:34:56 -0700 (PDT) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R691e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018046049;MF=hsiangkao@linux.alibaba.com;NM=1;PH=DS;RN=21;SR=0;TI=SMTPD_---0VnIKFjW_1689258886; Received: from 30.27.122.43(mailfrom:hsiangkao@linux.alibaba.com fp:SMTPD_---0VnIKFjW_1689258886) by smtp.aliyun-inc.com; Thu, 13 Jul 2023 22:34:50 +0800 Message-ID: <894a3b64-a369-7bc6-c8a8-0910843cc587@linux.alibaba.com> Date: Thu, 13 Jul 2023 22:34:45 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.13.0 Subject: Re: [PATCH v1] rcu: Fix and improve RCU read lock checks when !CONFIG_DEBUG_LOCK_ALLOC To: Joel Fernandes Cc: paulmck@kernel.org, Sandeep Dhavale , Frederic Weisbecker , Neeraj Upadhyay , Josh Triplett , Boqun Feng , Steven Rostedt , Mathieu Desnoyers , Lai Jiangshan , Zqiang , Matthias Brugger , AngeloGioacchino Del Regno , linux-erofs@lists.ozlabs.org, xiang@kernel.org, Will Shiu , kernel-team@android.com, rcu@vger.kernel.org, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-mediatek@lists.infradead.org References: <20230711233816.2187577-1-dhavale@google.com> <20230713003201.GA469376@google.com> <161f1615-3d85-cf47-d2d5-695adf1ca7d4@linux.alibaba.com> <0d9e7b4d-6477-47a6-b3d2-2c9d9b64903d@paulmck-laptop> <87292a44-cc02-4d95-940e-e4e31d0bc6f2@paulmck-laptop> From: Gao Xiang In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-10.0 required=5.0 tests=BAYES_00, ENV_AND_HDR_SPF_MATCH,NICE_REPLY_A,RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE,UNPARSEABLE_RELAY,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2023/7/13 22:07, Joel Fernandes wrote: > On Thu, Jul 13, 2023 at 12:59 AM Gao Xiang wrote: >> On 2023/7/13 12:52, Paul E. McKenney wrote: >>> On Thu, Jul 13, 2023 at 12:41:09PM +0800, Gao Xiang wrote: >>>> >>>> >> >> ... >> >>>> >>>> There are lots of performance issues here and even a plumber >>>> topic last year to show that, see: >>>> >>>> [1] https://lore.kernel.org/r/20230519001709.2563-1-tj@kernel.org >>>> [2] https://lore.kernel.org/r/CAHk-=wgE9kORADrDJ4nEsHHLirqPCZ1tGaEPAZejHdZ03qCOGg@mail.gmail.com >>>> [3] https://lore.kernel.org/r/CAB=BE-SBtO6vcoyLNA9F-9VaN5R0t3o_Zn+FW8GbO6wyUqFneQ@mail.gmail.com >>>> [4] https://lpc.events/event/16/contributions/1338/ >>>> and more. >>>> >>>> I'm not sure if it's necessary to look info all of that, >>>> andSandeep knows more than I am (the scheduling issue >>>> becomes vital on some aarch64 platform.) >>> >>> Hmmm... Please let me try again. >>> >>> Assuming that this approach turns out to make sense, the resulting >>> patch will need to clearly state the performance benefits directly in >>> the commit log. >>> >>> And of course, for the approach to make sense, it must avoid breaking >>> the existing lockdep-RCU debugging code. >>> >>> Is that more clear? >> >> Personally I'm not working on Android platform any more so I don't >> have a way to reproduce, hopefully Sandeep could give actually >> number _again_ if dm-verity is enabled and trigger another >> workqueue here and make a comparsion why the scheduling latency of >> the extra work becomes unacceptable. >> > > Question from my side, are we talking about only performance issues or > also a crash? It appears z_erofs_decompress_pcluster() takes > mutex_lock(&pcl->lock); > > So if it is either in an RCU read-side critical section or in an > atomic section, like the softirq path, then it may > schedule-while-atomic or trigger RCU warnings. > > z_erofs_decompressqueue_endio > -> z_erofs_decompress_kickoff > ->z_erofs_decompressqueue_work > ->z_erofs_decompress_queue > -> z_erofs_decompress_pcluster > -> mutex_lock > Why does the softirq path not trigger a workqueue instead? why here it triggers "schedule-while-atomic" in the softirq context? > Per Sandeep in [1], this stack happens under RCU read-lock in: > > #define __blk_mq_run_dispatch_ops(q, check_sleep, dispatch_ops) \ > [...] > rcu_read_lock(); > (dispatch_ops); > rcu_read_unlock(); > [...] > > Coming from: > blk_mq_flush_plug_list -> > blk_mq_run_dispatch_ops(q, > __blk_mq_flush_plug_list(q, plug)); > > and __blk_mq_flush_plug_list does this: > q->mq_ops->queue_rqs(&plug->mq_list); > > This somehow ends up calling the bio_endio and the > z_erofs_decompressqueue_endio which grabs the mutex. > > So... I have a question, it looks like one of the paths in > __blk_mq_run_dispatch_ops() uses SRCU. Where are as the alternate > path uses RCU. Why does this alternate want to block even if it is not > supposed to? Is the real issue here that the BLK_MQ_F_BLOCKING should > be set? It sounds like you want to block in the "else" path even > though BLK_MQ_F_BLOCKING is not set: BLK_MQ_F_BLOCKING is not a flag that a filesystem can do anything with. That is block layer and mq device driver stuffs. filesystems cannot set this value. As I said, as far as I understand, previously, .end_io() can only be called without RCU context, so it will be fine, but I don't know when .end_io() can be called under some RCU context now. Thanks, Gao Xiang