Received: by 2002:a05:6358:7058:b0:131:369:b2a3 with SMTP id 24csp2654212rwp; Fri, 14 Jul 2023 09:12:25 -0700 (PDT) X-Google-Smtp-Source: APBJJlEVCYzaGqn5Arm5jvXT4xi22SoycidlvrPp0tIltuRsUXYTdipOos0z5v+8XGWNF8VOB95Z X-Received: by 2002:a17:90b:50b:b0:262:e801:740f with SMTP id r11-20020a17090b050b00b00262e801740fmr3818493pjz.38.1689351144719; Fri, 14 Jul 2023 09:12:24 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1689351144; cv=none; d=google.com; s=arc-20160816; b=eZUfJXGHxfs5cPQpkV7wGUcfVUGI/kHsn7Z1vYZyRDceQRF9mpmqK7m2x1u3VViVp9 IscRbPfDDzfe/snHaSMfkJvqbFCgOAc4bAKeTZgFQFNsrzndgTRbXSZ6Azb/J/XAK+V9 RefLUozYf6i9jmu16OBO8CPFHScRjCl3oVly0X36C2pYaY6IxqiAUVno0eBgOHUnJfXp wolPLYi4KJHXpVfDc2uTVNtku64LbWGyncBTFYPmq+AENd2HE3mfwBYrTwvMYnPaYvEg KQmRqP1CkNs6Fw9glvxxyanARESSijvHiJyFheHTuTusWkEUywPiR8FKU2zcb3Yh52O/ LeGg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:to:references:message-id :content-transfer-encoding:cc:date:in-reply-to:from:subject :mime-version:dkim-signature; bh=NqpmYVnbbHnWcNUmmhC8NRk/JXLwLEr9LrHoxThT/cY=; fh=2B2MnEn9gDNDTbloMs/3Iqvs6Gd02t01N20IxO+pTnQ=; b=tIU4kiBwTE9pNscx+9aZdN874F3d0FU/8f0e36jCxs0QkEt7nFX+XRrnWVPqNT2U7C X89s6qEsX7QmZxlivZi9V6otOav/mZD1kth9z7t7NBqIvWxm1smd0WVXhHNIswM/Av4g it3jH57mRC/fC8BDPnzwyhr1ZX1LF1YR9sKyqI+fo0MRZ7a6a88kZRJFQQX9huMrwhdi 1tU1NY+MksC/lMKBiOBE6P4cFdVBOh9onDZtodV1UhrAsXbcR0DV/lowtQ+78NYhXD1q nCfSPx4Z9g0SoGNuyC5sqqrY9jJb4qZ7GnayRbRI79oUeIhStzWn3JgEqVuwFk3dT6/v s/ug== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20221208 header.b=d13J78Ha; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id iq12-20020a17090afb4c00b00262e93858f6si1475769pjb.58.2023.07.14.09.12.11; Fri, 14 Jul 2023 09:12:24 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20221208 header.b=d13J78Ha; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236211AbjGNPz3 (ORCPT + 99 others); Fri, 14 Jul 2023 11:55:29 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37112 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236199AbjGNPz1 (ORCPT ); Fri, 14 Jul 2023 11:55:27 -0400 Received: from mail-pf1-x429.google.com (mail-pf1-x429.google.com [IPv6:2607:f8b0:4864:20::429]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F264A35A5; Fri, 14 Jul 2023 08:55:24 -0700 (PDT) Received: by mail-pf1-x429.google.com with SMTP id d2e1a72fcca58-666e64e97e2so1436389b3a.1; Fri, 14 Jul 2023 08:55:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1689350124; x=1691942124; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=NqpmYVnbbHnWcNUmmhC8NRk/JXLwLEr9LrHoxThT/cY=; b=d13J78Hae+iUwr3wrXVIbKZGmYbkfAut0VdeD0DMDSWYuc5Ae+8BKpnTACIzYGzUSr xgaceyN8+odllP4SgLtjA1yGOuhSjlEklNahp9nGFh9khSJfarg4RfEGlpMjngPfDV3t uLMW+Z6G0NxBau/8Dw8UlB1Oeyq5BMqoWK2/sOjRU1m3vFWSshRKP+QpACOdTldQplni FmucbEuWy0I1djDWT32XMtsXST1X1izEqeqTKslbASkaIPNb5nnWlqkht/jdX+ZhA1xr m2aJniiHAH7WfQdM/JrTjGr9plc+V3BaE96BJ09aZjPfhFKSfdYNvE25DUNFatCwgG44 4+RA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689350124; x=1691942124; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=NqpmYVnbbHnWcNUmmhC8NRk/JXLwLEr9LrHoxThT/cY=; b=RTCPLIRZut36wBVDJWVSdLmBmRKL4D7g8XsThxBJu0TJFQIDKHh2uDLhvddaf+SaEU nLPvcyjqPj+LlHH4akDTYh9aOcMrXor4fOWZiayWGr6dCh6yL7Pu3GT+8ZAlWw4O1el4 s4LQJJI2CX5yTuqFuNRnO3YWIwUH/Zrw+TrKjt1wGRbEIqUaUWuCSpbv2DSfm331mfL5 P5Ssc+ex+FaiJCfWtPPan3Hkf25kGC8iU8hOokndIHHZ31HoDelN4GAezI3/Eby/IMSn OT7K5EVtUP03x765lvHgZkZ2kRJIS5xcYkRadq32FPa2ZNqK9d6LNsg465Xxrh8WEofH vCTw== X-Gm-Message-State: ABy/qLZPqNq43QI0sJO0wvOZuDnNpix1AvdaGHDJuRPLyWbDc51r5mGm bi+Y7YkgchdG1XjhJzxu16g= X-Received: by 2002:a05:6a20:101a:b0:12d:8d4d:27e9 with SMTP id gs26-20020a056a20101a00b0012d8d4d27e9mr3678882pzc.15.1689350124359; Fri, 14 Jul 2023 08:55:24 -0700 (PDT) Received: from smtpclient.apple ([2402:d0c0:2:a2a::1]) by smtp.gmail.com with ESMTPSA id s4-20020a63b404000000b0055adced9e13sm7098528pgf.0.2023.07.14.08.55.01 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Fri, 14 Jul 2023 08:55:23 -0700 (PDT) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3731.400.51.1.1\)) Subject: Re: [PATCH v1] rcu: Fix and improve RCU read lock checks when !CONFIG_DEBUG_LOCK_ALLOC From: Alan Huang In-Reply-To: Date: Fri, 14 Jul 2023 23:54:47 +0800 Cc: Joel Fernandes , Gao Xiang , Sandeep Dhavale , Frederic Weisbecker , Neeraj Upadhyay , Josh Triplett , Boqun Feng , Steven Rostedt , Mathieu Desnoyers , Lai Jiangshan , Zqiang , Matthias Brugger , AngeloGioacchino Del Regno , linux-erofs@lists.ozlabs.org, xiang@kernel.org, Will Shiu , kernel-team@android.com, rcu@vger.kernel.org, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-mediatek@lists.infradead.org Content-Transfer-Encoding: quoted-printable Message-Id: <6E5326AD-9A5D-4570-906A-BDE8257B6F0C@gmail.com> References: <20230713003201.GA469376@google.com> <161f1615-3d85-cf47-d2d5-695adf1ca7d4@linux.alibaba.com> <0d9e7b4d-6477-47a6-b3d2-2c9d9b64903d@paulmck-laptop> <87292a44-cc02-4d95-940e-e4e31d0bc6f2@paulmck-laptop> <894a3b64-a369-7bc6-c8a8-0910843cc587@linux.alibaba.com> <58b661d0-0ebb-4b45-a10d-c5927fb791cd@paulmck-laptop> <7d433fac-a62d-4e81-b8e5-57cf5f2d1d55@paulmck-laptop> To: paulmck@kernel.org X-Mailer: Apple Mail (2.3731.400.51.1.1) X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE, URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > 2023=E5=B9=B47=E6=9C=8814=E6=97=A5 23:35=EF=BC=8CAlan Huang = =E5=86=99=E9=81=93=EF=BC=9A >=20 >>=20 >> 2023=E5=B9=B47=E6=9C=8814=E6=97=A5 10:16=EF=BC=8CPaul E. McKenney = =E5=86=99=E9=81=93=EF=BC=9A >>=20 >> On Thu, Jul 13, 2023 at 09:33:35AM -0700, Paul E. McKenney wrote: >>> On Thu, Jul 13, 2023 at 11:33:24AM -0400, Joel Fernandes wrote: >>>> On Thu, Jul 13, 2023 at 10:34=E2=80=AFAM Gao Xiang = wrote: >>>>> On 2023/7/13 22:07, Joel Fernandes wrote: >>>>>> On Thu, Jul 13, 2023 at 12:59=E2=80=AFAM Gao Xiang = wrote: >>>>>>> On 2023/7/13 12:52, Paul E. McKenney wrote: >>>>>>>> On Thu, Jul 13, 2023 at 12:41:09PM +0800, Gao Xiang wrote: >>>>>>>=20 >>>>>>> ... >>>>>>>=20 >>>>>>>>>=20 >>>>>>>>> There are lots of performance issues here and even a plumber >>>>>>>>> topic last year to show that, see: >>>>>>>>>=20 >>>>>>>>> [1] = https://lore.kernel.org/r/20230519001709.2563-1-tj@kernel.org >>>>>>>>> [2] = https://lore.kernel.org/r/CAHk-=3DwgE9kORADrDJ4nEsHHLirqPCZ1tGaEPAZejHdZ03= qCOGg@mail.gmail.com >>>>>>>>> [3] = https://lore.kernel.org/r/CAB=3DBE-SBtO6vcoyLNA9F-9VaN5R0t3o_Zn+FW8GbO6wyU= qFneQ@mail.gmail.com >>>>>>>>> [4] https://lpc.events/event/16/contributions/1338/ >>>>>>>>> and more. >>>>>>>>>=20 >>>>>>>>> I'm not sure if it's necessary to look info all of that, >>>>>>>>> andSandeep knows more than I am (the scheduling issue >>>>>>>>> becomes vital on some aarch64 platform.) >>>>>>>>=20 >>>>>>>> Hmmm... Please let me try again. >>>>>>>>=20 >>>>>>>> Assuming that this approach turns out to make sense, the = resulting >>>>>>>> patch will need to clearly state the performance benefits = directly in >>>>>>>> the commit log. >>>>>>>>=20 >>>>>>>> And of course, for the approach to make sense, it must avoid = breaking >>>>>>>> the existing lockdep-RCU debugging code. >>>>>>>>=20 >>>>>>>> Is that more clear? >>>>>>>=20 >>>>>>> Personally I'm not working on Android platform any more so I = don't >>>>>>> have a way to reproduce, hopefully Sandeep could give actually >>>>>>> number _again_ if dm-verity is enabled and trigger another >>>>>>> workqueue here and make a comparsion why the scheduling latency = of >>>>>>> the extra work becomes unacceptable. >>>>>>>=20 >>>>>>=20 >>>>>> Question from my side, are we talking about only performance = issues or >>>>>> also a crash? It appears z_erofs_decompress_pcluster() takes >>>>>> mutex_lock(&pcl->lock); >>>>>>=20 >>>>>> So if it is either in an RCU read-side critical section or in an >>>>>> atomic section, like the softirq path, then it may >>>>>> schedule-while-atomic or trigger RCU warnings. >>>>>>=20 >>>>>> z_erofs_decompressqueue_endio >>>>>> -> z_erofs_decompress_kickoff >>>>>> ->z_erofs_decompressqueue_work >>>>>> ->z_erofs_decompress_queue >>>>>> -> z_erofs_decompress_pcluster >>>>>> -> mutex_lock >>>>>>=20 >>>>>=20 >>>>> Why does the softirq path not trigger a workqueue instead? >>>>=20 >>>> I said "if it is". I was giving a scenario. mutex_lock() is not >>>> allowed in softirq context or in an RCU-reader. >>>>=20 >>>>>> Per Sandeep in [1], this stack happens under RCU read-lock in: >>>>>>=20 >>>>>> #define __blk_mq_run_dispatch_ops(q, check_sleep, dispatch_ops) \ >>>>>> [...] >>>>>> rcu_read_lock(); >>>>>> (dispatch_ops); >>>>>> rcu_read_unlock(); >>>>>> [...] >>>>>>=20 >>>>>> Coming from: >>>>>> blk_mq_flush_plug_list -> >>>>>> blk_mq_run_dispatch_ops(q, >>>>>> __blk_mq_flush_plug_list(q, = plug)); >>>>>>=20 >>>>>> and __blk_mq_flush_plug_list does this: >>>>>> q->mq_ops->queue_rqs(&plug->mq_list); >>>>>>=20 >>>>>> This somehow ends up calling the bio_endio and the >>>>>> z_erofs_decompressqueue_endio which grabs the mutex. >>>>>>=20 >>>>>> So... I have a question, it looks like one of the paths in >>>>>> __blk_mq_run_dispatch_ops() uses SRCU. Where are as the = alternate >>>>>> path uses RCU. Why does this alternate want to block even if it = is not >>>>>> supposed to? Is the real issue here that the BLK_MQ_F_BLOCKING = should >>>>>> be set? It sounds like you want to block in the "else" path even >>>>>> though BLK_MQ_F_BLOCKING is not set: >>>>>=20 >>>>> BLK_MQ_F_BLOCKING is not a flag that a filesystem can do anything = with. >>>>> That is block layer and mq device driver stuffs. filesystems = cannot set >>>>> this value. >>>>>=20 >>>>> As I said, as far as I understand, previously, >>>>> .end_io() can only be called without RCU context, so it will be = fine, >>>>> but I don't know when .end_io() can be called under some RCU = context >>>>> now. >>>>=20 >>>>> =46rom what Sandeep described, the code path is in an RCU reader. = My >>>> question is more, why doesn't it use SRCU instead since it clearly >>>> does so if BLK_MQ_F_BLOCKING. What are the tradeoffs? IMHO, a = deeper >>>> dive needs to be made into that before concluding that the fix is = to >>>> use rcu_read_lock_any_held(). >>>=20 >>> How can this be solved? >>>=20 >>> 1. Always use a workqueue. Simple, but is said to have performance >>> issues. >>>=20 >>> 2. Pass a flag in that indicates whether or not the caller is in an >>> RCU read-side critical section. Conceptually simple, but might >>> or might not be reasonable to actually implement in the code as >>> it exists now. (You tell me!) >>>=20 >>> 3. Create a function in z_erofs that gives you a decent >>> approximation, maybe something like the following. >>>=20 >>> 4. Other ideas here. >>=20 >> 5. #3 plus make the corresponding Kconfig option select >> PREEMPT_COUNT, assuming that any users needing compression in >> non-preemptible kernels are OK with PREEMPT_COUNT being set. >> (Some users of non-preemptible kernels object strenuously >> to the added overhead from CONFIG_PREEMPT_COUNT=3Dy.) >=20 > 6. Set one bit in bio->bi_private, check the bit and flip it in = rcu_read_lock() path, > then in z_erofs_decompressqueue_endio, check if the bit has changed. Seems bad, read and modify bi_private is a bad idea. >=20 > Not sure if this is feasible or acceptable. :) >=20 >>=20 >> Thanx, Paul >>=20 >>> The following is untested, and is probably quite buggy, but it = should >>> provide you with a starting point. >>>=20 >>> static bool z_erofs_wq_needed(void) >>> { >>> if (IS_ENABLED(CONFIG_PREEMPTION) && rcu_preempt_depth()) >>> return true; // RCU reader >>> if (IS_ENABLED(CONFIG_PREEMPT_COUNT) && !preemptible()) >>> return true; // non-preemptible >>> if (!IS_ENABLED(CONFIG_PREEMPT_COUNT)) >>> return true; // non-preeemptible kernel, so play it safe >>> return false; >>> } >>>=20 >>> You break it, you buy it! ;-) >>>=20 >>> Thanx, Paul