Received: by 2002:a05:6a10:16a7:0:0:0:0 with SMTP id gp39csp745996pxb; Tue, 3 Nov 2020 11:20:17 -0800 (PST) X-Google-Smtp-Source: ABdhPJzTLRy+IfIZmGj/lZos2vCsA297DV8EKtoRUBka54OpOFZua4TDdNdhX7oPa1S3/eN/Y54K X-Received: by 2002:a05:6402:b35:: with SMTP id bo21mr24101553edb.52.1604431217095; Tue, 03 Nov 2020 11:20:17 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1604431217; cv=none; d=google.com; s=arc-20160816; b=ix0bkS+pxvmd/2mpvuowFSCXuL682PY2U0bN2hq6jqkChBcn49DDRphDEZvtG3D7DA 8iVDYKdnIgjlhakSDiyQSPQOaYJd18ym4ssRm3jEMTge4DfL6FjOFRt3sfa+bxq0OObK GTFjngAFPfRZC+x07AWrsCacWtG/OLxuZaxfWMkBjfnaJnyoqtYXeSnKKR8R531cahg/ wmK3LphjjW3wO49CbSqOr0EUFHkb1mrER9hXoun0S9qHx4kakCwZrHZIu4XrR6gzP0al iuBk3Xii5AU43svhbcP7/b8UgGXTSnFUL6mDBGlnbYvUOR90IbGjayEd5SslRbhQ9H7R WzLw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-disposition :mime-version:references:reply-to:message-id:subject:cc:to:from:date :dkim-signature; bh=+LnmjaHmeqfJsBEr6hPy8M6Zdhl5qAEjGvznGVZK3H0=; b=rjFgLMWv2p0E2RK4ox37KG1FDZHvYQM0xvcOaddwXdy4IssI/gt5a6ohJ/HN/pr2xE olKryK091Cxf3NwF63p7eSeQRK1bVmfyCa87uWhGC19C5p38qx7x4SVU72yo3LFOTHzg qDSy14fBPIe8XWmX0u3ETpek0VH6D4QJwiCm81eWV78P5qsxMo3C7q5zG/uXirSSMEfc 4ZJUxstRoOfkITCzlItw+aqzTsf6F/2Qv1pxWkgeorQQOVuj4WJUNM62hZlZRQzXPEFs /O835vbtGjAdU8P3A9UQa2w1YdtCsxIiztb1vbTlhlYDk/RNfmBj0hUvbi1toLhaVUI9 FQ2A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=c1ygTLkc; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id a25si3455236edx.171.2020.11.03.11.19.52; Tue, 03 Nov 2020 11:20:17 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=c1ygTLkc; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729194AbgKCTSY (ORCPT + 99 others); Tue, 3 Nov 2020 14:18:24 -0500 Received: from mail.kernel.org ([198.145.29.99]:57582 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725957AbgKCTSX (ORCPT ); Tue, 3 Nov 2020 14:18:23 -0500 Received: from paulmck-ThinkPad-P72.home (50-39-104-11.bvtn.or.frontiernet.net [50.39.104.11]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id B08DC20786; Tue, 3 Nov 2020 19:18:22 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1604431102; bh=W1KrRl2SR9qaJPw6LcrcuRypZW22z89eV1jif/bLgYc=; h=Date:From:To:Cc:Subject:Reply-To:References:In-Reply-To:From; b=c1ygTLkcxdrfGoa4b5sqjlSXmqs1/Fu5qT7NqHnaighwVxjQiSVaYw5k4cjwg6i49 ycxAFZ4CnGCr50uKv+qUtCFTxQzRt3R/E90AdhVblCa6+PKM2L63/ZTY5/ByuMcYB0 R+50+86MRprDPSlpou2MQYyyBXl9/uqGudkx24PA= Received: by paulmck-ThinkPad-P72.home (Postfix, from userid 1000) id 50250352265F; Tue, 3 Nov 2020 11:18:22 -0800 (PST) Date: Tue, 3 Nov 2020 11:18:22 -0800 From: "Paul E. McKenney" To: Uladzislau Rezki Cc: Joel Fernandes , LKML , RCU , Andrew Morton , Peter Zijlstra , Michal Hocko , Thomas Gleixner , "Theodore Y . Ts'o" , Sebastian Andrzej Siewior , Oleksiy Avramchenko , willy@infradead.org Subject: Re: [PATCH 01/16] rcu/tree: Add a work to allocate pages from regular context Message-ID: <20201103191822.GC3249@paulmck-ThinkPad-P72> Reply-To: paulmck@kernel.org References: <20201029165019.14218-1-urezki@gmail.com> <20201103154723.GA1310511@google.com> <20201103163350.GA10665@pc636> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20201103163350.GA10665@pc636> User-Agent: Mutt/1.9.4 (2018-02-28) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Nov 03, 2020 at 05:33:50PM +0100, Uladzislau Rezki wrote: > On Tue, Nov 03, 2020 at 10:47:23AM -0500, Joel Fernandes wrote: > > On Thu, Oct 29, 2020 at 05:50:04PM +0100, Uladzislau Rezki (Sony) wrote: > > > The current memmory-allocation interface presents to following > > > difficulties that this patch is designed to overcome: > > > > > > a) If built with CONFIG_PROVE_RAW_LOCK_NESTING, the lockdep will > > > complain about violation("BUG: Invalid wait context") of the > > > nesting rules. It does the raw_spinlock vs. spinlock nesting > > > checks, i.e. it is not legal to acquire a spinlock_t while > > > holding a raw_spinlock_t. > > > > > > Internally the kfree_rcu() uses raw_spinlock_t whereas the > > > "page allocator" internally deals with spinlock_t to access > > > to its zones. The code also can be broken from higher level > > > of view: > > > > > > raw_spin_lock(&some_lock); > > > kfree_rcu(some_pointer, some_field_offset); > > > > > > > > > b) If built with CONFIG_PREEMPT_RT. Please note, in that case spinlock_t > > > is converted into sleepable variant. Invoking the page allocator from > > > atomic contexts leads to "BUG: scheduling while atomic". > > > > > > c) call_rcu() is invoked from raw atomic context and kfree_rcu() > > > and kvfree_rcu() are expected to be called from atomic raw context > > > as well. > > > > > > Move out a page allocation from contexts which trigger kvfree_rcu() > > > function to the separate worker. When a k[v]free_rcu() per-cpu page > > > cache is empty a fallback mechanism is used and a special job is > > > scheduled to refill the per-cpu cache. > > > > Looks good, still reviewing here. BTW just for my education, I was wondering > > about Thomas's email: > > https://lkml.org/lkml/2020/8/11/939 > > > > If slab allocations in pure raw-atomic context on RT is not allowed or > > recommended, should kfree_rcu() be allowed? > > > Thanks for reviewing, Joel :) > > The decision was made that we need to support kfree_rcu() from "real atomic contexts", > to align with how it used to be before. We can go and just convert our local locks > to the spinlock_t variant but that was not Paul goal, it can be that some users need > kfree_rcu() for raw atomics. People invoke call_rcu() from raw atomics, and so we should provide the same for kfree_rcu(). Yes, people could work around a raw-atomic prohibition, but such prohibitions incur constant costs over time in terms of development effort, increased bug rate, and increased complexity. Yes, this does increase all of those for RCU, but the relative increase is negligible, RCU being what it is. > > slab can have same issue right? If per-cpu cache is drained, it has to > > allocate page from buddy allocator and there's no GFP flag to tell it about > > context where alloc is happening from. > > > Sounds like that. Apart of that, it might turn out soon that we or somebody > else will rise a question one more time about something GFP_RAW or GFP_NOLOCKS. > So who knows.. I would prefer that slab provide some way of dealing with raw atomic context, but the maintainers are thus far unconvinced. > > Or are we saying that we want to support kfree on RT from raw atomic atomic > > context, even though kmalloc is not supported? I hate to bring up this > > elephant in the room, but since I am a part of the people maintaining this > > code, I believe I would rather set some rules than supporting unsupported > > usages. :-\ (Once I know what is supported and what isn't that is). If indeed > > raw atomic kfree_rcu() is a bogus use case because of -RT, then we ought to > > put a giant warning than supporting it :-(. > > > We discussed it several times, the conclusion was that we need to support > kfree_rcu() from raw contexts. At least that was a clear signal from Paul > to me. I think, if we obtain the preemtable(), so it becomes versatile, we > can drop the patch that is in question later on in the future. Given a universally meaningful preemptible(), we could directly call the allocator in some cases. It might (or might not) still make sense to defer the allocation when preemptible() indicated that a direct call to the allocator was unsafe. Thanx, Paul