Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18;
Date:   Tue, 3 Nov 2020 11:18:22 -0800
From:   "Paul E. McKenney" <paulmck@kernel.org>
To:     Uladzislau Rezki <urezki@gmail.com>
Cc:     Joel Fernandes <joel@joelfernandes.org>,
        LKML <linux-kernel@vger.kernel.org>, RCU <rcu@vger.kernel.org>,
        Andrew Morton <akpm@linux-foundation.org>,
        Peter Zijlstra <peterz@infradead.org>,
        Michal Hocko <mhocko@suse.com>,
        Thomas Gleixner <tglx@linutronix.de>,
        "Theodore Y . Ts'o" <tytso@mit.edu>,
        Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
        Oleksiy Avramchenko <oleksiy.avramchenko@sonymobile.com>,
        willy@infradead.org
Subject: Re: [PATCH 01/16] rcu/tree: Add a work to allocate pages from
 regular context
Message-ID: <20201103191822.GC3249@paulmck-ThinkPad-P72>
Reply-To: paulmck@kernel.org
References: <20201029165019.14218-1-urezki@gmail.com>
 <20201103154723.GA1310511@google.com>
 <20201103163350.GA10665@pc636>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20201103163350.GA10665@pc636>
User-Agent: Mutt/1.9.4 (2018-02-28)
Precedence: bulk

On Tue, Nov 03, 2020 at 05:33:50PM +0100, Uladzislau Rezki wrote:
> On Tue, Nov 03, 2020 at 10:47:23AM -0500, Joel Fernandes wrote:
> > On Thu, Oct 29, 2020 at 05:50:04PM +0100, Uladzislau Rezki (Sony) wrote:
> > > The current memmory-allocation interface presents to following
> > > difficulties that this patch is designed to overcome:
> > > 
> > > a) If built with CONFIG_PROVE_RAW_LOCK_NESTING, the lockdep will
> > >    complain about violation("BUG: Invalid wait context") of the
> > >    nesting rules. It does the raw_spinlock vs. spinlock nesting
> > >    checks, i.e. it is not legal to acquire a spinlock_t while
> > >    holding a raw_spinlock_t.
> > > 
> > >    Internally the kfree_rcu() uses raw_spinlock_t whereas the
> > >    "page allocator" internally deals with spinlock_t to access
> > >    to its zones. The code also can be broken from higher level
> > >    of view:
> > >    <snip>
> > >        raw_spin_lock(&some_lock);
> > >        kfree_rcu(some_pointer, some_field_offset);
> > >    <snip>
> > > 
> > > b) If built with CONFIG_PREEMPT_RT. Please note, in that case spinlock_t
> > >    is converted into sleepable variant. Invoking the page allocator from
> > >    atomic contexts leads to "BUG: scheduling while atomic".
> > > 
> > > c) call_rcu() is invoked from raw atomic context and kfree_rcu()
> > >    and kvfree_rcu() are expected to be called from atomic raw context
> > >    as well.
> > > 
> > > Move out a page allocation from contexts which trigger kvfree_rcu()
> > > function to the separate worker. When a k[v]free_rcu() per-cpu page
> > > cache is empty a fallback mechanism is used and a special job is
> > > scheduled to refill the per-cpu cache.
> > 
> > Looks good, still reviewing here. BTW just for my education, I was wondering
> > about Thomas's email:
> > https://lkml.org/lkml/2020/8/11/939
> > 
> > If slab allocations in pure raw-atomic context on RT is not allowed or
> > recommended, should kfree_rcu() be allowed?
> >
> Thanks for reviewing, Joel :)
> 
> The decision was made that we need to support kfree_rcu() from "real atomic contexts",
> to align with how it used to be before. We can go and just convert our local locks
> to the spinlock_t variant but that was not Paul goal, it can be that some users need
> kfree_rcu() for raw atomics.

People invoke call_rcu() from raw atomics, and so we should provide
the same for kfree_rcu().  Yes, people could work around a raw-atomic
prohibition, but such prohibitions incur constant costs over time in
terms of development effort, increased bug rate, and increased complexity.
Yes, this does increase all of those for RCU, but the relative increase
is negligible, RCU being what it is.

> > slab can have same issue right? If per-cpu cache is drained, it has to
> > allocate page from buddy allocator and there's no GFP flag to tell it about
> > context where alloc is happening from.
> > 
> Sounds like that. Apart of that, it might turn out soon that we or somebody
> else will rise a question one more time about something GFP_RAW or GFP_NOLOCKS.
> So who knows..

I would prefer that slab provide some way of dealing with raw atomic
context, but the maintainers are thus far unconvinced.

> > Or are we saying that we want to support kfree on RT from raw atomic atomic
> > context, even though kmalloc is not supported? I hate to bring up this
> > elephant in the room, but since I am a part of the people maintaining this
> > code, I believe I would rather set some rules than supporting unsupported
> > usages. :-\ (Once I know what is supported and what isn't that is). If indeed
> > raw atomic kfree_rcu() is a bogus use case because of -RT, then we ought to
> > put a giant warning than supporting it :-(.
> > 
> We discussed it several times, the conclusion was that we need to support 
> kfree_rcu() from raw contexts. At least that was a clear signal from Paul 
> to me. I think, if we obtain the preemtable(), so it becomes versatile, we
> can drop the patch that is in question later on in the future.

Given a universally meaningful preemptible(), we could directly call
the allocator in some cases.  It might (or might not) still make sense
to defer the allocation when preemptible() indicated that a direct call
to the allocator was unsafe.

							Thanx, Paul