Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67;
From:   Uladzislau Rezki <urezki@gmail.com>
Date:   Wed, 1 Apr 2020 14:25:50 +0200
To:     Joel Fernandes <joel@joelfernandes.org>
Cc:     Uladzislau Rezki <urezki@gmail.com>, linux-kernel@vger.kernel.org,
        linux-mm@kvack.org, rcu@vger.kernel.org, willy@infradead.org,
        peterz@infradead.org, neilb@suse.com, vbabka@suse.cz,
        mgorman@suse.de, Andrew Morton <akpm@linux-foundation.org>,
        Josh Triplett <josh@joshtriplett.org>,
        Lai Jiangshan <jiangshanlai@gmail.com>,
        Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
        "Paul E. McKenney" <paulmck@kernel.org>,
        Steven Rostedt <rostedt@goodmis.org>
Subject: Re: [PATCH RFC] rcu/tree: Use GFP_MEMALLOC for alloc memory to free
 memory pattern
Message-ID: <20200401122550.GA32593@pc636>
References: <20200331131628.153118-1-joel@joelfernandes.org>
 <20200331140433.GA26498@pc636>
 <20200331150911.GC236678@google.com>
 <20200331160119.GA27614@pc636>
 <20200331183000.GD236678@google.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20200331183000.GD236678@google.com>
User-Agent: Mutt/1.10.1 (2018-07-13)
Sender: linux-kernel-owner@vger.kernel.org
Precedence: bulk

> > I think there should be GFP_ATOMIC used, because it has more chance to
> > return memory then GFP_NOWAIT. I see that Michal has same view on it.
> 
> I don't think so because GFP_ATOMIC implies GFP_NOWAIT. I am Ok with keeping
> the GFP_ATOMIC as it is btw. Paul mentioned he prefers this. I agree with
> that as well.
> 
GFP_ATOMIC can access to reserved memory whereas GFP_NOWAIT is not
eligible to do so. So there is difference between them :)

> > > 
> > > Yes, the benefit of the trace/warning is that the user can switch to a
> > > non-headless API and avoid the synchronize_rcu(), that would help them get
> > > faster kfree_rcu() performance instead of having silent slowdowns.
> > > 
> > Agree. What about just adding WARN_ON_ONCE()? I am just thinking if it
> > could be harmful or not.
> 
> You mean WARN_ON_ONCE() before the synchronize_rcu() right? We could do that.
> Paul mentioned to me he prefers if this new warning can be turned off with a
> boot parameter since some future user may prefer no warning. I also agree.
> 
Yes, we can add it before doing synchronize_rcu(). WARN_ON_ONCE() will
emit only once the warning. I think that would be enough to pay an
attention.

>
> If we add this then we can keep your __GFP_NOWARN flag with no additional GFP
> flag changes.
>
We can also add __GFP_RETRY_MAYFAIL to GFP_ATOMIC to make it more tight.
Basically your patch can be modified just adding that.

> > > It also tells us whether the headless API is worth it in the long run, I
> > > think it is worth it because we will likely never hit the synchronize_rcu()
> > > failsafe. But if we hit it a lot, at least it wont happen silently.
> > > 
> > Agree.
> > 
> > > Paul was concerned about following scenario with hitting synchronize_rcu():
> > > 1. Consider a system under memory pressure.
> > > 2. Consider some other subsystem X depending on another system Y which uses
> > >    kfree_rcu(). If Y doesn't complete the operation in time, X accumulates
> > >    more memory.
> > > 3. Since kfree_rcu() on Y hits synchronize_rcu() a lot, it slows it down.
> > >    This causes X to further allocate memory, further causing a chain
> > >    reaction.
> > > Paul, please correct me if I'm wrong.
> > > 
> > I see your point and agree that in theory it can happen. So, we should
> > make it more tight when it comes to rcu_head attachment logic.
> 
> Right. Per discussion with Paul, we discussed that it is better if we
> pre-allocate N number of array blocks per-CPU and use it for the cache.
> Default for N being 1 and tunable with a boot parameter. I agree with this.
> 
As discussed before, we can make use of memory pool API for such
purpose. But i am not sure if it should be one pool per CPU or
one pool per NR_CPUS, that would contain NR_CPUS * N pre-allocated
blocks.

> In current code, we have 1 cache page per CPU, but this is allocated only on
> the first kvfree_rcu() request. So we could change this behavior as well to
> make it pre-allocated.
> 
> Does this all sound good to you?
> 
I think that makes sense :)

--
Vlad Rezki