Received: by 2002:a05:6a10:22f:0:0:0:0 with SMTP id 15csp3666359pxk; Mon, 21 Sep 2020 22:15:51 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwufwoJ1++rYHWLa2NzyVX0wv31O0OOkFpHV7I/PNnlZsk54qWnuahed9vC2zd8l08WIcaq X-Received: by 2002:a17:906:4d97:: with SMTP id s23mr3064912eju.157.1600751751197; Mon, 21 Sep 2020 22:15:51 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1600751751; cv=none; d=google.com; s=arc-20160816; b=RnTUIxNG1ELniF9zdr2sM8p0xlJvPD4XlV/1zquQViZp1WKsEh4dF1R7V9Fc93UYbV Pvsp7h2gZcPAYNc/icd13hEoaV2ZqG1KuOvk9hnBX7UAv13msW1VJREkQes7WULF/UCN 2tGTSLFP9sSsV0xVdyeM5BE7qm94J5kS24nnmPDDqP7sHvPeNXj4vHRBCmunp8ISStRg hzjU/q0lLZn7rapkfIiXDfHsvxwicVldfYJZkL1qS7KA6q6G1TJbVX5D0QsD7Lgdtxci qt/Rj/YB8M3jxqdyWSQmxjGtkR/f9XKCB9g+r/ZdTr+Nq0dMKtokVBQQMi12wg7Xvlap XZNQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-disposition :mime-version:references:reply-to:message-id:subject:cc:to:from:date :dkim-signature; bh=VpT5EMUvOQKoauUpM97bR6rvfIueqzEQWbtwTo9UevI=; b=wv4qxXGIUW7wZjouEUacDIOGy/qsgDzgBKEGYSc9cPcHqNkZKLrNvDhQ2RCXZUCuvs 4yRep4SQwr9ry2mMXjetnhg66qfzJgj8vZGnHy2ON+M5OWDu/FXKCsFuFX1BU9Qg4qJU c/ijcEVoIuzE49tkZopsDVHvxAATsMlty3ZkZALz1VzwCCGDAwFhpa3N1S2ZGkIHNcg5 to9o7HfRVY2dBgLNzFODHKxZlVuz1Il7DuL1RJJsKX6ar4Sqt6oX+D609UPAZqdpRN/j iNLmkpzczoA8mcoLS1bVVrjCk+yIhtjwxNHGYKajv1W1Xmc242dNkBBpGpr2dXvgOh91 ex9A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=xLzkxeIt; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id de17si10094192edb.89.2020.09.21.22.15.27; Mon, 21 Sep 2020 22:15:51 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=xLzkxeIt; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728789AbgIVDfy (ORCPT + 99 others); Mon, 21 Sep 2020 23:35:54 -0400 Received: from mail.kernel.org ([198.145.29.99]:43616 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728634AbgIVDfy (ORCPT ); Mon, 21 Sep 2020 23:35:54 -0400 Received: from paulmck-ThinkPad-P72.home (unknown [50.45.173.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id EBC5A23A32; Tue, 22 Sep 2020 03:35:53 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1600745754; bh=Zg1voEFK6z5Rkm7iaxnBwo567WXU9h8mw+sC++e4x68=; h=Date:From:To:Cc:Subject:Reply-To:References:In-Reply-To:From; b=xLzkxeIta7HsB3m7JlSEs6z5XXhdnouoGY7Dz2Us48cNimR5KW5TnxbmCVc1m9pCE OdxIirw8AAnRGZyDoOfR6I4K+hQFLRpukm5FmqQbEz1qyDOP1HO/EJpqORMQMrOtrx vZLsgtqPiNsHjdZPJScyqzSSwmCANVB6rHnY26Pg= Received: by paulmck-ThinkPad-P72.home (Postfix, from userid 1000) id BD4D735226C1; Mon, 21 Sep 2020 20:35:53 -0700 (PDT) Date: Mon, 21 Sep 2020 20:35:53 -0700 From: "Paul E. McKenney" To: Michal Hocko Cc: "Uladzislau Rezki (Sony)" , LKML , RCU , linux-mm@kvack.org, Andrew Morton , Peter Zijlstra , Vlastimil Babka , Thomas Gleixner , "Theodore Y . Ts'o" , Joel Fernandes , Sebastian Andrzej Siewior , Oleksiy Avramchenko Subject: Re: [RFC-PATCH 2/4] mm: Add __rcu_alloc_page_lockless() func. Message-ID: <20200922033553.GU29330@paulmck-ThinkPad-P72> Reply-To: paulmck@kernel.org References: <20200918194817.48921-1-urezki@gmail.com> <20200918194817.48921-3-urezki@gmail.com> <20200921074716.GC12990@dhcp22.suse.cz> <20200921154558.GD29330@paulmck-ThinkPad-P72> <20200921160318.GO12990@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200921160318.GO12990@dhcp22.suse.cz> User-Agent: Mutt/1.9.4 (2018-02-28) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Sep 21, 2020 at 06:03:18PM +0200, Michal Hocko wrote: > On Mon 21-09-20 08:45:58, Paul E. McKenney wrote: > > On Mon, Sep 21, 2020 at 09:47:16AM +0200, Michal Hocko wrote: > > > On Fri 18-09-20 21:48:15, Uladzislau Rezki (Sony) wrote: > > > [...] > > > > Proposal > > > > ======== > > > > Introduce a lock-free function that obtain a page from the per-cpu-lists > > > > on current CPU. It returns NULL rather than acquiring any non-raw spinlock. > > > > > > I was not happy about this solution when we have discussed this > > > last time and I have to say I am still not happy. This is exposing > > > an internal allocator optimization and allows a hard to estimate > > > consumption of pcp free pages. IIUC this run on pcp cache can be > > > controled directly from userspace (close(open) loop IIRC) which makes it > > > even bigger no-no. > > > > Yes, I do well remember that you are unhappy with this approach. > > Unfortunately, thus far, there is no solution that makes all developers > > happy. You might be glad to hear that we are also looking into other > > solutions, each of which makes some other developers unhappy. So we > > are at least not picking on you alone. :-/ > > No worries I do not feel like a whipping boy here. But do expect me to > argue against the approach. I would also appreciate it if there was some > more information on other attempts, why they have failed. E.g. why > pre-allocation is not an option that works well enough in most > reasonable workloads. I would also appreciate some more thoughts why we > need to optimize for heavy abusers of RCU (like close(open) extremes). Not optimizing for them, but rather defending against them. Uladzislau gave the example of low-memory phones. And we have quite the array of defenses against other userspace bugs including MMUs, the "limit" command, and so on. There have been quite a few successful attempts, starting from the introduction of blimit and RCU-bh more than 15 years ago, continuing through making call_rcu() jump-start grace periods, IPIing reluctant CPUs, tuning RCU callback offloading, and many others. But these prior approaches have only taken us so far. Other approaches under consideration include making CONFIG_PREEMPT_COUNT unconditional and thus allowing call_rcu() and kvfree_rcu() to determine whether direct calls to the allocator are safe (some guy named Linus doesn't like this one), preallocation (Uladzislau covered this, and the amount that would need to be preallocated is excessive), deferring allocation to RCU_SOFTIRQ (this would also need CONFIG_PREEMPT_COUNT), and deferring to some clean context (which is the best we can do within the confines of RCU, but which can have issues with delay). So it is not the need to address this general problem that is new. Far from it! What is new is the need for changes outside of RCU. > > > I strongly agree with Thomas http://lkml.kernel.org/r/87tux4kefm.fsf@nanos.tec.linutronix.de > > > that this optimization is not aiming at reasonable workloads. Really, go > > > with pre-allocated buffer and fallback to whatever slow path you have > > > already. Exposing more internals of the allocator is not going to do any > > > good for long term maintainability. > > > > I suggest that you carefully re-read the thread following that email. > > I clearly remember Thomas not being particularly happy that you optimize > for a corner case. I do not remember there being a consensus that this > is the right approach. There was some consensus that this is better than > a gfp flag. Still quite bad though if you ask me. Again, this "optimization" is for robustness more than raw speed. > > Given a choice between making users unhappy and making developers > > unhappy, I will side with the users each and every time. > > Well, let me rephrase. It is not only about me (as a developer) being > unhappy but also all the side effects this would have for users when > performance of their favorite workload declines for no apparent reason > just because pcp caches are depleted by an unrelated process. But in the close(open()) case, wouldn't the allocations on the open() side refill those caches? Yes, cases where one CPU is doing the allocating and another the call_rcu()/kvfree_rcu() need additional help, but as Uladzislau noted, we do have patches that ensure that the refilling happens. Thanx, Paul