Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp3604698yba; Tue, 23 Apr 2019 06:43:51 -0700 (PDT) X-Google-Smtp-Source: APXvYqziTmf/IFcUEHskFUpPvy7nvfSlKAIgVxRdW9mq/6x36UZ+c1lcr+mBmmA8ElSRBne2YqDj X-Received: by 2002:a17:902:294b:: with SMTP id g69mr26172232plb.57.1556027031520; Tue, 23 Apr 2019 06:43:51 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1556027031; cv=none; d=google.com; s=arc-20160816; b=Qj8LDEDns7E90PIYoxerTfdNGU0ZX2nrTFfhVHUAgAZHn3PhdtP7OAy8C3iRgUCkUd 9ZQtaZfzsU8lP6b0yvXwgwxZTIWG49TYGMMNzGj7Z1hNTPEvHr+j02UQLUi/BFmCG8fe tJQg2YnA7mnyrsRvJeld6Va50mtviYVAua0dYFaC29I2W1jypBGec42vvQMCnJhjvQVI 7TsFonv06IDXuKMWBtxLUzpRS5GbCGUK7APoxujRcSvhKUFjrDxXAOj0E26M80vJbw6+ 509hOE74x3rPO3gIecU7+1jzY84QD/Ch0x7cVT5A0v/eqcCg1SQ3XQQBxLObM7BrAB0S O27g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=CUxIXFnocJydptOGBP61OJ1zwduQcoA6qUjBz1+cU5c=; b=GTRBpetBAxV59ZnxOkiicpfyWP3MPZODH1t17bWxozOt7dAkbp8QYFVPAR5RhKN/Un h63WN4F/agh37dKf3Z45jZBwndatQP1xF030l7gnOnhkAITyMBkoESaS+mpqOWlr5Ipq safU6swK5JSaDhaFZUPYszGCU3T6HWlhylhUzLRviP6i7mpPCOfvVNrjchMFTEjszDVm vUbe/MllTDmporkSM84jJyStcvzrDh8CX/EXJ9VpFywmsoDfTRNbtlh8UhwxlF0BzMr1 uqhGylHd7Gao2WcDr5gr2qaVTHtU4AjEawT6L1rYD6m1DXi1DV9T44ojwxjJtuxTSEDo nnQg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id b3si16734511plc.236.2019.04.23.06.43.36; Tue, 23 Apr 2019 06:43:51 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727872AbfDWNmb (ORCPT + 99 others); Tue, 23 Apr 2019 09:42:31 -0400 Received: from mx2.suse.de ([195.135.220.15]:46972 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726659AbfDWNm2 (ORCPT ); Tue, 23 Apr 2019 09:42:28 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 6E9C4ACAC; Tue, 23 Apr 2019 13:42:25 +0000 (UTC) Date: Tue, 23 Apr 2019 15:42:22 +0200 From: Michal Hocko To: Matthew Wilcox Cc: Michel Lespinasse , Laurent Dufour , Andrew Morton , Peter Zijlstra , "Kirill A. Shutemov" , Andi Kleen , dave@stgolabs.net, Jan Kara , aneesh.kumar@linux.ibm.com, Benjamin Herrenschmidt , mpe@ellerman.id.au, Paul Mackerras , Thomas Gleixner , Ingo Molnar , "H. Peter Anvin" , Will Deacon , Sergey Senozhatsky , sergey.senozhatsky.work@gmail.com, Andrea Arcangeli , Alexei Starovoitov , kemi.wang@intel.com, Daniel Jordan , David Rientjes , Jerome Glisse , Ganesh Mahendran , Minchan Kim , Punit Agrawal , vinayak menon , Yang Shi , zhong jiang , Haiyan Song , Balbir Singh , sj38.park@gmail.com, Mike Rapoport , LKML , linux-mm , haren@linux.vnet.ibm.com, Nick Piggin , "Paul E. McKenney" , Tim Chen , linuxppc-dev@lists.ozlabs.org, x86@kernel.org Subject: Re: [PATCH v12 00/31] Speculative page faults Message-ID: <20190423134222.GL25106@dhcp22.suse.cz> References: <20190416134522.17540-1-ldufour@linux.ibm.com> <20190423104707.GK25106@dhcp22.suse.cz> <20190423124148.GA19031@bombadil.infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190423124148.GA19031@bombadil.infradead.org> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue 23-04-19 05:41:48, Matthew Wilcox wrote: > On Tue, Apr 23, 2019 at 12:47:07PM +0200, Michal Hocko wrote: > > On Mon 22-04-19 14:29:16, Michel Lespinasse wrote: > > [...] > > > I want to add a note about mmap_sem. In the past there has been > > > discussions about replacing it with an interval lock, but these never > > > went anywhere because, mostly, of the fact that such mechanisms were > > > too expensive to use in the page fault path. I think adding the spf > > > mechanism would invite us to revisit this issue - interval locks may > > > be a great way to avoid blocking between unrelated mmap_sem writers > > > (for example, do not delay stack creation for new threads while a > > > large mmap or munmap may be going on), and probably also to handle > > > mmap_sem readers that can't easily use the spf mechanism (for example, > > > gup callers which make use of the returned vmas). But again that is a > > > separate topic to explore which doesn't have to get resolved before > > > spf goes in. > > > > Well, I believe we should _really_ re-evaluate the range locking sooner > > rather than later. Why? Because it looks like the most straightforward > > approach to the mmap_sem contention for most usecases I have heard of > > (mostly a mm{unm}ap, mremap standing in the way of page faults). > > On a plus side it also makes us think about the current mmap (ab)users > > which should lead to an overall code improvements and maintainability. > > Dave Chinner recently did evaluate the range lock for solving a problem > in XFS and didn't like what he saw: > > https://lore.kernel.org/linux-fsdevel/20190418031013.GX29573@dread.disaster.area/T/#md981b32c12a2557a2dd0f79ad41d6c8df1f6f27c Thank you, will have a look. > I think scaling the lock needs to be tied to the actual data structure > and not have a second tree on-the-side to fake-scale the locking. Anyway, > we're going to have a session on this at LSFMM, right? I thought we had something for the mmap_sem scaling but I do not see this in a list of proposed topics. But we can certainly add it there. > > SPF sounds like a good idea but it is a really big and intrusive surgery > > to the #PF path. And more importantly without any real world usecase > > numbers which would justify this. That being said I am not opposed to > > this change I just think it is a large hammer while we haven't seen > > attempts to tackle problems in a simpler way. > > I don't think the "no real world usecase numbers" is fair. Laurent quoted: > > > Ebizzy: > > ------- > > The test is counting the number of records per second it can manage, the > > higher is the best. I run it like this 'ebizzy -mTt '. To get > > consistent result I repeated the test 100 times and measure the average > > result. The number is the record processes per second, the higher is the best. > > > > BASE SPF delta > > 24 CPUs x86 5492.69 9383.07 70.83% > > 1024 CPUS P8 VM 8476.74 17144.38 102% > > and cited 30% improvement for you-know-what product from an earlier > version of the patch. Well, we are talking about 45 files changed, 1277 insertions(+), 196 deletions(-) which is a _major_ surgery in my book. Having a real life workloads numbers is nothing unfair to ask for IMHO. And let me remind you that I am not really opposing SPF in general. I would just like to see a simpler approach before we go such a large change. If the range locking is not really a scalable approach then all right but from why I've see it should help a lot of most bottle-necks I have seen. -- Michal Hocko SUSE Labs