Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp879185yba; Wed, 24 Apr 2019 11:05:04 -0700 (PDT) X-Google-Smtp-Source: APXvYqxroMuoXOTNxzz7vG3EO+u6KlX7OZGxZb1na4ZSqvuhy3tz9kCpghwc95vCqliCqVQlI3Uu X-Received: by 2002:a63:d349:: with SMTP id u9mr30372064pgi.83.1556129104895; Wed, 24 Apr 2019 11:05:04 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1556129104; cv=none; d=google.com; s=arc-20160816; b=OIRuo/XhqY5//1IMv7tBid1lJVWoYEenUY7dpbaNGg9GpaDpEsYKAcgVn5TwcwqNUd lRDoooGAtqpW1G0KxB7iNhwg5NId8f9NsCs0rgQgbX8EjvMeYeDzR2mk5OEOcE2aLwWo 88SFLuBqQyPgs+coY9NVuO1gPCTyoP3d/X6mFhJvmGlH7D//5e2nG2egWCqcHgZW9sJu gx7lexTPcYhTNQDHGTVyB42DlzCsnbLFCdAt3wpWMSIH/ILv3Xp7zZXlDgv+7TXOi9rP Yz8uZvqSl4lDaWIIbnQc4kA6NEW/E2T+ue1fZ7ueug9CTgmJDTpd1MyBo8WcVLg4fnFS 0XHA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date:from :references:cc:to:subject; bh=qj/1tEFPW+55bJRT6wqNSU5mPK2TPHsMj54JOcS/oZ8=; b=cLgQ6TE1TMhNcbp8EhHmihG7vsxMbOdqMT9gk47cnI0VkUM3YbqJmFogli84oTUXP2 Hp5lM2VTL4dJnpIZoe1iLc1J+Pid8X+kdsF2kToa+TuCweoSxiVn3j0meetfYxWjmi9m +oFmk7gxX39E5Mch43QZl/HTAdpwADGsLJgiawI9UxDstkNduU4vtdU4oaDGUbsNdL2A dMagMwpsTQpBJuXwdTaLqpASjklGPzyTwfhrSnV4yRt6iacPcnhjZlNL3YdTHGpmYOpG ziSOc5FJG80yH4Ir30tdi8Ac46U0kQ7xGjnzT6mbCYo9xKIDJ+CUghWdi+rQEuIUGIU4 x7VA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id y1si20146826plb.276.2019.04.24.11.04.49; Wed, 24 Apr 2019 11:05:04 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2389875AbfDXSCj (ORCPT + 99 others); Wed, 24 Apr 2019 14:02:39 -0400 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:57954 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S2390808AbfDXSBk (ORCPT ); Wed, 24 Apr 2019 14:01:40 -0400 Received: from pps.filterd (m0098413.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.27/8.16.0.27) with SMTP id x3OHsUq3040367 for ; Wed, 24 Apr 2019 14:01:39 -0400 Received: from e06smtp04.uk.ibm.com (e06smtp04.uk.ibm.com [195.75.94.100]) by mx0b-001b2d01.pphosted.com with ESMTP id 2s2u7umpap-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Wed, 24 Apr 2019 14:01:39 -0400 Received: from localhost by e06smtp04.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Wed, 24 Apr 2019 19:01:36 +0100 Received: from b06cxnps4076.portsmouth.uk.ibm.com (9.149.109.198) by e06smtp04.uk.ibm.com (192.168.101.134) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Wed, 24 Apr 2019 19:01:26 +0100 Received: from d06av26.portsmouth.uk.ibm.com (d06av26.portsmouth.uk.ibm.com [9.149.105.62]) by b06cxnps4076.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id x3OI1OxR42663952 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 24 Apr 2019 18:01:24 GMT Received: from d06av26.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 527D2AE057; Wed, 24 Apr 2019 18:01:24 +0000 (GMT) Received: from d06av26.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 688DDAE05F; Wed, 24 Apr 2019 18:01:21 +0000 (GMT) Received: from [9.145.176.48] (unknown [9.145.176.48]) by d06av26.portsmouth.uk.ibm.com (Postfix) with ESMTP; Wed, 24 Apr 2019 18:01:21 +0000 (GMT) Subject: Re: [PATCH v12 00/31] Speculative page faults To: Michel Lespinasse Cc: Andrew Morton , Michal Hocko , Peter Zijlstra , "Kirill A. Shutemov" , Andi Kleen , dave@stgolabs.net, Jan Kara , Matthew Wilcox , aneesh.kumar@linux.ibm.com, Benjamin Herrenschmidt , mpe@ellerman.id.au, Paul Mackerras , Thomas Gleixner , Ingo Molnar , "H. Peter Anvin" , Will Deacon , Sergey Senozhatsky , sergey.senozhatsky.work@gmail.com, Andrea Arcangeli , Alexei Starovoitov , kemi.wang@intel.com, Daniel Jordan , David Rientjes , Jerome Glisse , Ganesh Mahendran , Minchan Kim , Punit Agrawal , vinayak menon , Yang Shi , zhong jiang , Haiyan Song , Balbir Singh , sj38.park@gmail.com, Mike Rapoport , LKML , linux-mm , haren@linux.vnet.ibm.com, Nick Piggin , "Paul E. McKenney" , Tim Chen , linuxppc-dev@lists.ozlabs.org, x86@kernel.org References: <20190416134522.17540-1-ldufour@linux.ibm.com> From: Laurent Dufour Date: Wed, 24 Apr 2019 20:01:20 +0200 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:60.0) Gecko/20100101 Thunderbird/60.6.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit X-TM-AS-GCONF: 00 x-cbid: 19042418-0016-0000-0000-00000273657C X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 19042418-0017-0000-0000-000032CFD8D0 Message-Id: X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:,, definitions=2019-04-24_11:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=2 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1904240131 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Le 22/04/2019 à 23:29, Michel Lespinasse a écrit : > Hi Laurent, > > Thanks a lot for copying me on this patchset. It took me a few days to > go through it - I had not been following the previous iterations of > this series so I had to catch up. I will be sending comments for > individual commits, but before tat I would like to discuss the series > as a whole. Hi Michel, Thanks for reviewing this series. > I think these changes are a big step in the right direction. My main > reservation about them is that they are additive - adding some complexity > for speculative page faults - and I wonder if it'd be possible, over the > long term, to replace the existing complexity we have in mmap_sem retry > mechanisms instead of adding to it. This is not something that should > block your progress, but I think it would be good, as we introduce spf, > to evaluate whether we could eventually get all the way to removing the > mmap_sem retry mechanism, or if we will actually have to keep both. Until we get rid of the mmap_sem which seems to be a very long story, I can't see how we could get rid of the retry mechanism. > The proposed spf mechanism only handles anon vmas. Is there a > fundamental reason why it couldn't handle mapped files too ? > My understanding is that the mechanism of verifying the vma after > taking back the ptl at the end of the fault would work there too ? > The file has to stay referenced during the fault, but holding the vma's > refcount could be made to cover that ? the vm_file refcount would have > to be released in __free_vma() instead of remove_vma; I'm not quite sure > if that has more implications than I realize ? The only concern is the flow of operation done in the vm_ops->fault() processing. Most of the file system relie on the generic filemap_fault() which should be safe to use. But we need a clever way to identify fault processing which are compatible with the SPF handler. This could be done using a tag/flag in the vm_ops structure or in the vma's flags. This would be the next step. > The proposed spf mechanism only works at the pte level after the page > tables have already been created. The non-spf page fault path takes the > mm->page_table_lock to protect against concurrent page table allocation > by multiple page faults; I think unmapping/freeing page tables could > be done under mm->page_table_lock too so that spf could implement > allocating new page tables by verifying the vma after taking the > mm->page_table_lock ? I've to admit that I didn't dig further here. Do you have a patch? ;) > > The proposed spf mechanism depends on ARCH_HAS_PTE_SPECIAL. > I am not sure what is the issue there - is this due to the vma->vm_start > and vma->vm_pgoff reads in *__vm_normal_page() ? Yes that's the reason, no way to guarantee the value of these fields in the SPF path. > > My last potential concern is about performance. The numbers you have > look great, but I worry about potential regressions in PF performance > for threaded processes that don't currently encounter contention > (i.e. there may be just one thread actually doing all the work while > the others are blocked). I think one good proxy for measuring that > would be to measure a single threaded workload - kernbench would be > fine - without the special-case optimization in patch 22 where > handle_speculative_fault() immediately aborts in the single-threaded case. I'll have to give it a try. > Reviewed-by: Michel Lespinasse > This is for the series as a whole; I expect to do another review pass on > individual commits in the series when we have agreement on the toplevel > stuff (I noticed a few things like out-of-date commit messages but that's > really minor stuff). Thanks a lot for reviewing this long series. > > I want to add a note about mmap_sem. In the past there has been > discussions about replacing it with an interval lock, but these never > went anywhere because, mostly, of the fact that such mechanisms were > too expensive to use in the page fault path. I think adding the spf > mechanism would invite us to revisit this issue - interval locks may > be a great way to avoid blocking between unrelated mmap_sem writers > (for example, do not delay stack creation for new threads while a > large mmap or munmap may be going on), and probably also to handle > mmap_sem readers that can't easily use the spf mechanism (for example, > gup callers which make use of the returned vmas). But again that is a > separate topic to explore which doesn't have to get resolved before > spf goes in. >