Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751667AbdH3Jxz (ORCPT ); Wed, 30 Aug 2017 05:53:55 -0400 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:43692 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751318AbdH3Jxy (ORCPT ); Wed, 30 Aug 2017 05:53:54 -0400 Subject: Re: [PATCH v2 14/20] mm: Provide speculative fault infrastructure To: Anshuman Khandual , Peter Zijlstra Cc: "Kirill A. Shutemov" , paulmck@linux.vnet.ibm.com, akpm@linux-foundation.org, ak@linux.intel.com, mhocko@kernel.org, dave@stgolabs.net, jack@suse.cz, Matthew Wilcox , benh@kernel.crashing.org, mpe@ellerman.id.au, paulus@samba.org, Thomas Gleixner , Ingo Molnar , hpa@zytor.com, Will Deacon , linux-kernel@vger.kernel.org, linux-mm@kvack.org, haren@linux.vnet.ibm.com, npiggin@gmail.com, bsingharora@gmail.com, Tim Chen , linuxppc-dev@lists.ozlabs.org, x86@kernel.org References: <1503007519-26777-1-git-send-email-ldufour@linux.vnet.ibm.com> <1503007519-26777-15-git-send-email-ldufour@linux.vnet.ibm.com> <20170827001823.n5wgkfq36z6snvf2@node.shutemov.name> <507e79d5-59df-c5b5-106d-970c9353d9bc@linux.vnet.ibm.com> <20170829120426.4ar56rbmiupbqmio@hirez.programming.kicks-ass.net> <848fa2c6-dbda-9a1e-2efd-3ce9b083365e@linux.vnet.ibm.com> <20170829134550.t7du5zdssvlzemtk@hirez.programming.kicks-ass.net> From: Laurent Dufour Date: Wed, 30 Aug 2017 11:53:41 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.2.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=windows-1252 Content-Language: en-US Content-Transfer-Encoding: 7bit X-TM-AS-GCONF: 00 x-cbid: 17083009-0016-0000-0000-000004E75F99 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17083009-0017-0000-0000-00002820D907 Message-Id: X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-08-30_04:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1707230000 definitions=main-1708300146 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4297 Lines: 115 On 30/08/2017 07:03, Anshuman Khandual wrote: > On 08/29/2017 07:15 PM, Peter Zijlstra wrote: >> On Tue, Aug 29, 2017 at 03:18:25PM +0200, Laurent Dufour wrote: >>> On 29/08/2017 14:04, Peter Zijlstra wrote: >>>> On Tue, Aug 29, 2017 at 09:59:30AM +0200, Laurent Dufour wrote: >>>>> On 27/08/2017 02:18, Kirill A. Shutemov wrote: >>>>>>> + >>>>>>> + if (unlikely(!vma->anon_vma)) >>>>>>> + goto unlock; >>>>>> >>>>>> It deserves a comment. >>>>> >>>>> You're right I'll add it in the next version. >>>>> For the record, the root cause is that __anon_vma_prepare() requires the >>>>> mmap_sem to be held because vm_next and vm_prev must be safe. >>>> >>>> But should that test not be: >>>> >>>> if (unlikely(vma_is_anonymous(vma) && !vma->anon_vma)) >>>> goto unlock; >>>> >>>> Because !anon vmas will never have ->anon_vma set and you don't want to >>>> exclude those. >>> >>> Yes in the case we later allow non anonymous vmas to be handled. >>> Currently only anonymous vmas are supported so the check is good enough, >>> isn't it ? >> >> That wasn't at all clear from reading the code. This makes it clear >> ->anon_vma is only ever looked at for anonymous. >> >> And like Kirill says, we _really_ should start allowing some (if not >> all) vm_ops. Large file based mappings aren't particularly rare. >> >> I'm not sure we want to introduce a white-list or just bite the bullet >> and audit all ->fault() implementations. But either works and isn't >> terribly difficult, auditing all is more work though. > > filemap_fault() is used as vma-vm_ops->fault() for most of the file > systems. Changing it can enable speculative fault support for all of > them. It will still exclude other driver based vma-vm_ops->fault() > implementation. AFAICS, __lock_page_or_retry() function can drop > mm->mmap_sem if the page could not be locked right away. As suggested > by Peterz, making it understand FAULT_FLAG_SPECULATIVE should be good > enough. The patch is lightly tested for file mappings on top of this > series. Hi Anshuman, This sounds pretty good, except for the FAULT_FLAG_RETRY_NOWAIT's case I mentioned in another mail. The next step would be to find a way to discriminate between the vm_fault() functions. Any idea ? Thanks, Laurent. > > diff --git a/mm/filemap.c b/mm/filemap.c > index a497024..08f3042 100644 > --- a/mm/filemap.c > +++ b/mm/filemap.c > @@ -1181,6 +1181,18 @@ int __lock_page_killable(struct page *__page) > int __lock_page_or_retry(struct page *page, struct mm_struct *mm, > unsigned int flags) > { > + if (flags & FAULT_FLAG_SPECULATIVE) { > + if (flags & FAULT_FLAG_KILLABLE) { > + int ret; > + > + ret = __lock_page_killable(page); > + if (ret) > + return 0; > + } else > + __lock_page(page); > + return 1; > + } > + > if (flags & FAULT_FLAG_ALLOW_RETRY) { > /* > * CAUTION! In this case, mmap_sem is not released > diff --git a/mm/memory.c b/mm/memory.c > index 549d235..02347f3 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -3836,8 +3836,6 @@ static int handle_pte_fault(struct vm_fault *vmf) > if (!vmf->pte) { > if (vma_is_anonymous(vmf->vma)) > return do_anonymous_page(vmf); > - else if (vmf->flags & FAULT_FLAG_SPECULATIVE) > - return VM_FAULT_RETRY; > else > return do_fault(vmf); > } > @@ -4012,17 +4010,7 @@ int handle_speculative_fault(struct mm_struct *mm, unsigned long address, > goto unlock; > } > > - /* > - * Can't call vm_ops service has we don't know what they would do > - * with the VMA. > - * This include huge page from hugetlbfs. > - */ > - if (vma->vm_ops) { > - trace_spf_vma_notsup(_RET_IP_, vma, address); > - goto unlock; > - } > - > - if (unlikely(!vma->anon_vma)) { > + if (unlikely(vma_is_anonymous(vma) && !vma->anon_vma)) { > trace_spf_vma_notsup(_RET_IP_, vma, address); > goto unlock; > } >