Received: by 10.213.65.68 with SMTP id h4csp350800imn; Fri, 23 Mar 2018 06:06:28 -0700 (PDT) X-Google-Smtp-Source: AG47ELtmcFwodEBq7DOUoI8SuOnQEK/RAQLBXLET2hQWlnuDziCsoXkMGaz8x9decKxtkSWYjDa9 X-Received: by 10.99.9.132 with SMTP id 126mr20932912pgj.446.1521810388097; Fri, 23 Mar 2018 06:06:28 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1521810388; cv=none; d=google.com; s=arc-20160816; b=rfJvClG8kBqSl2KiCk35NCsulT+jVnZoEpu0TJdNMJqbvoPNmSnRrRiutQtaWKjMJP F2G167M8OkCtuARng+n+Uc/ZqMpmOFK2QAC1nIBUnMKhKwRTMra8myUnL3evXOmO+zTj UyYdyq9dDW5VW27Jl7ysxu4xjA6J0gGLILKnu3M94xABX9ftuF9ytvP3qvUAbHzkzQY9 Lb+dQoBUM8SYKVbg7RozNAd0znt6ATqvXcg0ivKbmq4fq/aJWnVuyaCQVGNEtISuv5k/ lit9GTGWisc266AsDxb8EuATmpSASrpN2pMFdYrjOd90/OrwChYxTCnNHcwnWrB7WbzI UJLA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date:from :references:cc:to:subject:arc-authentication-results; bh=7O0q/TUgUueWvGQXmIBnqDsK9Ng/1mSLOZ88QjlKnTg=; b=NHJc8WJDrvRcQIGqGG1N1dGY3ZppE2K/e+v+7peb5kGRca55U2zuU1TfVyHScFot7A ZmxiIuoQqhwb63XFJvKYqEfnP6hWO6+LEic4it9rOMULgYETzw+EU64VjLbOs5Those1 9bfP84fSyrW1tZvImD7Hd369cf4A9OIezlTmXSTjuU8jvD5uMOOFRIVBPfLrCA7ecfpR 7DA8Z+FLmTfgdkwJc3cjI+6EKU4cfX1h3+rHDUJEAMFeDDPOydkzqhVXSA6FTdbFx+3g 7964OBFluJtNdaIl4Sp6kpnVURBjvBXFIiCdBkgXowa26pBmrbq7ueUF/RzEwSEpmBgc HGdg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f34-v6si9002030ple.622.2018.03.23.06.05.27; Fri, 23 Mar 2018 06:06:28 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752281AbeCWND3 (ORCPT + 99 others); Fri, 23 Mar 2018 09:03:29 -0400 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:57070 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751963AbeCWND1 (ORCPT ); Fri, 23 Mar 2018 09:03:27 -0400 Received: from pps.filterd (m0098413.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.22/8.16.0.22) with SMTP id w2ND1Gfp120312 for ; Fri, 23 Mar 2018 09:03:27 -0400 Received: from e06smtp15.uk.ibm.com (e06smtp15.uk.ibm.com [195.75.94.111]) by mx0b-001b2d01.pphosted.com with ESMTP id 2gvye5y524-1 (version=TLSv1.2 cipher=AES256-SHA256 bits=256 verify=NOT) for ; Fri, 23 Mar 2018 09:03:26 -0400 Received: from localhost by e06smtp15.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Fri, 23 Mar 2018 13:03:24 -0000 Received: from b06cxnps4075.portsmouth.uk.ibm.com (9.149.109.197) by e06smtp15.uk.ibm.com (192.168.101.145) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Fri, 23 Mar 2018 13:03:21 -0000 Received: from d06av22.portsmouth.uk.ibm.com (d06av22.portsmouth.uk.ibm.com [9.149.105.58]) by b06cxnps4075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id w2ND3KSp46334120; Fri, 23 Mar 2018 13:03:20 GMT Received: from d06av22.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id E3FD74C04E; Fri, 23 Mar 2018 12:56:25 +0000 (GMT) Received: from d06av22.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 717F84C059; Fri, 23 Mar 2018 12:56:25 +0000 (GMT) Received: from [9.145.185.231] (unknown [9.145.185.231]) by d06av22.portsmouth.uk.ibm.com (Postfix) with ESMTP; Fri, 23 Mar 2018 12:56:25 +0000 (GMT) Subject: Re: [RFC PATCH 1/8] mm: mmap: unmap large mapping by section To: Yang Shi , Matthew Wilcox Cc: Michal Hocko , akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <1521581486-99134-1-git-send-email-yang.shi@linux.alibaba.com> <1521581486-99134-2-git-send-email-yang.shi@linux.alibaba.com> <20180321130833.GM23100@dhcp22.suse.cz> <20180321172932.GE4780@bombadil.infradead.org> <20180321224631.GB3969@bombadil.infradead.org> <18a727fd-f006-9fae-d9ca-74b9004f0a8b@linux.vnet.ibm.com> <20180322154055.GB28468@bombadil.infradead.org> <0442fb0e-3da3-3f23-ce4d-0f6cbc3eac9a@linux.vnet.ibm.com> <20180322160547.GC28468@bombadil.infradead.org> <55ac947f-fd77-3754-ebfe-30d458c54403@linux.vnet.ibm.com> From: Laurent Dufour Date: Fri, 23 Mar 2018 14:03:19 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.6.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit X-TM-AS-GCONF: 00 x-cbid: 18032313-0020-0000-0000-000004089C36 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 18032313-0021-0000-0000-0000429CC3C0 Message-Id: X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2018-03-23_08:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 impostorscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1709140000 definitions=main-1803230150 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 22/03/2018 17:46, Yang Shi wrote: > > > On 3/22/18 9:18 AM, Laurent Dufour wrote: >> >> On 22/03/2018 17:05, Matthew Wilcox wrote: >>> On Thu, Mar 22, 2018 at 04:54:52PM +0100, Laurent Dufour wrote: >>>> On 22/03/2018 16:40, Matthew Wilcox wrote: >>>>> On Thu, Mar 22, 2018 at 04:32:00PM +0100, Laurent Dufour wrote: >>>>>> Regarding the page fault, why not relying on the PTE locking ? >>>>>> >>>>>> When munmap() will unset the PTE it will have to held the PTE lock, so this >>>>>> will serialize the access. >>>>>> If the page fault occurs before the mmap(MAP_FIXED), the page mapped will be >>>>>> removed when mmap(MAP_FIXED) would do the cleanup. Fair enough. >>>>> The page fault handler will walk the VMA tree to find the correct >>>>> VMA and then find that the VMA is marked as deleted.  If it assumes >>>>> that the VMA has been deleted because of munmap(), then it can raise >>>>> SIGSEGV immediately.  But if the VMA is marked as deleted because of >>>>> mmap(MAP_FIXED), it must wait until the new VMA is in place. >>>> I'm wondering if such a complexity is required. >>>> If the user space process try to access the page being overwritten through >>>> mmap(MAP_FIXED) by another thread, there is no guarantee that it will >>>> manipulate the *old* page or *new* one. >>> Right; but it must return one or the other, it can't segfault. >> Good point, I missed that... >> >>>> I'd think this is up to the user process to handle that concurrency. >>>> What needs to be guaranteed is that once mmap(MAP_FIXED) returns the old page >>>> are no more there, which is done through the mmap_sem and PTE locking. >>> Yes, and allowing the fault handler to return the *old* page risks the >>> old page being reinserted into the page tables after the unmapping task >>> has done its work. >> The PTE locking should prevent that. >> >>> It's *really* rare to page-fault on a VMA which is in the middle of >>> being replaced.  Why are you trying to optimise it? >> I was not trying to optimize it, but to not wait in the page fault handler. >> This could become tricky in the case the VMA is removed once mmap(MAP_FIXED) is >> done and before the waiting page fault got woken up. This means that the >> removed VMA structure will have to remain until all the waiters are woken up >> which implies ref_count or similar. > > We may not need ref_count. After removing "locked-for-deletion" vmas when > mmap(MAP_FIXED) is done, just wake up page fault to re-lookup vma, then it will > find the new vma installed by mmap(MAP_FIXED), right? I do agree, as far as waking up would not require access to the VMA. > I'm not sure if completion can do this or not since I'm not quite familiar with > it :-( I don't know either :/ Laurent. > Yang > >> >>>>> I think I was wrong to describe VMAs as being *deleted*.  I think we >>>>> instead need the concept of a *locked* VMA that page faults will block on. >>>>> Conceptually, it's a per-VMA rwsem, but I'd use a completion instead of >>>>> an rwsem since the only reason to write-lock the VMA is because it is >>>>> being deleted. >>>> Such a lock would only makes sense in the case of mmap(MAP_FIXED) since when >>>> the VMA is removed there is no need to wait. Isn't it ? >>> I can't think of another reason.  I suppose we could mark the VMA as >>> locked-for-deletion or locked-for-replacement and have the SIGSEGV happen >>> early.  But I'm not sure that optimising for SIGSEGVs is a worthwhile >>> use of our time.  Just always have the pagefault sleep for a deleted VMA. > > > >